Extracted Concepts List

This section contains a short guide on how to use the Extracted Concepts list of the corpus management.

The Extracted Concepts list contains all concepts from the thesaurus that have been detected in the uploaded documents.

In your opened corpus, click the Extracted Concepts (1) tab to open the Extracted Concepts list. Details about available options find below.

Available Options

  • Use Search Concepts (2) to filter for concepts, click Search to start the search. A list of results will be displayed. Click Reset to display the whole list.

  • In the drop down Search Criteria (3) you can select to filter concepts found in the corpus or alternatively to filter concepts not found in the corpus.

  • Use the Show Matching Terms icon (4) to display a list of extracted terms that are similar to the respective concept. This helps to identify synonyms (alternative labels) and new narrower concepts which can be added to a concept with a few mouse clicks.

  • You can use the Add to Blacklist icon (5) to blacklist a concept, in order to exclude it from the extraction results.

The Extracted Concept list provides an overview of how often these concepts were found in the document corpus. It also lists the most frequently used label of each concept. In addition, the broader concepts and concept schemes of those concepts are displayed.

You can sort the first three table columns by clicking the table headers. The image shows a table sorted by Relevance column:


Table Columns

The table columns in the Extracted Concepts tab can be used to sort for Preferred Label, Frequency and Most Frequent Label.

The columns and their content provide the following information:

  • Preferred Label: the actual label of the concept contained in the thesaurus that has been extracted from the corpus documents.

  • Frequency: the total number of times a concept has been found in the corpus documents.

  • Relevance: displays the scores of concepts found inside the corpus during the Corpus Analysis. Use these scores as information about the validity of the extracted concepts. The relevance of concepts here is calculated similarly to the terms in the Extracted Terms list.

  • Most Frequent Label: displays labels of concepts that have been found as part of the term or as phrase in the corpus documents.

  • Broader Concepts: displays the skos:broaders for the respective concept in that table row.

  • Concept Scheme: displays the concept scheme that concept is part of.

Extracted Concepts - Use the Similar Terms List

This section contains a short guide on how to use the Similar Terms list for extracted concepts.

After you have executed a corpus analysis, and opened the Extracted Concepts list in its tab, you can access the Similar Terms list.

The terms in this list are the results of the corpus analysis and can help you to further refine your thesaurus with terms as synonyms or concepts you can select and add to it from here. The analysis will calculate the terms based on PoolParty's similarity algorithms and display them here, including their similarity scores. A higher score means the similarity is higher compared to your thesaurus' concepts.

How to Use the Similar Terms List
  1. Use the Show Matching Terms icon in the Extracted Concepts list behind the concept's name.

  2. The Similar Terms list will open.

  3. You can sort the list by Terms in alphabetical order or by Similarity Score. The higher the score the closer the term is to the selected concept in meaning. In the list you can select the terms of your choice by checking the check box behind one, several or all of them.

  4. Beside Add Selected Terms as choose from the drop down to add them as one of these entities: Alternative Label, Hidden Label or Narrower Concept.

  5. Click Add, to add them to the previously selected concept.

  6. When you are done, click Close.