Shadow Concepts in PoolParty

This section contains a short guide on how to use the Shadow Concepts functionality in PoolParty.

In relation to the corpus management and term extraction functions in PoolParty you can also use the Shadow Concepts function.

Note

These topics presuppose that you are familiar with the basic Corpus Management functions and ideas in PoolParty.

The Shadow Concepts function in PoolParty is based on the idea that co-occurrences of concepts can help to further refine extraction results for corpus documents. It makes sure that search results in search applications will be more reliable and cover a broader range of documents. You can use the interface as described in this section to check on the results of the co-occurrences calculation.

The co-occurrences calculation is basis for all Shadow Concepts functions you can use.

The idea of Shadow Concepts can be described as follows:

  • they are concepts of your thesaurus.

  • during entity extraction in the corpus concepts A, B and C often co-occur with concept D.

  • D will be suggested as a shadow concept.

    • This means, a user might search for D in a search application.

    • Yet, a document contains only the concepts A, B and C.

    • This document still will be listed in the search results, since D is a 'shadow concept'.

Example

An example outlining this function would be this: texts that deal with Peru and the Inca culture, would contain the term 'Machu Picchu' often in close proximity to 'Peru'. The concept 'Machu Picchu' will be suggested as a 'shadow concept'.

Searching for the concept 'Machu Picchu', documents that only contain the concept 'Peru' will still be found and listed, since 'Machu Picchu' is their shadow concept.

The schematic representation of the co-occurrence calculation for 'Machu Picchu' looks like this, with a short text example:

23900022.png

You find the following topics in this section:

How to Use the Shadow Concepts Function

This section contains a short guide on how to use the Shadow Concepts function in PoolParty during corpus analysis.

The Shadow Concepts function and its results are part of the corpus analysis in PoolParty. You will have to prepare a few things before you can start using Shadow Concepts.

Prerequisites for Using the Shadow Concepts FunctionSteps to Activate the Shadow Concepts Function
  1. Open the Corpus Management.

  2. Activate the node of the corpus you want to use as basis for calculation.

  3. On the right find the Corpus Analysis Settings section.

  4. Check the box beside Calculate Co-Occurrences.

  5. Click Start Corpus Analysis.

23900024.png

Note

PoolParty starts calculating the Co-Occurrences in your corpus documents now. This means the terms that co-occur often in close proximity to your thesaurus's concepts found in the documents will be listed, the shadow concepts. Thus also these shadow concepts for documents, which do not contain thesaurus concepts as such, will be included in the display.

Details on how to use them, find here: How to Check on Shadow Concepts in Corpus Documents

How to Check on Shadow Concepts in Corpus Documents

This section contains a short guide on how to use Shadow Concepts for your thesaurus.

From the previous step you will see extraction results inside the Corpus Documents tab once the corpus has been analysed. The extractor's corpus analysis provides you with a selection of concepts, shadow concepts and terms. As Shadow Concepts & Terms in PoolParty are an analysing tool rather than a function as such a few prerequisites are needed for the analysis to work. A few steps are necessary to use it to advantage.

In the following steps you will find the options available in the corresponding dialogue. They will help you to do two things:

  • Check on the Shadow Concepts & Terms found in that document. Since they will be displayed in the term cloud beneath the text itself this will let you check on the overall relevance of that text.

    • More shadow concepts found in it increase its relevancy score.

  • Additionally Shadow Concepts & Terms can give you an idea if the concepts of your thesaurus are well-fitted to the corpus and its documents: if a large number of concrete shadow concepts are found, this might mean it could be prudent for you to consider selecting shadow concepts as additional concepts for your thesaurus.

Steps to Use Shadow Concepts in Your Thesaurus
  1. After you have accessed the Corpus Management, activate the corpus's node from the tree on the left.

  2. In the Corpus Documents tab open a document by double-clicking it in the list.

  3. The document will open in a new tab, displaying the information you need for using Shadow Concepts to advantage. Details about available options find below.

23900016.png
Available Options

In the area below the Document tab (3) you have these options:

  • Use the Export Document link to export this one document only.

  • Highlight check boxes (4): they control the highlighting of the respective element in this tab. Concepts, terms and / or shadow concepts will be highlighted in the text field itself as well as in the term cloud (6).

  • Inside the Text field (5), use your mouse cursor to highlight terms or phrases you want to add to the Candidate Concepts list.

  • The Term Cloud (6) displays all terms as well as concepts and shadow concepts found in that document's text.

23900017.png
Available Information

In the area below the Document tab (3) you find this information:

  • Title of the document.

  • Concept Schemes the concepts found in the document are part of.

  • Corpus Quality by a red, yellow or green Status icon indicates the respective status 'poor', 'moderate' or 'good'.

  • The Text field (5) contains the actual text of the document. Depending on the terms and concepts found, these will be highlighted here too.

Note

Use the Test Extraction Dialogue Box to check on individual scores of shadow concepts in more detail.Test Extraction Dialogue Box for Checking Extractor Calculation Scores