An Introduction to Inference Tagging

Inference Tagging is a functionality extension of the proven PoolParty Extractor to generate additional metadata about documents or data in general. It recognizes and extracts not only explicitly mentioned concepts, but also uses relevant concepts for annotation and classification that are only implicitly or indirectly mentioned.

While PoolParty Extractor basically extracts all concepts from a given input text or file that explicitly appear in a document, and at the same time are represented in the knowledge model forming the basis of the extraction service (taxonomy and ontology), inference tagging is useful when we want to go beyond extracting explicit concepts. Inference Tagging also extracts implicit concepts from documents while using specific rules. All we need is a well-structured domain knowledge model.

All existing concepts are organized and stored using taxonomies and ontologies. Documents can be easily annotated using concepts through an extraction call. At this point it is the ontology with all the classes, subclasses, and relations which indeed makes the difference, since we can use classes and relations to infer additional concepts (new triples).

Let us take a closer look at how inference tagging works and the results it delivers. The standard variant of the PoolParty Extractor used to identify concepts in the input text saying “Climate change is a driver for heatwaves and drought”, would - provided a corresponding knowledge model is in place - extract the following concepts: climate change, heatwave, drought. Provided that the knowledge model contains relations to risks associated with heatwaves (e.g. heatstroke) and drought (e.g. crop failure) then those can also be extracted and used for both annotation and classification.

Rules determining which additional concepts are to be extracted using the Inference Tagging feature can be created, tested and edited in the PoolParty Workbench. Inference Tagging uses two requests - an extract and expand request. These requests can be easily created and tested using the PoolParty Workbench. An expansion call is a SPARQL request. For those less familiar with SPARQL the PoolParty Workbench offers an Expansion Query Builder where such queries can be easily formulated.

The PoolParty Workbench is a useful tool allowing subject matter experts to define specific rules for both calls (i.e. extraction and expansion); these calls are stored together in a configuration file and can be tested at any point using the Workbench.

The PoolParty Workbench offers - reflecting your license - the following functionalities:

inference tagging
recommendation service

The image above shows the landing page of the PoolParty Workbench with the Inference Tagging feature enabled.

The grayed out button (1) opens the corresponding Settings tab where you select the project and is mandatory to be able use the Inference Tagging function. Then you have to enter the input that should be tagged - here you have three options available: you can enter any input text (2), you can upload content from a URL (3), or you can upload a file (4). Clicking on the arrow symbol starts the tagging (5), whereas clicking the Configuration button (6) allows you to review all relevant settings and to modify or update them when required.

Let us assume that we want to make sure that we have selected the right project. To do so we will click the Configuration (6) button on the Workbench landing page to open the Settings (1) window with several tabs allowing us to fine tune the parameters.

Here we see three tabs - PROJECT (2a), EXTRACTION (2b) and EXPANSION (2c); the respective selected and active tab is indicated by an underlining blue line.

You can save your configuration at any time using the SAVE CONFIGURATION (3) button, or load a saved one using the LOAD CONFIGURATION (4) button.

This tab only has one input pulldown field Select project (5) allowing you to select the desired project. If all other settings for extraction and expansion are already as desired simply click the Apply (6) button, otherwise proceed on the EXTRACTION respectively EXPANSION tabs.

Note

Keep in mind that clicking the Cancel (6) button will discard all settings made so far.

After selecting the desired project we need to continue on the EXTRACTION (2b) tab.

On this tab (1) we will fine tune the extraction from our input text.

There are the following settings:

Extraction language (2) - this is the language used for extraction and the language used in your selected project - our example shows English - en.
Minimum concept score (3) - the higher the score the fewer results you will obtain - our example shows 30 as the minimum score.
Maximum number of concepts (4) - here you can define the maximum number of concepts to be extracted.
If you have a concept scheme defined in PoolParty for the selected project - you can use the Concept scheme filter (5) by selecting one of the available schemata using this pulldown.
The two checkboxes below - Filter nested concepts (6) and Use disambiguation (7) are optional but ticking them will narrow down the number of retrieved results.
If you have a corpus defined for the selected project in PoolParty you can specify it here using the corresponding Use corpus (8) pulldown field.

After having fine tuned the extraction parameters we can move on to the EXPANSION tab. If you only want to run an extraction on your input then click on the Apply (10) button at this point.

Note

Use the SAVE CONFIGURATION (9) button to save your current configuration, respectively the LOAD CONFIGURATION (9) button to load an existing one.

Note

Keep in mind that clicking the Cancel (10) button will discard all settings made so far.

Our next step will be to define the expansion query which is a SPARQL query. For users less familiar with the SPARQL query language, the PoolParty feature Expansion Query Builder facilitates formulation of such queries; experts can of course use the command line input available on the Switch to SPARQL sub-tab where they can formulate sophisticated queries going beyond the options available on the UI.

We will now guide you through the Expansion Query Builder feature facilitating the formulation of the required SPARQL query illustrated by a sample query.

In our example we will create an inference tagging scenario and then use the Workbench to configure the required settings, in particular the Expansion Query Builder, to be able to run the inference tagging feature and obtain inferred concepts.

The following assumptions are underlying our scenario:

A knowledge manager wants to find out how certain regulations within the ESG (Environment, Social and Governance) domain address concepts tagged in their documents. Such regulations may or may not be explicitly mentioned in the content, however, with the combined power of Inference Tagging and the existing knowledge model, these documents can be annotated with these hidden concepts.
This knowledge manager uses both PoolParty Taxonomy and Ontology and has the PoolParty Workbench installed providing the Expansion Query Builder feature.
Our scenario uses the following input text: "As part of our environmental management program, we are committed to ensuring that our employees receive training to raise awareness of environmental issues and our processes to reduce environmental impact. In 2021, site workers received 1,500 hours of environmental training, focusing on job-specific environmental awareness, hazardous material management, spill response, and reporting."
In our example we will use the ESG Core Knowledge Model (taxonomy & ontology) which will return the following concept Environmental Management Program. This concept has a few connections (i.e. relations) within our knowledge model, however, we are interested in the "is addressed by" relation, pointing to concepts with the class Regulation. Here, the concept Environmental Management Program points to two separate resources, namely ISO 14000 and United Nations Global Compact. These are not explicitly mentioned in the text, however, we want to annotate the content with those concepts. When we run Extraction without the Inference Tagging functionality on top, we get the following tags for the above input text: Educations, ESG Management, Employee, and Employment. However, what we want is the following: Education, ESG Management, Employee, Employment + ISO 14000, United Nations Global Compact. In order to annotate the text with these additional, implicit concepts we need to set up an Inference Tagging rule stating "If a certain concept is found within the text, then also include concepts connected via the "is addressed by" relation in the Extraction results".

Now, as we have our Inference Tagging scenario, we can make use of the Expansion Query Builder to assist us in setting up the tagging rule. To do so, we click on the PoolParty Workbench landing page on Configuration. There click on Expansion and select the Expansion Query Builder sub-tab. The image below shows the Expansion Query Builder sub-tab with all fields filled in.

settings-expansion-query-builder-top_-w_o__recomm_tab.gif

After opening the Settings (1) window we navigate to the EXPANSION (2) tab where we have two sub-tabs; Expansion Query Builder (3) and Switch to SPARQL. We will now focus on how to use the Expansion Query Builder sub-tab to formulate a SPARQL query.

Tip

The Save query and Load query (9a) buttons on the right allow you to save your query or if you already have queries stored to load one.

The first step will be to specify the language using the Locale (4) pulldown. You have to keep in mind that only those concepts will be returned for which the specified language matches your selection here. This parameter is mandatory.
Then we can specify the Prefixes (5) which can be a standard prefix using the Choose Standard Prefix (5a) pulldown. The standard prefixes include skos, rdf, dcterms, etc. We can specify one or more standard prefixes. In our example we are using skos as the selected standard prefix. We can however also add our own prefix (5b) by entering the Name and the corresponding URI and clicking the ADD (5c) button. In our example we entered esg in the Name field and https://esg.poolparty.biz/esg-ontology/ in the URI field. Specifying prefixes upfront will allow us to skip the full path (URI) every time we use a prefix when defining the rules of our inference tagging. The saved custom prefix is always shown above the two input fields Name and URI; also here we can define more than one custom prefix.
Next step is to specify the rules. Sometimes we will need multiple rules, in such case we can use the Add a new rule button (7) on the very right.
In the Rules (6) section of this tab we can specify all the required details we need for each rule. Every rule is comprised of a relation and score. A relation is then comprised of the URI (mandatory) field (6a) and Type field (6b). In our example the Relation #1 contains the relation esg:is-addressed-by. (Since we have defined our custom prefix esg in the previous step we only need to enter its name (egs) and can skip entering the complete URL.) The Type (6a) field is in our case esg:Regulation. This field works as an additional filter; here we specify that we want to receive inferred concepts linked to the extracted concepts using the esg:is-addressed-by relation HOWEVER ONLY if such concepts belong to the esg:Regulation class. If in contrast we intended to retrieve all possible concepts irrespective of their class we would enter skos:Concept in the Type (6b) field. If we want to add another relation to a rule we will use the Add a new relation (8) button to the right.
We can specify one or more rules, and for each rule one or more relations. All specified rules will be applied and are combined via OR; Each rule is however independent. At the same time the relation specified within the rule defines a path. If more than one relation is specified for a rule, then they form a path within the graph applying one after the other.
Then we need to select the Score Type (6c) and enter the Score (6d). The Score Type field is a pulldown with two options: Multiplier and Fixed Score.
In our example we will used the Fixed Score option and enter the 0.8 as the score - this means that each inferred concept will have this score. If we however used the Multiplier option and entered 0.8 in the Score field then - depending on the score of the concept extracted from our input - the inferred concept would a have a variable score equal to 0.8 times the score of the original extracted concept; e.g. if our original concept has a score of 80, then using the Multiplier option will result in the inferred concept having a score of 0.8 times 80 = 64.

Tip

You can test your SPARQL query before applying it.

To test your SPARQL query scroll to the bottom of the window until you see the Test expansion query: (1) section. There you only need to enter a concept from the extraction in the Concepts (2) field and click on the TEST (3) button. The input field Concepts uses the auto-complete function, i.e. after having entered thirst three letters of a concept you will automatically see a list with one or more concepts matching the first leading letters. Simply select one of those shown here. You can specify more than one concept for testing your query.
Click the Apply (4) button to add your query or the Cancel (4) button if you want to discard the formulated query and any other settings you may have made on the other two tabs - PROJECT and EXTRACTION.

Tip

You can start your extraction and expansion also from the Settings window with a click on the Apply (4) button.

After running both an extraction and expansion query we will see the following concepts (both the extracted and expanded color-coded concepts) returned for the input text (2) and the selected project (1).

Below the Input Document (2) heading you see the input text with identified concepts. To the right you see all Extracted concepts (3) and all Expanded concepts (5). Next to each concept you can see its score.

If you want to save the configuration or individual calls, click on the three dots (1) in the upper right section of the screen to open a context menu allowing you to import and export (2) configurations. You can export the complete configuration or individual calls as cURL.

These exported cURL files can be used later for API calls. Please refer to the PoolParty Workbench guide for more details on how to export configuration and individual calls. These requests may look like this:

Exporting as cURL opens the Export cURL requests (1) window where you see the Extract request (2), and a page symbol next to it (3) which will copy the complete request to clipboard; below is the second request - the Expand request (4) also with the page symbol next to it (5) allowing you to copy the complete request to clipboard.

The EXTRACTION tab offers also a possibility to formulate advanced queries directly on the command line. This is possible on the Switch to SPARQL (2) sub-tab which looks as follows:

In the top right area you have two buttons Save query (2) and Load query (2). The large section below called Expansion query (3) is where you enter the SPARQL query with all rules and relations you want to specify. Underneath you can set the Maximum number of expanded concepts (4) and test your query by entering concepts in the Concepts (5) field and clicking the TEST (6) button.

To test your SPARQL query use the Test expansion query section. There you only need to enter a concept from the extraction in the Concepts (5) field and click on the TEST (6) button. The input field Concepts: uses the auto-complete function, i.e. after having entered thirst three letters of a concept you will automatically see a list with one or more concepts matching the first leading letters. Simply select one of those shown here. You can specify more than one concept for testing your query.

Click the Apply (7) button to add your query to the configuration or the Cancel (7) button if you want to discard the formulated query and any other settings you may have made on the other two tabs - PROJECT and EXTRACTION.

The example we will use is based on the following scenario:

A Knowledge Manager wants to identify opportunities that can be inferred from concepts found in their documents. Such opportunities may or may not be explicitly stated in the content, but attributable to the combined power of Inference Tagging and knowledge model, all documents will be annotated using such inferred concepts.
The input text used in our example is: As part of our environmental management program, we are committed to ensuring that our employees receive training to raise awareness of environmental issues and our processes to reduce environmental impact. In 2021, site workers received 1,500 hours of environmental training, focusing on job-specific environmental awareness, hazardous material management, spill response, and reporting.

The intention of this query is to identify any opportunities that can be inferred from the concepts found in this input texts.

PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX esg: <https://esg.poolparty.biz/esg-ontology/>
SELECT ?uri ?label ?finalScore
WHERE {
    ?uri skos:prefLabel ?label .
    FILTER ( lang(?label) = 'en' ) {
        SELECT ?uri (MAX(?expansionScore) as ?finalScore)
        WHERE {
            VALUES (?x ?inScore) { <inputConcepts> } {
               # This part expands and scores the related concept for expressing the recommendation path
               {
                ?x skos:related ?uri .
                ?uri esg:positive-impact|esg:negative-impact ?z .
                #?z a esg:Risk .
                BIND(50 AS ?distScore).
               }
               # This part expands and scores the risks speeded up or slowed down by the ontology relations
            UNION {
                ?x skos:related ?y .
                ?y esg:positive-impact|esg:negative-impact ?uri .
                #?uri a esg:Risk .
                BIND(100 AS ?distScore).
               }
              }
            BIND(IF(BOUND(?distScore), ?distScore, ?inScore) AS ?expansionScore) .
            } GROUP BY ?uri order by desc(?finalScore)
        }} limit <numberOfConcepts>

After running both an extraction and expansion query we will see the following concepts (both the extracted and expanded color-coded concepts) returned for the input text and the selected project (1).

Below the Input Document (2) heading you see the input text with identified concepts. To the right you see all Extracted concepts (3) and all Expanded concepts (4). Next to each concept is its score.

If you want to save the configuration or individual calls, click on the three dots (6 above and 1 below) in the upper right section of the screen to open a context menu allowing you to import and export configurations. You can export the complete configuration or individual calls as cURL.

In this section: