Skip to main content

Categorization

Abstract

Categorization

The Extractor also performs the task of categorisation.

Categories are defined as the top concepts in a project in PoolParty. Top concepts are at the level just below the concept schemes. In the screenshot below the top concepts are 'Alcoholic beverage', 'Beverages', 'Distilled beverage', 'Non-alcoholic beverage' and 'Wine'.

For categorisation first the concepts are detected in the text and then they are mapped to the top concepts they are connected to (multiple mappings are possible in case of poly-hierarchies). Afterwards, the scores are calculated according to the number of concepts and annotations per category.

24576877.png

The following text is annotated:

Grapes used in the production of both Champagne and prosecco are set by their region’s respective governing bodies to ensure the quality and authenticity of the region’s wines. There are three main grapes allowed in the production of Champagne: chardonnay, pinot noir and pinot meunier. Prosecco is produced primarily from the prosecco or glera grape, which is native to the Veneto region of Italy.

To categorise, use the parameter 'categorize':

Request

{{url}}/extractor/api/extract?text=Grapes used in the production of both Champagne and prosecco are set by their region’s respective governing bodies to ensure the quality and authenticity of the region’s wines. There are three main grapes allowed in the production of Champagne: chardonnay, pinot noir and pinot meunier. Prosecco is produced primarily from the prosecco or glera grape, which is native to the Veneto region of Italy.&projectId={{project}}&language=en&numberOfTerms=0&categorize=true&numberOfConcepts=0

The result shows two categories, 'Alcoholic beverage' and 'Wine'.

This is because of poly-hierarchies, where the concept 'Wine' is both a top concept and a narrower of 'Alcoholic beverage'. That is also why 'Wine' is part of the extracted concepts for the first category but not the second one.

Click to expand the result:

{
    "categories": [
        {
            "prefLabel": "Alcoholic beverage",
            "uri": "https://nextrelease-cons.semantic-web.at/cocktails/f3000285-36b0-4ffe-af90-740c2dd8fff5",
            "score": 2,
            "categoryConceptResults": [
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/d5286375-41b3-468b-9602-8fcf698e3e83",
                    "prefLabel": "Wine",
                    "score": 40
                },
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/cdbac85a-d5b7-40f3-9e98-4d7e176b5565",
                    "prefLabel": "Champagne",
                    "score": 97
                },
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/a293047d-788b-4e8c-96bc-65e62fec4b0b",
                    "prefLabel": "Prosecco",
                    "score": 100
                },
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/e8629f00-4dbd-4774-9003-6278f1162b1d",
                    "prefLabel": "Chardonnay",
                    "score": 25
                }
            ]
        },
        {
            "prefLabel": "Wine",
            "uri": "https://nextrelease-cons.semantic-web.at/cocktails/d5286375-41b3-468b-9602-8fcf698e3e83",
            "score": 1.5,
            "categoryConceptResults": [
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/cdbac85a-d5b7-40f3-9e98-4d7e176b5565",
                    "prefLabel": "Champagne",
                    "score": 97
                },
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/a293047d-788b-4e8c-96bc-65e62fec4b0b",
                    "prefLabel": "Prosecco",
                    "score": 100
                },
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/e8629f00-4dbd-4774-9003-6278f1162b1d",
                    "prefLabel": "Chardonnay",
                    "score": 25
                }
            ]
        }
    ]
}

Another example text:

Shake the gin, orange juice and lime juice in a cocktail shaker with ice. Strain the contents of the shaker into and ice-filled highball or Collins glass. Slowly add the ginger ale, and gently stir. Garnish with the orange slice, and serve.

Request

{{url}}/extractor/api/extract?text=Shake the gin, orange juice and lime juice in a cocktail shaker with ice. Strain the contents of the shaker into and ice-filled highball or Collins glass. Slowly add the ginger ale, and gently stir. Garnish with the orange slice, and serve.&projectId={{project}}&language=en&numberOfTerms=0&categorize=true&numberOfConcepts=0

The result shows that the text contains mostly non-alcoholic beverages, together with some other categories:

Click to expand the result:

{
    "categories": [
        {
            "prefLabel": "Non-alcoholic beverage",
            "uri": "https://nextrelease-cons.semantic-web.at/cocktails/e86c1671-4a67-494b-ae5d-bcb750865acc",
            "score": 0.6,
            "categoryConceptResults": [
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/89ac12aa-d4f9-46d0-b127-fd8f4c21fbcc",
                    "prefLabel": "Ginger ale",
                    "score": 23
                },
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/2ce50273-13b2-4ba1-9ff2-8cfbd835c1f2",
                    "prefLabel": "Orange juice",
                    "score": 96
                },
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/88f5de3d-3de4-4a2f-9523-50bc7bb06600",
                    "prefLabel": "Lime juice",
                    "score": 85
                }
            ]
        },
        {
            "prefLabel": "Alcoholic beverage",
            "uri": "https://nextrelease-cons.semantic-web.at/cocktails/f3000285-36b0-4ffe-af90-740c2dd8fff5",
            "score": 0.2,
            "categoryConceptResults": [
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/5d96c50c-ee48-4c40-bb4c-4b9a42d11de6",
                    "prefLabel": "Gin",
                    "score": 100
                }
            ]
        },
        {
            "prefLabel": "Fruit",
            "uri": "https://nextrelease-cons.semantic-web.at/cocktails/4f20d6bb-710d-4870-bde4-b6e835d7d13f",
            "score": 0.2,
            "categoryConceptResults": [
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/99171f20-c59f-40c4-998a-a9c8f1909168",
                    "prefLabel": "Orange",
                    "score": 12
                }
            ]
        },
        {
            "prefLabel": "Distilled beverage",
            "uri": "https://nextrelease-cons.semantic-web.at/cocktails/403d1249-f37f-4f43-bebf-8dde9677d886",
            "score": 0.2,
            "categoryConceptResults": [
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/5d96c50c-ee48-4c40-bb4c-4b9a42d11de6",
                    "prefLabel": "Gin",
                    "score": 100
                }
            ]
        },
        {
            "prefLabel": "Tumbler",
            "uri": "https://nextrelease-cons.semantic-web.at/cocktails/2e839563-8037-4e59-8dbf-27736605d687",
            "score": 0.2,
            "categoryConceptResults": [
                {
                    "uri": "https://nextrelease-cons.semantic-web.at/cocktails/20d288cb-4e38-46f9-9bec-1c79d8397215",
                    "prefLabel": "Collins glass",
                    "score": 33
                }
            ]
        }
    ]
}