Skip to main content

Named Entity Recognition

Abstract

Named Entity Recognition

Another functionality of the PoolParty Extractor is to extract named entities.

To extract named entities, use the nerParameters parameter, which is an array of methods and types used for named entity recognition.

Note

The array index starts at 0 and increments upwards. For example, you can usenerParameters[0].type=person&nerParameters[1].type=location to extract both person names and locations.

The parameters in the method invocation have to be URL-encoded, which affects the brackets of the array index.

Additionally, you can combine the nerParameters parameter with the showMatchingPosition parameter to get the position of the matched text. For more information on this parameter, see How to Use the Extraction Service.

Allowed methods to extract named entities are maximum entropy (method=MAXIMUM_ENTROPY) and rule-based method (method=RULE_BASED). Default is the maximum entropy method.

With the rule-based method, the PoolParty Extractor can extract person names (type=person) and locations (type=location).

With the maximum entropy method (method=MAXIMUM_ENTROPY), PoolParty Extractor can extract:

  • person names (type=person), locations (type=location), organisations (type=organization) using the corresponding pre-trained Apache OpenNLP models

  • custom named entities using your own Apache OpenNLP models

The following call uses the maximum entropy method to extract person names, locations and organisations:

Request

{{url}}/extractor/api/extract?text=Chris Stemman, the executive director of the British Coffee Association, says most of those techniques from decaffeination’s earliest days are still being used today. But the process isn’t as straightforward as you’d expect. “It isn’t done by the coffee companies themselves,” says Stemann. “There are specialist decaffeination companies that carry it out.” Many of these companies are based in Europe, Canada, the US and South America.&projectId={{project}}&language=en&numberOfTerms=0&nerParameters%5B0%5D.method=MAXIMUM_ENTROPY&nerParameters%5B0%5D.type=person&nerParameters%5B1%5D.method=MAXIMUM_ENTROPY&nerParameters%5B1%5D.type=organization&nerParameters%5B2%5D.method=MAXIMUM_ENTROPY&nerParameters%5B2%5D.type=location&numberOfConcepts=0

Result

{
    "namedEntities": [
        {
            "textValue": "Chris Stemman",
            "type": "person",
            "frequency": 1,
            "score": 100,
            "method": "MAXIMUM_ENTROPY",
            "positions": [
                {
                    "beginningIndex": 0,
                    "endIndex": 12
                }
            ]
        },
        {
            "textValue": "British Coffee Association",
            "type": "organization",
            "frequency": 1,
            "score": 85,
            "method": "MAXIMUM_ENTROPY",
            "positions": [
                {
                    "beginningIndex": 45,
                    "endIndex": 70
                }
            ]
        },
        {
            "textValue": "Europe",
            "type": "location",
            "frequency": 1,
            "score": 12,
            "method": "MAXIMUM_ENTROPY",
            "positions": [
                {
                    "beginningIndex": 395,
                    "endIndex": 400
                }
            ]
        },
        {
            "textValue": "Canada",
            "type": "location",
            "frequency": 1,
            "score": 11,
            "method": "MAXIMUM_ENTROPY",
            "positions": [
                {
                    "beginningIndex": 403,
                    "endIndex": 408
                }
            ]
        },
        {
            "textValue": "US",
            "type": "organization",
            "frequency": 1,
            "score": 9,
            "method": "MAXIMUM_ENTROPY",
            "positions": [
                {
                    "beginningIndex": 415,
                    "endIndex": 416
                }
            ]
        },
        {
            "textValue": "South America",
            "type": "location",
            "frequency": 1,
            "score": 8,
            "method": "MAXIMUM_ENTROPY",
            "positions": [
                {
                    "beginningIndex": 422,
                    "endIndex": 434
                }
            ]
        }
    ]
}