Named Entity Recognition
Named Entity Recognition
Another functionality of the PoolParty Extractor is to extract named entities.
To extract named entities, use the nerParameters
parameter, which is an array of methods and types used for named entity recognition.
Note
The array index starts at 0 and increments upwards. For example, you can usenerParameters[0].type=person&nerParameters[1].type=location
to extract both person names and locations.
The parameters in the method invocation have to be URL-encoded, which affects the brackets of the array index.
Additionally, you can combine the nerParameters
parameter with the showMatchingPosition
parameter to get the position of the matched text. For more information on this parameter, see How to Use the Extraction Service.
Allowed methods to extract named entities are maximum entropy (method=MAXIMUM_ENTROPY
) and rule-based method (method=RULE_BASED
). Default is the maximum entropy method.
With the rule-based method, the PoolParty Extractor can extract person names (type=person
) and locations (type=location
).
With the maximum entropy method (method=MAXIMUM_ENTROPY
), PoolParty Extractor can extract:
person names (
type=person
), locations (type=location
), organisations (type=organization
) using the corresponding pre-trained Apache OpenNLP modelscustom named entities using your own Apache OpenNLP models
The following call uses the maximum entropy method to extract person names, locations and organisations:
Request
{{url}}/extractor/api/extract?text=Chris Stemman, the executive director of the British Coffee Association, says most of those techniques from decaffeination’s earliest days are still being used today. But the process isn’t as straightforward as you’d expect. “It isn’t done by the coffee companies themselves,” says Stemann. “There are specialist decaffeination companies that carry it out.” Many of these companies are based in Europe, Canada, the US and South America.&projectId={{project}}&language=en&numberOfTerms=0&nerParameters%5B0%5D.method=MAXIMUM_ENTROPY&nerParameters%5B0%5D.type=person&nerParameters%5B1%5D.method=MAXIMUM_ENTROPY&nerParameters%5B1%5D.type=organization&nerParameters%5B2%5D.method=MAXIMUM_ENTROPY&nerParameters%5B2%5D.type=location&numberOfConcepts=0
Result
{ "namedEntities": [ { "textValue": "Chris Stemman", "type": "person", "frequency": 1, "score": 100, "method": "MAXIMUM_ENTROPY", "positions": [ { "beginningIndex": 0, "endIndex": 12 } ] }, { "textValue": "British Coffee Association", "type": "organization", "frequency": 1, "score": 85, "method": "MAXIMUM_ENTROPY", "positions": [ { "beginningIndex": 45, "endIndex": 70 } ] }, { "textValue": "Europe", "type": "location", "frequency": 1, "score": 12, "method": "MAXIMUM_ENTROPY", "positions": [ { "beginningIndex": 395, "endIndex": 400 } ] }, { "textValue": "Canada", "type": "location", "frequency": 1, "score": 11, "method": "MAXIMUM_ENTROPY", "positions": [ { "beginningIndex": 403, "endIndex": 408 } ] }, { "textValue": "US", "type": "organization", "frequency": 1, "score": 9, "method": "MAXIMUM_ENTROPY", "positions": [ { "beginningIndex": 415, "endIndex": 416 } ] }, { "textValue": "South America", "type": "location", "frequency": 1, "score": 8, "method": "MAXIMUM_ENTROPY", "positions": [ { "beginningIndex": 422, "endIndex": 434 } ] } ] }