Skip to main content

Entity Extractor Overview

Abstract

Entity Extractor Overview

This section provides an introduction the the PoolParty Extractor. If you are new to the Extractor read this section first to familiarize yourself with the main features.

What the Extractor Does

The Extractor is basically an API that provides text mining functionalities in relation to a thesaurus in PoolParty. So the first thing to have is a thesaurus in a PoolParty project.

If you have not come that far, we recommend to start with the PoolParty - Quick Start Guide first.

Indexing - Extraction Model

One important detail to keep in mind with the Extractor is that it uses an indexed data structure of the thesaurus to be able to do fast matching over the whole data. This data structure is called the extraction model and needs to be refreshed when the data in the thesaurus changes and one wants to use the latest data for extraction. The reason why this is done manually is that the update of the index takes a bit of time and can therefore not be done automatically every time the user changes something in the thesaurus.

This section is an introduction and it walks you through the main methods of the API with some concrete examples.

For those examples we use a project in PoolParty that you can get from the data section and use to run them on your own.