Skip to main content

Extraction Model and Indexing

Abstract

Extraction Model and Indexing

The Extractor uses an indexed data structure of the thesaurus to be able to perform fast matching across all data. This data structure is called the extraction model and needs to be refreshed when information stored in the thesaurus has been changed and you want to use the latest data for extraction. Updating the index is a process requiring some time, and is to be manually triggered by the user after they have made any changes to the thesaurus.

As mentioned before we will provide a sample PoolParty project and walk you through a few very simple and a few rather more complex calls. To get your copy of this project, please click here to download the sample project file which you will then use to create your project. Follow the instructions on how to create a PoolParty project using the Create Project from PoolParty Archive function.

You can use the PoolParty sample project to execute some Extractor calls in your browser's address bar. Such a simple Extractor call querying the projects would be {{url}}/extractor/api/projects/ where {{url}} stands for the server running your PoolParty installation. You can also use tools like Curl or Postman for executing the calls to the API. Keep in mind that you need to authenticate yourself using either OAuth 2.0 or Basic Authentication to be able to access the API endpoints.

The concepts required for the extraction model are structured within concept schemes with top concepts along with regular concepts which are assigned to each respective top concept. However keep in mind that the Extractor needs top concepts to be able to perform categorization, whereas basic information on each concept is contained in the extraction model. Furthermore, a thesaurus may contain multiple concept schemes, meaning that you will need filters to narrow down the number of results. A more in-depth description would however be beyond the scope of this quick start guide.