Harvest Linked Data to Generate a Seed Thesaurus

The PoolParty Linked Data harvesting feature offers you the possibility to create seed thesauri as a starting project for your thesaurus project.

Seed thesauri will be based on DBpedia, the semantic web version of the Wikipedia. PoolParty generates a first seed model for your thesaurus based on the category system and the respective resources available in this vast pool of information.

Prerequisites for Using Linked Data Harvesting in Your Project

You created a new project and concept schemes for the domains you want to harvest.

How to Execute Linked Data Harvesting

  • Select the Linked Data Harvesting menu entry from the context menu of a concept scheme. This opens the Generate Seed Thesaurus dialogue (1).

  • Type a search string into search field (2), the list of available categories will then be displayed by autocomplete (3). You will get a paginated list of categories matching your search string.

    • You can lookup the categories clicking the Link icon next to the the category and select categories by double clicking them.

Once you have selected the categories you want to start from, in the Additional Information & URI Generation section (4) you can select the following parameters to generate a seed model:

  • Depth

    • You can choose to what level of categories below the chosen categories you want to harvest data.

  • Add Relations, Add Definitions, Add Alternative Labels

    • You can choose to include alternative labels, definitions and relations based on the information in DBpedia

After selecting categories and defining the parameters for the harvesting process click Generate to start the process.

23900087.png

Note

The generation of a seed model may take a while depending on the number of categories and the parameters you have defined.

Tip

You can set up your own DBpedia Cache to improve performance of the seed model generation. If you need more information or support, just contact support@poolparty.biz.

PoolParty Academy Tutorial

(Duration: 16m10s)