Skip to main content

Harvest Linked Data to Generate a Seed Thesaurus

Abstract

Harvest Linked Data to Generate a Seed Thesaurus

The PoolParty Linked Data harvesting feature offers you the possibility to create seed thesauri as a starting project for your thesaurus project.

Seed thesauri must be based on DBpedia, the semantic web version of the Wikipedia. PoolParty generates a first seed model for your thesaurus based on the category system and the respective resources available in this vast pool of information.

. Prerequisites
  • You have created a new project and concept schemes for the domains you want to harvest.

  • You have activated the DBpedia Categories lookup for your project by selecting the LOD source of DBpedia Categories in the Linked Data Sources dialogue.

    To access this dialogue, select Advanced and then Linked Data Administration.

Procedure. Procedure
  1. Right click the concept scheme in your thesaurus project and select Linked Data Harvesting...

    Linked-data-harvesting-option.jpg

    The Generate Seed Thesaurus dialogue opens.

  2. Start typing the name of the DBpedia category that you want to use to seed your thesaurus into the Search Term field.

    The list of categories matching your query appears by autocomplete in the Available Categories section.

  3. Select the LOD Source.

  4. Specify the parameters for the harvesting process in the Additional Information section.

    • Depth

      Determines the level of subcategories below the selected categories that will be included in your thesaurus.

    • Add Alternative Labels, Add Definitions, Add Relations

      Select if you want your thesaurus to contain alternative labels, definitions and relations based on the information in DBpedia.

  5. Select the categories you want to use to seed your thesaurus by double clicking them in Available Categories.

    Tip

    To display the Wikipedia page of a category in a new tab, click Wikipedia.

  6. Click Generate to start the linked data harvesting process.

    Generate-Seed-Thesaurus-dialogue.jpg

    Note

    The generation of a seed model may take a while depending on the number of categories and the parameters you have defined.

Tip

You can set up your own DBpedia Cache to improve performance of the seed model generation. If you need more information or support, just contact support@poolparty.biz.

If you would like to learn more about this topic, please watch this PoolParty Academy Tutorial video:

2.6 Linked Data Management Basics

When the video is not available, you can sign up to the PoolParty Academy