Release Notes - PoolParty 4.0

The release notes for PoolParty 4.0.0 cover major developments, improvements and changes made in the respective release. The release notes are divided into three chapters:

  • Highlights
    Newly developed functions or major improvements of existing functions.
  • Improvements
    Minor improvements of existing functions.
  • Fixes
    Fixes for problems found in previous releases of PoolParty.

Besides the functional changes which are listed below, PoolParty 4.0.0 also brings a major change in modalities how PoolParty can be delivered: from a component driven approach, we have switched to a product bundle approach. This provides higher flexibility in tailoring the PoolParty solution to customer needs. The following bundles are available:

  • PoolParty Basic Server (Thesaurus Server Basic Edition = PPT Basic)
  • PoolParty Advanced Server (Thesaurus Server Advanced Edition = PPT Advanced)
  • PoolParty PowerTagging  (Bundle of Basic Server + Extractor Basic = PPP)
  • PoolParty Enterprise Server (Bundle of full versions of Thesaurus Server and Extraction Server = PP Enterprise)
  • PoolParty Semantic Integrator (Bundle of PP Enterprise + PoolParty Search Server + Virtuoso Connection = PP Integrator)

To find out which functionalities are offered by which PoolParty version, please take a look at the PoolParty Product Matrix (not all of the new features listed below are available or fully available for all of the product bundles).

Highlights

PoolParty Corpus Management

With PoolParty Corpus Management, the PoolParty Extractor has been tightly integrated into the thesaurus management workflow. It replaces the previous document management functionality and provides many new features to improve the thesaurus modeling workflow.

Depending on the license, one or many document corpora can be created per project. Uploading of documents to a corpus can be done by selecting files from a folder on the local drive, by pasting text or by providing the URL of a website.

Corpus management - Upload documents

After having provided the documents, an automatic analysis of the documents is made. A user dialogue is provided to review the results: the interface shows how the automatic annotation of the documents has been made and what concepts of the thesaurus have been found in a document. Additional terms which are relevant for the document, but not available in the thesaurus yet, are suggested to be inserted into the controlled vocabulary. Each document in the list of uploaded documents can be reviewed and found concepts and extracted terms can be highlighted.

Review tagging for documents

To support several workflows, three lists are generated for each corpus:

  • Candidate Terms
    List of terms added manually from documents to add them to the thesaurus.
  • Extracted Concepts
    List of concepts found in the documents. 
  • Extracted Terms
    List of terms found in the documents which are not yet available in the thesaurus. 

The Extracted Terms list and the Candidate Terms list can be used to extend the thesaurus with alternative or hidden labels for existing concepts or also to add new narrower concepts below existing concepts. The Extracted Concepts list detects gaps between the document corpus and the thesaurus: it shows concepts from the thesaurus that were found in the document corpus, and it also lists concepts in the thesaurus not found in the document corpus. All lists provide different functionalities to integrate new concepts in the thesaurus or improve existing concepts by adding synonyms. Below you can see the extracted concepts list.

Extracted concepts list

The functionality is basically available for all project bundles. For the Basic, Advanced and PowerTagging bundle the Corpus Management is limited to one document corpus only, and for the Basic and PowerTagging bundle only the Candidate Terms list is available.

Watch a video to learn more about this feature

PoolParty Quality Management

With the integration of qSKOS, the quality management in PoolParty has been significantly improved. The eight most relevant quality checks from qSKOS have been added to replace and extend the quality queries available in older PoolParty releases. The checks can be set to:

  • enforce (not available for all checks)
    violations are prohibited when editing
  • report
    violations are written to the quality report 
  • ignore
    violations are ignored

Six predefined quality settings are provided. They offer various settings for those checks with regards to the envisioned application for which the thesaurus is developed for. It is also possible to create a custom defined quality schema. The quality setting is assigned on project creation, but it can be changed later via the Quality Settings dialogue which is available via the Advanced menu or also directly in the project's metadata tab.

Adding quality settings for a project

A quality report can be generated for the project via the respective tab in the projects details view or the entry in the Tools menu. The output is available for the whole project in the project's details view, and also per concept in the concept's details view. For example, the report below shows that one concept was found where the same label is defined as a preferred and an alternative label in the same language.

Quality report for project

Watch a video to learn more about this feature

Linked data harvesting

The full Skossy functionality has been implemented into the thesaurus server backend. This offers the highly convenient option to generate seed thesauri on-the-fly, instead of starting with an empty project.


Autopopulate a thesaurus from DBpedia

This function is included in the Enterprise Server and the Semantic Integrator bundle. For the other product bundles it can be obtained as an additional add-on.

Watch a video to learn more about this feature

Import of Excel Taxonomies

The simple CSV lists import has been replaced by a more elaborated Excel importer. This allows not only the import of simple lists, but also the further use of already existing term hierarchies which have been created with Excel. The data has to be provided in a specific structure. As before, labels in different languages are supported. The following rules apply to provide a valid import file:

  • First row is header row and must be labeled according to the following rules.
  • The first column must be labeled "scheme".
  • The second Column must be labeled "concept" without a language definition.
  • There can be multiple columns labeled "concept" to form a hierarchy.
  • After concept columns, there can be one or more of the following columns in any order: prefLabel@lang, altLabel[@lang], hiddenLabe[@lang], scopeNote[@lang], definition[@lang]
  • prefLabel has a mandatory language tag, as there can only be one prefLabel per language, and the prefLabel in the default language will be set from the concept column.
  • The other fields, altLabel, hiddenLabel, scopeNote and definition, have an optional language tag.
  • AltLabel, hiddenLabel can occur multiple times per language.
  • If no scheme should be added (Excel subtree import), the column is left empty.
  • Only one entry per row is allowed.

Find below a very simple import file providing a schema with some concepts. Import including concept schemes can be done via the default project import dialogue or via the context menu of the project node. Subtree import can be done via the context menu of a concept scheme or concept.

scheme concept concept concept altLabel definition definition
Scheme 1
TopConcept 1
Concept 1.1
Concept 1.1.1
Concept 1.2 Alt 1.2
TopConcept 2
Concept 2.1
Concept 2.1.1 Alt 2.1.1 First Definition 2.1.1 Second Definition 2.1.1

Simple Excel import file

Watch a video to learn more about this feature

Improvements

In the following section you will find a list of some of the most prominent improvements:

  • Create concepts from DBpedia
    Provides suggestions for new concepts in the autocomplete of the create concept dialog.

This function is included in the Enterprise Server and the Semantic Integrator bundle. For the other product bundles it can be obtained as an additional add-on.

  • Simplified configuration for PP components
    • Configuration for all components can now be found in the config folder:
      • /opt/poolparty/config (GNU/Linux)
      • C:/Users/poolparty/PoolParty/config/ (MS Windows)
    • Configuration for overall settings is done in the poolparty.properties file.
  • Simplified GUI
    • User Information has been added and can be found together with the Help menu and Log-in in the right part of the menu.
    • Search bar is more user-friendly now.
    • Icon bar has been added to switch from thesaurus view to corpus view.
  • Links to LOD resources (e.g. DBpedia) have been moved to the default graph
    As the links to LOD resources can be considered to be part of the thesaurus data (in contrast to data copied from the LOD resources like, for example, abstracts and thumbnails from DBpedia), they have been moved to the default graph of a project. Copied data still resides in the respective linked data graphs.
  • PoolParty 4 is compatible with Java 7
  • Edit metadata for collections
  • Browser compatibility has been extended to cover Chrome and Safari
  • Project & concept scheme nodes are now available in the relation browser
  • In LD Frontend, URIs ending with a trailing slash resolve to the same URI without trailing slash

Fixes

  • Non-resolving data dump URIs have been fixed.
  • Custom report (full details) provides all concepts for larger projects.
  • Alphabetic Sorting in Linked Frontend on concept level has been fixed.
  • BaseURL may include port number when generating a project.