Release Notes - PoolParty 5.0

Here you can find the release notes for PoolParty release 5.0. In all release notes you can find major developments, improvements and changes made in the respective release. The release notes are divided into three chapters:

Highlights

A new license manager has been integrated in PoolParty licensing is now done per server and not per component. All customers with active maintenance contracts will receive new license keys.

PoolParty Thesaurus Server (PPT)

Custom Schemes II

PoolParty's custom schema / ontology management functionality has been redesigned and extended. We distinguish now between ontologies and custom schemes. Ontologies include all predefined schemes that can be enabled (e.g. FOAF, DCterms, schema.org, ORG, etc.). Those ontologies or parts of them can be reused in one or many custom schemes. Single classes or properties can be added to custom schemes directly from the Ontologies details view or via autocomplete when creating new classes or properties in the custom scheme. In addition the whole ontology can be added as a custom scheme via the respective button.

PoolParty ontologies

Custom schemes are user created schemes that can consist of classes, relations and attributes from all enabled ontologies or user created ones. All defined custom schemes can be made available in a project as additional tabs to extend the information that can be defined for a concept or concept scheme. Labels of user created classes and properties can be edited by. Additional all custom schemes are published on the PoolParty server and can be made available in other applications.

PoolParty Custom Schemes

The expressivity of PoolParty's custom schema functionality has been extended by the ability to define multiple domain and range restrictions for relations and attributes. Furthermore a subset of schema.org has been added to the available ontologies, which can now be used as a basis to develop custom schemes.

Extend corpus crawling mechanism

The upload functionality in Corpus Management has been extended. You now can crawl websites down to 4 levels, provide the URL of an RSS feed to retrieve data from or grab the data from all DBpedia/Wikipedia pages linked to the concepts in your thesaurus.

Extended corpus crawling mechanism

 

Corpus API

The Corpus API allows you to manage corpora of your projects programmatically. For example, the services allow you to create new corpora and to upload documents to corpora, provided as plain text or as file. To fully support a feedback loop the corpus analysis can be triggered via API as well. As before, the manual curation review allows you to insert all relevant term suggestions in your thesaurus.

This means, with the Corpus API you can constantly update your corpus content with your relevant content from outside (e.g. a CMS) to reflect all incoming new terms in your thesaurus project.

PoolParty Extractor (PPX)

Thesaurus based disambiguation

One frequently observed phenomenon in controlled vocabularies like thesauri are ambiguous terms, i.e. different concepts share the same label. This leads to wrong annotations in the text extraction process. With this release of PoolParty, users now benefit from a new method to distinguish such occurrences based on the thesaurus structure and the local surrounding of the ambiguous concepts. The following example explains the applied method.

In an example thesaurus there are two concepts, "Data mart" and "Data mining", whereas both have the same alternative label "DM":

Disambiguation example

The method takes into account other all the other concepts that are found in the surrounding of the ambiguous label in a given text and evaluates how close they are in the thesaurus. For example "Data mart" has a related concept "OLAP cube", whereas "Data mining" is related to the concept "SEMMA" in the thesaurus.

Relations used for disambiguation

If one those concepts are found near the term "DM" in the text then the system is able to decide how it should be annotated, i.e. if it should return "Data mining" or "Data mart". This way, the annotation quality of PoolParty's text mining feature is greatly improved.

Disambiguation is enabled and configured via the respective dialogue.

Disambiguation Settings dialog

Extract content from ZIP file

A new service allows you to extract concepts and terms from content that is provided as ZIP container. 

The user of the zip extraction service can decide whether the documents within the zip container are processed individually or as a whole. The individual approach can be useful when you want to process a number of documents in one single call e.g. due to performance considerations. In an other situation it could be relevant that multiple documents can be tagged accumulatively. This could be useful e.g. for the tagging of a document set where individual results is not relevant.

Improvements

PPT

Improved Suggest Concepts Workflow

The Suggest Concepts service has been extended allowing to add suggestions for broader and related concepts. In addition suggestions can detailed adding definitions a note and a score. In addtion the Suggested Concepts list has been replaced by the Suggest Concepts dashboard providing better usability integrating suggestions into the project.

Suggested Concepts dashboard

Deprecated Concepts List

All deleted/deprecated concepts are displayed in the Deprecated Concepts dashboard. You can add them again as suggested concepts and review the history of deleted concepts.

Deprecated Concepts dashboard


Advanced Filter and Reporting for History

An advanced filter mechanism has been added to the PoolParty history tab. This allows to filter history entries by date, language, author and/or by a string. Additionally an export option has been added allowing to export history entries to all default RDF serializations and Excel.

Enhanced filter and export for history

Minor Improvements:

  • Stronger Password security can be configured per server.
  • Projects sorting is now done case insensitive in addition it is no more required to add a subject when creating a project.
  • A search feature has been added to the Linked Projects dialogue.
  • A "Create another" option has been added to the Create Concept dialogue.
  • Error handling of import issues that are result of incompatible data in the import files has been improved.
  • Snapshots are now compressed and deleted when the project is deleted.
  • Extraction models are deleted when the corresponding project is deleted.
  • Autocomplete language can be switched directly in the search field.
  • Linked Data Harvesting has been improved in performance and stability.
    • The feature is now only available from a custom scheme level.
  • A search feature has been added to the Admin Dashboard.
  • In Corpus Management a message is displayed when a user tries to upload very large files.
  • API calls for adding and deleting Concept Schemes have been added.

PPX

Improved Concept Matching

Concept matching has been reworked and improved to resolve issues about missing matches. Now concept labels containing special characters like "/" or "(" will be matched during text extraction.

Corpus based Free Terms Extraction in PPX API

In the previous release free term extraction was improved in the corpus analysis in the PoolParty Thesaurus Manager (see Analyse Documents in Your Document Corpus) where frequency and distribution of terms over the entire corpus are used to filter terms of low significance terms in a document. This method is now also available in the PPX API. The user can select a corpus from the Thesaurus Manager that is then used to evaluate the extracted free terms in a document and adjust the scores according to the corpus statistics. This will result in more meaningful free terms for extracted documents.

Fixes

  • Highlighing of concepts in the document view of Corpus Management is now done for all occurrences and variations of a concept correctly.
  • Custom schemes with https URIs can now be imported via the retrieve from URL method.
  • Data copied from DBpedia via the linked data feature to skos properties (e.g. dbpedia:abstract -> skos:definition) is displayed again in the SKOS tab.
  • Linguistic service (e.g. suggest service) is now also working correctly for projects with only one language.
  • Vulnerabilities in yui 2.9 have been removed.