Release Notes - PoolParty 5.5

Here you can find the release notes for PoolParty release 5.5. In the release notes you can find major developments, improvements and changes made in the respective release. The release notes are divided into three chapters:

  • Highlights
    Newly developed functions or major improvements of existing functions.
  • Improvements
    Minor improvements of existing functions.
  • Fixes
    Fixes for problems found in previous releases of PoolParty.

Highlights

PoolParty Semantic Suite

Updates of 3rd party libraries

Release 5.5 includes updates of 3rd party components used by the PoolParty server:

  • Update of Apache Solr to version 5.4.1
    • With version 5 Apache Solr is not distributed as a webapp in the PoolParty Tomcat anymore. It is a separate server packaged with the PoolParty installation.

Consolidated graph structure

Detailed List of Repository and Graph Structure of Data stored in PoolParty

With Release 5.5, the graph structure of PoolParty projects has been re-organised. All data of a PoolParty project resides now in a graph and the naming is based on the base-URL of the project. The following table provides an overview.

Categories
Description
Graph Name
Project Data  
VoidProject void graph<http://<baseUrl>/<projectId>/metadata/void
ADMSProject adms graph<http://<baseUrl>/<projectId>/metadata/adms
Thesaurus Data  
Thesaurus DataIncludes all thesaurus data e.g. concepts, concept schemes etc.<http://<baseUrl>/<projectId>/thesaurus
History DataIncludes all PoolParty history information of a project.<http://<baseUrl>/<projectId>/thesaurus/history
SPARQL List DataIncludes all SPARQL lists created<http://<baseUrl>/<projectId>/thesaurus/sparqllists
SKOS NotesIncludes all SKOS change, editorial and history notes data.<http://<baseUrl>/<projectId>/thesaurus/notes
Workflow DataIncludes all PoolParty workflow information of a project.<http://<baseUrl>/<projectId>/thesaurus/workflow
Quality Management DataIncludes all PoolParty quality management information of a project.<http://<baseUrl>/<projectId>/thesaurus/quality
Suggested Concepts DataIncludes all suggested concepts data of a project.<http://<baseUrl>/<projectId>/thesaurus/suggestions
Batch Linking DataIncludes all batch linking results data of a project.<http://<baseUrl>/<projectId>/thesaurus/linking
Deprecated ConceptsIncludes all resources that have been deleted.<http://<baseUrl>/<projectId>/thesaurus/deprecated
Corpus & Extraction Data  
Corpus Data  
Candidate Concepts DataCandidate Concepts Graph<http://<baseUrl>/<projectId>/corpus/candidates
Disambiguation DataDisambiguation Graph<http://<baseUrl>/<projectId>/corpus/disambiguation
Linked Data  
Linked DataData copied from linked data sources e.g. DBPedia<http://<baseUrl>/<projectId>/linkeddata/en/dbpedia

User become Resources

Pre 5.5 users in PoolParty have been handled as strings. With 5.5 they become resources and by that get an URI. The URI is specified as an parameter similar to the project and custom scheme and ontolyo URI in the PoolParty Configuration Files.

Make sure you choose the URI carefully. This URI will be used as the base URI when creating new users and it will be used converting existing users and their references in all projects when running the post update script.

PPT

Import Assistant

With PoolParty 5.5 we provide an import assistant, which allows to validate and repair issues both for RDF and Excel import. Import validation is selected per default for RDF and Excel import.

Import Dialogue

When an import is done a series of checks is made to validate that the data ...

  • ... conforms to the RDF and SKOS standard
  • ... meets the requirements expected by PoolParty Semantic Suite

For Excel import in addition, some checks are done to validate the consistency of the Excel import file and the conformance to the Excel format defined for PoolParty. When the import is triggered all checks are done and a quality report is presented showing the results. When violations are detected the respective data can be fixed by provided automatic and manual repair mechanism. Bulk repair options are available where appropriate.

Import Quality Report

When all issues are fixed the data can be imported to become available in the PoolParty Thesaurus Manager.

Context aware data modelling

PoolParty 5.5 allows to use the skos:inScheme property to model data. By that you can explicitly define in which context (concept scheme) your concept resides. Via several settings the behaviour of the usage of the skos:inScheme property can be defined. This allows to enable the usage of skos:inScheme per project and to specify in detail if the property should be handled manually or automatically.

SKOS inScheme setting

Once enabled, the skos:inScheme relations can be managed as all other SKOS relations in the Advanced Details View of a concept. In addition the skos:inScheme relation can be inherited on concept schemes or subtrees similarly to custom classes from custom schemes.

Show inScheme relations in concept details

This allows great freedom in using the skos:inScheme relation for modelling purposes in different scenarios. See below an example ('Brexit') where we can specify for example that

  • 'Ireland' in the context of 'British Isles' is inScheme, but not in the context of the European Union, since it's not a political entity in contrast to the 'Republic of Ireland'
  • 'England' will remain broader of 'London', no matter in which context, but it might happen that London will stay in the EU, but England will exit for sure

inScheme modeling example

Terminology import and approval

The Terminology Import Assistant supports the user when translating the taxonomy or allows to check a list of terms against the taxonomy. The term translations can be uploaded in the form of Excel files where each column represents a translation in a specific language. Upon upload the user can choose the reference language (from the columns of the Excel file), which is used to find matching labels in the taxonomy. The user can work through each term to verify if the translation is correct. If not, the user can decide which translations to update, add or change. 

Please ensure your Excel file only contains 1 sheet. By default, Excel creates 3 sheets in a workbook and it is required that you delete these two extra sheets and remain with only 1.

Without doing so the Terminology import will not work.


Terminology Import Assistent

Enhanced User Management

With PoolParty 5.5, users in PoolParty are moved "from strings to things". That means that each user has a URI (which can be freely defined when created) and becomes a resource. All information around the user (roles, groups etc.) is now related to this resource.

Define URI for user

All data created by the user is now linked to the user via his URI. This allows to do queries about user specific usage data via SPARQL in order to create custom reports from it.

Metadata per user

In addition, user settings for selecting the UI language, the used display languages, the autocomplete language and the default view for the SKOS details tab have been introduced.

User settings

PoolParty Extractor (PPX)

Knowledge-based filtering of suggest service results

Suggest service of the Extractor was improved to be able to filter based on Custom Scheme. Additional parameters make it possible to narrow down the result set of matching concepts by providing contextual information.

As an example, when you want to provide auto-complete functionality for a specific domain you could use Custom Classes when concepts belong to a specific type of class:

Custom Class applied to skos:Concept

Formulating an Extractor suggest request containing the search string "Paris" and including Custom Class https://schema.org/Organization as request parameter lets the system allow to focus on the hotel in Paris, since this concept has the matching type applied. By this you could separate the person 'Paris Hilton' from the organization 'Hilton Paris'.

Example request: suggest Organizations having 'Paris' in their label
{URL}/extractor/api/suggest?projectId={projectID}&language=en&searchString=Paris&customClasses=https://schema.org/Organization

Filtering can also be done by providing a list of Custom Classes, e.g. Artists and Scientists with "Newton" in their label.

Richer response data for expand service

Based on the expand service that was introduced in version 5.3 we have extended this functionality to provide a more complete set of this feature. Feature extension now supports a larger set of input parameters to exactly define the response needed to implement a thesaurus based search query expansion application.

/api/expand

For example, when you use the expand service for the term 'University' you can use linguistic expansion to retrieve word forms like 'universities' to enrich your search query. In addition, you could collect applied custom classes like e.g. 'Academic Institution' to precisely describe the type of search request. Providing additional type information, e.g. in a search interface could also be used to improve search experience for users.

PPX - API extensions

Suggest service supports Custom Scheme data

/api/suggest

Suggest service allows filtering, based on Custom Scheme data (Custom Classes and Custom Properties)

PP API - New services

Terminology check service

POST /thesaurus/{project}/terminologyCheck

Returns the JSON representation of concepts found in term list.

TfIdf moved from PPX to PPT

GET /corpusmanagement/{project}/createTfidfCorpus
GET /corpusmanagement/{project}/createTfidfIndex

Retrieve more complete results from corpus analysis

GET /corpusmanagement/{project}/results/blacklist/concepts
GET /corpusmanagement/{project}/results/blacklist/terms
GET /corpusmanagement/{project}/results/cooccurrence/concept
GET /corpusmanagement/{project}/results/cooccurrence/term

Get blacklisted concepts and terms. Get concept and term coocurrence results.

Retrieve all path of a concept to the concept scheme level

GET /thesaurus/{project}/getPaths

Workflow Draft concepts can be retrieved

GET /workflow/{project}/draftConcepts

Get draft concepts, assigned to requesting user

Suggest multiple concepts at once

POST /thesaurus/{project}/suggestConcepts

Service supports suggestion of multiple concepts in one request

PoolParty Semantic Integrator

Elasticsearch Integration

The integration of PoolParty with Elasticsearch allows users / developers / integration architects to choose to use either Solr or Elasticsearch for all functionalities within PoolParty that require a search index. The following features can be used with either Solr or Elasticsearch:

  • PoolParty / Extractor / Semantic integrator based on Elasticsearch
  • Extractor clustering based on Solr / Elasticsearch
  • PoolParty semantic search based on Solr / Elasticsearch (Graphsearch / Extractor API)
  • Full integration in existing customer Solr / Elasticsearch environment

The integration of Solr and Elasticsearch has been implemented to be fully equivalent. All functionalities and API calls (e.g. also the GraphSearch API) are independent from the use of either Solr or Elasticsearch in the background.

PP Graphsearch using Elasticsearch

Stardog Remote Graph Database Support

With PoolParty 5.5, Stardog is now fully supported as a remote graph database for PoolParty. All remote graph database features are available with Stardog:

  • Export thesauri, custom schemes and ontologies
  • Store extraction results from PPX API
  • Corpus Management

Improvements

PoolParty Thesaurus Server

Enhanced user interface

The UI has been reworked by adapting the new PoolParty CI. The header has been reduced by merging the search bar into the menu and by restructuring the menu.

New look and feel

The right menu has been removed and replaced by icons opening applications like the SPARQL endpoint, GraphSearch etc. (1) user related information and (2) messages have been moved into a separate box indicating the message status via an icon (3).

 

Right menu & messages

Improved URI generation

With 5.5. PoolParty offers again more flexibility for defining URI generation patterns for a project. PoolParty allows now to specify different patterns per resource type (e.g. skos:Concept, skos:ConceptScheme ...).

URI pattern per resource type

In addition, the ability to define different patterns per concept scheme has been extended beyond only adding a qualifier. Different URI generation patterns can be defined per concept scheme.

URI pattern per concept scheme

 Finally if a URI has been accidently created it can be edited manually by users with Administrator rights.

Excel Import/Export improvements

  • Excel import and Export now supports import of SKOS-XL labels.
  • Multiple valuse can be added via multiple columns of multiple rows.

RDF import/Export improvements

RDF Export for formats that do not support named graphs exports now one file per graph applying the following naming convention:

  • pp_project_<projectId>.<graph>.<format> e.g. pp_project_reeglethesaurus.concepts.ttl, pp_project_reeglethesaurus.history.ttl

If several files are exported they are provided as an zip archive. If an import to a Remote Graph Database is done the respective data is stored in respective graphs.

RDF import supports now the import of zip archives. If they include files following the naming convention above. Information is imported in the respective graphs.

Minor improvements

  • It is now possible to link to URIs using SKOS matching properties.
  • Open project is selected in Snapshot Dashboard.
  • User Activity (Login, Logout) has been added to logging.

PPX

Frequency based lemmatisation utilisation

Extracted terms (or free terms) are lemmatised in PoolParty. For corpus analysis, lemmatisation to their basic word form has the advantage that the number of suggested terms is reduced. This allows the PoolParty user to focus on only one word form for introduction into the taxonomy.

Experience showed that this approach may also have downsides when a specialist terminology is used. E.g. 'Big Data' is lemmatised to 'big datum'. In PoolParty 5.5. these artifacts are filtered by usage of frequency analysis of observed terms in corpus. 

Focused on real-life knowledge from observations of terminology during Corpus Analysis, PoolParty is able to suggest an improved set of Extracted Terms which are the basis of the taxonomy improvement process. 

Version 5.4Version 5.5
Comparsion of Extracted Terms as result of Corpus Analysis

Minor improvements

  • Improved plain text extraction avoiding extraction of invalid terms.
  • Improved filtering of terms avoiding creation of invalid terms.
  • Improved autocomplete functionality and option to manipulate autocomplete query via UI.

PPT API - Improved and adapted services

Extended User data reflected in User service extension

GET /user
POST /user/addUser
POST /user/updateUser

Optional full name is supported.

Please note, that the /user/addUser service now uses a JSON object as envelope to send user details to the server.

Details you can find under: Web Service Method: Create a New User

History services support fine granulated time pattern

GET /history/{project}
GET /history/{project}/concepts

Parameter fromTime/toTime support seconds - Example value "25.07.2016T17:29:20"

Please see e.g. Web Service Method: Request the History of Concepts for more details.

Export / Import change in data structure

GET /projects/{project}/export
POST /projects/{project}/import

Export and import allows you to define the specific PoolParty Project Modules where data will be stored. Please be aware that non-context supporting formats allow export of only one module. For example you are not able to export PoolParty data of modules workflow+history with serialization format n3.

Please see e.g. Web Service Method: Export and Download Project Data for more details.

Retrieve Suggested Concepts provides cursor based pagination

GET /thesaurus/{project}/suggestedConcepts

Use parameters offset, noOfConcepts to retrieve chunks of lists of suggested concepts.

See Web Service Method: Request Suggested Concepts of a Project for further details.

Extension of response data

GET /thesaurus/{project}/concept

Added fields "topConceptOf" and "inScheme" to result JSON

GET /thesaurus/{project}/schemes

Added optional "properties" parameter (same as the "properties" parameter in /concepts)

Changes

Parameter Changes

 For Web Service Method: Request Subtree of Concept or Concept Scheme the parameter name 'uri' was changed to 'root'

Fixes

  • Alphabetic sorting in LD Frontend has been fixed.
  • Corpus API returns Status 400: "Request rejected. Analysis task is already ..." if a calculation is triggered for a project while another calculation is still in progress.
  • Approving concepts in the workflow dashboard on pages provides now the correct behaviour.

 

Please note that we adapted the PoolParty System Requirements for this version due to improved functionality of the application.