Release Notes - PoolParty 6.0

This page contains the release notes for PoolParty release version 6.0. In the release notes you can find major developments, improvements and changes made in the respective release.

Details on how to upgrade from your present version find here:  PoolParty Upgrades

The release notes contain two chapters, highlights and improvements:

Highlights

  • Ontology-Driven Search with GraphSearch Server
  • Semantic Middleware Configurator
  • Word Sense Induction
  • Shadow Concept Extraction

Improvements

  • General Improvements
  • Taxonomy & Ontology Management Improvements
  • Corpus Management Improvements
  • Entity Extraction Improvements
  • UnifiedViews Improvements
  • Thesaurus & Ontology Manager API Changes
  • New Methods
  • Changes to Existing Methods
  • Entity Extractor API Changes
  • GraphSearch API Changes

Highlights

Ontology-Driven Search with GraphSearch Server

PoolParty's GraphSearch Server received a major upgrade so now you can not only provide taxonomy based search features for search indexes (e.g. Solr, Elasticsearch) but also configure ontology based search applications based on RDF data in Graph Databases (e.g. MarkLogic, Stardog, GraphDB etc.). A new default GraphSearch interface is provided that can be used and configured right away. All functionality is also available via APIs and you can  integrate them in existing search applications or use them to develop custom search interfaces on top.

In addition an EXCEL export of search results is available.

GraphSearch Server Administration

An administration dashboard for the GraphSearch server is available. Thus you can monitor and configure GraphSearch via an administration interface:

It offers the following functions and options:

  • The System view contains information about the PoolParty instance you are viewing, like memory and disk space used, availability of the PoolParty server as well as availability of search index or graph database used, and the connected UnifiedViews instance.
  • The Project view allows you to configure the PoolParty project or ontology as well as the search index to be used for the GraphSearch. 
  • The SearchFields displays the search field keys, their labels and their allowed values (i.e. types). For index based searches you can configure custom fields here.
  • The Charts view allows you to configure different types of charts/statistics that are displayed in the default search frontend.
  • The Mappings view is only available for a graph database/ontology based search, where you can define properties used as labels for resources.
  • The Web Feeds view is only available for a search index based search and shows the defined agents that are crawled regularly.

Statistics API for GraphSearch Server

The GraphSearch Charts Service allows to create charts and statistics over the search results dynamically. As default you can configure interface charts and place them directly using the administration.

Semantic Middleware Configurator

The Semantic Middleware Configurator provides the control center for your PoolParty server; configure and connect to available indexing engines and graph databases; set up available linked data sources and visualization tools in one place. 

 Use PoolParty as your Linked Data Hub:

 

The following options are available in the Semantic Middleware Configurator:

  • Configure the available remote graph databases of your PoolParty installation. 
  • Configure the indices (Solr Elasticsearch) used for entity extraction and GraphSearch.
  • Configure external systems connected to PoolParty to display visualizations, like WebVOWL or SkosPlay!, for example.
  • Configure the Linked Data Sources connected to PoolParty.

For all defined connections a check is done and the availability of the external system is indicated by a green icon in the tree.

Depending on the defined configuration you will find the features it can be used for. Below the Features node for example Linked Data Harvesting is only available for DBpedia.

Word Sense Induction

Get help from PoolParty to identify potentially ambiguous terms in your calculated corpora. Calculating Word Sense Induction now is available in PoolParty's corpus analysis.

A word's sense is derived from the context. So in order to understand for example, if the term 'Americano' refers to the coffee or the cocktail, you need to know the context of the word. In order to disambiguate meaning, PoolParty checks co-occurrences of the terms relevant to the word 'Americano' in a corpus or in corpora. The co-occurring terms will be extracted and clustered.

You can create different candidate terms based on the disambiguation done by the corpus analysis. In addition to the candidate concepts, the co-occurrences are stored to provide disambiguation later on.

Shadow Concept Extraction

Benefit from deep semantic analytics of your content; PoolParty extracts even implicit knowledge from your text.

'Shadow Concepts' in PoolParty refer to the fact that often particular terms appear in texts in a certain position and frequency, similar to the co-occurrences of actual concepts in a reference corpus. Thus the underlying 'shadow concept' that is already part of your thesaurus, can be deduced for those texts, even if the actual concept is not found there. This can be useful to extend and refine the entity extraction, which is done based on your thesaurus. The calculation of these shadow concepts is based on PoolParty's special frequency and co-occurrence algorithms.

Shadow concepts can be reviewed and used in corpus management to extend your thesaurus and are available via the extractor API to improve concept extraction.

Improvements

General Improvements

Switch to Java 8

With PoolParty version 6.0 we switched to Java 8. Java 7 is not supported anymore.

Upgrade to Tomcat 7.0.77

PoolParty has been upgraded to use the latest stable version of Tomcat 7 (7.0.77)

Switch to RDF4J

Since openrdf-sesame has become the Eclipse RDF4J project, PoolParty has been upgraded to use the latest version of RDF4J (2.2.1) as internal graph database.

Support for IBM WebSphere

With release 6.0 PoolParty can be run on an IBM WebSphere webapplication server, as an alternative to Apache Tomcat.

Taxonomy & Ontology Management Improvements

Improved New Project Dialogue

The New Project dialogue now allows you to choose from different options to start your project:

Besides creating an empty project to start from scratch or creating a project from an existing one you now can:

  • Create a project based on a corpus and start with uploading documents or harvesting your webpage.
  • Create a project uploading your RDF data.
  • Create a project starting from you Excel taxonomy.

Advanced Ontology Management

Ontology management in Poolparty is now based on OWL and the previous used RDF format has been removed. That means all your onotologies follow the OWL standard, yet you have to adapt ontologies to the expressivity supported by PoolParty, before you import them.

You now can also create sub-properties creating relations and attributes. This provides an additional level of expressivity in ontology management:

The label and literal attribute have been merged, thus offering an additional level of expressivity managing your ontologies, as shown below:

In addition it is now possible to add and edit metadata for custom schemes and ontologies and the included classes, relations and attributes.

The following new ontologies have been added to PoolParty's ontology catalogue:

  • BIBFRAME
  • PROV-O

Support for Third-party Visualization Tools

Include external apps in your PoolParty platform, for example to dynamically visualize ontologies and knowledge models by the use of third-party tools like WebVOWL.

You can configure external sources, for example WebVOWL, to visualize your ontologies. That way you can make them available to those applications for the following elements:

  • the whole project
  • a concept scheme or subtree
  • a custom scheme or ontology

Another usage example is to trigger visualizations of your data, as shown below with WebVOWL:

Additional Linked Data Sources

The following new Linke Data sources have been added:

  • DBpedia Dutch & Russian
  • Getty Vocabularies (AAT, TGN, ULAN)
  • PermID
    • requires an API Key.

Support for IBM WebSphere

As of PoolParty 6.0 instead of of Apache Tomcat, IBM WebSphere can be used to run the PoolParty Server. Setting up the IBM WebSphere server is easy and fast.

Using IBM Websphere is only available if you acquire an additional support package.

Improved Knowledge Graph Visualization of PoolParty's Visual Mapper

PoolParty's Visual mapper has been extended to provide more details about your data. You can now display on demand:

  • all SKOS relations and SKOS data of your concepts, clicking Display Settings and activating the check boxes.
  • browse by custom relations, clicking inside the dynamic chart directly
  • show custom attributes (2)
  • toggle concept details (1) and expand a details dialogue that way (3)

Improved Taxonomy Linking

Mapping and linking of various taxonomies is now easier than ever before. Where before only SKOS was available, you now can choose from online and custom ontologies. You can:

  • use custom relations with domain and range restrictions for linking
  • batch linking now will only provide results not yet linked

In this example you see the 'GEO' ontology that has been linked to the project as possible option in the drop down (1). Thus as Linking Predicate (2) now a different option is also available.

PoolParty Project History Improvements

The history in PoolParty has been extended to provide filtering for history events. In addition you can export filtered history events (and so also import them again) as well as delete them.

That way you can clean up your history without loosing any information.

All of these improvements also available via our API (see below). In addition the API has been extended to provide full information about history events so you can use it to synchronize your projects.

Improved Export/Import Functionality

PoolParty export allows now to also export corpora and schemes and ontologies used in a project. In addition export can be compressed and the Pretty Print option groups the RDF statements in the exported files in a meaningful way.

On project import information is provided which thesaurus data, corpora, schemes and ontologies have been imported.

Enhanced User Interface

The rework of the interface to follow the new PoolParty corporate design has been continued to provide a consistent look and feel of interface elements.

Minor Improvements

  • The modified date has been added to the table in the open/delete project dialogue.
  • Custom scheme and ontology tables can be sorted by the table headers. In addition the URI of the name of classes and properties has been made a link to distinguish between opening the Linked Data frontend or providing the URI of the class or property.
  • The Excel structure of the history export has been improved to provide all information on history events.
  • The delete project dialogue allows to select multiple projects for deletion.
  • Also SKOS documentation properties (definition, scope note etc.) now support the new multiline field and by that all special characters are supported.
  • Import validation can be triggered directly in the Quality Management tab and thus run independently of an import.
  • General performance improvements through parallelisation when fetching data.

Corpus Management Improvements

Corpus Language Detection

The language of corpus documents will be detected automatically on upload. If the document language does not match the corpus language, the document will not be imported and the respective information is provided in the interface. Language detection can be disabled on demand.

Improved Performance to Handle Large Corpora

Analyze thousands of documents and extend your knowledge model semi-automatically; let machines learn from text and benefit from high-quality taxonomies.

The corpus analysis and handling of large text corpora has been improved, especially performance related.

Sample Sentences

A very nifty feature inside the corpus of a project, since you can now display sample sentences for extracted terms, just by clicking the term in the list. A new dialogue shows the term in the context it appears in inside the documents.

Highlight Shadow Concepts in Documents View

In the Documents View of corpus management you can now display shadow concepts as well that have been calculated for the document (in addition to concepts and terms found).

Export and Import Corpora

A corpus export option has been added. The functionality is available via the context menu of the respective node of a corpus.

Exported corpora can be imported again. You can reach the functionality using the context menu of the Corpus node.

Minor Improvements

  • We revised the interface and messages, so usage and clarity have been enhanced.
  • You can export all documents of a corpus as .zip file by simply using a button in the Corpus Documents tab.
  • If you search in corpus management for concepts in the thesaurus, the concept is shown in the thesaurus views in corpus management.
  • When crawling websites the link to the website is stored with the corpus document.
  • Concept to concept co-occurrences are available in corpus management: they improve the workflow of integration of candidate terms and suggest links in the thesaurus.
  • A corpus can be renamed using their respective entry in the context menu.
  • On corpus creation you can choose if the corpus should reside in a local RDF4J repository or a remote graph database configured in the Semantic Middleware Configurator.

Entity Extraction Improvements

Inclusion of Standard Language Models

Benefit from a no-black-box approach and choose between various approaches to optimize your text mining services. PoolParty now supports scoring based on a standard language model. PoolParty's extracted terms are scored against the language model to get more relevant scores. In other words: This is a new method that compares frequencies of terms in a corpus to frequencies in a generic corpus (here generated from DBpedia). It will generate new scores for each term.

You can find standard language models in the PoolParty download area. They have to be made available on the PoolParty Server. Once this is done you can enable them using the interface in the Language Model Configuration dialogue, which you can reach via Corpora > Language Model Settings.

Standard language models are loaded into PoolParty's Solr server. If they are made available, the Solr server has to be provided with more resources. A minimum of 4GB RAM dedicated to the Solr server is recommended.

Term Extraction and Handling

The extraction of words and terms has seen major improvements:

  • The methods for calculating scores for single and multiple words have been greatly enhanced.
  • Sub-terms are detected in the list of extracted terms and their score is reduced to provide better results.  One example is the phrase 'new york city', where 'new' as well as 'city' will frequently occur before, respectively after, 'york'.
  • Integrate scores of term extraction methods. Terms have either a content term score (single word terms) or a compound score (multi word terms). Integrating two extraction methods now is possible as a relevance score is based on these and other extraction methods, calculated using the custom developed formula you can use from the administration script interface.
  • Remove terms with repeated words. So far repeated words might have turned up after extraction and analysis had been done. The enhancement here takes care to remove such repetitions to make corpus handling more effective.
  • Match equivalent characters. Sometimes the spelling of the same word in different languages can be similar using different characters, like ö = oe inside a word in German. The extractor has been extended with an equivalent characters language model, which ensures that these spellings are found as variants.

Entity Extractor Supports Multiple Projects

Configure your own pipelines of extraction services and make use of various knowledge graphs for a highly precise text mining.

Information of knowledge graphs and taxonomies may be spread across multiple projects. As of PoolParty version 6, extraction from these 'distributed projects' becomes easier: You can now specify a list of projects and get aggregated extraction results for them all at once.

UnifiedViews Improvements

Performance Improvements

UnifiedViews can be connected to GraphDB instead of the built-in RDF4J graph database to improve performance.

Certain DPUs can be switched between normal and debug mode. In normal mode no debug information is generated to improve performance when running pipelines in production environments.

Validation of Transformation Results

Validation of the output data of DPUs can be configured directly in the DPU configuration providing ASK queries.

Improvement on PoolParty DPUs

A GraphSearch DPU has been created allowing to ingest data into the Solr and Elasticsearch search index via the PoolParty GraphSearch Content API.

The Entity Extraction DPU has bee improved to support https and provide a better URI handling.

Thesaurus & Ontology Manager API Changes

New Methods

Method: GET /corpusmanagement/{project}/documents/download

  • New endpoint for exporting corpus documents as ZIP file.

Method: GET /corpusmanagement/{project}/export

  • New endpoint for exporting the corpus graph in TRIG format as ZIP file.

Method: POST /history/{project}/delete

  • New delete history endpoint.

Method: POST /history/{project}/deleteByConcepts

  • New delete history by concept endpoint.

Method: POST /projects/{project}/delete

  • New delete projects endpoint.

Method: POST /projects/create

  • Create a new project by providing the metadata via a JSON construct.

Method: POST /projects/create

  • Create a new project by providing a .ppar file and supplying additional data via request parameters.

Method: GET /PoolParty/api/currentlyRunningTasks

  • Returns a list of currently running tasks.

Method: GET /PoolParty/api/corpusmanagement/{project}/analysisRunning

  • Returns true if an analysis is running.

Method: GET /projects/{project}/pparExport

  • New endpoint for exporting projects as .ppar files.

Changes to Existing Methods

 

Method: POST schema/createCustomOntology

  • New parameter for languages added.

Method: POST schema/createClass

  • New parameter for comment added.

Method: POST schema/createDirectedRelation

  • New parameter for comment added.

Method: POST schema/createInverseRelation

  • New parameter for comment added.

Method: POST schema/createSymmetricRelation

  • New parameter for comment added.

Method: POST schema/createAttribute

  • New parameter for comment added.

Method: GET /PoolParty/api/history/{project}

  • New parameters for text, eventtype and users added.

Method: GET /PoolParty/api/history/{project}/concepts

  • New parameters for text, eventtype and users added.

Method: GET /PoolParty/api/schema/export

  • Removed "language" parameter, since our custom schemes and ontologies are now standard OWL and there is no custom SWC format anymore. Added optional "compress" parameter for ZIP download.

Method: GET /projects/{project}/export

  • Additional parameter "prettyPrint".

Method: /api/corpusmanagement/---/results/cooccurrence/term

  • Z-score, abcount returns null. New value: cooccurringScore

Method: /api/corpusmanagement/—/results/cooccurrence/concept

  • Z-score, abcount returns null. New value: cooccurringScore

Method: /api/corpusmanagement/—/results/cooccurrence/closeconceptsforterm

  • Z-score, abcount returns null. New value: cooccurringScore

Method: /PoolParty/api/schema/getOntology

  • Change the return structure of call. Calls returning status code 201 and 204 are changed to return 200. Affected calls are: Create and delete calls of custom scheme and ontologies, Create calls of candidate concept

Method: /api/corpusmanagement/{projectId}/results/cooccurrence/concept

  • New optional parameter limit.

Method: /api/corpusmanagement/{projectId}/results/cooccurrence/term

  • New optional parameter limit.

Method: /api/corpusmanagement/{projectId}/extractedterms

New optional parameter limit.

Method: POST /thesaurus/{project}/addLiteral

  • Harmonize input parameters for addLiteral, updateLiteral, removeLitera, addRelation, removeRelation.

Entity Extractor API Changes

Method: api/extract

  • Added new parameter "shadowConceptCorpusId"  which accepts a list of corpus URIs.
  • "projecId" and "corpusScoring" paramters now accepts lists.
  • Algorithm for corpusScoring (corpusScoring).
  • Added new parameter: "properties". Accepts a list of custom class attribute/relation properties (and http://www.w3.org/1999/02/22-rdf-syntax-ns%23type) and returns the result. Set to "all", to fetch all properties.
  • Language parameter optional.
  • Matched label information: The separate matched labels of a concept and their individual frequencies are now returned.
  • Actual text matches of concept annotations: The matches found for a term in documents are listed in detail.
  • Concept matching positions: Additionally the position of a matched concept in documents will be returned.

Method: api/annotate

  • Added new parameter: "properties". Accepts a list of custom class attribute/relation properties (and http://www.w3.org/1999/02/22-rdf-syntax-ns%23type) and returns the result. Set to "all", to fetch all properties.
  •  Language parameter optional.

Method: /api/annotate/store

  • Language parameter optional.

Method: api/suggest

  • Added new parameter: "properties". Accepts a list of custom class attribute/relation properties (and http://www.w3.org/1999/02/22-rdf-syntax-ns%23type) and returns the result. Set to "all", to fetch all properties.

Method: /api/categorization

  • Language parameter optional.

api/expand

  • Added new parameter: "properties". Accepts a list of custom class attribute/relation properties and returns the result. Set to "all", to fetch all properties.

GraphSearch API Changes

Method: /api/content/update

  • Request parameter mapping has changed
    • spaceKey -> context (not mandatory anymore)
    • pageUrl -> identifier
    • externalUrl -> website
    • dynUris & customAttributes merged -> annotations

Method: /api/content/delete/all

  • Request Path changed /api/content/delete/source

Method: /api/content/delete/

  • Request Path changed /api/content/delete/id

Method: /api/search & api/suggest & api/similiar & api/recommend 

  • Request parameter mapping has changed 
    • format -> removed
    • nativeQuery -> removed

Method: /admin/*

  • Request mappings & methods changed

Method: /api/search

  • Request & Response parameter mapping has changed