Release Notes - PoolParty 6.2

This section contains the release notes for PoolParty minor release version 6.2. In the release notes you can find major developments, improvements and changes made in the respective release.

Details on how to upgrade from your present version find here:  PoolParty Upgrades

The release notes contain two chapters, highlights and improvements:

Highlights

PoolParty Semantic Classifier

PoolParty offers a sophisticated and unique tool to make sorting and assigning as well as classification of hundreds or even thousands of digital documents a breeze. It is available as an add-on for PoolParty Enterprise Server and Semantic Integrator.

The use cases are not confined to classifying documents: you can classify emails, do a sentiment analysis, news classification, or match products or content to user groups or users.

The PoolParty Semantic Classifier combines machine learning algorithms (SVM, Deep Learning, Naive Bayes, etc.) with semantic knowledge graphs.

Use our Semantic Classifier functionality to easily assign categories to documents based on training calculations in so-called training classifiers you prepare yourself as templates.

After that you use the PoolParty Extractor API and the new Classification Management API to classify documents using the classifier's ID.

You can also combine the classifier with a UnifiedViews DPU pipeline or with GraphSearch, including a recommender plugin you may have created.

Procedure Summary

To use the Semantic Classifier, you have to follow this procedure, find some details below:

  1. Prepare your training documents using PoolParty's Training Boxes function. (optional)
    • Using Training Boxes offers the advantage of being able to reuse documents in several different classifiers.
  2. Configure Classifiers as basis for classification. Choose from the machine learning algorithms PoolParty offers as default calculation models for your new classifier.
  3. Define categories for the documents you want to classify.
  4. Add training documents to the classifiers from the Training Boxes.
  5. Configure, train and test the classifier using the training boxes and evaluate the results. If they are not yet satisfactory, change training settings until you receive the desired results' scoring.
  6. Classify new documents, emails or match content to users or user groups, using the API, based on the trained classifiers.
1. Configure Training Boxes (optional)

After you have opened your project and accessed the Semantic Classifier, you will be able to create training boxes below the Training Boxes node:

Add documents to it whose category or categories you already know about. 

2. Create Classifiers

Now that you created training boxes you would set up classifiers. The PoolParty interface provides an easy and fast setup for you in the Train Classifiers node. Open the node to see available classifiers or create new ones:

3. Define Categories for the Classification

After you have created a classifier,  you would add the corresponding categories. The Details View of the Semantic Classifier will let you do that with a few clicks:

4. Add Training Documents to the Classifiers

After you have set up a classifier and its categories, you have to add training documents to it. As the Traning Boxes allow you to reuse the documents you uploaded, we recommend creating them first and then reuse the documents in a classifier from there.

5. Train the Classifiers Using the Training Boxes - Test the Classifiers using Test Classifiers

The classifiers in PoolParty can be used based on machine learning algorithms that are included in PoolParty by default.

The classifier in training will search the documents for the categories you defined before and group them into those categories.

The overall recommendation for result scores for a well-trained classifier are 70+ % for the Cross Validation values, f1, return and precision.

Test the classifiers by using them in Test Classifiers to make sure the results will be satisfactory and precise.

6. Classify New Documents via API

After you have trained one or more classifiers you can classify documents based on them.

In order to classify new documents you would use the API, like this for example:

<server>/extractor/api/extract?text=<text>&documentClassifierIds=<classifier-ID1>,<classifier-ID2>

Using this example call you might classify emails that arrive on a server and have a spam filter inside PoolParty configured for them. By specifying the URI of the respective custom trained classifiers using the documentClassifierIds parameter, these emails could be filtered according to the categories you defined.

Improvements

Improved Literal Attribute Creation

Creating attributes of the type literal now enables you to create literals of all possible types via a set of radio buttons. Creating invalid combinations for example combining 'Unique Per Language' with 'No Language Options' is prevented.

Scoring Method for Shadow Concepts

The scoring method for shadow concepts has been improved in regard to frequency and relevancy: Since the shadow concepts are calculated by their co-occurrencies with thesaurus concepts inside corpus documents, the frequency of the shadow concepts in documents and the absolute number of documents have now been made reliant on each other. Thus the scoring results for shadow concepts are even more accurate.

Full-text Search in GraphSearch UI

Previously in the GraphSearch interface you were able to use the search field to search by facets. The search has been improved, a full-text search is now available so any term or content in the GraphSearch documents can be found.

API Additions and Improvements

Extractor API

The Suggest Service has seen some major improvements:

  • Use a multitext search, that lets the search find not just individual terms but also phrases consisting of two or more terms that are similar to the search string. Combine terms as phrases in your search string. Thus search results now will not only consist of exact matching terms or phrases but also find terms that contain parts of the search string's elements.
  • You now have the possibility to search for custom attributes. This can be helpful to further narrow down the available terms for accuray of search results.
  • In addition it is now possible to define which fields (for example labels or custom attributes) should used for searching. For example one could limit a search only to the preferred labels or a specific custom attribute.
  • Take adavantage of a boost factor definition: by adding one of the respective digits to the search query terms, the search results will be sorted according to these factors. Use the boost attribute with the new parameter searchParameters.
  • Define exact matching results by using the matchingStrategy attribute with the new parameter searchParameters.

The Concept Extraction Service has been extended for use with the Semantic Classifier:

  • This API is also crucial to the use with the new machine learning based classifier: use the documentClassifierIds parameter to define a classifier by its ID. The methods for url, text as well as file apply.

Classification Management APIs

A whole new kind of API calls has been introduced for the Semantic Classifier, and the handling of Training Boxes via API, the Classification Management APIs. You can already get a list of classifiers or training boxes, create new training boxes, and upload documents to training boxes.

Details find here:  Classification Management APIs

Specific Configuration Changes

SAML-Authentication Adjustment

Adjust the following values and settings:

  • Change the auth.xml file for the SAML setup when upgrading to version 6.2 of PoolParty: the parameter defaultTargetUrl now needs the default value /auth/logout. Details finde here:  Setup SAML Authentication for PoolParty

Elasticsearch Index Changes

  • For the Elasticsearch folders manually change the following respectively:
    • Delete the folder /poolparty/data/elasticsearch/corpusterm and the respective index in: ${elasticsearch.home}/data
    • Delete the folder /poolparty/data/elasticsearch/conceptdata and the respective index in: ${elasticsearch.home}/data

Bug Fixes

The following issues or problems were fixed with this release:

  • A reported path manipulation vulnerability has been fixed.
  • An error with custom attributes of the type literal on migration from PoolParty 5.7 to 6.04 has been fixed.
  • The issue with the display of a Requires Language check box has been fixed in cases where no language is necessary on creating a custom attribute.
  • On confirming a suggested concept from the workflow below a top concept, the class type is now correctly added and shows up as triple.
  • An issue with linked projects' taxonomies not appearing in the Details View of the project has been corrected.
  • The display of spaces and tabs while editing scope notes and definitions has been fixed.
  • In the New Project dialogue the sample URI displays correctly again, according to the ID generation method setting.
  • Using advanced SKOS with linking of terms among projects now works again as expected.
  • The issue concerning searching with wildcards in autocomplete as well as Advanced Search contraints has been fixed.
  • The problem with diacritics of a character preventing editing or deletion has been fixed, edit or delete is now working as expected.
  • Display of quality data details has been fixed and is working as expected again.
  • Error on export of subtree data and importing it again, due to graph data corruption, has been fixed.
  • On data export as .ppar all corpora again will be included in the export file.
  • An Excel import/export loop, after using the update data feature on import, has been fixed, exports are working again as expected.
  • RDF export of custom schemes and ontologies is working as expected again, also an additional message respecting this has been added to the interface.
  • The user icon now displays the notifications about assigned tasks for workflows again.
  • The broken link in the About dialogue has been fixed.
  • On project language update the display now updates properly again for added preferred labels of the new languages.
  • The display issue of search results in the Corpus Test interface has been taken care of.
  • An issue with the suggest concept API call and concepts not correctly matched has been fixed.
  • Sorting in suggest service also has been taken care of and now works again as expected, displaying search results sorted by score.
  • The following issues in entity extraction have been fixed:
    • Texts starting with quotes is extracted correctly.
    • The character '´' now is being taken into account correctly, as regards variants such as Spain's or Spain´s
  • The character '-' now is being taken into account correctly in the suggest concept call of the Extractor API.