PoolParty Semantic Classifier

The PoolParty Semantic Classifier combines Apache Spark machine learning algorithms with semantic knowledge graphs. The machine learning algorithms are trained based on a defined set of data, this means it is a semi-automated process that requires you to first define your categories and add the material to these categories in order for the classifier to understand your material. Therefore, the classifier algorithm will be working at its best, when you provide more information to help understand what the categories mean, using terms, concepts and shadow concepts.

The training process requires a data set to be used as the training material, paired with a defined set of categories, in this case a minimum of two. Simply choose among seven available Spark machine learning algorithms and modify your algorithm parameters. That way, you are able to train a classifier to your desired needs and outcomes. Functionality such as validation and various features are included to ensure your classifier can be fine tuned and corrected, this allows for easier access to training in machine learning. The classifier will learn from the features used, in order to help identify classification categories better on your data set. The combination of these features helps a better understanding what each data material means and solidifies the data examples, thus improving your classification result. 

The semantic classifier allows for you to achieve consistent classification results, on a very large scale, that reduces the impact of bias decisions and human error.

Trained classifiers are made available using RESTful APIs of the PoolParty Extractor, designed with a high throughput. Document text is analysed for terms and/or concepts and is checked against the existing trained information, and upon completion of the analysis, depending on the algorithm in use, a classified label or classified labels are returned in a JSON format. 

Due to the classifier being combined with the extractor, results are combined with document information and a classified label or labels are presented alongside your extracted terms and concepts found in the material.