This section contains a short guide about what Word Sense Induction in PoolParty means and how it helps disambiguating terms in your corpus.
Ambiguous terms can be a problem during text extraction and data mining. Words such as 'Jaguar' can mean the big cat on the one hand or the car brand on the other. In order to find out what is actually meant in a certain document, the context has to be analyzed. The function in PoolParty that calculates co-occurrences also takes care of such ambiguous terms and lets you even decide what to do with them.
To make the principle clearer, the Word Sense Induction (1) can be compared to an unsupervised classifier. The terms and phrases in the corpus documents are processed and calculated by the extractor which groups them into meaningful clusters. This is done according to terms that have been found in certain frequency in close proximity to your thesaurus' concepts. To make this more precise, additionally the content of the document is taken into account, large or small, high numbers for co-occurring terms or not. The results of this calculation also are saved in PoolParty and you can use them from inside the Extracted Terms list as individual terms to be added to the Candidate Concepts list.
In order to enable the Word Sense Induction feature, follow these steps: Analyse Documents in Your Document Corpus
Find the following topics in this section: