PoolParty Extractor - Suggest Service Configuration

Abstract

Before PoolParty 5.1 the Concept Suggestion Request Service lookup was based on prefix search. Using the service concepts were found, whose labels started with the searchString.

Note

Since 5.1 the index configuration is extended to also find concepts that contain the searchString.

Example: Different Results

With a concept label: 'Some Concept', we get different results.

PoolParty 5.0:

searchString	Result
"Som"	"<em>Som</em>e Concept"
"Con"	none

PoolParty 5.1:

searchString	Result
"Som"	"<em>Som</em>e Concept"
"Con"	"Some <em>Con</em>cept"

Configuration

Note

After changing the schema.xml, the index has to be rebuilt. This is described in Web Service Method: Build an Extraction Model.

This behaviour can be changed by changing the definition of the ngram fieldType. For search with matches inside the labels, use the NGramFilter as shipped in 5.1:

POOLPARTY_HOME/data/solr/conceptData/conf/schema.xml

<fieldType name="ngram" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory"
            generateWordParts="0" generateNumberParts="0" catenateWords="1"
            catenateNumbers="1" catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <!-- vvv -->
    <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="1000"/>
    <!-- ^^^ -->
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  ...
</fieldType>

To disable in-label-matches replace the NGramFilter with the EdgeNGramFilter. This will produce ngrams just from the left to the right, all from the start of the labels:

POOLPARTY_HOME/data/solr/conceptData/conf/schema.xml

<fieldType name="ngram" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory"
            generateWordParts="0" generateNumberParts="0" catenateWords="1"
            catenateNumbers="1" catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <!-- vvv -->
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="1000" side="front"/>
    <!-- ^^^ -->
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  ...
</fieldType>

In this section: