PoolParty Extractor - Suggest Service Configuration
PoolParty Extractor - Suggest Service Configuration
Before PoolParty 5.1 the Concept Suggestion Request Service lookup was based on prefix search. Using the service concepts were found, whose labels started with the searchString.
Note
Since 5.1 the index configuration is extended to also find concepts that contain the searchString.
Example: Different Results
With a concept label: 'Some Concept', we get different results.
PoolParty 5.0:
searchString | Result |
---|---|
"Som" | "<em>Som</em>e Concept" |
"Con" | none |
PoolParty 5.1:
searchString | Result |
---|---|
"Som" | "<em>Som</em>e Concept" |
"Con" | "Some <em>Con</em>cept" |
Configuration
Note
After changing the schema.xml, the index has to be rebuilt. This is described in Web Service Method: Build an Extraction Model.
This behaviour can be changed by changing the definition of the ngram fieldType. For search with matches inside the labels, use the NGramFilter as shipped in 5.1:
POOLPARTY_HOME/data/solr/conceptData/conf/schema.xml
<fieldType name="ngram" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- vvv --> <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="1000"/> <!-- ^^^ --> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> ... </fieldType>
To disable in-label-matches replace the NGramFilter with the EdgeNGramFilter. This will produce ngrams just from the left to the right, all from the start of the labels:
POOLPARTY_HOME/data/solr/conceptData/conf/schema.xml
<fieldType name="ngram" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- vvv --> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="1000" side="front"/> <!-- ^^^ --> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> ... </fieldType>