Concept Extraction Service
Concept Extraction Service
The PoolParty Extraction Service allows you to send text to the API. The return will be a defined number of concepts from a thesaurus and/or a defined number of terms relevant for the text.
This API call accepts plain text, a web page referenced by a URL, and an uploaded file as input.
Example URL
https://<your server>/extractor/api/extract
Find details in the following topics:
Web Service Method: Extract from Text
Description |
---|
Extracts and returns meaningful metadata like concepts and terms from a given text. |
URL: /extractor/api/extract
Request
Supported Methods |
---|
POST |
GET |
HTTP Parameters
Parameter | Type | Required | Description |
---|---|---|---|
categorizationWithPpxBoost | boolean | false | Use Extractor boosting, default = false |
categorize | boolean | false | Categorization extraction, default = false |
conceptMinimumScore | Double | false | Minimum required score of concepts, default = 0 |
conceptSchemeFilters | Array of String | false | Concept scheme URI filters |
corpusScoring | Array of String | false | Corpus term scoring. Enabled if corpusIds (UUIDs) are provided |
customAttributeFilters | Array of CustomProperty | false | Custom attribute (property URI and string value) filters |
customClassFilters | Array of String | false | Custom class URI filters |
disambiguate | boolean | false | Use thesaurus based disambiguation, default = false |
displayText | boolean | false | Include text extracted from given text in response, default = false |
documentClassifierIds | Array of String | false | Enable document classification by giving the document classifier IDs as input |
documentId | String | false | Internal ID of the document, taken from documentUri |
extraConceptLanguages | Array of PPLocale | false | Additional languages used for concept extraction (en|de|es|fr|...); also supports wildcard * for all language |
extractorVersion | String | false | Version of PPX Extractor used |
filterNestedConcepts | boolean | false | Remove concepts matches which are contained within other matches, default = false |
findPersonNames | boolean | false | Deprecated (use nerParameters) - extracts person names from the given text |
language | PPLocale | false | Extraction language (en|de|es|fr|...) |
lemmatization | boolean | false | Use lemmatization, default = true |
locationExtraction | boolean | false | Deprecated (use nerParameters) - extracts locations from a given text |
nerParameters | Array of NERConfig | false | Array of models that are used for Named Entity Recognition. |
numberOfConcepts | Integer | false | Retrieve number of concepts, default = 25 |
numberOfTerms | Integer | false | Retrieve number of terms, default = 25 |
phraseLength | Integer | false | Phrase length, default = 4 |
projectId | Array of String | false | Thesaurus' project ID |
properties | Array of String | false | Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Set to |
regexFilename | String | false | File name for regex patterns |
sentimentAnalysis | boolean | false | Sentiment analysis, default: false |
shadowConceptCorpusId | Array of String | false | Shadow concepts calculation; enabled if corpusIds (UUID) are provided |
showMatchingDetails | boolean | false | Shows which concept labels where found inside the text, default = false |
showMatchingPosition | boolean | false | Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false |
text | String | true | Text of the document |
tfidfScoring | boolean | false | Use TFIDF scoring |
title | String | false | Title of the document |
useRelatedConcepts | boolean | false | Retrieve related concepts, default = false |
useTransitiveBroaderConcepts | boolean | false | Retrieve transitive broader concepts, default = false |
useTransitiveBroaderTopConcepts | boolean | false | Retrieve transitive broader top concepts, default = false |
useTypes | boolean | false | Retrieve custom types for concepts, default = false |
CustomProperty
Custom property
Attribute | Type | Comment |
---|---|---|
property | String | Property |
value | String | Value |
PPLocale
A PPLocale object
Attribute | Type | Comment |
---|---|---|
ALL_LANGUAGES | PPLocale | |
DUTCH | PPLocale | |
ENGLISH | PPLocale | |
FRENCH | PPLocale | |
GERMAN | PPLocale | |
RUSSIAN | PPLocale | |
SPANISH | PPLocale | |
VALID | PPLocale | |
country | String | |
language | String | |
languageTag | String |
NERConfig
Named Entity Recognition configuration
Attribute | Type | Comment |
---|---|---|
classUri | String | Class URI given to identified Named Entities |
method | Method | Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY |
type | String | Type of Named Entity Model; predefined models for MAXIMUM_ENTROPY: person, organization, location |
Example of a Named Entity Recognition Usage:
|
Response
Returns
Content-Type: application/json
Response Attributes
Arrays of Response Attributes
ExtractionResponse
Results of a text extraction request. Properties with no entries are not present
Attribute | Type | Comment |
---|---|---|
categories | Array of Category | Categories of the document |
classificationResults | Array of DocumentClassification | Document classification results |
concepts | Array of ThesaurusConcept | Matched concepts |
detectedLanguage | PPLocale | Detected Language of the document |
extractedTerms | Array of ExtractedTerm | Extracted freeTerms |
locations | Array of Location | Matched locations |
namedEntities | Array of NamedEntityResponse | Deprecated |
personNames | Array of String | Person name matches |
regexMatches | Array of RegexMatches | Regex token matches |
sentiments | Array of Sentiment | Matched sentiments |
shadowConcepts | Array of ShadowConceptResponse | Shadow Concepts |
text | String | Text as extracted from url or file |
title | String | Title as extracted from url or file |
Category
Categorization result
Attribute | Type | Comment |
---|---|---|
categoryConceptResults | Array of ConceptCategory | Categorized concepts |
prefLabel | String | Preferred label |
score | double | Score between 0.0-100.0 |
uri | String | Category URI |
ConceptCategory
Categorized concept
Attribute | Type | Comment |
---|---|---|
prefLabel | String | Preferred label |
score | double | Score from 0.0 to 100.0 |
uri | String | URI |
DocumentClassification
A DocumentClassification object.
Attribute | Type | Comment |
---|---|---|
predictedLabel | String | predictedLabel |
probabilities | Array of Prediction | Probabilities |
uri | String | URI of the classifier |
ThesaurusConcept
Concept from a PoolParty thesaurus project.
Attribute | Type | Comment |
---|---|---|
altLabels | Map of PPLocale | Alternative labels |
broaderConcepts | Array of String | URIs of all direct broader concepts |
conceptSchemes | Array of ThesaurusConceptScheme | The concept schemes this concept resides |
corporaScore | Double | Relevance score - e.g. when extracted from a text |
customAttributes | Array of CustomAttribute | Custom attributes |
customRelations | Array of CustomRelation | Custom relations |
customSchemeTypes | Array of CustomSchemeType | URIs of the custom types assigned to the concept |
frequencyInDocument | int | Frequency of the concept in the text |
frequencyInDocuments | int | Frequency of the concept in the text |
hiddenLabels | Map of PPLocale | Hidden labels |
id | String | Concept id |
languages | Array of PPLocale | Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept |
matchingLabels | Array of MatchingLabel | Matching labels |
prefLabels | Map of PPLocale | Preferred label |
project | String | UUID of the containing PoolParty project |
relatedConcepts | Array of String | URIs of all related concepts |
score | double | Normalized relevance score - e.g. when extracted from a text |
transitiveBroaderConcepts | Array of String | URIs of all transitive broader concepts |
transitiveBroaderTopConcepts | Array of String | URIs of all top concepts that this concept is connected to via a transitive broader-chain |
uri | String | Uniform resource identifier |
wordForms | Array of String | Lemmatized word forms |
ThesaurusConceptScheme
ConceptScheme from a PoolParty thesaurus project - acts as a container for concepts.
Attribute | Type | Comment |
---|---|---|
title | String | The localized title of this concept scheme |
uri | String | Uniform resource identifier |
CustomAttribute
Custom attribute
Attribute | Type | Comment |
---|---|---|
literal | Literal | Literal |
property | String | Property |
CustomRelation
Custom Relation
Attribute | Type | Comment |
---|---|---|
object | String | Object |
property | String | Property |
CustomSchemeType
(PoolParty) concept scheme - acts as a container for concepts.
Attribute | Type | Comment |
---|---|---|
title | String | The name of this custom scheme type |
uri | String | Uniform resource identifier |
ExtractedTerm
Phrase extracted from a text that does not match any concepts.
Attribute | Type | Comment |
---|---|---|
corporaScore | double | Corpora score |
frequencyInDocument | int | Frequency within the document where it was extracted |
frequencyInDocuments | int | Frequency within the documents where it was extracted |
score | double | Relevance score |
textValue | String | The term phrase |
Location
A geographical location extracted from a text.
Attribute | Type | Comment |
---|---|---|
countryCode | String | ISO 3166-1 alpha-2 country code |
latitude | float | Latitude |
longitude | float | Longitude |
matchedLabel | String | The location label that was found in the text |
name | String | Common name of the location |
score | Double | Relevance score |
type | LocationType | Location type - either city or country City | Country |
uri | String | Uniform resource identifier of the location |
NamedEntityResponse
Named Entity
Attribute | Type | Comment |
---|---|---|
frequency | int | Frequency in document |
metadata | Map of String | Metadata |
method | String | Method |
positions | Array of SimpleTokenPosition | Position |
score | double | Score |
textValue | String | Matched text |
type | String | Type |
RegexMatches
Regex match
Attribute | Type | Comment |
---|---|---|
regexMatches | Array of String | Tokens from the input text that match the regex pattern |
regexPattern | String | The original pattern used for matching |
Sentiment
Sentiment result
Attribute | Type | Comment |
---|---|---|
negativeTerms | Array of String | List of negative terms |
positiveTerms | Array of String | List of positive terms |
score | float | Score |
sentiment | String | Sentiment |
ShadowConceptResponse
Shadow concept
Attribute | Type | Comment |
---|---|---|
altLabels | Map of PPLocale | Alternative labels |
broaderConcepts | Array of String | URIs of all direct broader concepts |
conceptSchemes | Array of ThesaurusConceptScheme | The concept schemes this concept resides |
corporaScore | Double | Relevance score - e.g. when extracted from a text |
customAttributes | Array of CustomAttribute | Custom attributes |
customRelations | Array of CustomRelation | Custom relations |
customSchemeTypes | Array of CustomSchemeType | URIs of the custom types assigned to the concept |
hiddenLabels | Map of PPLocale | Hidden labels |
id | String | Concept id |
languages | Array of PPLocale | Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept |
prefLabels | Map of PPLocale | Preferred label |
project | String | UUID of the containing PoolParty project |
relatedConcepts | Array of String | URIs of all related concepts |
score | double | Normalized relevance score - e.g. when extracted from a text |
shadowConceptTerms | Array of ShadowTerm | Extracted terms that contribute to calculation of the shadow concept |
transitiveBroaderConcepts | Array of String | URIs of all transitive broader concepts |
transitiveBroaderTopConcepts | Array of String | URIs of all top concepts that this concept is connected to via a transitive broader-chain |
uri | String | Uniform resource identifier |
ShadowTerm
Phrase extracted from a text that does not match any Concepts
Attribute | Type | Comment |
---|---|---|
score | double | Relevance score |
textValue | String | The term phrase |
Web Service Method: Extract from File
Web Service Method: Extract from File
Description |
---|
Extracts and returns meaningful metadata like concepts and terms from a given file upload. |
URL: /extractor/api/extract
Request
Supported Methods |
---|
POST |
Content-Type:
multipart/form-data
HTTP Parameters
Parameter | Type | Required | Description |
---|---|---|---|
categorizationWithPpxBoost | boolean | false | Use Extractor boosting, default = false |
categorize | boolean | false | Categorization extraction, default = false |
charset | String | false | Character set used in the File |
conceptMinimumScore | Double | false | Minimum required score of concepts, default = 0 |
conceptSchemeFilters | Array of String | false | Concept scheme URI filters |
corpusScoring | Array of String | false | Corpus term scoring. Enabled if corpusIds (UUID) are provided. |
customAttributeFilters | Array of CustomProperty | false | Custom attribute (property uri and string value) filters |
customClassFilters | Array of String | false | Custom class URI filters |
disambiguate | boolean | false | Use thesaurus based disambiguation, default = false |
displayText | boolean | false | Include text extracted from file in response, default = false |
documentClassifierIds | Array of String | false | Enable document classification by giving the document classifier IDs as input. |
documentId | String | false | Internal ID of the document |
extraConceptLanguages | Array of PPLocale | false | Additional languages used for concept extraction (en|de|es|fr|...) Also supports wildcard * for all language |
extractorVersion | String | false | Version of PPX Extractor used |
file | MultipartFile | true | File to be extracted (word, excel, powerpoint, pdf, open documents) - Mimetype of file must be 'multipart/form-data' |
filterNestedConcepts | boolean | false | Remove concepts matches which are contained within other matches, default = false |
findPersonNames | boolean | false | Deprecated (use nerParameters) - extracts person names from the given text |
language | PPLocale | false | Extraction language (en|de|es|fr|...) |
lemmatization | boolean | false | Use lemmatization, default = false |
locationExtraction | boolean | false | Deprecated (use nerParameters) - extracts locations from the given text |
metadata | String | false | Metadata of the document (concatenated fields with delimiter: '.') |
nerParameters | Array of NERConfig | false | Array of models that are used for Named Entity Recognition |
numberOfConcepts | Integer | false | Retrieve number of concepts, default = 25 |
numberOfTerms | Integer | false | Retrieve number of terms, default = 25 |
phraseLength | Integer | false | Phrase length, default = 4 |
projectId | Array of String | false | Thesaurus projectIds |
properties | Array of String | false | Array of custom class attributes and relations that will be fetched by providing their property URIs as input. |
regexFilename | String | false | File name for regex patterns |
sentimentAnalysis | boolean | false | Sentiment analysis, default: false |
shadowConceptCorpusId | Array of String | false | Shadow concepts calculation. Enabled if corpusIds (UUID) are provided |
showMatchingDetails | boolean | false | Shows which concept labels where found inside the text, default = false |
showMatchingPosition | boolean | false | Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false |
tfidfScoring | boolean | false | Use TFIDF scoring |
title | String | false | Title of the document |
useRelatedConcepts | boolean | false | Retrieve related concepts, default = false |
useTransitiveBroaderConcepts | boolean | false | Retrieve transitive broader concepts, default = false |
useTransitiveBroaderTopConcepts | boolean | false | Retrieve transitive broader top concepts, default = false |
useTypes | boolean | false | Retrieve custom types for concepts, default = false |
Custom property
Attribute | Type | Comment |
---|---|---|
property | String | Property |
value | String | Value |
PPLocale
A PPLocale object
Attribute | Type | Comment |
---|---|---|
ALL_LANGUAGES | PPLocale | |
DUTCH | PPLocale | |
ENGLISH | PPLocale | |
FRENCH | PPLocale | |
GERMAN | PPLocale | |
RUSSIAN | PPLocale | |
SPANISH | PPLocale | |
VALID | PPLocale | |
country | String | |
language | String | |
languageTag | String |
A MultipartFile object
Named Entity Recognition configuration
Attribute | Type | Required | Comment |
---|---|---|---|
classUri | String | false | Class URI given to identified Named Entities |
method | Method | false | Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY |
type | String | false | Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location |
|
Content-Type: application/json
Results of a file based text extraction request. Properties with no entries are not present
Attribute | Type | Comment |
---|---|---|
document | ExtractionResponse | Extraction result |
metadata | ExtractionResponse | Metadata extraction result |
text | String | File text content |
title | String | File title |
Results of a text extraction request. Properties with no entries are not present
Attribute | Type | Comment |
---|---|---|
categories | Array of Category | Categories of the document |
classificationResults | Array of DocumentClassification | Document classification results |
concepts | Array of ThesaurusConcept | Matched concepts |
detectedLanguage | PPLocale | Detected Language of the document |
extractedTerms | Array of ExtractedTerm | Extracted freeTerms |
locations | Array of Location | Matched locations |
namedEntities | Array of NamedEntityResponse | Named Entities |
personNames | Array of String | Deprecated |
regexMatches | Array of RegexMatches | Regex token matches |
sentiments | Array of Sentiment | Matched sentiments |
shadowConcepts | Array of ShadowConceptResponse | Shadow Concepts |
text | String | Text as extracted from url or file |
title | String | Title as extracted from url or file |
Categorization result
Attribute | Type | Comment |
---|---|---|
categoryConceptResults | Array of ConceptCategory | Categorized concepts |
prefLabel | String | Preferred label |
score | double | Score between 0.0-100.0 |
uri | String | Category URI |
Categorized concept
Attribute | Type | Comment |
---|---|---|
prefLabel | String | Preferred label |
score | double | Score from 0.0 to 100.0 |
uri | String | URI |
A DocumentClassification object.
Attribute | Type | Comment |
---|---|---|
predictedLabel | String | predictedLabel |
probabilities | Array of Prediction | Probabilities |
uri | String | URI of the classifier |
Concept from a PoolParty thesaurus project.
Attribute | Type | Comment |
---|---|---|
altLabels | Map of PPLocale | Alternative labels |
broaderConcepts | Array of String | URIs of all direct broader concepts |
conceptSchemes | Array of ThesaurusConceptScheme | The concept schemes this concept resides in. |
corporaScore | Double | Relevance score - e.g. when extracted from a text. |
customAttributes | Array of CustomAttribute | Custom attributes |
customRelations | Array of CustomRelation | Custom relations |
customSchemeTypes | Array of CustomSchemeType | URIs of the custom types assigned to the concept |
frequencyInDocument | int | Frequency of the concept in the text |
frequencyInDocuments | int | Frequency of the concept in the text |
hiddenLabels | Map of PPLocale | Hidden labels |
id | String | Concept id |
languages | Array of PPLocale | Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept. |
matchingLabels | Array of MatchingLabel | Matching labels |
prefLabels | Map of PPLocale | Preferred label |
project | String | UUID of the containing PoolParty project |
relatedConcepts | Array of String | URIs of all related concepts |
score | double | Normalized relevance score - e.g. when extracted from a text. |
transitiveBroaderConcepts | Array of String | URIs of all transitive broader concepts |
transitiveBroaderTopConcepts | Array of String | URIs of all top concepts that this concept is connected to via a transitive broader-chain. |
uri | String | Uniform resource identifier |
wordForms | Array of String | Lemmatized word forms |
ConceptScheme from a PoolParty thesaurus project - acts as a container for concepts.
Attribute | Type | Comment |
---|---|---|
title | String | The localized title of this concept scheme |
uri | String | Uniform resource identifier |
Custom attribute
Attribute | Type | Comment |
---|---|---|
literal | Literal | Literal |
property | String | Property |
Custom relation
Attribute | Type | Comment |
---|---|---|
object | String | Object |
property | String | Property |
(PoolParty) concept scheme - acts as a container for concepts
Attribute | Type | Comment |
---|---|---|
title | String | The name of this custom scheme type |
uri | String | Uniform resource identifier |
Phrase extracted from a text that does not match any concepts
Attribute | Type | Comment |
---|---|---|
corporaScore | Double | Corpora score |
frequencyInDocument | int | Frequency within the document where it was extracted. |
frequencyInDocuments | int | Frequency within the documents where it was extracted. |
score | Double | Relevance score |
textValue | String | The term phrase |
A geographical location extracted from a text.
Attribute | Type | Comment |
---|---|---|
countryCode | String | ISO 3166-1 alpha-2 country code |
latitude | float | Latitude |
longitude | float | Longitude |
matchedLabel | String | The location label that was found in the text |
name | String | Common name of the location |
score | Double | Relevance score |
type | LocationType | Location type - either city or country City | Country |
uri | String | Uniform resource identifier of the location. |
Named Entity
Attribute | Type | Comment |
---|---|---|
frequency | int | Frequency in document |
metadata | Map of String | Metadata |
method | String | Method |
positions | Array of SimpleTokenPosition | Position |
score | double | Score |
textValue | String | Matched text |
type | String | Type |
Regex match
Attribute | Type | Comment |
---|---|---|
regexMatches | Array of String | Tokens from the input text that match the regex pattern |
regexPattern | String | The original pattern used to match |
Sentiment result
Attribute | Type | Comment |
---|---|---|
negativeTerms | Array of String | List of negative terms |
positiveTerms | Array of String | List of positive terms |
score | float | Score |
sentiment | String | Sentiment |
Shadow concept
Attribute | Type | Comment |
---|---|---|
altLabels | Map of PPLocale | Alternative labels |
broaderConcepts | Array of String | URIs of all direct broader concepts |
conceptSchemes | Array of ThesaurusConceptScheme | The concept schemes this concept resides |
corporaScore | Double | Relevance score - e.g. when extracted from a text |
customAttributes | Array of CustomAttribute | Custom attributes |
customRelations | Array of CustomRelation | Custom relations |
customSchemeTypes | Array of CustomSchemeType | URIs of the custom types assigned to the concept |
hiddenLabels | Map of PPLocale | Hidden labels |
id | String | Concept id |
languages | Array of PPLocale | Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept |
prefLabels | Map of PPLocale | Preferred label |
project | String | UUID of the containing PoolParty project |
relatedConcepts | Array of String | URIs of all related concepts |
score | double | Normalized relevance score - e.g. when extracted from a text |
shadowConceptTerms | Array of ShadowTerm | Extracted terms that contribute to calculation of the shadow concept |
transitiveBroaderConcepts | Array of String | URIs of all transitive broader concepts |
transitiveBroaderTopConcepts | Array of String | URIs of all top concepts that this concept is connected to via a transitive broader-chain |
uri | String | Uniform resource identifier |
Phrase extracted from a text that does not match any concepts
Attribute | Type | Comment |
---|---|---|
score | double | Relevance score |
textValue | String | The term phrase |
Web Service Method: Extract from URL
Web Service Method: Extract from URL
Description |
---|
[url] Extracts and returns meaningful metadata like concepts and terms from a given URL. |
URL: /extractor/api/extract
Supported Methods |
---|
POST |
GET |
application/x-www-form-urlencoded
Parameter | Type | Required | Description |
---|---|---|---|
categorizationWithPpxBoost | boolean | false | Use Extractor boosting, default = false |
categorize | boolean | false | Categorization extraction, default = false |
conceptMinimumScore | Double | false | Minimum required score of concepts, default = 0 |
conceptSchemeFilters | Array of String | false | Concept scheme URI filters |
corpusScoring | Array of String | false | Corpus term scoring. Enabled if corpusIds (UUID) are provided |
customAttributeFilters | Array of CustomProperty | false | Custom attribute (property uri and string value) filters |
customClassFilters | Array of String | false | Custom class URI filters |
disambiguate | boolean | false | Use thesaurus based disambiguation, default = false |
displayText | boolean | false | Include text extracted from url in response, default = false |
documentClassifierIds | Array of String | false | Enable document classification by giving the document classifier IDs as input |
documentId | String | false | Internal ID of the document |
extraConceptLanguages | Array of PPLocale | false | Additional languages used for concept extraction (en|de|es|fr|...) Also supports wildcard * for all languages |
extractorVersion | String | false | Version of PPX Extractor used |
filterNestedConcepts | boolean | false | Remove concepts matches which are contained within other matches, default = false |
findPersonNames | boolean | false | Deprecated (use nerParameters) - extracts person names from the given text |
language | PPLocale | false | Extraction language (en|de|es|fr|...) |
lemmatization | boolean | false | Use lemmatization, default = true |
locationExtraction | boolean | false | Deprecated (use nerParameters) - extracts locations from the given text |
nerParameters | Array of NERConfig | false | Array of models that are used for Named Entity Recognition |
numberOfConcepts | Integer | false | Retrieve number of concepts, default = 25 |
numberOfTerms | Integer | false | Retrieve number of terms, default = 25 |
phraseLength | Integer | false | Phrase length, default = 4 |
projectId | Array of String | false | Thesaurus projectIds |
properties | Array of String | false | Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Set to |
regexFilename | String | false | File name for regex patterns |
sentimentAnalysis | boolean | false | Sentiment analysis, default: false |
shadowConceptCorpusId | Array of String | false | Shadow concepts calculation. Enabled if corpusIds (UUID) are provided |
showMatchingDetails | boolean | false | Shows which concept labels where found inside the text, default = false |
showMatchingPosition | boolean | false | Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false |
tfidfScoring | boolean | false | Use TFIDF scoring |
title | String | false | Title of the document |
url | String | true | URL to be extracted |
useRelatedConcepts | boolean | false | Retrieve related concepts, default = false |
useTransitiveBroaderConcepts | boolean | false | Retrieve transitive broader concepts, default = false |
useTransitiveBroaderTopConcepts | boolean | false | Retrieve transitive broader top concepts, default = false |
useTypes | boolean | false | Retrieve custom types for concepts, default = false |
Attribute | Type | Required | Comment |
---|---|---|---|
property | String | false | Property |
value | String | false | Value |
A PPLocale object
Attribute | Type | Comment |
---|---|---|
ALL_LANGUAGES | PPLocale | |
DUTCH | PPLocale | |
ENGLISH | PPLocale | |
FRENCH | PPLocale | |
GERMAN | PPLocale | |
RUSSIAN | PPLocale | |
SPANISH | PPLocale | |
VALID | PPLocale | |
country | String | |
language | String | |
languageTag | String |
Named Entity Recognition configuration
Attribute | Type | Required | Comment |
---|---|---|---|
classUri | String | false | Class URI given to identified Named Entities |
method | Method | false | Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY |
type | String | false | Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location |
|
curl --location 'https://docu.semantic-web.at/extractor/api/extract' \ --header 'Authorization: Basic e3t1c2VybmFtZX19Ont7cGFzc3dvcmR9fQ==' \ --header 'Cookie: JSESSIONID=92B9D4DDDE6465C32A1F7B37B666D4B3' \ --form 'projectID="{{project_id}}"' \ --form 'url="https://en.wikipedia.org/wiki/Artificial_intelligence"' \ --form 'language="en"' \ --form 'showMatchingDetails="true"' \ --form 'displayText="true"'
Content-Type: application/json
Results of a text extraction request. Properties with no entries are not present
Attribute | Type | Comment |
---|---|---|
categories | Array of Category | Categories of the document |
classificationResults | Array of DocumentClassification | Document classification results |
concepts | Array of ThesaurusConcept | Matched concepts |
detectedLanguage | PPLocale | Detected Language of the document |
extractedTerms | Array of ExtractedTerm | Extracted freeTerms |
locations | Array of Location | Matched locations |
namedEntities | Array of NamedEntityResponse | Named Entities |
personNames | Array of String | Deprecated |
regexMatches | Array of RegexMatches | Regex token matches |
sentiments | Array of Sentiment | Matched sentiments |
shadowConcepts | Array of ShadowConceptResponse | Shadow Concepts |
text | String | Text as extracted from url or file |
title | String | Title as extracted from url or file |
Categorization result
Attribute | Type | Comment |
---|---|---|
categoryConceptResults | Array of ConceptCategory | Categorized concepts |
prefLabel | String | Preferred label |
score | double | Score between 0.0-100.0 |
uri | String | Category URI |
Categorized concept
Attribute | Type | Comment |
---|---|---|
prefLabel | String | Preferred label |
score | double | Score from 0.0 to 100.0 |
uri | String | URI |
A DocumentClassification object.
Attribute | Type | Comment |
---|---|---|
predictedLabel | String | predictedLabel |
probabilities | Array of Double | Probabilities |
uri | String | URI of the classifier |
Concept from a PoolParty thesaurus project
Attribute | Type | Comment |
---|---|---|
altLabels | Map of PPLocale | Alternative labels |
broaderConcepts | Array of String | URIs of all direct broader concepts |
conceptSchemes | Array of ThesaurusConceptScheme | The concept schemes this concept resides |
corporaScore | double | Relevance score - e.g. when extracted from a text. |
customAttributes | Array of CustomAttribute | Custom attributes |
customRelations | Array of CustomRelation | Custom relations |
customSchemeTypes | Array of CustomSchemeType | URIs of the custom types assigned to the concept |
frequencyInDocument | int | Frequency of the concept in the text |
frequencyInDocuments | int | Frequency of the concept in the text |
hiddenLabels | Map of PPLocale | Hidden labels |
id | String | Concept id |
languages | Array of PPLocale | Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept. |
matchingLabels | Array of MatchingLabel | Matching labels |
prefLabels | Map of PPLocale | Preferred label |
project | String | UUID of the containing PoolParty project. |
relatedConcepts | Array of String | URIs of all related concepts |
score | double | Normalized relevance score - e.g. when extracted from a text. |
transitiveBroaderConcepts | Array of String | URIs of all transitive broader concepts |
transitiveBroaderTopConcepts | Array of String | URIs of all top concepts that this concept is connected to via a transitive broader-chain. |
uri | String | Uniform resource identifier |
wordForms | Array of String | Lemmatized word forms |
ConceptScheme from a PoolParty thesaurus project - acts as a container for concepts
Attribute | Type | Comment |
---|---|---|
title | String | The localized title of this concept scheme |
uri | String | Uniform resource identifier |
Custom attribute
Attribute | Type | Comment |
---|---|---|
literal | Literal | Literal |
property | String | Property |
Custom Relation
Attribute | Type | Comment |
---|---|---|
object | String | Object |
property | String | Property |
(PoolParty) concept scheme - acts as a container for concepts
Attribute | Type | Comment |
---|---|---|
title | String | The name of this custom scheme type |
uri | String | Uniform resource identifier |
Phrase extracted from a text that does not match any concepts
Attribute | Type | Comment |
---|---|---|
corporaScore | Double | Corpora score |
frequencyInDocument | int | Frequency within the document where it was extracted |
frequencyInDocuments | int | Frequency within the documents where it was extracted |
score | Double | Relevance score |
textValue | String | The term phrase |
A geographical location extracted from a text.
Attribute | Type | Comment |
---|---|---|
countryCode | String | ISO 3166-1 alpha-2 country code |
latitude | float | Latitude |
longitude | float | Longitude |
matchedLabel | String | The location label that was found in the text. |
name | String | Common name of the location. |
score | Double | Relevance score |
type | LocationType | Location type - either city or country City | Country |
uri | String | Uniform resource identifier of the location |
Named Entity
Attribute | Type | Comment |
---|---|---|
frequency | int | Frequency in document |
metadata | Map of String | Metadata |
method | String | Method |
positions | Array of SimpleTokenPosition | Position |
score | double | Score |
textValue | String | Matched text |
type | String | Type |
Regex match
Attribute | Type | Comment |
---|---|---|
regexMatches | Array of String | Tokens from the input text that match the regex pattern |
regexPattern | String | The original pattern used to match |
Sentiment result
Attribute | Type | Comment |
---|---|---|
negativeTerms | Array of String | List of negative terms |
positiveTerms | Array of String | List of positive terms |
score | float | Score |
sentiment | String | Sentiment |
Shadow Concept
Attribute | Type | Comment |
---|---|---|
altLabels | Map of PPLocale | Alternative labels |
broaderConcepts | Array of String | URIs of all direct broader concepts |
conceptSchemes | Array of ThesaurusConceptScheme | The concept schemes this concept resides |
corporaScore | Double | Relevance score - e.g. when extracted from a text |
customAttributes | Array of CustomAttribute | Custom attributes |
customRelations | Array of CustomRelation | Custom relations |
customSchemeTypes | Array of CustomSchemeType | URIs of the custom types assigned to the concept |
hiddenLabels | Map of PPLocale | Hidden labels |
id | String | Concept id |
languages | Array of PPLocale | Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept |
prefLabels | Map of PPLocale | Preferred label |
project | String | UUID of the containing PoolParty project |
relatedConcepts | Array of String | URIs of all related concepts |
score | double | Normalized relevance score - e.g. when extracted from a text |
shadowConceptTerms | Array of ShadowTerm | Extracted terms that contribute to calculation of the shadow concept |
transitiveBroaderConcepts | Array of String | URIs of all transitive broader concepts |
transitiveBroaderTopConcepts | Array of String | URIs of all top concepts that this concept is connected to via a transitive broader-chain |
uri | String | Uniform resource identifier |
Phrase extracted from a text that does not match any Concepts
Attribute | Type | Comment |
---|---|---|
score | double | Relevance score |
textValue | String | The term phrase |