Web Service Method: Extract from File
Web Service Method: Extract from File
Description |
[file] Extracts and returns meaningful metadata like concepts and terms from a given file upload. |
URL: /extractor/api/extract
Supported Methods |
Parameter | Type | Required | Description |
categorizationWithPpxBoost | boolean | false | Use Extractor boosting, default = false |
categorize | boolean | false | Categorization extraction, default = false |
charset | String | false | Character set used in the File |
conceptMinimumScore | Double | false | Minimum required score of concepts, default = 0 |
conceptSchemeFilters | Array of String | false | Concept scheme URI filters |
corpusScoring | Array of String | false | Corpus term scoring. Enabled if corpusIds (UUID) are provided. |
customAttributeFilters | Array of CustomProperty | false | Custom attribute (property uri and string value) filters |
customClassFilters | Array of String | false | Custom class URI filters |
disambiguate | boolean | false | Use thesaurus based disambiguation, default = false |
displayText | boolean | false | Include text extracted from url in response, default = false |
documentClassifierIds | Array of String | false | Enable document classification by giving the document classifier IDs as input. |
documentId | String | false | Internal ID of the document |
extraConceptLanguages | Array of PPLocale | false | Additional languages used for concept extraction (en|de|es|fr|...) Also supports wildcard * for all language |
extractorVersion | String | false | Version of PPX Extractor used |
file | MultipartFile | true | File to be extracted (word, excel, powerpoint, pdf, open documents) - Mimetype of file must be 'multipart/form-data' |
filterNestedConcepts | boolean | false | Remove concepts matches which are contained within other matches, default = false |
findPersonNames | boolean | false | Deprecated (use nerParameters) - extracts person names from the given text |
language | PPLocale | false | Extraction language (en|de|es|fr|...) |
lemmatization | boolean | false | Use lemmatization, default = false |
locationExtraction | boolean | false | Deprecated (use nerParameters) - extracts locations from the given text |
metadata | String | false | Metadata of the document (concatenated fields with delimiter: '.') |
nerParameters | Array of NERConfig | false | Array of models that are used for Named Entity Recognition |
numberOfConcepts | Integer | false | Retrieve number of concepts, default = 25 |
numberOfTerms | Integer | false | Retrieve number of terms, default = 25 |
phraseLength | Integer | false | Phrase length, default = 4 |
projectId | Array of String | false | Thesaurus projectIds |
properties | Array of String | false | Array of custom class attributes and relations that will be fetched by providing their property URIs as input. |
regexFilename | String | false | File name for regex patterns |
sentimentAnalysis | boolean | false | Sentiment analysis, default: false |
shadowConceptCorpusId | Array of String | false | Shadow concepts calculation. Enabled if corpusIds (UUID) are provided |
showMatchingDetails | boolean | false | Shows which concept labels where found inside the text, default = false |
showMatchingPosition | boolean | false | Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false |
tfidfScoring | boolean | false | Use TFIDF scoring |
title | String | false | Title of the document |
useRelatedConcepts | boolean | false | Retrieve related concepts, default = false |
useTransitiveBroaderConcepts | boolean | false | Retrieve transitive broader concepts, default = false |
useTransitiveBroaderTopConcepts | boolean | false | Retrieve transitive broader top concepts, default = false |
useTypes | boolean | false | Retrieve custom types for concepts, default = false |
Custom property
Attribute | Type | Comment |
property | String | Property |
value | String | Value |
A PPLocale object
Attribute | Type | Comment |
DUTCH | PPLocale | |
ENGLISH | PPLocale | |
FRENCH | PPLocale | |
GERMAN | PPLocale | |
RUSSIAN | PPLocale | |
SPANISH | PPLocale | |
VALID | PPLocale | |
country | String | |
language | String | |
languageTag | String |
A MultipartFile object
Named Entity Recognition configuration
Attribute | Type | Required | Comment |
classUri | String | false | Class URI given to identified Named Entities |
method | Method | false | Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY |
type | String | false | Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location |
Content-Type: application/json
Click here to expand...
Results of an file based text extraction request. Properties with no entries are not present
Attribute | Type | Comment |
document | ExtractionResponse | Extraction result |
metadata | ExtractionResponse | Metadata extraction result |
text | String | File text content |
title | String | File title |
Results of an text extraction request. Properties with no entries are not present
Attribute | Type | Comment |
categories | Array of Category | Categories of the document |
classificationResults | Array of DocumentClassification | Document classification results |
concepts | Array of ThesaurusConcept | Matched concepts |
detectedLanguage | PPLocale | Detected Language of the document |
extractedTerms | Array of ExtractedTerm | Extracted freeTerms |
locations | Array of Location | Matched locations |
namedEntities | Array of NamedEntityResponse | Named Entities |
personNames | Array of String | Deprecated |
regexMatches | Array of RegexMatches | Regex token matches |
sentiments | Array of Sentiment | Matched sentiments |
shadowConcepts | Array of ShadowConceptResponse | Shadow Concepts |
text | String | Text as extracted from url or file |
title | String | Title as extracted from url or file |
Categorization result
Attribute | Type | Comment |
categoryConceptResults | Array of ConceptCategory | Categorized concepts |
prefLabel | String | Preferred label |
score | double | Score between 0.0-100.0 |
uri | String | Category URI |
Categorized concept
Attribute | Type | Comment |
prefLabel | String | Preferred label |
score | double | Score from 0.0 to 100.0 |
uri | String | URI |
A DocumentClassification object.
Attribute | Type | Comment |
predictedLabel | String | predictedLabel |
probabilities | Array of Prediction | Probabilities |
uri | String | URI of the classifier |
Concept from a PoolParty thesaurus project.
Attribute | Type | Comment |
altLabels | Map of PPLocale | Alternative labels |
broaderConcepts | Array of String | URIs of all direct broader concepts |
conceptSchemes | Array of ThesaurusConceptScheme | The concept schemes this concept resides in. |
corporaScore | Double | Relevance score - e.g. when extracted from a text. |
customAttributes | Array of CustomAttribute | Custom attributes |
customRelations | Array of CustomRelation | Custom relations |
customSchemeTypes | Array of CustomSchemeType | URIs of the custom types assigned to the concept |
frequencyInDocument | int | Frequency of the concept in the text |
frequencyInDocuments | int | Frequency of the concept in the text |
hiddenLabels | Map of PPLocale | Hidden labels |
id | String | Concept id |
languages | Array of PPLocale | Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept. |
matchingLabels | Array of MatchingLabel | Matching labels |
prefLabels | Map of PPLocale | Preferred label |
project | String | UUID of the containing PoolParty project |
relatedConcepts | Array of String | URIs of all related concepts |
score | double | Normalized relevance score - e.g. when extracted from a text. |
transitiveBroaderConcepts | Array of String | URIs of all transitive broader concepts |
transitiveBroaderTopConcepts | Array of String | URIs of all top concepts that this concept is connected to via a transitive broader-chain. |
uri | String | Uniform resource identifier |
wordForms | Array of String | Lemmatized word forms |
ConceptScheme from a PoolParty thesaurus project - acts as a container for concepts.
Attribute | Type | Comment |
title | String | The localized title of this concept scheme |
uri | String | Uniform resource identifier |
Custom attribute
Attribute | Type | Comment |
literal | Literal | Literal |
property | String | Property |
Custom Relation
Attribute | Type | Comment |
object | String | Object |
property | String | Property |
(PoolParty) concept scheme - acts as a container for concepts
Attribute | Type | Comment |
title | String | The name of this custom scheme type |
uri | String | Uniform resource identifier |
Phrase extracted from a text that does not match any concepts
Attribute | Type | Comment |
corporaScore | Double | Corpora score |
frequencyInDocument | int | Frequency within the document where it was extracted. |
frequencyInDocuments | int | Frequency within the documents where it was extracted. |
score | Double | Relevance score |
textValue | String | The term phrase |
A geographical location extracted from a text.
Attribute | Type | Comment |
countryCode | String | ISO 3166-1 alpha-2 country code |
latitude | float | Latitude |
longitude | float | Longitude |
matchedLabel | String | The location label that was found in the text |
name | String | Common name of the location |
score | Double | Relevance score |
type | LocationType | Location type - either city or country City | Country |
uri | String | Uniform resource identifier of the location. |
Named Entity
Attribute | Type | Comment |
frequency | int | Frequency in document |
metadata | Map of String | Metadata |
method | String | Method |
positions | Array of SimpleTokenPosition | Position |
score | double | Score |
textValue | String | Matched text |
type | String | Type |
Regex match
Attribute | Type | Comment |
regexMatches | Array of String | Tokens from the input text that match the regex pattern |
regexPattern | String | The original pattern used to match |
Sentiment result
Attribute | Type | Comment |
negativeTerms | Array of String | List of negative terms |
positiveTerms | Array of String | List of positive terms |
score | float | Score |
sentiment | String | Sentiment |
Shadow Concept
Attribute | Type | Comment |
altLabels | Map of PPLocale | Alternative labels |
broaderConcepts | Array of String | URIs of all direct broader concepts |
conceptSchemes | Array of ThesaurusConceptScheme | The concept schemes this concept resides |
corporaScore | Double | Relevance score - e.g. when extracted from a text |
customAttributes | Array of CustomAttribute | Custom attributes |
customRelations | Array of CustomRelation | Custom relations |
customSchemeTypes | Array of CustomSchemeType | URIs of the custom types assigned to the concept |
hiddenLabels | Map of PPLocale | Hidden labels |
id | String | Concept id |
languages | Array of PPLocale | Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept |
prefLabels | Map of PPLocale | Preferred label |
project | String | UUID of the containing PoolParty project |
relatedConcepts | Array of String | URIs of all related concepts |
score | double | Normalized relevance score - e.g. when extracted from a text |
shadowConceptTerms | Array of ShadowTerm | Extracted terms that contribute to calculation of the shadow concept |
transitiveBroaderConcepts | Array of String | URIs of all transitive broader concepts |
transitiveBroaderTopConcepts | Array of String | URIs of all top concepts that this concept is connected to via a transitive broader-chain |
uri | String | Uniform resource identifier |
Phrase extracted from a text that does not match any Concepts
Attribute | Type | Comment |
score | double | Relevance score |
textValue | String | The term phrase |