Web Service Method: Extract Metadata from Zip File - Aggregated

Abstract

Description
[file] Extracts and returns a single aggregated document with meaningful metadata like concepts and terms from documents which are packed inside a given archive file (*.zip) upload.

URL: /extractor/api/extract/zip/aggregated

Request

Supported Methods
POST

Content-Type

multipart/form-data

HTTP Parameter

Parameter	Type	Required	Description
categorizationWithPpxBoost	boolean	false	Use Extractor boosting, default = false
categorize	boolean	false	Categorization extraction, default = false
conceptSchemeFilters	Array of String	false	Concept scheme URI filters
corpusScoring	Array of String	false	Corpus term scoring. Enabled if corpusIds (UUID) are provided
customAttributeFilters	Array of CustomProperty	false	Custom attribute (property uri and string value) filters
customClassFilters	Array of String	false	Custom class URI filters
disambiguate	boolean	false	Use thesaurus based disambiguation, default = false
displayText	boolean	false	Include text extracted from url in response, default = false
documentClassifierIds	Array of String	false	Enable document classification by giving the document classifier IDs as input
documentId	String	false	Internal ID of the document
file	MultipartFile	true	File to be extracted (word, excel, powerpoint, pdf, open documents) - Mimetype of file must be 'multipart/form-data'
filterNestedConcepts	boolean	false	Remove concepts matches which are contained within other matches, default = false
findPersonNames	boolean	false	Person name extraction, default = false
language	String	false	Extraction language (en\|de\|es\|fr\|...)
lemmatization	boolean	false	Use lemmatization, default = true
locationExtraction	boolean	false	Location extraction, default = false
metadata	String	false	Metadata of the document (concatenated fields with delimiter: '.')
numberOfConcepts	Integer	false	Retrieve number of concepts, default = 25
numberOfTerms	Integer	false	Retrieve number of terms, default = 25
projectId	Array of String	false	Thesaurus projectIds
properties	Array of String	false	Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Set to `all` to fetch all properties.
regexFilename	String	false	File name for regex patterns
sentimentAnalysis	boolean	false	Sentiment analysis, default: false
shadowConceptCorpusId	Array of String	false	Shadow concepts calculation. Enabled if corpusIds (UUID) are provided
showMatchingDetails	boolean	false	Shows which concept labels where found inside the text, default = false
showMatchingPosition	boolean	false	Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false
tfidfScoring	boolean	false	Use TFIDF scoring
useRelatedConcepts	boolean	false	Retrieve related concepts, default = false
useTransitiveBroaderConcepts	boolean	false	Retrieve transitive broader concepts, default = false
useTransitiveBroaderTopConcepts	boolean	false	Retrieve transitive broader top concepts, default = false
useTypes	boolean	false	Retrieve custom types for concepts, default = false

Custom property

Attribute	Type	Comment
property	String	Property
value	String	Value

Response

This method returns execution results in JSON format.

Click here to expand Response Arrays and Attributes...

ZipFileExtractionResponse

Results of an file based text extraction request. Properties with no entries are not present.

Attribute	Type	Comment
aggregatedResponse	FileExtractionResponse	Aggregated result
defaultExtractions	Array of FileExtractionResponse	List of extracted file results
message	String	Additional message
numberOfExtractedDocuments	int	Number of extracted documents

FileExtractionResponse

Results of an file based text extraction request. Properties with no entries are not present

Attribute	Type	Comment
document	ExtractionResponse	Extraction result
metadata	ExtractionResponse	Metadata extraction result
text	String	File text content
title	String	File title

ExtractionResponse

Results of an text extraction request. Properties with no entries are not present

Attribute	Type	Comment
categories	Array of Category	Categories of the document
classificationResults	Array of DocumentClassification	Document classification results
concepts	Array of ThesaurusConcept	Matched concepts
detectedLanguage	String	Detected Language of the document
extractedTerms	Array of ExtractedTerm	Extracted freeTerms
locations	Array of Location	Matched locations
personNames	Array of String	Person name matches
regexMatches	Array of RegexMatches	Regex token matches
sentiments	Array of Sentiment	Matched sentiments
shadowConcepts	Array of ThesaurusConcept	Shadow Concepts
text	String	Text as extracted from url or file
title	String	Title as extracted from url or file

Category

Categorization result

Attribute	Type	Comment
categoryConceptResults	Array of ConceptCategory	Categorized concepts
prefLabel	String	Preferred label
score	double	Score
uri	String	Uri

ConceptCategory

Categorized concept

Attribute	Type	Comment
prefLabel	String	Preferred label
score	double	Score
uri	String	Uri

DocumentClassification

A DocumentClassification object.

Attribute	Type	Comment
predictedLabel	String	predictedLabel
probabilities	Array of Prediction	Probabilities
uri	String	URI of the classifier

ThesaurusConcept

Concept from a PoolParty thesaurus project

Attribute	Type	Comment
altLabels	Array of String	Alternative labels
broaderConcepts	Array of String	URIs of all direct broader concepts
conceptSchemes	Array of ThesaurusConceptScheme	The concept schemes this concept resides
corporaScore	double	Relevance score - e.g. when extracted from a text
customAttributes	Array of CustomAttribute	Custom attributes
customRelations	Array of CustomRelation	Custom relations
customSchemeTypes	Array of CustomSchemeType	URIs of the custom types assigned to the concept
frequencyInDocument	int	Frequency of the concept in the text
frequencyInDocuments	int	Frequency of the concept in the text
hiddenLabels	Array of String	Hidden labels
id	String	Concept id
language	String	Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept
matchingLabels	Array of MatchingLabel	Matching labels
prefLabel	String	Preferred label
project	String	UUID of the containing PoolParty project
relatedConcepts	Array of String	URIs of all related concepts
score	double	Normalized relevance score - e.g. when extracted from a text
shadowConceptTerms	Array of ExtractedTerm
transitiveBroaderConcepts	Array of String	URIs of all transitive broader concepts
transitiveBroaderTopConcepts	Array of String	URIs of all top concepts that this concept is connected to via a transitive broader-chain
uri	String	Uniform resource identifier
wordForms	Array of String	Lemmatized word forms

ThesaurusConceptScheme

ConceptScheme from a PoolParty thesaurus project - acts as a container for concepts

Attribute	Type	Comment
title	String	The localized title of this concept scheme
uri	String	Uniform resource identifier

CustomAttribute

Custom attribute

Attribute	Type	Comment
literal	Literal	Literal
property	String	Property

CustomRelation

Custom Relation

Attribute	Type	Comment
object	String	Object
property	String	Property

CustomSchemeType

(PoolParty) concept scheme - acts as a container for concepts

Attribute	Type	Comment
title	String	The name of this custom scheme type
uri	String	Uniform resource identifier

ExtractedTerm

Phrase extracted from a text that does not match any Concepts

Attribute	Type	Comment
corporaScore	double	Corpora score
frequencyInDocument	int	Frequency within the document where it was extracted
frequencyInDocuments	int	Frequency within the documents where it was extracted
score	double	Relevance score
textValue	String	The term phrase

Location

A geographical location extracted from a text

Attribute	Type	Comment
countryCode	String	ISO 3166-1 alpha-2 country code
latitude	float	Latitude
longitude	float	Longitude
matchedLabel	String	The location label that was found in the text
name	String	Common name of the location
score	Double	Relevance score
type	LocationType	Location type - either city or country City \| Country
uri	String	Uniform resource identifier of the location

RegexMatches

Regex match

Attribute	Type	Comment
regexMatches	Array of String	Tokens from the input text that match the regex pattern
regexPattern	String	The original pattern used to match

Sentiment

Sentiment result

Attribute	Type	Comment
negativeTerms	Array of String	List of negative terms
positiveTerms	Array of String	List of positive terms
score	float	Score
sentiment	String	Sentiment

In this section: