Skip to main content

Web Service Method: Extract Metadata from Inside Zip File

Abstract

Web Service Method: Extract Metadata from Inside Zip File

Description

[file] Extracts and returns a list of documents with meaningful metadata like concepts and terms from documents which are packed inside a given archive file (*.zip) upload.

URL: /extractor/api/extract/zip

Request

Supported Methods

POST

Content-Type

multipart/form-data

HTTP Parameters

Parameter

Type

Required

Description

categorizationWithPpxBoost

boolean

false

Use Extractor boosting, default = false

categorize

boolean

false

Categorization extraction, default = false

conceptSchemeFilters

Array of String

false

Concept scheme URI filters

corpusScoring

Array of String

false

Corpus term scoring. Enabled if corpusIds (UUID) are provided

customAttributeFilters

Array of CustomProperty

false

Custom attribute (property uri and string value) filters

customClassFilters

Array of String

false

Custom class URI filters

disambiguate

boolean

false

Use thesaurus based disambiguation, default = false

displayText

boolean

false

Include text extracted from url in response, default = false

documentClassifierIds

Array of String

false

Enable document classification by giving the document classifier IDs as input

documentId

String

false

Internal ID of the document

file

MultipartFile

true

File to be extracted (word, excel, powerpoint, pdf, open documents) - Mimetype of file must be 'multipart/form-data'

filterNestedConcepts

boolean

false

Remove concepts matches which are contained within other matches, default = false

findPersonNames

boolean

false

Person name extraction, default = false

language

String

false

Extraction language (en|de|es|fr|...)

lemmatization

boolean

false

Use lemmatization, default = true

locationExtraction

boolean

false

Location extraction, default = false

metadata

String

false

Metadata of the document (concatenated fields with delimiter: '.')

numberOfConcepts

Integer

false

Retrieve number of concepts, default = 25

numberOfTerms

Integer

false

Retrieve number of terms, default = 25

projectId

Array of String

false

Thesaurus projectIds

properties

Array of String

false

Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

Set to all to fetch all properties.

regexFilename

String

false

File name for regex patterns

sentimentAnalysis

boolean

false

Sentiment analysis, default: false

shadowConceptCorpusId

Array of String

false

Shadow concepts calculation. Enabled if corpusIds (UUID) are provided

showMatchingDetails

boolean

false

Shows which concept labels where found inside the text, default = false

showMatchingPosition

boolean

false

Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false

tfidfScoring

boolean

false

Use TFIDF scoring

useRelatedConcepts

boolean

false

Retrieve related concepts, default = false

useTransitiveBroaderConcepts

boolean

false

Retrieve transitive broader concepts, default = false

useTransitiveBroaderTopConcepts

boolean

false

Retrieve transitive broader top concepts, default = false

useTypes

boolean

false

Retrieve custom types for concepts, default = false

Custom property

Attribute

Type

Comment

property

String

Property

value

String

Value

MultipartFileResponse

This method returns execution results in JSON format.

Click here to expand Response Arrays and Attributes...

ZipFileExtractionResponse

Results of a file based text extraction request. Properties with no entries are not present

Attribute

Type

Comment

aggregatedResponse

FileExtractionResponse

Aggregated result

defaultExtractions

Array of FileExtractionResponse

List of extracted file results

message

String

Additional message

numberOfExtractedDocuments

int

Number of extracted documents

FileExtractionResponse

Results of a file based text extraction request. Properties with no entries are not present

Attribute

Type

Comment

document

ExtractionResponse

Extraction result

metadata

ExtractionResponse

Metadata extraction result

text

String

File text content

title

String

File title

ExtractionResponse

Results of an text extraction request. Properties with no entries are not present

Attribute

Type

Comment

categories

Array of Category

Categories of the document

classificationResults

Array of DocumentClassification

Document classification results

concepts

Array of ThesaurusConcept

Matched concepts

detectedLanguage

String

Detected Language of the document

extractedTerms

Array of ExtractedTerm

Extracted freeTerms

locations

Array of Location

Matched locations

personNames

Array of String

Person name matches

regexMatches

Array of RegexMatches

Regex token matches

sentiments

Array of Sentiment

Matched sentiments

shadowConcepts

Array of ThesaurusConcept

Shadow Concepts

text

String

Text as extracted from url or file

title

String

Title as extracted from url or file

Category

Categorization result

Attribute

Type

Comment

categoryConceptResults

Array of ConceptCategory

Categorized concepts

prefLabel

String

Preferred label

score

double

Score

uri

String

Uri

ConceptCategory

Categorized concept

Attribute

Type

Comment

prefLabel

String

Preferred label

score

double

Score

uri

String

Uri

DocumentClassification

A DocumentClassification object.

Attribute

Type

Comment

predictedLabel

String

predictedLabel

probabilities

Array of Prediction

Probabilities

uri

String

URI of the classifier

ThesaurusConcept

Concept from a PoolParty thesaurus project

Attribute

Type

Comment

altLabels

Array of String

Alternative labels

broaderConcepts

Array of String

URIs of all direct broader concepts

conceptSchemes

Array of ThesaurusConceptScheme

The concept schemes this concept resides

corporaScore

double

Relevance score - e.g. when extracted from a text

customAttributes

Array of CustomAttribute

Custom attributes

customRelations

Array of CustomRelation

Custom relations

customSchemeTypes

Array of CustomSchemeType

URIs of the custom types assigned to the concept

frequencyInDocument

int

Frequency of the concept in the text

frequencyInDocuments

int

Frequency of the concept in the text

hiddenLabels

Array of String

Hidden labels

id

String

Concept id

language

String

Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept

matchingLabels

Array of MatchingLabel

Matching labels

prefLabel

String

Preferred label

project

String

UUID of the containing PoolParty project

relatedConcepts

Array of String

URIs of all related concepts

score

double

Normalized relevance score - e.g. when extracted from a text

shadowConceptTerms

Array of ExtractedTerm

transitiveBroaderConcepts

Array of String

URIs of all transitive broader concepts

transitiveBroaderTopConcepts

Array of String

URIs of all top concepts that this concept is connected to via a transitive broader-chain

uri

String

Uniform resource identifier

wordForms

Array of String

Lemmatized word forms

ThesaurusConceptScheme

ConceptScheme from a PoolParty thesaurus project - acts as a container for concepts

Attribute

Type

Comment

title

String

The localized title of this concept scheme

uri

String

Uniform resource identifier

CustomAttribute

Custom attribute

Attribute

Type

Comment

literal

Literal

Literal

property

String

Property

CustomRelation

Custom Relation

Attribute

Type

Comment

object

String

Object

property

String

Property

CustomSchemeType

(PoolParty) concept scheme - acts as a container for concepts

Attribute

Type

Comment

title

String

The name of this custom scheme type

uri

String

Uniform resource identifier

ExtractedTerm

Phrase extracted from a text that does not match any Concepts

Attribute

Type

Comment

corporaScore

double

Corpora score

frequencyInDocument

int

Frequency within the document where it was extracted

frequencyInDocuments

int

Frequency within the documents where it was extracted

score

double

Relevance score

textValue

String

The term phrase

Location

A geographical location extracted from a text

Attribute

Type

Comment

countryCode

String

ISO 3166-1 alpha-2 country code

latitude

float

Latitude

longitude

float

Longitude

matchedLabel

String

The location label that was found in the text

name

String

Common name of the location

score

Double

Relevance score

type

LocationType

Location type - either city or country City | Country

uri

String

Uniform resource identifier of the location

RegexMatches

Regex match

Attribute

Type

Comment

regexMatches

Array of String

Tokens from the input text that match the regex pattern

regexPattern

String

The original pattern used to match

Sentiment

Sentiment result

Attribute

Type

Comment

negativeTerms

Array of String

List of negative terms

positiveTerms

Array of String

List of positive terms

score

float

Score

sentiment

String

Sentiment