Zip Extraction Service

Abstract

The Web Service methods available in this section allow you to manage .zip files of project documents, upload and download as well as defining the exact content of the respective .zip file.

Details find in these pages:

Web Service Method: Extract Metadata from Inside Zip File

Abstract

Web Service Method: Extract Metadata from Inside Zip File

Description
[file] Extracts and returns a list of documents with meaningful metadata like concepts and terms from documents which are packed inside a given archive file (*.zip) upload.

URL: /extractor/api/extract/zip

Request

Supported Methods
POST

Content-Type

multipart/form-data

HTTP Parameters

Parameter	Type	Required	Description
categorizationWithPpxBoost	boolean	false	Use Extractor boosting, default = false
categorize	boolean	false	Categorization extraction, default = false
conceptSchemeFilters	Array of String	false	Concept scheme URI filters
corpusScoring	Array of String	false	Corpus term scoring. Enabled if corpusIds (UUID) are provided
customAttributeFilters	Array of CustomProperty	false	Custom attribute (property uri and string value) filters
customClassFilters	Array of String	false	Custom class URI filters
disambiguate	boolean	false	Use thesaurus based disambiguation, default = false
displayText	boolean	false	Include text extracted from url in response, default = false
documentClassifierIds	Array of String	false	Enable document classification by giving the document classifier IDs as input
documentId	String	false	Internal ID of the document
file	MultipartFile	true	File to be extracted (word, excel, powerpoint, pdf, open documents) - Mimetype of file must be 'multipart/form-data'
filterNestedConcepts	boolean	false	Remove concepts matches which are contained within other matches, default = false
findPersonNames	boolean	false	Person name extraction, default = false
language	String	false	Extraction language (en\|de\|es\|fr\|...)
lemmatization	boolean	false	Use lemmatization, default = true
locationExtraction	boolean	false	Location extraction, default = false
metadata	String	false	Metadata of the document (concatenated fields with delimiter: '.')
numberOfConcepts	Integer	false	Retrieve number of concepts, default = 25
numberOfTerms	Integer	false	Retrieve number of terms, default = 25
projectId	Array of String	false	Thesaurus projectIds
properties	Array of String	false	Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Set to `all` to fetch all properties.
regexFilename	String	false	File name for regex patterns
sentimentAnalysis	boolean	false	Sentiment analysis, default: false
shadowConceptCorpusId	Array of String	false	Shadow concepts calculation. Enabled if corpusIds (UUID) are provided
showMatchingDetails	boolean	false	Shows which concept labels where found inside the text, default = false
showMatchingPosition	boolean	false	Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false
tfidfScoring	boolean	false	Use TFIDF scoring
useRelatedConcepts	boolean	false	Retrieve related concepts, default = false
useTransitiveBroaderConcepts	boolean	false	Retrieve transitive broader concepts, default = false
useTransitiveBroaderTopConcepts	boolean	false	Retrieve transitive broader top concepts, default = false
useTypes	boolean	false	Retrieve custom types for concepts, default = false

Custom property

Attribute	Type	Comment
property	String	Property
value	String	Value

MultipartFileResponse

This method returns execution results in JSON format.

Click here to expand Response Arrays and Attributes...

ZipFileExtractionResponse

Results of a file based text extraction request. Properties with no entries are not present

Attribute	Type	Comment
aggregatedResponse	FileExtractionResponse	Aggregated result
defaultExtractions	Array of FileExtractionResponse	List of extracted file results
message	String	Additional message
numberOfExtractedDocuments	int	Number of extracted documents

FileExtractionResponse

Results of a file based text extraction request. Properties with no entries are not present

Attribute	Type	Comment
document	ExtractionResponse	Extraction result
metadata	ExtractionResponse	Metadata extraction result
text	String	File text content
title	String	File title

ExtractionResponse

Results of an text extraction request. Properties with no entries are not present

Attribute	Type	Comment
categories	Array of Category	Categories of the document
classificationResults	Array of DocumentClassification	Document classification results
concepts	Array of ThesaurusConcept	Matched concepts
detectedLanguage	String	Detected Language of the document
extractedTerms	Array of ExtractedTerm	Extracted freeTerms
locations	Array of Location	Matched locations
personNames	Array of String	Person name matches
regexMatches	Array of RegexMatches	Regex token matches
sentiments	Array of Sentiment	Matched sentiments
shadowConcepts	Array of ThesaurusConcept	Shadow Concepts
text	String	Text as extracted from url or file
title	String	Title as extracted from url or file

Attribute	Type	Comment
categoryConceptResults	Array of ConceptCategory	Categorized concepts
prefLabel	String	Preferred label
score	double	Score
uri	String	Uri

Attribute	Type	Comment
prefLabel	String	Preferred label
score	double	Score
uri	String	Uri

Attribute	Type	Comment
predictedLabel	String	predictedLabel
probabilities	Array of Prediction	Probabilities
uri	String	URI of the classifier

Attribute	Type	Comment
altLabels	Array of String	Alternative labels
broaderConcepts	Array of String	URIs of all direct broader concepts
conceptSchemes	Array of ThesaurusConceptScheme	The concept schemes this concept resides
corporaScore	double	Relevance score - e.g. when extracted from a text
customAttributes	Array of CustomAttribute	Custom attributes
customRelations	Array of CustomRelation	Custom relations
customSchemeTypes	Array of CustomSchemeType	URIs of the custom types assigned to the concept
frequencyInDocument	int	Frequency of the concept in the text
frequencyInDocuments	int	Frequency of the concept in the text
hiddenLabels	Array of String	Hidden labels
id	String	Concept id
language	String	Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept
matchingLabels	Array of MatchingLabel	Matching labels
prefLabel	String	Preferred label
project	String	UUID of the containing PoolParty project
relatedConcepts	Array of String	URIs of all related concepts
score	double	Normalized relevance score - e.g. when extracted from a text
shadowConceptTerms	Array of ExtractedTerm
transitiveBroaderConcepts	Array of String	URIs of all transitive broader concepts
transitiveBroaderTopConcepts	Array of String	URIs of all top concepts that this concept is connected to via a transitive broader-chain
uri	String	Uniform resource identifier
wordForms	Array of String	Lemmatized word forms

Attribute	Type	Comment
title	String	The localized title of this concept scheme
uri	String	Uniform resource identifier

Attribute	Type	Comment
literal	Literal	Literal
property	String	Property

Attribute	Type	Comment
object	String	Object
property	String	Property

Attribute	Type	Comment
title	String	The name of this custom scheme type
uri	String	Uniform resource identifier

Attribute	Type	Comment
corporaScore	double	Corpora score
frequencyInDocument	int	Frequency within the document where it was extracted
frequencyInDocuments	int	Frequency within the documents where it was extracted
score	double	Relevance score
textValue	String	The term phrase

Attribute	Type	Comment
countryCode	String	ISO 3166-1 alpha-2 country code
latitude	float	Latitude
longitude	float	Longitude
matchedLabel	String	The location label that was found in the text
name	String	Common name of the location
score	Double	Relevance score
type	LocationType	Location type - either city or country City \| Country
uri	String	Uniform resource identifier of the location

Attribute	Type	Comment
regexMatches	Array of String	Tokens from the input text that match the regex pattern
regexPattern	String	The original pattern used to match

Attribute	Type	Comment
negativeTerms	Array of String	List of negative terms
positiveTerms	Array of String	List of positive terms
score	float	Score
sentiment	String	Sentiment

Web Service Method: Extract Metadata from Inside Zip File Asynchronously

Abstract

Web Service Method: Extract Metadata from Inside Zip File Asynchronously

Description
[file] Extracts asynchronously and returns a list of documents with meaningful metadata like concepts and terms from documents which are packed inside a given archive file (*.zip) upload.

URL: /extractor/api/extract/zip/async

Request

Supported Methods
POST

Content-Type

multipart/form-data

HTTP Parameter

Parameter	Type	Required	Description
categorizationWithPpxBoost	boolean	false	Use Extractor boosting, default = false
categorize	boolean	false	Categorization extraction, default = false
conceptSchemeFilters	Array of String	false	Concept scheme URI filters
corpusScoring	Array of String	false	Corpus term scoring. Enabled if corpusIds (UUID) are provided
customAttributeFilters	Array of CustomProperty	false	Custom attribute (property uri and string value) filters
customClassFilters	Array of String	false	Custom class URI filters
disambiguate	boolean	false	Use thesaurus based disambiguation, default = false
displayText	boolean	false	Include text extracted from url in response, default = false
documentClassifierIds	Array of String	false	Enable document classification by giving the document classifier IDs as input
documentId	String	false	Internal ID of the document
file	MultipartFile	true	File to be extracted (word, excel, powerpoint, pdf, open documents) - Mimetype of file must be 'multipart/form-data'
filterNestedConcepts	boolean	false	Remove concepts matches which are contained within other matches, default = false
findPersonNames	boolean	false	Person name extraction, default = false
language	String	false	Extraction language (en\|de\|es\|fr\|...)
lemmatization	boolean	false	Use lemmatization, default = true
locationExtraction	boolean	false	Location extraction, default = false
metadata	String	false	Metadata of the document (concatenated fields with delimiter: '.')
numberOfConcepts	Integer	false	Retrieve number of concepts, default = 25
numberOfTerms	Integer	false	Retrieve number of terms, default = 25
projectId	Array of String	false	Thesaurus projectIds
properties	Array of String	false	Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Set to `all` to fetch all properties.
regexFilename	String	false	File name for regex patterns
sentimentAnalysis	boolean	false	Sentiment analysis, default: false
shadowConceptCorpusId	Array of String	false	Shadow concepts calculation. Enabled if corpusIds (UUID) are provided
showMatchingDetails	boolean	false	Shows which concept labels where found inside the text, default = false
showMatchingPosition	boolean	false	Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false
tfidfScoring	boolean	false	Use TFIDF scoring
useRelatedConcepts	boolean	false	Retrieve related concepts, default = false
useTransitiveBroaderConcepts	boolean	false	Retrieve transitive broader concepts, default = false
useTransitiveBroaderTopConcepts	boolean	false	Retrieve transitive broader top concepts, default = false
useTypes	boolean	false	Retrieve custom types for concepts, default = false

Response

This method returns execution results in JSON format.

Return ValuesTaskSubmitResponse

common base response defining the minimum result structure and semantics.

Attribute	Type	Comment
message	String	short descriptive message of the operation result, or an error description
result	Object	the actual response content body, defined by the resultType.
resultType	String	MIME type of the result if successful, or Exception type if an error occurred
status	int	HTTP status code of the requested operation
success	boolean	true if the operation was successful (i.e. returning a status of 2xx)
taskId	String

Asynchronous Concept Extraction From a Zip Container

Abstract

Asynchronous Concept Extraction From a Zip Container

With these services an asynchronous extraction processing of zip files is possible. This approach allows a client to receive the extraction response independent from requesting the service.

The client orders the processing of the provided file, receiving a taskId from the system to identify the processing task in later calls.

After receiving the file, the zip file is inserted into a processing pipeline.

Finally, the extraction result can be collected using the taskId as soon as the processing is finished.

Inserting Zip File Into Extraction Pipeline

Similar to Concept Extraction From a Zip Container there are two basic processing options for zip containers:

1) extraction results for each document individually

Mimetype of request must be 'multipart/form-data'

POST /extractor/api/extract/zip/async + zip file

2) extraction results aggregated for the whole zip container

Mimetype of request must be 'multipart/form-data'

POST /extractor/api/extract/zip/aggregated/async + zip file

Checking the Processing Status of a Task

GET /extractor/api/extract/zip/taskstatus?taskId

Returns the current status of the specified task in the extraction pipeline.

Retrieve Extraction Results

After the zip file passed the processing pipeline the result can be retrieved by providing the taskId. Depending on the processing option, defined during pipeline start, the results can be retrieved by using one of these services:

Returns a asynchronous called extraction identified by the task id

GET /extractor/api/extract/zip/task?taskId

Returns a asynchronous called aggregation extraction identified by the task id

GET /extractor/api/extract/zip/task/aggregated?taskId

Example showing usage of the async Extraction service

Step 1: insert zip file into pipeline & receive taskId as immediate response

Step 2: check status of specified task

Step 3: retrieve extraction result of specific task after completion of the process

Web Service Method: Extract Metadata from Zip File - Aggregated

Abstract

Web Service Method: Extract Metadata from Zip File - Aggregated

Description
[file] Extracts and returns a single aggregated document with meaningful metadata like concepts and terms from documents which are packed inside a given archive file (*.zip) upload.

URL: /extractor/api/extract/zip/aggregated

Request

Supported Methods
POST

Content-Type

multipart/form-data

HTTP Parameter

Parameter	Type	Required	Description
categorizationWithPpxBoost	boolean	false	Use Extractor boosting, default = false
categorize	boolean	false	Categorization extraction, default = false
conceptSchemeFilters	Array of String	false	Concept scheme URI filters
corpusScoring	Array of String	false	Corpus term scoring. Enabled if corpusIds (UUID) are provided
customAttributeFilters	Array of CustomProperty	false	Custom attribute (property uri and string value) filters
customClassFilters	Array of String	false	Custom class URI filters
disambiguate	boolean	false	Use thesaurus based disambiguation, default = false
displayText	boolean	false	Include text extracted from url in response, default = false
documentClassifierIds	Array of String	false	Enable document classification by giving the document classifier IDs as input
documentId	String	false	Internal ID of the document
file	MultipartFile	true	File to be extracted (word, excel, powerpoint, pdf, open documents) - Mimetype of file must be 'multipart/form-data'
filterNestedConcepts	boolean	false	Remove concepts matches which are contained within other matches, default = false
findPersonNames	boolean	false	Person name extraction, default = false
language	String	false	Extraction language (en\|de\|es\|fr\|...)
lemmatization	boolean	false	Use lemmatization, default = true
locationExtraction	boolean	false	Location extraction, default = false
metadata	String	false	Metadata of the document (concatenated fields with delimiter: '.')
numberOfConcepts	Integer	false	Retrieve number of concepts, default = 25
numberOfTerms	Integer	false	Retrieve number of terms, default = 25
projectId	Array of String	false	Thesaurus projectIds
properties	Array of String	false	Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Set to `all` to fetch all properties.
regexFilename	String	false	File name for regex patterns
sentimentAnalysis	boolean	false	Sentiment analysis, default: false
shadowConceptCorpusId	Array of String	false	Shadow concepts calculation. Enabled if corpusIds (UUID) are provided
showMatchingDetails	boolean	false	Shows which concept labels where found inside the text, default = false
showMatchingPosition	boolean	false	Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false
tfidfScoring	boolean	false	Use TFIDF scoring
useRelatedConcepts	boolean	false	Retrieve related concepts, default = false
useTransitiveBroaderConcepts	boolean	false	Retrieve transitive broader concepts, default = false
useTransitiveBroaderTopConcepts	boolean	false	Retrieve transitive broader top concepts, default = false
useTypes	boolean	false	Retrieve custom types for concepts, default = false

Custom property

Attribute	Type	Comment
property	String	Property
value	String	Value

Response

This method returns execution results in JSON format.

Click here to expand Response Arrays and Attributes...

ZipFileExtractionResponse

Results of an file based text extraction request. Properties with no entries are not present.

Attribute	Type	Comment
aggregatedResponse	FileExtractionResponse	Aggregated result
defaultExtractions	Array of FileExtractionResponse	List of extracted file results
message	String	Additional message
numberOfExtractedDocuments	int	Number of extracted documents

FileExtractionResponse

Results of an file based text extraction request. Properties with no entries are not present

Attribute	Type	Comment
document	ExtractionResponse	Extraction result
metadata	ExtractionResponse	Metadata extraction result
text	String	File text content
title	String	File title

ExtractionResponse

Results of an text extraction request. Properties with no entries are not present

Attribute	Type	Comment
categories	Array of Category	Categories of the document
classificationResults	Array of DocumentClassification	Document classification results
concepts	Array of ThesaurusConcept	Matched concepts
detectedLanguage	String	Detected Language of the document
extractedTerms	Array of ExtractedTerm	Extracted freeTerms
locations	Array of Location	Matched locations
personNames	Array of String	Person name matches
regexMatches	Array of RegexMatches	Regex token matches
sentiments	Array of Sentiment	Matched sentiments
shadowConcepts	Array of ThesaurusConcept	Shadow Concepts
text	String	Text as extracted from url or file
title	String	Title as extracted from url or file

Attribute	Type	Comment
categoryConceptResults	Array of ConceptCategory	Categorized concepts
prefLabel	String	Preferred label
score	double	Score
uri	String	Uri

Attribute	Type	Comment
prefLabel	String	Preferred label
score	double	Score
uri	String	Uri

Attribute	Type	Comment
predictedLabel	String	predictedLabel
probabilities	Array of Prediction	Probabilities
uri	String	URI of the classifier

Attribute	Type	Comment
altLabels	Array of String	Alternative labels
broaderConcepts	Array of String	URIs of all direct broader concepts
conceptSchemes	Array of ThesaurusConceptScheme	The concept schemes this concept resides
corporaScore	double	Relevance score - e.g. when extracted from a text
customAttributes	Array of CustomAttribute	Custom attributes
customRelations	Array of CustomRelation	Custom relations
customSchemeTypes	Array of CustomSchemeType	URIs of the custom types assigned to the concept
frequencyInDocument	int	Frequency of the concept in the text
frequencyInDocuments	int	Frequency of the concept in the text
hiddenLabels	Array of String	Hidden labels
id	String	Concept id
language	String	Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept
matchingLabels	Array of MatchingLabel	Matching labels
prefLabel	String	Preferred label
project	String	UUID of the containing PoolParty project
relatedConcepts	Array of String	URIs of all related concepts
score	double	Normalized relevance score - e.g. when extracted from a text
shadowConceptTerms	Array of ExtractedTerm
transitiveBroaderConcepts	Array of String	URIs of all transitive broader concepts
transitiveBroaderTopConcepts	Array of String	URIs of all top concepts that this concept is connected to via a transitive broader-chain
uri	String	Uniform resource identifier
wordForms	Array of String	Lemmatized word forms

Attribute	Type	Comment
title	String	The localized title of this concept scheme
uri	String	Uniform resource identifier

Attribute	Type	Comment
literal	Literal	Literal
property	String	Property

Attribute	Type	Comment
object	String	Object
property	String	Property

Attribute	Type	Comment
title	String	The name of this custom scheme type
uri	String	Uniform resource identifier

Attribute	Type	Comment
corporaScore	double	Corpora score
frequencyInDocument	int	Frequency within the document where it was extracted
frequencyInDocuments	int	Frequency within the documents where it was extracted
score	double	Relevance score
textValue	String	The term phrase

Attribute	Type	Comment
countryCode	String	ISO 3166-1 alpha-2 country code
latitude	float	Latitude
longitude	float	Longitude
matchedLabel	String	The location label that was found in the text
name	String	Common name of the location
score	Double	Relevance score
type	LocationType	Location type - either city or country City \| Country
uri	String	Uniform resource identifier of the location

Attribute	Type	Comment
regexMatches	Array of String	Tokens from the input text that match the regex pattern
regexPattern	String	The original pattern used to match

Attribute	Type	Comment
negativeTerms	Array of String	List of negative terms
positiveTerms	Array of String	List of positive terms
score	float	Score
sentiment	String	Sentiment

Concept Extraction From a Zip Container

Abstract

Concept Extraction From a Zip Container

These services can be used to extract concepts and terms from content that is delivered in a zip container. This service is an extension to the main Concept Extraction Service, using a file.

Basically there are two ways of processing content in zip containers available:

Retrieve extraction results per document within the given zip container individually.
Retrieve extraction results aggregated for the whole zip container.

1. Extraction Results Per Document Individually

URL: /extractor/api/extract/zip

2. Extraction Results Aggregated for the Whole Zip Container

URL: /extractor/api/extract/zip/aggregated

Request

Supported Methods
POST

Content-Type

multipart/form-data

Specific HTTP Parameters

Parameter	Type	Required	Comment
file	MultipartFile	true	File to be extracted. Has to be a zip file.

Other parameters can be used like in the main Concept Extraction Service for files.

Example

You can use this file: cocktails.zip (containing three cocktails recipes in pdf format) together with a PoolParty project like e.g. 'All about Cocktails' (http://vocabulary.semantic-web.at/cocktails.html)

POST
http://vocabulary.semantic-web.at/extractor/api/extract/zip?language=en&numberOfConcepts=3&numberOfTerms=0&projectId=1DCE0ED2-D7E8-0001-86A1-18652DF0D7A0
Content-Type: multipart/form-data
file: cocktails.zip

Sample request, done with Postman:

Postman sample requests: Postman-zip_extraction.json

You can import the json file into e.g. Postman REST Client.

Sample request, done with curl:

curl -i -X POST -H "Content-Type: multipart/form-data" -F "file=@cocktails.zip" http://USERNAME:PASSWORD@vocabulary.semantic-web.at/extractor/api/extract/zip?language=en&numberOfConcepts=3&numberOfTerms=0&projectId=1DCE0ED2-D7E8-0001-86A1-18652DF0D7A0&displayText=true

Web Service Method: Extract Metadata from Zip File - Aggregated and Asynchronously

Abstract

Web Service Method: Extract Metadata from Zip File - Aggregated and Asynchronously

Description
[file] Extracts asynchronously and returns a single aggregated document with meaningful metadata like concepts and terms from documents which are packed inside a given archive file (*.zip) upload'.

URL: /extractor/api/extract/zip/aggregated/async

Request

Supported Methods
POST

Content-Type

multipart/form-data

HTTP Parameter

Parameter	Type	Required	Description
categorizationWithPpxBoost	boolean	false	Use Extractor boosting, default = false
categorize	boolean	false	Categorization extraction, default = false
conceptSchemeFilters	Array of String	false	Concept scheme URI filters
corpusScoring	Array of String	false	Corpus term scoring. Enabled if corpusIds (UUID) are provided
customAttributeFilters	Array of CustomProperty	false	Custom attribute (property uri and string value) filters
customClassFilters	Array of String	false	Custom class URI filters
disambiguate	boolean	false	Use thesaurus based disambiguation, default = false
displayText	boolean	false	Include text extracted from url in response, default = false
documentClassifierIds	Array of String	false	Enable document classification by giving the document classifier IDs as input
documentId	String	false	Internal ID of the document
file	MultipartFile	true	File to be extracted (word, excel, powerpoint, pdf, open documents) - Mimetype of file must be 'multipart/form-data'
filterNestedConcepts	boolean	false	Remove concepts matches which are contained within other matches, default = false
findPersonNames	boolean	false	Person name extraction, default = false
language	String	false	Extraction language (en\|de\|es\|fr\|...)
lemmatization	boolean	false	Use lemmatization, default = true
locationExtraction	boolean	false	Location extraction, default = false
metadata	String	false	Metadata of the document (concatenated fields with delimiter: '.')
numberOfConcepts	Integer	false	Retrieve number of concepts, default = 25
numberOfTerms	Integer	false	Retrieve number of terms, default = 25
projectId	Array of String	false	Thesaurus projectIds
properties	Array of String	false	Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Set to all to fetch all properties.
regexFilename	String	false	File name for regex patterns
sentimentAnalysis	boolean	false	Sentiment analysis, default: false
shadowConceptCorpusId	Array of String	false	Shadow concepts calculation. Enabled if corpusIds (UUID) are provided
showMatchingDetails	boolean	false	Shows which concept labels where found inside the text, default = false
showMatchingPosition	boolean	false	Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false
tfidfScoring	boolean	false	Use TFIDF scoring
useRelatedConcepts	boolean	false	Retrieve related concepts, default = false
useTransitiveBroaderConcepts	boolean	false	Retrieve transitive broader concepts, default = false
useTransitiveBroaderTopConcepts	boolean	false	Retrieve transitive broader top concepts, default = false
useTypes	boolean	false	Retrieve custom types for concepts, default = false

Custom property

Attribute	Type	Comment
property	String	Property
value	String	Value

Response

This method returns execution results in JSON format.

TaskSubmitResponse

Common base response defining the minimum result structure and semantics.

Attribute	Type	Comment
message	String	short descriptive message of the operation result, or an error description
result	Object	the actual response content body, defined by the resultType.
resultType	String	MIME type of the result if successful, or Exception type if an error occurred
status	int	HTTP status code of the requested operation
success	boolean	true if the operation was successful (i.e. returning a status of 2xx
taskId	String

Web Service Method: Request an Aggregated Task Synchronously

Abstract

Web Service Method: Request an Aggregated Task Synchronously

Description
Returns a synchronous called aggregation extraction identified by the task ID.

URL: /extractor/api/extract/zip/task/aggregated

Request

Supported Methods
GET

Content-Type

application/x-www-form-urlencoded

HTTP Parameter

Parameter	Type	Required	Description
taskId	String	true	Task ID of the asynchronous called task.

Response

This method returns execution results in JSON format.

Click here to expand Response Arrays and Attributes...

ZipFileExtractionResponse

Results of an file based text extraction request. Properties with no entries are not present

Attribute	Type	Comment
aggregatedResponse	FileExtractionResponse	Aggregated result
defaultExtractions	Array of FileExtractionResponse	List of extracted file results
message	String	Additional message
numberOfExtractedDocuments	int	Number of extracted documents

FileExtractionResponse

Results of an file based text extraction request. Properties with no entries are not present

Attribute	Type	Comment
document	ExtractionResponse	Extraction result
metadata	ExtractionResponse	Metadata extraction result
text	String	File text content
title	String	File title

ExtractionResponse

Results of an text extraction request. Properties with no entries are not present

Attribute	Type	Comment
categories	Array of Category	Categories of the document
classificationResults	Array of DocumentClassification	Document classification results
concepts	Array of ThesaurusConcept	Matched concepts
detectedLanguage	String	Detected Language of the document
extractedTerms	Array of ExtractedTerm	Extracted freeTerms
locations	Array of Location	Matched locations
personNames	Array of String	Person name matches
regexMatches	Array of RegexMatches	Regex token matches
sentiments	Array of Sentiment	Matched sentiments
shadowConcepts	Array of ThesaurusConcept	Shadow Concepts
text	String	Text as extracted from url or file
title	String	Title as extracted from url or file

Web Service Method: Request a Task's Status

Abstract

Web Service Method: Request a Task's Status

Description
Returns the current status of an asynchronously called extraction identified by the task ID.

URL: /extractor/api/extract/zip/taskstatus

Request

Supported Methods
GET

Content-Type

application/x-www-form-urlencoded

HTTP Parameter

Parameter	Type	Required	Description
taskId	String	true	Task id of the asynchronous called task
BASIC_AUTH	String	false
CLIENT_CERT_AUTH	String	false
DIGEST_AUTH	String	false
FORM_AUTH	String	false

Response

This method returns execution results in JSON format.

TaskStatus

A TaskStatus object.

Web Service Method: Request Task Information from Zip

Abstract

Web Service Method: Request Task Information from Zip

Description
Returns an asynchronously called extraction identified by the task ID.

URL: /extractor/api/extract/zip/task

Request

Supported Methods
GET

Content-Type

Content-Type: application/json

Response

This method returns execution results in JSON format.

HTTP Parameter

Parameter	Description	Type	Required
taskId	Task ID of the asynchronously called task	String	true

In this section: