Skip to main content

Web Service Method: Annotate from File

Abstract

Web Service Method: Annotate from File

Description

[file] Returns the document annotated with extracted concepts and extracted terms in RDF/XML representation.

URL: /extractor/api/annotate

Request

Supported Methods

POST

Content-Type

Content-Type: multipart/form-data

HTTP Parameter

Parameter

Type

Required

Description

categorizationWithPpxBoost

boolean

false

Use Extractor boosting, default = false

categorize

boolean

false

Categorization extraction, default = false

conceptMinimumScore

Double

false

Minimum required score of concepts, default = 0

conceptSchemeFilters

Array of String

false

Concept scheme URI filters

corpusScoring

Array of String

false

Corpus term scoring. Enabled if corpusIds (UUID) are provided.

customAttributeFilters

Array of CustomProperty

false

Custom attribute (property uri and string value) filters

customClassFilters

Array of String

false

Custom class URI filters

disambiguate

boolean

false

Use thesaurus based disambiguation, default = false

displayText

boolean

false

Include text extracted from url in response, default = false

documentClassifierIds

Array of String

false

Enable document classification by giving the document classifier IDs as input

documentId

String

false

Internal ID of the document, taken from documentUri

documentUri

String

true

URI of annotated document, used as ID

extractorVersion

String

false

Version of PPX Extractor used

file

MultipartFile

true

File to be annotated (word, excel, powerpoint, pdf, open document) - Mime type of request must be 'multipart/form-data'

filterNestedConcepts

boolean

false

Remove concepts matches which are contained within other matches, default = true

findPersonNames

boolean

false

Deprecated (use nerParameters) - extracts person names from the given text

language

String

false

Extraction language (en|de|es|fr|...)

lemmatization

boolean

false

Use lemmatization, default = true

locationExtraction

boolean

false

Deprecated (use nerParameters) - extracts locations from the given text

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

numberOfConcepts

Integer

false

Retrieve number of concepts, default = 25

numberOfTerms

Integer

false

Retrieve number of terms, default = 25

phraseLength

Integer

false

Phrase length, default = 4

projectId

Array of String

false

Thesaurus projectIds

properties

Array of String

false

Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

Set to all to fetch all properties.

regexFilename

String

false

File name for regex patterns

resultFilterSparql

String

false

Specify an optional SPARQL query for filtering the RDF result

sentimentAnalysis

boolean

false

Sentiment analysis, default: false

shadowConceptCorpusId

Array of String

false

Shadow concepts calculation. Enabled if corpusIds (UUID) are provided.

showMatchingDetails

boolean

false

Shows which concept labels where found inside the text, default = false

showMatchingPosition

boolean

false

Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false

tfidfScoring

boolean

false

Use TFIDF scoring, default = false

title

String

false

Title of the document

useRelatedConcepts

boolean

false

Retrieve related concepts, default = false

useTransitiveBroaderConcepts

boolean

false

Retrieve transitive broader concepts, default = false

useTransitiveBroaderTopConcepts

boolean

false

Retrieve transitive broader top concepts, default = false

useTypes

boolean

false

Retrieve custom types for concepts, default = false

Custom Property

Attribute

Type

Required

Comment

property

String

false

Property

value

String

false

Value

Example of Custom Property Object
{
  "property" : "https://semantic-web.com/api/property#6376",
  "value" : "some value"
}
Named Entity Recognition Configuration

Attribute

Type

Required

Comment

method

Method

false

Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY

type

String

false

Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location

Example of NERConfig Object
{
  "method" : "RULE_BASED",
  "type" : "https://semantic-web.com/api/type#20383"
}
ObjectStreamField Object

Attribute

Type

Required

Comment

field

Field

false

name

String

false

offset

int

false

signature

String

false

type

Class

false

unshared

boolean

false

Example of an ObjectStreamField object.
{
  "field" : {
    "genericInfo" : {
      "factory" : null,
      "tree" : null,
      "genericType" : null
    },
    "declaredAnnotations" : { },
    "overrideFieldAccessor" : { },
    "signature" : "some signature",
    "annotations" : [ 23 ],
    "securityCheckCache" : { },
    "slot" : 4350,
    "fieldAccessor" : { },
    "modifiers" : 27639,
    "type" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 2479,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 17528,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 28542,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null, null ],
      "serialVersionUID" : 19423,
      "ANNOTATION" : 2206,
      "enumConstants" : [ null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "ACCESS_PERMISSION" : {
      "serialVersionUID" : 23155,
      "name" : "some name"
    },
    "root" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "overrideFieldAccessor" : null,
      "signature" : "some signature",
      "annotations" : [ 87, 51 ],
      "securityCheckCache" : null,
      "slot" : 18207,
      "fieldAccessor" : null,
      "modifiers" : 24703,
      "type" : null,
      "ACCESS_PERMISSION" : null,
      "root" : null,
      "name" : "some name",
      "override" : true,
      "reflectionFactory" : null,
      "clazz" : null
    },
    "name" : "some name",
    "override" : true,
    "reflectionFactory" : {
      "inflationThreshold" : 28477,
      "initted" : false,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : false
    },
    "clazz" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 30581,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 12111,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 27304,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null ],
      "serialVersionUID" : 24089,
      "ANNOTATION" : 3326,
      "enumConstants" : [ null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    }
  },
  "offset" : 11522,
  "signature" : "some signature",
  "unshared" : false,
  "name" : "some name",
  "type" : {
    "annotationData" : {
      "declaredAnnotations" : { },
      "redefinedCount" : 11463,
      "annotations" : { }
    },
    "genericInfo" : {
      "factory" : null,
      "superclass" : null,
      "tree" : null,
      "typeParams" : [ null, null, null ],
      "NONE" : null,
      "superInterfaces" : [ null, null ]
    },
    "ENUM" : 2206,
    "enumConstantDirectory" : { },
    "classRedefinedCount" : 8783,
    "initted" : false,
    "cachedConstructor" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "hasRealParameterData" : false,
      "parameterTypes" : [ null, null ],
      "signature" : "some signature",
      "annotations" : [ 30 ],
      "securityCheckCache" : null,
      "constructorAccessor" : null,
      "slot" : 25006,
      "modifiers" : 3408,
      "ACCESS_PERMISSION" : null,
      "exceptionTypes" : [ null ],
      "root" : null,
      "override" : false,
      "parameterAnnotations" : [ 71, 121 ],
      "reflectionFactory" : null,
      "clazz" : null,
      "parameters" : [ null ]
    },
    "useCaches" : true,
    "SYNTHETIC" : 7276,
    "annotationType" : {
      "inherited" : true,
      "members" : { },
      "memberDefaults" : { },
      "$assertionsDisabled" : false,
      "memberTypes" : { },
      "retention" : "RUNTIME"
    },
    "newInstanceCallerCache" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 30429,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 13473,
      "initted" : true,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 5278,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null, null ],
      "serialVersionUID" : 18766,
      "ANNOTATION" : 3482,
      "enumConstants" : [ null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "reflectionData" : {
      "next" : null,
      "discovered" : null,
      "referent" : null,
      "pending" : null,
      "lock" : null,
      "clock" : 1663,
      "queue" : null,
      "timestamp" : 29342
    },
    "classValueMap" : { },
    "serialPersistentFields" : [ {
      "field" : null,
      "offset" : 20136,
      "signature" : "some signature",
      "unshared" : true,
      "name" : "some name",
      "type" : null
    } ],
    "serialVersionUID" : 7837,
    "ANNOTATION" : 12014,
    "enumConstants" : [ { }, { } ],
    "name" : "some name",
    "reflectionFactory" : {
      "inflationThreshold" : 192,
      "initted" : false,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : false
    },
    "allPermDomain" : {
      "staticPermissions" : false,
      "debug" : null,
      "hasAllPerm" : true,
      "codesource" : null,
      "permissions" : null,
      "classloader" : null,
      "principals" : [ null, null ],
      "key" : null
    }
  }
}
Response

This method returns execution results in format application/rdf+xml

Configure the Response Format

You can now manipulate the response format to any RDF format, as also defined here: http://docs.rdf4j.org/javadoc/2.3/org/eclipse/rdf4j/rio/RDFFormat.html

Example Formats:
  • application/rdf+xml

  • application/n-triples

  • application/x-turtle

  • application/trix

  • application/trig

In order to configure the response format, use an additional Accept header in your call.

Using an HTTP REST client, such as Postman, the call would look like this:

23901367.png