Skip to main content

Annotations Service

Abstract

Annotations Service

The annotations service enables you to use Web service method calls for adding and removing information (annotations) to and from documents or files.

The following methods are available for the annotations service:

Web Service Method: Annotate and Store from Text

Abstract

Web Service Method: Annotate and Store from Text

Description

[text] Returns the document annotated with extracted concepts and extracted terms in RDF/XML representation.

URL: /extractor/api/annotate/store/text

Request

Supported Methods

POST

Content-Type

Content-Type: application/json

HTTP Parameter

Parameter

Type

Required

Comment

categorizationWithPpxBoost

boolean

false

Use Extractor boosting, default = false

categorize

boolean

false

Categorization extraction, default = false

conceptMinimumScore

Double

false

Minimum required score of concepts, default = 0

conceptSchemeFilters

Array of String

false

Concept scheme URI filters

corpusScoring

Array of String

false

Corpus term scoring. Enabled if corpusIds (UUID) are provided.

customAttributeFilters

Array of CustomProperty

false

Custom attribute (property uri and string value) filters

customClassFilters

Array of String

false

Custom class URI filters

disambiguate

boolean

false

Use thesaurus based disambiguation, default = false

displayText

boolean

false

Include text extracted from url in response, default = false

documentClassifierIds

Array of String

false

Enable document classification by giving the document classifier IDs as input

documentId

String

false

Internal ID of the document, taken from documentUri

documentUri

String

true

URI of annotated document, used as ID

extractorVersion

String

false

Version of PPX Extractor used

filterNestedConcepts

boolean

false

Remove concepts matches which are contained within other matches, default = true

findPersonNames

boolean

false

Deprecated (use nerParameters) - extracts person names from the given text

graphName

String

false

The name of the graph in the remote repository the PPX results gets written to

language

String

false

Extraction language (en|de|es|fr|...)

lemmatization

boolean

false

Use lemmatization, default = true

locationExtraction

boolean

false

Deprecated (use nerParameters) - extracts locations from the given text

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

numberOfConcepts

Integer

false

Retrieve number of concepts, default = 25

numberOfTerms

Integer

false

Retrieve number of terms, default = 25

phraseLength

Integer

false

Phrase length, default = 4

projectId

Array of String

false

Thesaurus projectIds

properties

Array of String

false

Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

Set to all to fetch all properties.

regexFilename

String

false

File name for regex patterns

repositoryId

String

false

handle of the target repository to access, defaults to configured property 'remote.repositoryid'

resultFilterSparql

String

false

Specify an optional SPARQL query for filtering the RDF result

sentimentAnalysis

boolean

false

Sentiment analysis, default: false

shadowConceptCorpusId

Array of String

false

Shadow concepts calculation. Enabled if corpusIds (UUID) are provided.

showMatchingDetails

boolean

false

Shows which concept labels where found inside the text, default = false

showMatchingPosition

boolean

false

Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false

text

String

true

Text of the document

tfidfScoring

boolean

false

Use TFIDF scoring, default = false

title

String

false

Title of the document

useRelatedConcepts

boolean

false

Retrieve related concepts, default = false

useTransitiveBroaderConcepts

boolean

false

Retrieve transitive broader concepts, default = false

useTransitiveBroaderTopConcepts

boolean

false

Retrieve transitive broader top concepts, default = false

useTypes

boolean

false

Retrieve custom types for concepts, default = false

Example
{
  "nerParameters" : [ {
    "method" : "RULE_BASED",
    "type" : "https://semantic-web.com/api/type#26656"
  }, {
    "method" : "RULE_BASED",
    "type" : "https://semantic-web.com/api/type#24840"
  } ],
  "tfidfScoring" : true,
  "useTransitiveBroaderTopConcepts" : false,
  "language" : "fr",
  "title" : "All about Chuck Norris",
  "numberOfTerms" : 29907,
  "resultFilterSparql" : "some resultFilterSparql",
  "findPersonNames" : false,
  "conceptMinimumScore" : 0.6875532724352691,
  "customAttributeFilters" : [ {
    "property" : "https://semantic-web.com/api/property#14358",
    "value" : "some value"
  }, {
    "property" : "https://semantic-web.com/api/property#2572",
    "value" : "some value"
  } ],
  "corpusScoring" : [ "some corpusScoring", "some corpusScoring" ],
  "locationExtraction" : true,
  "useRelatedConcepts" : false,
  "customClassFilters" : [ "some customClassFilters", "some customClassFilters" ],
  "text" : "some text",
  "shadowConceptCorpusId" : [ "some shadowConceptCorpusId", "some shadowConceptCorpusId", "some shadowConceptCorpusId" ],
  "categorize" : false,
  "filterNestedConcepts" : false,
  "useTransitiveBroaderConcepts" : false,
  "displayText" : true,
  "regexFilename" : "some regexFilename",
  "categorizationWithPpxBoost" : false,
  "documentUri" : "some documentUri",
  "numberOfConcepts" : 32518,
  "disambiguate" : true,
  "showMatchingPosition" : true,
  "graphName" : "some graphName",
  "extractorVersion" : "6.0.1",
  "sentimentAnalysis" : false,
  "useTypes" : false,
  "documentClassifierIds" : [ "some documentClassifierIds" ],
  "repositoryId" : "1DF1343D-0570-0001-FAAF-149079206440",
  "conceptSchemeFilters" : [ "https://semantic-web.com/api/conceptSchemeFilters#29423", "https://semantic-web.com/api/conceptSchemeFilters#2556", "https://semantic-web.com/api/conceptSchemeFilters#31614" ],
  "documentId" : "corpusDocument:0ac32384-b3c2-4e62-8bcf-7ed4fd67b630",
  "lemmatization" : false,
  "projectId" : [ "some projectId" ],
  "properties" : [ "https://semantic-web.com/api/properties#5962", "https://semantic-web.com/api/properties#2227" ],
  "showMatchingDetails" : true
}
ResponseContent Type

text/plain

Web Service Method: Annotate and Store from File

Abstract

Web Service Method: Annotate and Store from File

Description

[file] Annotates the file with extracted concepts in RDF/XML format and stores it in the remote repository.

URL: /extractor/api/annotate

Request

Supported Methods

POST

Content-Type

Content-Type: multipart/form-data

HTTP Parameter

Parameter

Type

Required

Description

categorizationWithPpxBoost

Boolean

false

Use Extractor boosting, default = false

categorize

Boolean

false

Categorization extraction, default = false

conceptMinimumScore

Double

false

Minimum required score of concepts, default = 0

conceptSchemeFilters

Array of String

false

Concept scheme URI filters

corpusScoring

Array of String

false

Corpus term scoring. Enabled if corpusIds (UUID) are provided.

customAttributeFilters

Array of CustomProperty

false

Custom attribute (property uri and string value) filters

customClassFilters

Array of String

false

Custom class URI filters

disambiguate

Boolean

false

Use thesaurus based disambiguation, default = false

displayText

Boolean

false

Include text extracted from url in response, default = false

documentClassifierIds

Array of String

false

Enable document classification by giving the document classifier IDs as input.

documentId

String

false

Internal ID of the document, taken from documentUri.

documentUri

String

true

URI of annotated document, used as ID

extractorVersion

String

false

Version of PPX Extractor used

file

MultipartFile

true

File to be annotated (Word, Excel, PowerPoint, PDF, open document) - Mime type of request must be 'multipart/form-data'

filterNestedConcepts

Boolean

false

Remove concepts matches which are contained within other matches, default = true

findPersonNames

Boolean

false

Deprecated (use nerParameters) - extracts person names from the given text

language

String

false

Extraction language (en|de|es|fr|...)

lemmatization

Boolean

false

Use lemmatization, default = true

locationExtraction

Boolean

false

Deprecated (use nerParameters) - extracts locations from the given text

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

numberOfConcepts

Integer

false

Retrieve number of concepts, default = 25

numberOfTerms

Integer

false

Retrieve number of terms, default = 25

phraseLength

Integer

false

Phrase length, default = 4

projectId

Array of String

false

Thesaurus projectIds

properties

Array of String

false

Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

Set to all to fetch all properties.

regexFilename

String

false

File name for regex patterns

resultFilterSparql

String

false

Specify an optional SPARQL query for filtering the RDF result

sentimentAnalysis

Boolean

false

Sentiment analysis, default: false

shadowConceptCorpusId

Array of String

false

Shadow concepts calculation. Enabled if corpusIds (UUID) are provided.

showMatchingDetails

Boolean

false

Shows which concept labels where found inside the text, default = false

showMatchingPosition

Boolean

false

Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false

tfidfScoring

Boolean

false

Use TFIDF scoring, default = false

title

String

false

Title of the document

useRelatedConcepts

Boolean

false

Retrieve related concepts, default = false

useTransitiveBroaderConcepts

Boolean

false

Retrieve transitive broader concepts, default = false

useTransitiveBroaderTopConcepts

Boolean

false

Retrieve transitive broader top concepts, default = false

useTypes

Boolean

false

Retrieve custom types for concepts, default = false

Array of Custom Property

Attribute

Type

Required

Comment

property

String

false

Property

value

String

false

Value

Example
{
  "property" : "https://semantic-web.com/api/property#26100",
  "value" : "some value"
}
Array of Named Entity Recognition Configuration

Attribute

Type

Required

Comment

method

Method

false

Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY

type

String

false

Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location

Example
{
  "method" : "MAXIMUM_ENTROPY",
  "type" : "https://semantic-web.com/api/type#3179"
}
Array of ObjectStreamField

An ObjectStreamField object.

Attribute

Type

Required

Comment

field

Field

false

name

String

false

offset

int

false

signature

String

false

type

Class

false

unshared

Boolean

false

Example of ObjectStreamField Array

Click here to expand...

{
  "field" : {
    "genericInfo" : {
      "factory" : null,
      "tree" : null,
      "genericType" : null
    },
    "declaredAnnotations" : { },
    "overrideFieldAccessor" : { },
    "signature" : "some signature",
    "annotations" : [ 48 ],
    "securityCheckCache" : { },
    "slot" : 26477,
    "fieldAccessor" : { },
    "modifiers" : 24139,
    "type" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 32463,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 22746,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 27448,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null ],
      "serialVersionUID" : 10320,
      "ANNOTATION" : 14968,
      "enumConstants" : [ null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "ACCESS_PERMISSION" : {
      "serialVersionUID" : 26505,
      "name" : "some name"
    },
    "root" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "overrideFieldAccessor" : null,
      "signature" : "some signature",
      "annotations" : [ 119, 46, 76 ],
      "securityCheckCache" : null,
      "slot" : 5243,
      "fieldAccessor" : null,
      "modifiers" : 32720,
      "type" : null,
      "ACCESS_PERMISSION" : null,
      "root" : null,
      "name" : "some name",
      "override" : false,
      "reflectionFactory" : null,
      "clazz" : null
    },
    "name" : "some name",
    "override" : true,
    "reflectionFactory" : {
      "inflationThreshold" : 19524,
      "initted" : true,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : false
    },
    "clazz" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 14462,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 26901,
      "initted" : true,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 15733,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null ],
      "serialVersionUID" : 1996,
      "ANNOTATION" : 14537,
      "enumConstants" : [ null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    }
  },
  "offset" : 8626,
  "signature" : "some signature",
  "unshared" : false,
  "name" : "some name",
  "type" : {
    "annotationData" : {
      "declaredAnnotations" : { },
      "redefinedCount" : 4678,
      "annotations" : { }
    },
    "genericInfo" : {
      "factory" : null,
      "superclass" : null,
      "tree" : null,
      "typeParams" : [ null, null, null ],
      "NONE" : null,
      "superInterfaces" : [ null, null, null ]
    },
    "ENUM" : 4825,
    "enumConstantDirectory" : { },
    "classRedefinedCount" : 19620,
    "initted" : true,
    "cachedConstructor" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "hasRealParameterData" : false,
      "parameterTypes" : [ null, null ],
      "signature" : "some signature",
      "annotations" : [ 0, 59 ],
      "securityCheckCache" : null,
      "constructorAccessor" : null,
      "slot" : 29100,
      "modifiers" : 9877,
      "ACCESS_PERMISSION" : null,
      "exceptionTypes" : [ null ],
      "root" : null,
      "override" : false,
      "parameterAnnotations" : [ 74, 86 ],
      "reflectionFactory" : null,
      "clazz" : null,
      "parameters" : [ null, null ]
    },
    "useCaches" : false,
    "SYNTHETIC" : 24161,
    "annotationType" : {
      "inherited" : false,
      "members" : { },
      "memberDefaults" : { },
      "$assertionsDisabled" : false,
      "memberTypes" : { },
      "retention" : "RUNTIME"
    },
    "newInstanceCallerCache" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 1033,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 10123,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : false,
      "SYNTHETIC" : 6635,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null, null ],
      "serialVersionUID" : 6521,
      "ANNOTATION" : 26847,
      "enumConstants" : [ null, null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "reflectionData" : {
      "next" : null,
      "discovered" : null,
      "referent" : null,
      "pending" : null,
      "lock" : null,
      "clock" : 24478,
      "queue" : null,
      "timestamp" : 16613
    },
    "classValueMap" : { },
    "serialPersistentFields" : [ {
      "field" : null,
      "offset" : 22640,
      "signature" : "some signature",
      "unshared" : true,
      "name" : "some name",
      "type" : null
    }, {
      "field" : null,
      "offset" : 23255,
      "signature" : "some signature",
      "unshared" : false,
      "name" : "some name",
      "type" : null
    } ],
    "serialVersionUID" : 20269,
    "ANNOTATION" : 24004,
    "enumConstants" : [ { }, { } ],
    "name" : "some name",
    "reflectionFactory" : {
      "inflationThreshold" : 17765,
      "initted" : true,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : true
    },
    "allPermDomain" : {
      "staticPermissions" : false,
      "debug" : null,
      "hasAllPerm" : false,
      "codesource" : null,
      "permissions" : null,
      "classloader" : null,
      "principals" : [ null, null ],
      "key" : null
    }
  }
}
ResponseContent Type

text/plain

Status: 200 - OK

Write PoolParty Extractor Results Into a Graph Database

Abstract

Write PoolParty Extractor Results Into a Graph Database

When a graph database is configured as remote repository then this service can be used to annotate documents and write the results directly into the graph database.

  • Method: annotateAndStore

  • URL: /extractor/api/annotate/store

This API call accepts plain text, a web page referenced by an URL, and an uploaded file as input.

Plain text input

Supported Methods

GET

POST

Specific HTTP Parameters

Parameter

Type

Required

Value range

Comment

text

String

true

The text to be used for the extraction request.

title

String

false

The title of the document.

Web pages as input

Supported Methods

GET

POST

Specific HTTP Parameters

Parameter

Type

Required

Value range

Comment

url

String

true

The Url to the document be used for the extraction request.

File as input

Supported Methods

POST

The Mimetype of request must be 'multipart/form-data'.

Specific HTTP Parameters

Parameter

Type

Required

Value range

Comment

file

MultipartFile

true

The file to be uploaded for the extraction request. Supported input formats are Word, Excel, Powerpoint, Pdf, Open Document Format.

Common HTTP Parameters

Parameter

Type

Required

Value range

Comment

projectId

String

true

The unique identifier of the PoolParty project to use for the extraction (the UUID of the project e.g. "d06bd0f8-03e4-45e0-8683-fed428fca242")

text

String

true

The text to be used for the extraction request.

documentUri

String

true

A URI to identify the document.

graphName

String

false

The URI of the graph in the graph database where the results should be written to. If not specified a new graph with the name of the document will be created.

language

String

true

The language of the text (e.g. "en", "de", "es", "fr", ...).

Note

A stop word list is only available for the following languages: en (english), de (german), fr (french). Other languages can be added on demand.

CJK languages are not supported.

transitiveBroaderConcepts

boolean

false

  • true

  • false

Retrieve transitive broader concepts.

  • true - The URIs of transitive broader concepts are returned along with the extracted concepts.

  • false - No transitive broaders are returned (default)

Depending on the depth of the thesaurus hierarchy this option can return a large number of transitive broaders per concept. Only set this parameter totrue if you really need the information.

transitiveBroaderTopConcepts

boolean

false

  • true

  • false

Retrieve transitive broader top concepts.

  • true - The URIs of transitive broader concepts that are top concepts are returned.

  • false - No transitive broader top concepts are returned (default)

relatedConcepts

boolean

false

  • true

  • false

Retrieve related concepts.

  • true - The URIs of the related concepts are returned.

  • false - No related concepts are returned (default)

numberOfConcepts

Integer

false

The number of concepts to be retrieved.

numberOfTerms

Integer

false

The number of terms to be retrieved.

This service generates an RDF graph of for the results in the same way as the annotate service that is written into the installed graph database. A document URI has to be specified for each document that is used to identify the documents in the store. If a graph name is provided the results are written to that graph (useful if one processes document sets). If not graph name is provided the results for each document are written into a separate graph based on the document URI.

Web Service Method: Annotate and Store from URL

Abstract

Web Service Method: Annotate and Store from URL

Description

[url] Annotates the document from the url with extracted concepts and stores it in the remote repository.

URL: /extractor/api/annotate/store

Request

Supported Methods

POST

Content-Type

Content-Type: application/json

Response

This method returns execution results in JSON format.

HTTP Parameters

Parameter

Description

Type

Required

url

Url to document be annotated

String

true

language

Language of text (en|de|es|fr|...)

String

false

documentUri

Internal ID of the document

String

true

graphName

The name of the graph in the remote repository the PPX results gets written to

String

false

projectIds

Thesaurus projectIds

String

false

conceptSchemeFilters

Concept scheme filters

String

false

customClassFilters

Custom class filters

String

false

numberOfTerms

Number of terms to return

Integer

false

numberOfConcepts

Number of concepts to return

Integer

false

conceptMinimumScore

Minimum required score of concepts, default = 0

Double

false

useTransitiveBroaderConcepts

Retrieve transitive broader concepts of the extracted concepts, default = false

Boolean

false

useTransitiveBroaderTopConcepts

Retrieve transitive broader top concepts of the extracted concepts, default = false

Boolean

false

useRelatedConcepts

Retrieve related concepts of the extracted concepts, default = false

Boolean

false

disambiguate

Use thesaurus based disambiguation, default = false

boolean

false

useTypes

Retrieve the custom types for concepts, default = false

boolean

false

tfidfScoring

Use tfidf scoring, default = false

boolean

false

corpusScoring

Adapt the document scores with the corpus analysis. Enabled if corpusId (uuid) is provided, default = disabled

String

false

properties

Array of custom class attributes and relations that will be fetched by providing their property URIs as input.

Set to all to fetch all properties.

String

false

phraseLength

Phrase length, default = 4

Integer

false

Web Service Method: Annotate from Text

Abstract

Web Service Method: Annotate from Text

Description

[text] Returns the document annotated with extracted concepts and extracted terms in RDF/XML representation.

URL: /extractor/api/annotate

Request

Supported Methods

POST

GET

Content-Type

Content-Type: application/x-www-form-urlencoded

HTTP Parameters

Parameter

Type

Required

Description

categorizationWithPpxBoost

boolean

false

Use Extractor boosting, default = false

categorize

boolean

false

Categorization extraction, default = false

conceptMinimumScore

Double

false

Minimum required score of concepts, default = 0

conceptSchemeFilters

Array of String

false

Concept scheme URI filters

corpusScoring

Array of String

false

Corpus term scoring. Enabled if corpusIds (UUID) are provided.

customAttributeFilters

Array of CustomProperty

false

Custom attribute (property uri and string value) filters

customClassFilters

Array of String

false

Custom class URI filters

disambiguate

boolean

false

Use thesaurus based disambiguation, default = false

displayText

boolean

false

Include text extracted from url in response, default = false

documentClassifierIds

Array of String

false

Enable document classification by giving the document classifier IDs as input

documentId

String

false

Internal ID of the document, taken from documentUri

documentUri

String

true

URI of annotated document, used as ID

extraConceptLanguages

Array of PPLocale

false

Additional languages used for concept extraction (en|de|es|fr|...)

extractorVersion

String

false

Version of PPX Extractor used

filterNestedConcepts

boolean

false

Remove concepts matches which are contained within other matches, default = true

findPersonNames

boolean

false

Deprecated (use nerParameters) - extracts person names from the given text

language

PPLocale

false

Extraction language (en|de|es|fr|...)

lemmatization

boolean

false

Use lemmatization, default = true

locationExtraction

boolean

false

Deprecated (use nerParameters) - extracts locations from the given text

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

numberOfConcepts

Integer

false

Retrieve number of concepts, default = 25

numberOfTerms

Integer

false

Retrieve number of terms, default = 25

phraseLength

Integer

false

Phrase length, default = 4

projectId

Array of String

false

Thesaurus projectIds

properties

Array of String

false

Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

Set to all to fetch all properties.

regexFilename

String

false

File name for regex patterns

resultFilterSparql

String

false

Specify an optional SPARQL query for filtering the RDF result

sentimentAnalysis

boolean

false

Sentiment analysis, default: false

shadowConceptCorpusId

Array of String

false

Shadow concepts calculation. Enabled if corpusIds (UUID) are provided.

showMatchingDetails

boolean

false

Shows which concept labels where found inside the text, default = false

showMatchingPosition

boolean

false

Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false

text

String

true

Text of the document

tfidfScoring

boolean

false

Use TFIDF scoring, default = false

title

String

false

Title of the document

useRelatedConcepts

boolean

false

Retrieve related concepts, default = false

useTransitiveBroaderConcepts

boolean

false

Retrieve transitive broader concepts, default = false

useTransitiveBroaderTopConcepts

boolean

false

Retrieve transitive broader top concepts, default = false

useTypes

boolean

false

Retrieve custom types for concepts, default = false

CASE_INSENSITIVE_ORDER

Comparator

false

hash

int

false

serialPersistentFields

Array of ObjectStreamField

false

serialVersionUID

long

false

value

Array of char

false

Custom Property Object

Attribute

Type

Required

Comment

property

String

false

Property

value

String

false

Value

Example of CustomProperty Object
{
  "property" : "https://semantic-web.com/api/property#30874",
  "value" : "some value"
}
NERConfig Object

Named Entity Recognition configuration

Attribute

Type

Required

Comment

method

Method

false

Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY

type

String

false

Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location

Example for an NER Configuration
{
  "method" : "MAXIMUM_ENTROPY",
  "type" : "https://semantic-web.com/api/type#16216"
}
ObjectStreamField Object

Attribute

Type

Required

Comment

field

Field

false

name

String

false

offset

int

false

signature

String

false

type

Class

false

unshared

boolean

false

Example of an ObjectStreamField Object

Click here to expand...

{
  "field" : {
    "genericInfo" : {
      "factory" : null,
      "tree" : null,
      "genericType" : null
    },
    "declaredAnnotations" : { },
    "overrideFieldAccessor" : { },
    "signature" : "some signature",
    "annotations" : [ 54, 99 ],
    "securityCheckCache" : { },
    "slot" : 7232,
    "fieldAccessor" : { },
    "modifiers" : 9075,
    "type" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 10078,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 31375,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : false,
      "SYNTHETIC" : 4809,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null ],
      "serialVersionUID" : 24960,
      "ANNOTATION" : 32249,
      "enumConstants" : [ null, null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "ACCESS_PERMISSION" : {
      "serialVersionUID" : 17792,
      "name" : "some name"
    },
    "root" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "overrideFieldAccessor" : null,
      "signature" : "some signature",
      "annotations" : [ 44 ],
      "securityCheckCache" : null,
      "slot" : 1259,
      "fieldAccessor" : null,
      "modifiers" : 15456,
      "type" : null,
      "ACCESS_PERMISSION" : null,
      "root" : null,
      "name" : "some name",
      "override" : true,
      "reflectionFactory" : null,
      "clazz" : null
    },
    "name" : "some name",
    "override" : true,
    "reflectionFactory" : {
      "inflationThreshold" : 16638,
      "initted" : true,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : true
    },
    "clazz" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 30901,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 13157,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 18792,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null ],
      "serialVersionUID" : 7011,
      "ANNOTATION" : 3943,
      "enumConstants" : [ null, null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    }
  },
  "offset" : 7713,
  "signature" : "some signature",
  "unshared" : false,
  "name" : "some name",
  "type" : {
    "annotationData" : {
      "declaredAnnotations" : { },
      "redefinedCount" : 9363,
      "annotations" : { }
    },
    "genericInfo" : {
      "factory" : null,
      "superclass" : null,
      "tree" : null,
      "typeParams" : [ null, null ],
      "NONE" : null,
      "superInterfaces" : [ null, null ]
    },
    "ENUM" : 10784,
    "enumConstantDirectory" : { },
    "classRedefinedCount" : 11824,
    "initted" : true,
    "cachedConstructor" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "hasRealParameterData" : true,
      "parameterTypes" : [ null, null, null ],
      "signature" : "some signature",
      "annotations" : [ 85, 4, 110 ],
      "securityCheckCache" : null,
      "constructorAccessor" : null,
      "slot" : 20713,
      "modifiers" : 7871,
      "ACCESS_PERMISSION" : null,
      "exceptionTypes" : [ null, null, null ],
      "root" : null,
      "override" : false,
      "parameterAnnotations" : [ 42 ],
      "reflectionFactory" : null,
      "clazz" : null,
      "parameters" : [ null, null ]
    },
    "useCaches" : true,
    "SYNTHETIC" : 5209,
    "annotationType" : {
      "inherited" : true,
      "members" : { },
      "memberDefaults" : { },
      "$assertionsDisabled" : true,
      "memberTypes" : { },
      "retention" : "SOURCE"
    },
    "newInstanceCallerCache" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 139,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 12963,
      "initted" : true,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 23823,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null, null ],
      "serialVersionUID" : 10600,
      "ANNOTATION" : 1872,
      "enumConstants" : [ null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "reflectionData" : {
      "next" : null,
      "discovered" : null,
      "referent" : null,
      "pending" : null,
      "lock" : null,
      "clock" : 31429,
      "queue" : null,
      "timestamp" : 5262
    },
    "classValueMap" : { },
    "serialPersistentFields" : [ {
      "field" : null,
      "offset" : 18525,
      "signature" : "some signature",
      "unshared" : true,
      "name" : "some name",
      "type" : null
    }, {
      "field" : null,
      "offset" : 6141,
      "signature" : "some signature",
      "unshared" : true,
      "name" : "some name",
      "type" : null
    } ],
    "serialVersionUID" : 7640,
    "ANNOTATION" : 16250,
    "enumConstants" : [ { }, { }, { } ],
    "name" : "some name",
    "reflectionFactory" : {
      "inflationThreshold" : 19353,
      "initted" : false,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : false
    },
    "allPermDomain" : {
      "staticPermissions" : true,
      "debug" : null,
      "hasAllPerm" : true,
      "codesource" : null,
      "permissions" : null,
      "classloader" : null,
      "principals" : [ null, null ],
      "key" : null
    }
  }
}
ResponseDefault:

Content-type: text/plain

Status: 200 - Ok

Configure the Response Format

You can now manipulate the response format to any RDF format, as also defined here: http://docs.rdf4j.org/javadoc/2.3/org/eclipse/rdf4j/rio/RDFFormat.html

Example Formats:
  • application/rdf+xml

  • application/n-triples

  • application/x-turtle

  • application/trix

  • application/trig

In order to configure the response format, use an additional Accept header in your call.

Using an HTTP REST client, such as Postman, the call would look like this:

23901367.png

Get Extraction Results as RDF

Abstract

Get Extraction Results as RDF

To obtain the extraction results as an RDF document use the PoolParty Extractor 'annotate' service. It expects the same parameters as the 'extract' services but returns the results as RDF/XML.

The only difference is that the parameter 'documentId' is required because it is part of the RDF results.

URL: /extractor/api/annotate

This API call accepts plain text, a web page referenced by a URL, and an uploaded file as input.

Plain text input

Supported Methods

GET

POST

Specific HTTP Parameters

Parameter

Type

Required

Value range

Comment

text

String

true

The text to be used for the extraction request.

title

String

false

The title of the document.

Web pages as input

Supported Methods

GET

POST

Specific HTTP Parameters

Parameter

Type

Required

Value range

Comment

url

String

true

The Url to the document be used for the extraction request.

File as input

Supported Methods

POST

The Mimetype of request must be 'multipart/form-data'.

Specific HTTP Parameters

Parameter

Type

Required

Value range

Comment

file

MultipartFile

true

The file to be uploaded for the extraction request. Supported input formats are Word, Excel, Powerpoint, Pdf, Open Document Format.

Common HTTP Parameters

The parameters are very similar to the /api/extract call (see Concept Extraction Service for more details).

Parameter

Type

Required

Value range

Comment

text

String

true

language

String

true

documentUri

String

true

The URI that will be used in the RDF output of the method.

projectId

String

false

numberOfConcepts

Integer

false

numberOfTerms

Integer

false

useTransitiveBroaderConcepts

Boolean

false

useTransitiveBroaderTopConcepts

Boolean

false

useRelatedConcepts

Boolean

false

Example

A simple example with a text of just one word would look like this:

http://test.semantic-web.at/extractor/api/annotate?projectId=1DAB156D-F01F-0001-ABCE-16301D4023C0&language=en&text=Aspirin&documentUri=SWC:1
Result:
<rdf:RDF>
        <rdf:Description rdf:about="SWC:1">
                <ctag:tagged rdf:resource="ppx:98838d58-3650-4c10-8b60-896a663cdca8"/>
        </rdf:Description>
        <rdf:Description rdf:about="ppx:98838d58-3650-4c10-8b60-896a663cdca8">
                <ctag:label xml:lang="en">Aspirin</ctag:label>
                <ctag:means rdf:resource="http://www.nlm.nih.gov/mesh/D001241"/>
                <rdf:type rdf:resource="http://commontag.org/ns#AutoTag"/>
        </rdf:Description>
        <rdf:Description rdf:about="http://www.nlm.nih.gov/mesh/D001241">
                <ppx:score rdf:datatype="http://www.w3.org/2001/XMLSchema#long">100</ppx:score>
                <skos:altLabel xml:lang="en">Micristin</skos:altLabel>
                <skos:altLabel xml:lang="en">Polopirin</skos:altLabel>
                <skos:altLabel xml:lang="en">Magnecyl</skos:altLabel>
                <skos:altLabel xml:lang="en">Zorprin</skos:altLabel>
                <skos:altLabel xml:lang="en">Ecotrin</skos:altLabel>
                <skos:altLabel xml:lang="en">Solupsan</skos:altLabel>
                <skos:altLabel xml:lang="en">Acetylsalicylic Acid</skos:altLabel>
                <skos:altLabel xml:lang="en">Solprin</skos:altLabel>
                <skos:altLabel xml:lang="en">Dispril</skos:altLabel>
                <skos:altLabel xml:lang="en">Acid, Acetylsalicylic</skos:altLabel>
                <skos:altLabel xml:lang="en">Aloxiprimum</skos:altLabel>
                <skos:altLabel xml:lang="en">Endosprin</skos:altLabel>
                <skos:altLabel xml:lang="en">2-(Acetyloxy)benzoic Acid</skos:altLabel>
                <skos:altLabel xml:lang="en">Easprin</skos:altLabel>
                <skos:altLabel xml:lang="en">Polopiryna</skos:altLabel>
                <skos:altLabel xml:lang="en">Acetysal</skos:altLabel>
                <skos:altLabel xml:lang="en">Colfarit</skos:altLabel>
                <skos:altLabel xml:lang="en">Acylpyrin</skos:altLabel>
                <skos:inScheme rdf:resource="http://www.nlm.nih.gov/mesh/Chemicals_and_Drugs"/>
        </rdf:Description>
        <rdf:Description rdf:about="ppx:98838d58-3650-4c10-8b60-896a663cdca8">
        <ctag:taggingDate>Wed Nov 20 14:33:29 CET 2013</ctag:taggingDate>
        </rdf:Description>
</rdf:RDF>

The result contains 3 types of resources. The document, the tagging events, and the description of the annotated concepts. The document links to the tagging events (predicate 'ctag:tagged') that contain the label (the 'skos:prefLabel' of the concept) and link themselves to the information about the annotated concept (predicate 'ctag:means').

Web Service Method: Annotate from Text using JSON

Description

[text] Returns the document annotated with extracted concepts and extracted terms in RDF/XML representation.

URL: /extractor/api/annotate/text

Request

Supported Methods

POST

Content-Type

Content-Type: application/json

HTTP Parameters

Parameter

Type

Required

Description

CASE_INSENSITIVE_ORDER

Comparator

false

hash

int

false

serialPersistentFields

Array of ObjectStreamField

false

serialVersionUID

long

false

value

Array of char

false

SC_ACCEPTED

int

false

SC_BAD_GATEWAY

int

false

SC_BAD_REQUEST

int

false

SC_CONFLICT

int

false

SC_CONTINUE

int

false

SC_CREATED

int

false

SC_EXPECTATION_FAILED

int

false

SC_FORBIDDEN

int

false

SC_FOUND

int

false

SC_GATEWAY_TIMEOUT

int

false

SC_GONE

int

false

SC_HTTP_VERSION_NOT_SUPPORTED

int

false

SC_INTERNAL_SERVER_ERROR

int

false

SC_LENGTH_REQUIRED

int

false

SC_METHOD_NOT_ALLOWED

int

false

SC_MOVED_PERMANENTLY

int

false

SC_MOVED_TEMPORARILY

int

false

SC_MULTIPLE_CHOICES

int

false

SC_NON_AUTHORITATIVE_INFORMATION

int

false

SC_NOT_ACCEPTABLE

int

false

SC_NOT_FOUND

int

false

SC_NOT_IMPLEMENTED

int

false

SC_NOT_MODIFIED

int

false

SC_NO_CONTENT

int

false

SC_OK

int

false

SC_PARTIAL_CONTENT

int

false

SC_PAYMENT_REQUIRED

int

false

SC_PRECONDITION_FAILED

int

false

SC_PROXY_AUTHENTICATION_REQUIRED

int

false

SC_REQUESTED_RANGE_NOT_SATISFIABLE

int

false

SC_REQUEST_ENTITY_TOO_LARGE

int

false

SC_REQUEST_TIMEOUT

int

false

SC_REQUEST_URI_TOO_LONG

int

false

SC_RESET_CONTENT

int

false

SC_SEE_OTHER

int

false

SC_SERVICE_UNAVAILABLE

int

false

SC_SWITCHING_PROTOCOLS

int

false

SC_TEMPORARY_REDIRECT

int

false

SC_UNAUTHORIZED

int

false

SC_UNSUPPORTED_MEDIA_TYPE

int

false

SC_USE_PROXY

int

false

Request Body

TextAnnotateRequest

Annotation request

Attribute

Type

Required

Comment

categorizationWithPpxBoost

boolean

false

Use Extractor boosting, default = false

categorize

boolean

false

Categorization extraction, default = false

conceptMinimumScore

Double

false

Minimum required score of concepts, default = 0

conceptSchemeFilters

Array of String

false

Concept scheme URI filters

corpusScoring

Array of String

false

Corpus term scoring. Enabled if corpusIds (UUID) are provided.

customAttributeFilters

Array of CustomProperty

false

Custom attribute (property uri and string value) filters

customClassFilters

Array of String

false

Custom class URI filters

disambiguate

boolean

false

Use thesaurus based disambiguation, default = false

displayText

boolean

false

Include text extracted from url in response, default = false

documentClassifierIds

Array of String

false

Enable document classification by giving the document classifier IDs as input

documentId

String

false

Internal ID of the document, taken from documentUri

documentUri

String

true

URI of annotated document, used as ID

extractorVersion

String

false

Version of PPX Extractor used

filterNestedConcepts

boolean

false

Remove concepts matches which are contained within other matches, default = true

findPersonNames

boolean

false

Deprecated (use nerParameters) - extracts person names from the given text

language

String

false

Extraction language (en|de|es|fr|...)

lemmatization

boolean

false

Use lemmatization, default = true

locationExtraction

boolean

false

Deprecated (use nerParameters) - extracts locations from the given text

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

numberOfConcepts

Integer

false

Retrieve number of concepts, default = 25

numberOfTerms

Integer

false

Retrieve number of terms, default = 25

phraseLength

Integer

false

Phrase length, default = 4

projectId

Array of String

false

Thesaurus projectIds

properties

Array of String

false

Array of custom class attributes and relations that will be fetched by providing their property URIs as input.

Set to all to fetch all properties.

regexFilename

String

false

File name for regex patterns

resultFilterSparql

String

false

Specify an optional SPARQL query for filtering the RDF result

sentimentAnalysis

boolean

false

Sentiment analysis, default: false

shadowConceptCorpusId

Array of String

false

Shadow concepts calculation. Enabled if corpusIds (UUID) are provided.

showMatchingDetails

boolean

false

Shows which concept labels where found inside the text, default = false

showMatchingPosition

boolean

false

Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false

text

String

true

Text of the document

tfidfScoring

boolean

false

Use TFIDF scoring, default = false

title

String

false

Title of the document

useRelatedConcepts

boolean

false

Retrieve related concepts, default = false

useTransitiveBroaderConcepts

boolean

false

Retrieve transitive broader concepts, default = false

useTransitiveBroaderTopConcepts

boolean

false

Retrieve transitive broader top concepts, default = false

useTypes

boolean

false

Retrieve custom types for concepts, default = false

Comparator

A Comparator object.

Attribute

Type

Required

Comment

Example

{ }

ObjectStreamField

An ObjectStreamField object.

Attribute

Type

Required

Comment

field

Field

false

name

String

false

offset

int

false

signature

String

false

type

Class

false

unshared

boolean

false

Web Service Method: Annotate from Text in NIF Format

Abstract

Web Service Method: Annotate from Text in NIF Format

Description

[text] Returns the document annotated with extracted concepts and extracted terms in NIF format.

URL: /extractor/api/annotate/nif

Request

Supported Methods

POST

GET

Content-Type

Content-Type: application/x-www-form-urlencoded

HTTP Parameters

Parameter

Type

Required

Description

includeConcepts

boolean

false

includeNamedEntities

boolean

false

includeTerms

boolean

false

informat

String

false

The format in which the input will be processed: text (default)

input

String

true

The input to be processed by the service

intype

String

false

Determines how input is accessed or retrieved: direct (default) | url

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

outformat

String

false

The format in which the output will be serialized: turtle (default) | text | json-ld | rdfxml | ntriples | rdfa

phraseLength

Interger

false

Phrase length, default = 4

prefix

String

true

The prefix part of new URIs

projectId

Array of String

true

Thesaurus projectId

NERConfig Object

Named Entity Recognition configuration

Attribute

Type

Required

Comment

method

Method

false

Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY

type

String

false

Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location

Example for an NER Configuration
{
  "method" : "RULE_BASED",
  "type" : "https://semantic-web.com/api/type#28577"
}
ResponseDefault:

Content-type: text/plain

Status: 200 - Ok

Configure the Response Format

You can now manipulate the response format to any RDF format, as also defined here: http://docs.rdf4j.org/javadoc/2.3/org/eclipse/rdf4j/rio/RDFFormat.html

Example Formats:
  • application/rdf+xml

  • application/n-triples

  • application/x-turtle

  • application/trix

  • application/trig

In order to configure the response format, use an additional Accept header in your call.

Using an HTTP REST client, such as Postman, the call would look like this:

23901367.png

Web Service Method: Annotate from Text in NIF Format in JSON

Abstract

Web Service Method: Annotate from Text in NIF Format in JSON

Description

[text] Returns the document annotated with extracted concepts and extracted terms in NIF format.

URL: /extractor/api/annotate/nif

Request

Supported Methods

POST

Content-Type

Content-Type: application/json

HTTP Parameters

Attribute

Type

Required

Comment

includeConcepts

boolean

false

includeNamedEntities

boolean

false

includeTerms

boolean

false

informat

String

false

The format in which the input will be processed: text (default)

input

String

true

The input to be processed by the service

intype

String

false

Determines how input is accessed or retrieved: direct (default) | url

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

outformat

String

false

The format in which the output will be serialized: turtle (default) | text | json-ld | rdfxml | ntriples | rdfa

phraseLength

Integer

false

Phrase length, default = 4

prefix

String

true

The prefix part of new URIs

projectId

Array of String

true

Thesaurus projectId

NERConfig Object

Named Entity Recognition configuration

Attribute

Type

Required

Comment

method

Method

false

Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY

type

String

false

Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location

Example for an NER Configuration
{
  "input" : "some input",
  "nerParameters" : [ {
    "method" : "MAXIMUM_ENTROPY",
    "type" : "https://semantic-web.com/api/type#10113"
  }, {
    "method" : "RULE_BASED",
    "type" : "https://semantic-web.com/api/type#491"
  }, {
    "method" : "MAXIMUM_ENTROPY",
    "type" : "https://semantic-web.com/api/type#16327"
  } ],
  "informat" : "some informat",
  "prefix" : "some prefix",
  "includeTerms" : false,
  "includeNamedEntities" : true,
  "includeConcepts" : false,
  "outformat" : "some outformat",
  "projectId" : [ "some projectId", "some projectId" ],
  "intype" : "some intype"
}
ResponseDefault:

Content-type: text/plain

Status: 200 - Ok

Configure the Response Format

You can now manipulate the response format to any RDF format, as also defined here: http://docs.rdf4j.org/javadoc/2.3/org/eclipse/rdf4j/rio/RDFFormat.html

Example Formats:
  • application/rdf+xml

  • application/n-triples

  • application/x-turtle

  • application/trix

  • application/trig

In order to configure the response format, use an additional Accept header in your call.

Using an HTTP REST client, such as Postman, the call would look like this:

23901367.png

Web Service Method: Annotate from File

Abstract

Web Service Method: Annotate from File

Description

[file] Returns the document annotated with extracted concepts and extracted terms in RDF/XML representation.

URL: /extractor/api/annotate

Request

Supported Methods

POST

Content-Type

Content-Type: multipart/form-data

HTTP Parameter

Parameter

Type

Required

Description

categorizationWithPpxBoost

boolean

false

Use Extractor boosting, default = false

categorize

boolean

false

Categorization extraction, default = false

conceptMinimumScore

Double

false

Minimum required score of concepts, default = 0

conceptSchemeFilters

Array of String

false

Concept scheme URI filters

corpusScoring

Array of String

false

Corpus term scoring. Enabled if corpusIds (UUID) are provided.

customAttributeFilters

Array of CustomProperty

false

Custom attribute (property uri and string value) filters

customClassFilters

Array of String

false

Custom class URI filters

disambiguate

boolean

false

Use thesaurus based disambiguation, default = false

displayText

boolean

false

Include text extracted from url in response, default = false

documentClassifierIds

Array of String

false

Enable document classification by giving the document classifier IDs as input

documentId

String

false

Internal ID of the document, taken from documentUri

documentUri

String

true

URI of annotated document, used as ID

extractorVersion

String

false

Version of PPX Extractor used

file

MultipartFile

true

File to be annotated (word, excel, powerpoint, pdf, open document) - Mime type of request must be 'multipart/form-data'

filterNestedConcepts

boolean

false

Remove concepts matches which are contained within other matches, default = true

findPersonNames

boolean

false

Deprecated (use nerParameters) - extracts person names from the given text

language

String

false

Extraction language (en|de|es|fr|...)

lemmatization

boolean

false

Use lemmatization, default = true

locationExtraction

boolean

false

Deprecated (use nerParameters) - extracts locations from the given text

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

numberOfConcepts

Integer

false

Retrieve number of concepts, default = 25

numberOfTerms

Integer

false

Retrieve number of terms, default = 25

phraseLength

Integer

false

Phrase length, default = 4

projectId

Array of String

false

Thesaurus projectIds

properties

Array of String

false

Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

Set to all to fetch all properties.

regexFilename

String

false

File name for regex patterns

resultFilterSparql

String

false

Specify an optional SPARQL query for filtering the RDF result

sentimentAnalysis

boolean

false

Sentiment analysis, default: false

shadowConceptCorpusId

Array of String

false

Shadow concepts calculation. Enabled if corpusIds (UUID) are provided.

showMatchingDetails

boolean

false

Shows which concept labels where found inside the text, default = false

showMatchingPosition

boolean

false

Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false

tfidfScoring

boolean

false

Use TFIDF scoring, default = false

title

String

false

Title of the document

useRelatedConcepts

boolean

false

Retrieve related concepts, default = false

useTransitiveBroaderConcepts

boolean

false

Retrieve transitive broader concepts, default = false

useTransitiveBroaderTopConcepts

boolean

false

Retrieve transitive broader top concepts, default = false

useTypes

boolean

false

Retrieve custom types for concepts, default = false

Custom Property

Attribute

Type

Required

Comment

property

String

false

Property

value

String

false

Value

Example of Custom Property Object
{
  "property" : "https://semantic-web.com/api/property#6376",
  "value" : "some value"
}
Named Entity Recognition Configuration

Attribute

Type

Required

Comment

method

Method

false

Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY

type

String

false

Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location

Example of NERConfig Object
{
  "method" : "RULE_BASED",
  "type" : "https://semantic-web.com/api/type#20383"
}
ObjectStreamField Object

Attribute

Type

Required

Comment

field

Field

false

name

String

false

offset

int

false

signature

String

false

type

Class

false

unshared

boolean

false

Example of an ObjectStreamField object.
{
  "field" : {
    "genericInfo" : {
      "factory" : null,
      "tree" : null,
      "genericType" : null
    },
    "declaredAnnotations" : { },
    "overrideFieldAccessor" : { },
    "signature" : "some signature",
    "annotations" : [ 23 ],
    "securityCheckCache" : { },
    "slot" : 4350,
    "fieldAccessor" : { },
    "modifiers" : 27639,
    "type" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 2479,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 17528,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 28542,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null, null ],
      "serialVersionUID" : 19423,
      "ANNOTATION" : 2206,
      "enumConstants" : [ null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "ACCESS_PERMISSION" : {
      "serialVersionUID" : 23155,
      "name" : "some name"
    },
    "root" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "overrideFieldAccessor" : null,
      "signature" : "some signature",
      "annotations" : [ 87, 51 ],
      "securityCheckCache" : null,
      "slot" : 18207,
      "fieldAccessor" : null,
      "modifiers" : 24703,
      "type" : null,
      "ACCESS_PERMISSION" : null,
      "root" : null,
      "name" : "some name",
      "override" : true,
      "reflectionFactory" : null,
      "clazz" : null
    },
    "name" : "some name",
    "override" : true,
    "reflectionFactory" : {
      "inflationThreshold" : 28477,
      "initted" : false,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : false
    },
    "clazz" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 30581,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 12111,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 27304,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null ],
      "serialVersionUID" : 24089,
      "ANNOTATION" : 3326,
      "enumConstants" : [ null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    }
  },
  "offset" : 11522,
  "signature" : "some signature",
  "unshared" : false,
  "name" : "some name",
  "type" : {
    "annotationData" : {
      "declaredAnnotations" : { },
      "redefinedCount" : 11463,
      "annotations" : { }
    },
    "genericInfo" : {
      "factory" : null,
      "superclass" : null,
      "tree" : null,
      "typeParams" : [ null, null, null ],
      "NONE" : null,
      "superInterfaces" : [ null, null ]
    },
    "ENUM" : 2206,
    "enumConstantDirectory" : { },
    "classRedefinedCount" : 8783,
    "initted" : false,
    "cachedConstructor" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "hasRealParameterData" : false,
      "parameterTypes" : [ null, null ],
      "signature" : "some signature",
      "annotations" : [ 30 ],
      "securityCheckCache" : null,
      "constructorAccessor" : null,
      "slot" : 25006,
      "modifiers" : 3408,
      "ACCESS_PERMISSION" : null,
      "exceptionTypes" : [ null ],
      "root" : null,
      "override" : false,
      "parameterAnnotations" : [ 71, 121 ],
      "reflectionFactory" : null,
      "clazz" : null,
      "parameters" : [ null ]
    },
    "useCaches" : true,
    "SYNTHETIC" : 7276,
    "annotationType" : {
      "inherited" : true,
      "members" : { },
      "memberDefaults" : { },
      "$assertionsDisabled" : false,
      "memberTypes" : { },
      "retention" : "RUNTIME"
    },
    "newInstanceCallerCache" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 30429,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 13473,
      "initted" : true,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 5278,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null, null ],
      "serialVersionUID" : 18766,
      "ANNOTATION" : 3482,
      "enumConstants" : [ null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "reflectionData" : {
      "next" : null,
      "discovered" : null,
      "referent" : null,
      "pending" : null,
      "lock" : null,
      "clock" : 1663,
      "queue" : null,
      "timestamp" : 29342
    },
    "classValueMap" : { },
    "serialPersistentFields" : [ {
      "field" : null,
      "offset" : 20136,
      "signature" : "some signature",
      "unshared" : true,
      "name" : "some name",
      "type" : null
    } ],
    "serialVersionUID" : 7837,
    "ANNOTATION" : 12014,
    "enumConstants" : [ { }, { } ],
    "name" : "some name",
    "reflectionFactory" : {
      "inflationThreshold" : 192,
      "initted" : false,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : false
    },
    "allPermDomain" : {
      "staticPermissions" : false,
      "debug" : null,
      "hasAllPerm" : true,
      "codesource" : null,
      "permissions" : null,
      "classloader" : null,
      "principals" : [ null, null ],
      "key" : null
    }
  }
}
Response

This method returns execution results in format application/rdf+xml

Configure the Response Format

You can now manipulate the response format to any RDF format, as also defined here: http://docs.rdf4j.org/javadoc/2.3/org/eclipse/rdf4j/rio/RDFFormat.html

Example Formats:
  • application/rdf+xml

  • application/n-triples

  • application/x-turtle

  • application/trix

  • application/trig

In order to configure the response format, use an additional Accept header in your call.

Using an HTTP REST client, such as Postman, the call would look like this:

23901367.png

Web Service Method: Annotate from URL

Abstract

Web Service Method: Annotate from URL

Description

[url] Returns the document annotated with extracted concepts and extracted terms in RDF/XML representation.

URL: /extractor/api/annotate

Request

Supported Methods

POST

GET

Content-Type

application/x-www-form-urlencoded

HTTP Parameter

Parameter

Type

Required

Description

categorizationWithPpxBoost

boolean

false

Use Extractor boosting, default = false

categorize

boolean

false

Categorization extraction, default = false

conceptMinimumScore

Double

false

Minimum required score of concepts, default = 0

conceptSchemeFilters

Array of String

false

Concept scheme URI filters

corpusScoring

Array of String

false

Corpus term scoring. Enabled if corpusIds (UUID) are provided.

customAttributeFilters

Array of CustomProperty

false

Custom attribute (property uri and string value) filters

customClassFilters

Array of String

false

Custom class URI filters

disambiguate

boolean

false

Use thesaurus based disambiguation, default = false

displayText

boolean

false

Include text extracted from url in response, default = false

documentClassifierIds

Array of String

false

Enable document classification by giving the document classifier IDs as input

documentId

String

false

Internal ID of the document, taken from documentUri

documentUri

String

true

URI of annotated document, used as ID

extractorVersion

String

false

Version of PPX Extractor used

filterNestedConcepts

boolean

false

Remove concepts matches which are contained within other matches, default = true

findPersonNames

boolean

false

Deprecated (use nerParameters) - extracts person names from the given text

language

String

false

Extraction language (en|de|es|fr|...)

lemmatization

boolean

false

Use lemmatization, default = true

locationExtraction

boolean

false

Deprecated (use nerParameters) - extracts locations from the given text

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

numberOfConcepts

Integer

false

Retrieve number of concepts, default = 25

numberOfTerms

Integer

false

Retrieve number of terms, default = 25

phraseLength

Integer

false

Phrase length, default = 4

projectId

Array of String

false

Thesaurus projectIds

properties

Array of String

false

Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Furthermore it supports http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

Set to all to fetch all properties.

regexFilename

String

false

File name for regex patterns

resultFilterSparql

String

false

Specify an optional SPARQL query for filtering the RDF result

sentimentAnalysis

boolean

false

Sentiment analysis, default: false

shadowConceptCorpusId

Array of String

false

Shadow concepts calculation. Enabled if corpusIds (UUID) are provided.

showMatchingDetails

boolean

false

Shows which concept labels where found inside the text, default = false

showMatchingPosition

boolean

false

Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false

tfidfScoring

boolean

false

Use TFIDF scoring, default = false

title

String

false

Title of the document

url

String

true

URL of a web document to be annotated

useRelatedConcepts

boolean

false

Retrieve related concepts, default = false

useTransitiveBroaderConcepts

boolean

false

Retrieve transitive broader concepts, default = false

useTransitiveBroaderTopConcepts

boolean

false

Retrieve transitive broader top concepts, default = false

useTypes

boolean

false

Retrieve custom types for concepts, default = false

Array of Custom Property

Attribute

Type

Required

Comment

property

String

false

Property

value

String

false

Value

Example
{
  "property" : "https://semantic-web.com/api/property#6376",
  "value" : "some value"
}
Array of NERConfig

Named Entity Recognition Configuration

Attribute

Type

Required

Comment

method

Method

false

Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY

type

String

false

Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location

Example
{
  "method" : "RULE_BASED",
  "type" : "https://semantic-web.com/api/type#20383"
}
Array of ObjectStreamField Object

An ObjectStreamField object.

Attribute

Type

Required

Comment

field

Field

false

name

String

false

offset

int

false

signature

String

false

type

Class

false

unshared

boolean

false

Example

Click here to expand...

{
  "field" : {
    "genericInfo" : {
      "factory" : null,
      "tree" : null,
      "genericType" : null
    },
    "declaredAnnotations" : { },
    "overrideFieldAccessor" : { },
    "signature" : "some signature",
    "annotations" : [ 23 ],
    "securityCheckCache" : { },
    "slot" : 4350,
    "fieldAccessor" : { },
    "modifiers" : 27639,
    "type" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 2479,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 17528,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 28542,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null, null ],
      "serialVersionUID" : 19423,
      "ANNOTATION" : 2206,
      "enumConstants" : [ null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "ACCESS_PERMISSION" : {
      "serialVersionUID" : 23155,
      "name" : "some name"
    },
    "root" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "overrideFieldAccessor" : null,
      "signature" : "some signature",
      "annotations" : [ 87, 51 ],
      "securityCheckCache" : null,
      "slot" : 18207,
      "fieldAccessor" : null,
      "modifiers" : 24703,
      "type" : null,
      "ACCESS_PERMISSION" : null,
      "root" : null,
      "name" : "some name",
      "override" : true,
      "reflectionFactory" : null,
      "clazz" : null
    },
    "name" : "some name",
    "override" : true,
    "reflectionFactory" : {
      "inflationThreshold" : 28477,
      "initted" : false,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : false
    },
    "clazz" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 30581,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 12111,
      "initted" : false,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 27304,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null ],
      "serialVersionUID" : 24089,
      "ANNOTATION" : 3326,
      "enumConstants" : [ null, null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    }
  },
  "offset" : 11522,
  "signature" : "some signature",
  "unshared" : false,
  "name" : "some name",
  "type" : {
    "annotationData" : {
      "declaredAnnotations" : { },
      "redefinedCount" : 11463,
      "annotations" : { }
    },
    "genericInfo" : {
      "factory" : null,
      "superclass" : null,
      "tree" : null,
      "typeParams" : [ null, null, null ],
      "NONE" : null,
      "superInterfaces" : [ null, null ]
    },
    "ENUM" : 2206,
    "enumConstantDirectory" : { },
    "classRedefinedCount" : 8783,
    "initted" : false,
    "cachedConstructor" : {
      "genericInfo" : null,
      "declaredAnnotations" : { },
      "hasRealParameterData" : false,
      "parameterTypes" : [ null, null ],
      "signature" : "some signature",
      "annotations" : [ 30 ],
      "securityCheckCache" : null,
      "constructorAccessor" : null,
      "slot" : 25006,
      "modifiers" : 3408,
      "ACCESS_PERMISSION" : null,
      "exceptionTypes" : [ null ],
      "root" : null,
      "override" : false,
      "parameterAnnotations" : [ 71, 121 ],
      "reflectionFactory" : null,
      "clazz" : null,
      "parameters" : [ null ]
    },
    "useCaches" : true,
    "SYNTHETIC" : 7276,
    "annotationType" : {
      "inherited" : true,
      "members" : { },
      "memberDefaults" : { },
      "$assertionsDisabled" : false,
      "memberTypes" : { },
      "retention" : "RUNTIME"
    },
    "newInstanceCallerCache" : {
      "annotationData" : null,
      "genericInfo" : null,
      "ENUM" : 30429,
      "enumConstantDirectory" : { },
      "classRedefinedCount" : 13473,
      "initted" : true,
      "cachedConstructor" : null,
      "useCaches" : true,
      "SYNTHETIC" : 5278,
      "annotationType" : null,
      "newInstanceCallerCache" : null,
      "reflectionData" : null,
      "classValueMap" : { },
      "serialPersistentFields" : [ null, null, null ],
      "serialVersionUID" : 18766,
      "ANNOTATION" : 3482,
      "enumConstants" : [ null ],
      "name" : "some name",
      "reflectionFactory" : null,
      "allPermDomain" : null
    },
    "reflectionData" : {
      "next" : null,
      "discovered" : null,
      "referent" : null,
      "pending" : null,
      "lock" : null,
      "clock" : 1663,
      "queue" : null,
      "timestamp" : 29342
    },
    "classValueMap" : { },
    "serialPersistentFields" : [ {
      "field" : null,
      "offset" : 20136,
      "signature" : "some signature",
      "unshared" : true,
      "name" : "some name",
      "type" : null
    } ],
    "serialVersionUID" : 7837,
    "ANNOTATION" : 12014,
    "enumConstants" : [ { }, { } ],
    "name" : "some name",
    "reflectionFactory" : {
      "inflationThreshold" : 192,
      "initted" : false,
      "soleInstance" : null,
      "reflectionFactoryAccessPerm" : null,
      "langReflectAccess" : null,
      "noInflation" : false
    },
    "allPermDomain" : {
      "staticPermissions" : false,
      "debug" : null,
      "hasAllPerm" : true,
      "codesource" : null,
      "permissions" : null,
      "classloader" : null,
      "principals" : [ null, null ],
      "key" : null
    }
  }
}
ResponseContent-Type

text/plain

Status: 200 - OK

This method returns execution results in format application/rdf+xml

Configure the Response Format

You can now manipulate the response format to any RDF format, as also defined here: http://docs.rdf4j.org/javadoc/2.3/org/eclipse/rdf4j/rio/RDFFormat.html

Example Formats:
  • application/rdf+xml

  • application/n-triples

  • application/x-turtle

  • application/trix

  • application/trig

In order to configure the response format, use an additional Accept header in your call.

Using an HTTP REST client, such as Postman, the call would look like this, according to the format you need to be returned:

23901367.png