Skip to main content

Web Service Method: Extract from URL

Abstract

Web Service Method: Extract from URL

Description

[url] Extracts and returns meaningful metadata like concepts and terms from a given URL.

URL: /extractor/api/extract

Request

Supported Methods

POST

GET

Content-Type

application/x-www-form-urlencoded

HTTP Parameters

Parameter

Type

Required

Description

categorizationWithPpxBoost

boolean

false

Use Extractor boosting, default = false

categorize

boolean

false

Categorization extraction, default = false

conceptMinimumScore

Double

false

Minimum required score of concepts, default = 0

conceptSchemeFilters

Array of String

false

Concept scheme URI filters

corpusScoring

Array of String

false

Corpus term scoring. Enabled if corpusIds (UUID) are provided

customAttributeFilters

Array of CustomProperty

false

Custom attribute (property uri and string value) filters

customClassFilters

Array of String

false

Custom class URI filters

disambiguate

boolean

false

Use thesaurus based disambiguation, default = false

displayText

boolean

false

Include text extracted from url in response, default = false

documentClassifierIds

Array of String

false

Enable document classification by giving the document classifier IDs as input

documentId

String

false

Internal ID of the document

extraConceptLanguages

Array of PPLocale

false

Additional languages used for concept extraction (en|de|es|fr|...) Also supports wildcard * for all languages

extractorVersion

String

false

Version of PPX Extractor used

filterNestedConcepts

boolean

false

Remove concepts matches which are contained within other matches, default = false

findPersonNames

boolean

false

Deprecated (use nerParameters) - extracts person names from the given text

language

PPLocale

false

Extraction language (en|de|es|fr|...)

lemmatization

boolean

false

Use lemmatization, default = true

locationExtraction

boolean

false

Deprecated (use nerParameters) - extracts locations from the given text

nerParameters

Array of NERConfig

false

Array of models that are used for Named Entity Recognition

numberOfConcepts

Integer

false

Retrieve number of concepts, default = 25

numberOfTerms

Integer

false

Retrieve number of terms, default = 25

phraseLength

Integer

false

Phrase length, default = 4

projectId

Array of String

false

Thesaurus projectIds

properties

Array of String

false

Array of custom class attributes and relations that will be fetched by providing their property URIs as input. Set to all to fetch all properties.

regexFilename

String

false

File name for regex patterns

sentimentAnalysis

boolean

false

Sentiment analysis, default: false

shadowConceptCorpusId

Array of String

false

Shadow concepts calculation. Enabled if corpusIds (UUID) are provided

showMatchingDetails

boolean

false

Shows which concept labels where found inside the text, default = false

showMatchingPosition

boolean

false

Shows the position of the matched text. Only shown if showMatchingDetails = true. default = false

tfidfScoring

boolean

false

Use TFIDF scoring

title

String

false

Title of the document

url

String

true

URL to be extracted

useRelatedConcepts

boolean

false

Retrieve related concepts, default = false

useTransitiveBroaderConcepts

boolean

false

Retrieve transitive broader concepts, default = false

useTransitiveBroaderTopConcepts

boolean

false

Retrieve transitive broader top concepts, default = false

useTypes

boolean

false

Retrieve custom types for concepts, default = false

Custom Property

Attribute

Type

Required

Comment

property

String

false

Property

value

String

false

Value

PPLocale

A PPLocale object

Attribute

Type

Comment

ALL_LANGUAGES

PPLocale

DUTCH

PPLocale

ENGLISH

PPLocale

FRENCH

PPLocale

GERMAN

PPLocale

RUSSIAN

PPLocale

SPANISH

PPLocale

VALID

PPLocale

country

String

language

String

languageTag

String

NERConfig

Named Entity Recognition configuration

Attribute

Type

Required

Comment

classUri

String

false

Class URI given to identified Named Entities

method

Method

false

Method used for Named Entity Extraction. (default: MAXIMUM_ENTROPY) RULE_BASED | MAXIMUM_ENTROPY

type

String

false

Type of Named Entity Model. Pre-defined models for MAXIMUM_ENTROPY: person, organization, location

Example of a Named Entity Recognition Usage:

{

"classUri" : "some classUri" ,

"method" : "RULE_BASED" ,

"type" : "https://semantic-web.com/api/type#13359"

}

Example cURL Request
curl --location 'https://docu.semantic-web.at/extractor/api/extract' \
--header 'Authorization: Basic e3t1c2VybmFtZX19Ont7cGFzc3dvcmR9fQ==' \
--header 'Cookie: JSESSIONID=92B9D4DDDE6465C32A1F7B37B666D4B3' \
--form 'projectID="{{project_id}}"' \
--form 'url="https://en.wikipedia.org/wiki/Artificial_intelligence"' \
--form 'language="en"' \
--form 'showMatchingDetails="true"' \
--form 'displayText="true"'
ResponseReturns

Content-Type: application/json

Arrays of Response AttributesExtractionResponse

Results of an text extraction request. Properties with no entries are not present

Attribute

Type

Comment

categories

Array of Category

Categories of the document

classificationResults

Array of DocumentClassification

Document classification results

concepts

Array of ThesaurusConcept

Matched concepts

detectedLanguage

PPLocale

Detected Language of the document

extractedTerms

Array of ExtractedTerm

Extracted freeTerms

locations

Array of Location

Matched locations

namedEntities

Array of NamedEntityResponse

Named Entities

personNames

Array of String

Deprecated

regexMatches

Array of RegexMatches

Regex token matches

sentiments

Array of Sentiment

Matched sentiments

shadowConcepts

Array of ShadowConceptResponse

Shadow Concepts

text

String

Text as extracted from url or file

title

String

Title as extracted from url or file

Category

Categorization result

Attribute

Type

Comment

categoryConceptResults

Array of ConceptCategory

Categorized concepts

prefLabel

String

Preferred label

score

double

Score between 0.0-100.0

uri

String

Category URI

ConceptCategory

Categorized concept

Attribute

Type

Comment

prefLabel

String

Preferred label

score

double

Score from 0.0 to 100.0

uri

String

URI

DocumentClassification

A DocumentClassification object.

Attribute

Type

Comment

predictedLabel

String

predictedLabel

probabilities

Array of Double

Probabilities

uri

String

URI of the classifier

ThesaurusConcept

Concept from a PoolParty thesaurus project

Attribute

Type

Comment

altLabels

Map of PPLocale

Alternative labels

broaderConcepts

Array of String

URIs of all direct broader concepts

conceptSchemes

Array of ThesaurusConceptScheme

The concept schemes this concept resides

corporaScore

double

Relevance score - e.g. when extracted from a text.

customAttributes

Array of CustomAttribute

Custom attributes

customRelations

Array of CustomRelation

Custom relations

customSchemeTypes

Array of CustomSchemeType

URIs of the custom types assigned to the concept

frequencyInDocument

int

Frequency of the concept in the text

frequencyInDocuments

int

Frequency of the concept in the text

hiddenLabels

Map of PPLocale

Hidden labels

id

String

Concept id

languages

Array of PPLocale

Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept.

matchingLabels

Array of MatchingLabel

Matching labels

prefLabels

Map of PPLocale

Preferred label

project

String

UUID of the containing PoolParty project.

relatedConcepts

Array of String

URIs of all related concepts

score

double

Normalized relevance score - e.g. when extracted from a text.

transitiveBroaderConcepts

Array of String

URIs of all transitive broader concepts

transitiveBroaderTopConcepts

Array of String

URIs of all top concepts that this concept is connected to via a transitive broader-chain.

uri

String

Uniform resource identifier

wordForms

Array of String

Lemmatized word forms

ThesaurusConceptScheme

ConceptScheme from a PoolParty thesaurus project - acts as a container for concepts

Attribute

Type

Comment

title

String

The localized title of this concept scheme

uri

String

Uniform resource identifier

CustomAttribute

Custom attribute

Attribute

Type

Comment

literal

Literal

Literal

property

String

Property

CustomRelation

Custom Relation

Attribute

Type

Comment

object

String

Object

property

String

Property

CustomSchemeType

(PoolParty) concept scheme - acts as a container for concepts

Attribute

Type

Comment

title

String

The name of this custom scheme type

uri

String

Uniform resource identifier

ExtractedTerm

Phrase extracted from a text that does not match any concepts

Attribute

Type

Comment

corporaScore

Double

Corpora score

frequencyInDocument

int

Frequency within the document where it was extracted

frequencyInDocuments

int

Frequency within the documents where it was extracted

score

Double

Relevance score

textValue

String

The term phrase

Location

A geographical location extracted from a text.

Attribute

Type

Comment

countryCode

String

ISO 3166-1 alpha-2 country code

latitude

float

Latitude

longitude

float

Longitude

matchedLabel

String

The location label that was found in the text.

name

String

Common name of the location.

score

Double

Relevance score

type

LocationType

Location type - either city or country City | Country

uri

String

Uniform resource identifier of the location

NamedEntityResponse

Named Entity

Attribute

Type

Comment

frequency

int

Frequency in document

metadata

Map of String

Metadata

method

String

Method

positions

Array of SimpleTokenPosition

Position

score

double

Score

textValue

String

Matched text

type

String

Type

RegexMatches

Regex match

Attribute

Type

Comment

regexMatches

Array of String

Tokens from the input text that match the regex pattern

regexPattern

String

The original pattern used to match

Sentiment

Sentiment result

Attribute

Type

Comment

negativeTerms

Array of String

List of negative terms

positiveTerms

Array of String

List of positive terms

score

float

Score

sentiment

String

Sentiment

ShadowConceptResponse

Shadow Concept

Attribute

Type

Comment

altLabels

Map of PPLocale

Alternative labels

broaderConcepts

Array of String

URIs of all direct broader concepts

conceptSchemes

Array of ThesaurusConceptScheme

The concept schemes this concept resides

corporaScore

Double

Relevance score - e.g. when extracted from a text

customAttributes

Array of CustomAttribute

Custom attributes

customRelations

Array of CustomRelation

Custom relations

customSchemeTypes

Array of CustomSchemeType

URIs of the custom types assigned to the concept

hiddenLabels

Map of PPLocale

Hidden labels

id

String

Concept id

languages

Array of PPLocale

Language of the prefLabel, altLabels and hiddenLabels of this localized view of the concept

prefLabels

Map of PPLocale

Preferred label

project

String

UUID of the containing PoolParty project

relatedConcepts

Array of String

URIs of all related concepts

score

double

Normalized relevance score - e.g. when extracted from a text

shadowConceptTerms

Array of ShadowTerm

Extracted terms that contribute to calculation of the shadow concept

transitiveBroaderConcepts

Array of String

URIs of all transitive broader concepts

transitiveBroaderTopConcepts

Array of String

URIs of all top concepts that this concept is connected to via a transitive broader-chain

uri

String

Uniform resource identifier

ShadowTerm

Phrase extracted from a text that does not match any Concepts

Attribute

Type

Comment

score

double

Relevance score

textValue

String

The term phrase

200 Response Example