Skip to main content

Special Extractor Functionalities

Abstract

Special Extractor Functionalities

Regular Expression Matching

The PPX extractor can match user defined regular expressions in text. The regular expressions are defined in a file on the PoolParty server (see PoolParty Directory Structure Linux). The regular expressions are defied using the Java syntax (see http://docs.oracle.com/javase/9/docs/api/java/util/regex/Pattern.html).

Examples

Regular expression:

\b(B|BA|BL|BM|BN|BR|BZ|DL|DO|E|EF|EU|FB|FE|FF|FK|FR|G|GB|GD|GF|GM|GR|GS|GU|HA|HB|HE|HL|HO|I|IL|IM|JE|JO|JU|K|KB|KF|KI|KL|KO|KR|KS|KU|L|LA|LB|LE|LF|LI|LL|LN|LZ|MA|MD|ME|MI|MU|MZ|ND|NK|OP|OW|P|PE|PL|RA|RE|RI|RO|S|SB|SD|SE|SL|SP|SR|SV|SW|SZ|TA|TU|UU|VB|VI|VK|VL|VO|W|WB|WE|WL|WN|WO|WT|WU|WY|WZ|ZE|ZT)[- -]?\d[\dA-Z]{2,4}[A-Z]\b

This regular expression matches Austrian license plates.

Input text:

KI-42KB for a Kfz characteristic of the district Kirchdorf to the Krems

permissible combinations are: KI-1AAA, KI-10AA, KI-100A; KI-10ZZZ, KI-100ZZ, KI-1000Z.

Permissible combinationsfor Vienna are: W-10AAA, W-100AA, W-1000A, W-10000A; W-10ZZZZ; W-100ZZZ; W-1000ZZ, W-10000Z.

Some authorities (particularly in the larger cities) reserve certain letter combinations for vehicles with special use in the context of this system:

BB federal bus (public Kraftfahrlinien) (W-4333BB) and Federal Railroads (those parts, which do not notice sovereign tasks)

Funeral (W-1256BE)

EW power station (W-8322EW)

L.G. fire-brigade (AM-23FW)

GE municipality-own vehicles (SW-10GE)

GT commercial goods transport (W-1234GT)

GW gas works (W-4136GW)

KT commercial small transportation (W-3614KT)

LO urban line motorbuses (W-3982LO)

mA vehicle of municipal authorities (W-2412MA)

MW rented car (P-673MW)

RD emergency service (outpatient clinic) (ME-100RD)

RK red cross (WN-19RK)

TX Taxi (SW-45TX)

VB vehicle of the urban transporting enterprises (W-7261VB)

API call:

http://test.semantic-web.at/extractor/api/extract?projectId=1DAAFECD-CF6D-0001-C2FE-3F8F14A111E5&language=de&text=KI-42KB%20for%20a%20Kfz%20characteristic%20of%20the%20district%20Kirchdorf%20to%20the%20Krems%20permissible%20combinations%20are:%20KI-1AAA,%20KI-10AA,%20KI-100A;%20KI-10ZZZ,%20KI-100ZZ,%20KI-1000Z.%20Permissible%20combinationsfor%20Vienna%20are:%20W-10AAA,%20W-100AA,%20W-1000A,%20W-10000A;%20W-10ZZZZ;%20W-100ZZZ;%20W-1000ZZ,%20W-10000Z.%20Some%20authorities%20%28particularly%20in%20the%20larger%20cities%29%20reserve%20certain%20letter%20combinations%20for%20vehicles%20with%20special%20use%20in%20the%20context%20of%20this%20system:%20BB%20federal%20bus%20%28public%20Kraftfahrlinien%29%20%28W-4333BB%29%20and%20Federal%20Railroads%20%28those%20parts,%20which%20do%20not%20notice%20sovereign%20tasks%29%20Funeral%20%28W-1256BE%29%20EW%20power%20station%20%28W-8322EW%29%20L.G.%20fire-brigade%20%28AM-23FW%29%20GE%20municipality-own%20vehicles%20%28SW-10GE%29%20GT%20commercial%20goods%20transport%20%28W-1234GT%29%20GW%20gas%20works%20%28W-4136GW%29%20KT%20commercial%20small%20transportation%20%28W-3614KT%29%20LO%20urban%20line%20motorbuses%20%28W-3982LO%29%20mA%20vehicle%20of%20municipal%20authorities%20%28W-2412MA%29%20MW%20rented%20car%20%28P-673MW%29%20RD%20emergency%20service%20%28outpatient%20clinic%29%20%28ME-100RD%29%20RK%20red%20cross%20%28WN-19RK%29%20TX%20Taxi%20%28SW-45TX%29%20VB%20vehicle%20of%20the%20urban%20transporting%20enterprises%20%28W-7261VB%29&regexFilename=KFZ.txt&numberOfTerms=0

Result:

{
    "regexMatches": [
        {
            "regexMatches": [
                "KI-42KB",
                "KI-1AAA",
                "KI-10AA",
                "KI-100A",
                "KI-10ZZZ",
                "KI-100ZZ",
                "KI-1000Z",
                "W-10AAA",
                "W-100AA",
                "W-1000A",
                "W-10000A",
                "W-10ZZZZ",
                "W-100ZZZ",
                "W-1000ZZ",
                "W-10000Z",
                "W-4333BB",
                "W-1256BE",
                "W-8322EW",
                "SW-10GE",
                "W-1234GT",
                "W-4136GW",
                "W-3614KT",
                "W-3982LO",
                "W-2412MA",
                "P-673MW",
                "ME-100RD",
                "WN-19RK",
                "SW-45TX",
                "W-7261VB"
            ],
            "regexPattern": "\\b(B|BA|BL|BM|BN|BR|BZ|DL|DO|E|EF|EU|FB|FE|FF|FK|FR|G|GB|GD|GF|GM|GR|GS|GU|HA|HB|HE|HL|HO|I|IL|IM|JE|JO|JU|K|KB|KF|KI|KL|KO|KR|KS|KU|L|LA|LB|LE|LF|LI|LL|LN|LZ|MA|MD|ME|MI|MU|MZ|ND|NK|OP|OW|P|PE|PL|RA|RE|RI|RO|S|SB|SD|SE|SL|SP|SR|SV|SW|SZ|TA|TU|UU|VB|VI|VK|VL|VO|W|WB|WE|WL|WN|WO|WT|WU|WY|WZ|ZE|ZT)[- -]?\\d[\\dA-Z]{2,4}[A-Z]\\b"
        }
    ]
}
Person Name Detection

This functionality extracts person names from the input text.

Examples

Input text:

Brigitte Helm is one of a unique group of iconic actresses; like Greta Garbo, Marlene Dietrich, and Louise Brooks, her face and image are recognized across generations, and in most corners of the world.

API call:

http://test.semantic-web.at/extractor/api/extract?projectId=1DAAFB0B-F69F-0001-55C2-62A0481C1075&language=en&numberOfTerms=0&numberOfConcepts=0&findPersonNames=true&text=Brigitte%20Helm%20is%20one%20of%20a%20unique%20group%20of%20iconic%20actresses;%20like%20Greta%20Garbo,%20Marlene%20Dietrich,%20and%20Louise%20Brooks,%20her%20face%20and%20image%20are%20recognized%20across%20generations,%20and%20in%20most%20corners%20of%20the%20world.

Result:

{
    "personNames": [
        "Louise Brooks",
        "Greta Garbo",
        "Marlene Dietrich",
        "Brigitte Helm"
    ]
}
Location Extraction

The location extraction functionality lets you find mayor geographical locations in the input text.

Examples

API call

http://test.semantic-web.at/extractor/api/extract?projectId=1DAB0E22-3005-0001-6A68-3EB01EB220C0&language=en&text=Germany&locationExtraction=true

Result:

{
    "locations": [
        {
            "latitude": 51.5,
            "longitude": 10.5,
            "uri": "http://sws.geonames.org/2921044/",
            "score": 0,
            "matchedLabel": "Germany",
            "countryCode": "http://reegle.info/countries/DE",
            "name": "Federal Republic of Germany",
            "type": "Country"
        }
    ],
}