Special Extractor Functionalities
Special Extractor Functionalities
The PPX extractor can match user defined regular expressions in text. The regular expressions are defined in a file on the PoolParty server (see PoolParty Directory Structure Linux). The regular expressions are defied using the Java syntax (see http://docs.oracle.com/javase/9/docs/api/java/util/regex/Pattern.html).
Regular expression:
\b(B|BA|BL|BM|BN|BR|BZ|DL|DO|E|EF|EU|FB|FE|FF|FK|FR|G|GB|GD|GF|GM|GR|GS|GU|HA|HB|HE|HL|HO|I|IL|IM|JE|JO|JU|K|KB|KF|KI|KL|KO|KR|KS|KU|L|LA|LB|LE|LF|LI|LL|LN|LZ|MA|MD|ME|MI|MU|MZ|ND|NK|OP|OW|P|PE|PL|RA|RE|RI|RO|S|SB|SD|SE|SL|SP|SR|SV|SW|SZ|TA|TU|UU|VB|VI|VK|VL|VO|W|WB|WE|WL|WN|WO|WT|WU|WY|WZ|ZE|ZT)[- -]?\d[\dA-Z]{2,4}[A-Z]\b
This regular expression matches Austrian license plates.
Input text:
KI-42KB for a Kfz characteristic of the district Kirchdorf to the Krems
permissible combinations are: KI-1AAA, KI-10AA, KI-100A; KI-10ZZZ, KI-100ZZ, KI-1000Z.
Permissible combinationsfor Vienna are: W-10AAA, W-100AA, W-1000A, W-10000A; W-10ZZZZ; W-100ZZZ; W-1000ZZ, W-10000Z.
Some authorities (particularly in the larger cities) reserve certain letter combinations for vehicles with special use in the context of this system:
BB federal bus (public Kraftfahrlinien) (W-4333BB) and Federal Railroads (those parts, which do not notice sovereign tasks)
Funeral (W-1256BE)
EW power station (W-8322EW)
L.G. fire-brigade (AM-23FW)
GE municipality-own vehicles (SW-10GE)
GT commercial goods transport (W-1234GT)
GW gas works (W-4136GW)
KT commercial small transportation (W-3614KT)
LO urban line motorbuses (W-3982LO)
mA vehicle of municipal authorities (W-2412MA)
MW rented car (P-673MW)
RD emergency service (outpatient clinic) (ME-100RD)
RK red cross (WN-19RK)
TX Taxi (SW-45TX)
VB vehicle of the urban transporting enterprises (W-7261VB)
API call:
http://test.semantic-web.at/extractor/api/extract?projectId=1DAAFECD-CF6D-0001-C2FE-3F8F14A111E5&language=de&text=KI-42KB%20for%20a%20Kfz%20characteristic%20of%20the%20district%20Kirchdorf%20to%20the%20Krems%20permissible%20combinations%20are:%20KI-1AAA,%20KI-10AA,%20KI-100A;%20KI-10ZZZ,%20KI-100ZZ,%20KI-1000Z.%20Permissible%20combinationsfor%20Vienna%20are:%20W-10AAA,%20W-100AA,%20W-1000A,%20W-10000A;%20W-10ZZZZ;%20W-100ZZZ;%20W-1000ZZ,%20W-10000Z.%20Some%20authorities%20%28particularly%20in%20the%20larger%20cities%29%20reserve%20certain%20letter%20combinations%20for%20vehicles%20with%20special%20use%20in%20the%20context%20of%20this%20system:%20BB%20federal%20bus%20%28public%20Kraftfahrlinien%29%20%28W-4333BB%29%20and%20Federal%20Railroads%20%28those%20parts,%20which%20do%20not%20notice%20sovereign%20tasks%29%20Funeral%20%28W-1256BE%29%20EW%20power%20station%20%28W-8322EW%29%20L.G.%20fire-brigade%20%28AM-23FW%29%20GE%20municipality-own%20vehicles%20%28SW-10GE%29%20GT%20commercial%20goods%20transport%20%28W-1234GT%29%20GW%20gas%20works%20%28W-4136GW%29%20KT%20commercial%20small%20transportation%20%28W-3614KT%29%20LO%20urban%20line%20motorbuses%20%28W-3982LO%29%20mA%20vehicle%20of%20municipal%20authorities%20%28W-2412MA%29%20MW%20rented%20car%20%28P-673MW%29%20RD%20emergency%20service%20%28outpatient%20clinic%29%20%28ME-100RD%29%20RK%20red%20cross%20%28WN-19RK%29%20TX%20Taxi%20%28SW-45TX%29%20VB%20vehicle%20of%20the%20urban%20transporting%20enterprises%20%28W-7261VB%29®exFilename=KFZ.txt&numberOfTerms=0
Result:
{ "regexMatches": [ { "regexMatches": [ "KI-42KB", "KI-1AAA", "KI-10AA", "KI-100A", "KI-10ZZZ", "KI-100ZZ", "KI-1000Z", "W-10AAA", "W-100AA", "W-1000A", "W-10000A", "W-10ZZZZ", "W-100ZZZ", "W-1000ZZ", "W-10000Z", "W-4333BB", "W-1256BE", "W-8322EW", "SW-10GE", "W-1234GT", "W-4136GW", "W-3614KT", "W-3982LO", "W-2412MA", "P-673MW", "ME-100RD", "WN-19RK", "SW-45TX", "W-7261VB" ], "regexPattern": "\\b(B|BA|BL|BM|BN|BR|BZ|DL|DO|E|EF|EU|FB|FE|FF|FK|FR|G|GB|GD|GF|GM|GR|GS|GU|HA|HB|HE|HL|HO|I|IL|IM|JE|JO|JU|K|KB|KF|KI|KL|KO|KR|KS|KU|L|LA|LB|LE|LF|LI|LL|LN|LZ|MA|MD|ME|MI|MU|MZ|ND|NK|OP|OW|P|PE|PL|RA|RE|RI|RO|S|SB|SD|SE|SL|SP|SR|SV|SW|SZ|TA|TU|UU|VB|VI|VK|VL|VO|W|WB|WE|WL|WN|WO|WT|WU|WY|WZ|ZE|ZT)[- -]?\\d[\\dA-Z]{2,4}[A-Z]\\b" } ] }
This functionality extracts person names from the input text.
Input text:
Brigitte Helm is one of a unique group of iconic actresses; like Greta Garbo, Marlene Dietrich, and Louise Brooks, her face and image are recognized across generations, and in most corners of the world.
API call:
http://test.semantic-web.at/extractor/api/extract?projectId=1DAAFB0B-F69F-0001-55C2-62A0481C1075&language=en&numberOfTerms=0&numberOfConcepts=0&findPersonNames=true&text=Brigitte%20Helm%20is%20one%20of%20a%20unique%20group%20of%20iconic%20actresses;%20like%20Greta%20Garbo,%20Marlene%20Dietrich,%20and%20Louise%20Brooks,%20her%20face%20and%20image%20are%20recognized%20across%20generations,%20and%20in%20most%20corners%20of%20the%20world.
Result:
{ "personNames": [ "Louise Brooks", "Greta Garbo", "Marlene Dietrich", "Brigitte Helm" ] }
The location extraction functionality lets you find mayor geographical locations in the input text.
API call
http://test.semantic-web.at/extractor/api/extract?projectId=1DAB0E22-3005-0001-6A68-3EB01EB220C0&language=en&text=Germany&locationExtraction=true
Result:
{ "locations": [ { "latitude": 51.5, "longitude": 10.5, "uri": "http://sws.geonames.org/2921044/", "score": 0, "matchedLabel": "Germany", "countryCode": "http://reegle.info/countries/DE", "name": "Federal Republic of Germany", "type": "Country" } ], }