RDF Loader

Description

RDF Loader (uv-t-rdfHttpLoader):

RDF HTTP Loader executes update queries in and load the input RDF data to a remote SPARQL endpoint via HTTP based on SPARQL 1.1 Update Protocol and SPARQL 1.1 Graph Store HTTP Protocol. It is compatible with any SPARQL endpoint supporting the aforementioned protocols.

Configuration Parameters

NameDescriptionExample
HostResolvable host name or IP address of the target remote SPARQL endpoint (excluding protocol prefix such as "http://")test.poolparty.biz
PortPort number of the SPARQL endpoint8080
SPARQL EndpointThe context path of the SPARQL endpoint relative to base URL/sparql
Basic AuthenticationHTTP Basic Authentication for the SPARQL endpointtrue
UsernameAccount name of the user granted access to the SPARQL endpointdba
PasswordPassword of the user***
Input Data TypeType of input data for this DPU, see the following chapters for more informationRDF | File | SPARQL Update
RDF File FormatSerialization format of the RDF data when input as fileTurtle
Specify Target GraphEnable the input to specify the loading destination of RDF datatrue
Graph URIURI of the target RDF graphhttp://example.org
Overwrite Existing DataDecide if new data overwrites or appends to existing datatrue
SPARQL UpdateSPARQL Update query to be executed on the SPARQL endpointDELETE WHERE {?s ?p ?o}
Validate Update QueryVerify if the syntax of SPARQL update query is correcttrue

Inputs and Outputs

NameTypeData UnitDescriptionMandatory
rdfInputinputRDFDataUnitRDF data in RDF objects or RDF data structure(error)
fileInputinputFilesDataUnitRDF data serialized to a standard RDF serialization file format(error)

Notes

Input Data Type

RDF HTTP Loader deals with three types of input data:

  • RDF: selected when RDF data comes from rdfInput wrapped in RDFDataUnit as Java Objects. In such case the input data is serialized into N-Triples, inserted into the body of a SPARQL Update query, and loaded to the target SPARQL endpoint through update query. This approach can be used for small datasets if N-Triples serialization is less than 10 MB.
  • File: selected when RDF data comes from fileInput wrapped in FilesDataUnit as files based on any standard RDF serialization format. In this case the input data is uploaded to the SPARQL endpoint as files in post body. In the meanwhile RDF File Format must be specified properly to set the appropriate content type header in the request. This approach is recommended for large datasets. Note that the SPARQL endpoint for file uploading of nearly every RDF database is different. So it is necessary to adjust the path of SPARQL endpoint accordingly.
  • SPARQL Update: selected when input data is provided manually in the update query instead of the connected DPU or any update and management task needs to be executed on the SPARQL endpoint. Based on the selection of input data type, the corresponding input data source is used to retrieve data. An error will be thrown when the input data type and input data source do not match.

SPARQL Endpoint

The service path of SPARQL endpoint differs according to the RDF database. The following table summarizes the paths for common RDF databases.

DatabasePath VariableService Path for SPARQL UpdateService Path for SPARQL Graph Store HTTP Protocol
RDF4J$REPOSITORY: RDF4J repository name/$REPOSITORY/statements/$REPOSITORY_ID/rdf-graphs/service
Stardog$DATABASE: Stardog database name/$DATABASE/update/$DATABASE
MarkLogicNone. "repository" is decided by port number/v1/graphs/sparql/v1/graphs
Allegrograph$REPOSITORY: RDF4J repository name/repositories/$REPOSITORYNot supported
GraphDB$REPOSITORY: GraphDB repository name/$REPOSITORY/statements/$REPOSITORY_ID/rdf-graphs/service

Graph URI

URI of the target RDF graph on the SPARQL endpoint can be specified to describe the destination of the RDF data to be loaded into. The default graph is used if no graph URI is specified by the user. In the case that input data type is a SPARQL update query, this option is disabled because graph operations should be specified in the update query.

Overwrite Existing Data

When files are used as input to be loaded to the remote SPARQL endpoint, one can specify if the new data is inserted into the existing target graph directly or after clearing the target graph. This operation is defined in SPARQL Graph Store HTTP Protocol by using HTTP operation POST or PUT. For SPARQL endpoints not conforming to this protocol strictly, inserting data to a non-existing target graph with overwritten option by HTTP PUT may do nothing.

Examples

Write RDF Data Produced by another DPU to an RDF4J Server

The following image shows a fragment of a pipeline which retrieves data from a SQL database, transforms relational data to RDF data, and loads RDF data to a remote SPARQL endpoint. The DPU receives RDF data from its "rdfInput" channel so the "Input Data Type" is set to "RDF". The target is an RDF4J repository named "test", so the correct path for the SPARQL update endpoint is "/test/statements". Data is loaded to a named graph with URI <http://example.org>. The DPU configuration is also illustrated in the image.


Write a Downloaded RDF file to an RDF4J Server

The following image shows a fragment of a pipeline which downloads an RDF Turtle file from file system and loads it to a remote SPARQL endpoint. The DPU receives file data from its "fileInput" channel so the "Input Data Type" is set to "File". "RDF File Format" is set to "Turtle" because of the format of the input file. The target is an RDF4J repository named "test", so the correct path for the SPARQL Graph Store HTTP endpoint is "/test/rdf-graphs/service". File is loaded to a named graph with URI <http://example.org> without removing existing data. The DPU configuration is also illustrated in the image.

Delete an Existing Graph from an RDF4J Server

The following image shows a fragment of a pipeline which runs a SPARQL update query in a SPARQL endpoint. The DPU does not need any input since the update query is provided in the DPU configuration. The "Input Data Type" is set to "SPARQL Update". "SPARQL Endpoint" is same as accepting RDF data. "SPARQL Update" is a query to drop a graph from the SPARQL endpoint with the query validated for syntax correctnessThe DPU configuration is also illustrated in the image.