Working with the RDF Data Unit
Working with the RDF Data Unit
This section contains a short guide on how RDF entries (graphs/triples) may be obtained from or written to input RDF data units.
For basic information about data units, please see Working with the RDF Data Unit.
Reading RDF Graphs From Input RDF Data Unit
Please prepare DPU 'MyDpu' as described in Working with the RDF Data Unit. To read RDF graphs from input RDF data unit, one has to define input RDF data unit.
Code 1 - defining input data unit
@DataUnit.AsInput(name = "input") public RDFDataUnit input;
All data units must be public with proper annotation: they must at least contain a name, which will be the name visible in UnifiedViews administration interface for pipeline developers. The code above goes to the main DPU class.
In order to work with input RDF data unit, you typically use the RDFHelper class (eu.unifiedviews.helpers.dataunit.rdf.RDFHelper
in uv-dataunit-helpers).
You can also use the data unit API directly, but this is typically not needed (details for that can be found Using API Classes Instead of RDFhelper).
RDFHelper class provides methods to get list of graphs/URIs the input RDF data unit contains or to operate with them, we introduce only a couple of them, the remaining ones are in eu.unifiedviews.helpers.dataunit.rdf.RDFHelper
:
static Set<RDFDataUnit.Entry> getGraphs(RDFDataUnit rdfDataUnit)throws DataUnitException.
This method returns a set of entries (RDF graphs) in the given
rdfDataUnit
.
static Set<URI> getGraphsURISet(RDFDataUnit rdfDataUnit) throws DataUnitException
.Similar as the one above, but it returns set of URIs of the RDF graphs directly.
static org.openrdf.query.Dataset getDatasetWithDefaultGraphs(RDFDataUnit rdfDataUnit)throws DataUnitException.
This method directly prepares
Dataset
(see OpenRDF API) with default graphs set to be equal to the set of graphs withinrdfDataUnit
. This approach is useful for further querying of the data.
Code 2 shows how the method for getting graphs can be used. The code below goes to innerExecute()
method of the DPU).
Line 2 returns a set of entries in the form of URIs.
Code 2 - Iterating over input RDF graphs using RDFHelper
try { Set<URI> rdfGraphs = RDFHelper.getGraphsURISet(input); } catch (DataUnitException ex) { throw ContextUtils.dpuException(ctx, ex, "dpuName.error"); }
By having the list of graphs, we may then work with the RDF graphs using the standard approach of OpenRDF API.
For example, when we want to query the input RDF data unit (query all the triples in the input data unit), we may use the approach described below (Querying the RDF Data Unit), which works in this case with connection to the input data unit.
There is also method (depicted in Code 3), which allows us to prepare Dataset
object, which we may then use during querying to particularize the dataset (set of graphs) the query is operated on. See the approach described below in Code 7.
Code 3 - Getting dataset
try { org.openrdf.query.Dataset dataset = RDFHelper.getDatasetWithDefaultGraphs(input); } catch (DataUnitException ex) { throw ContextUtils.dpuException(ctx, ex, "dpuName.error"); }
Writing RDF Entries to Output RDF Data Unit
Please prepare DPU 'MyDpu' as described in Working with the RDF Data Unit. To write RDF entries (graphs/triples) to output RDF data unit, one has to define output RDF data unit.
Code 4 - defining output data unit
@DataUnit.AsOutput(name = "output") public WritableRDFDataUnit output;
All data units must be public with proper annotation: they must at least contain a name, which will be the name visible in UnifiedViews administration interface for pipeline developers. The code above goes to the Main DPU class.
In order to work with output RDF data unit (create graphs in the output data unit), you typically use RDFHelper class (eu.unifiedviews.helpers.dataunit.rdf.RDFHelper
in uv-dataunit-helpers). You can also use the data unit API directly, but this is typically not needed.
Note
RDFHelper class does not support directly adding triples to the output RDF data unit or querying data units. To add triples/query data units, one has to use standard OpenRDF API methods (Simple examples are depicted in section Code 6, Code 7.) or a special UnifiedViews extension SimpleRDF, described below.
There are these basic methods DPU developers may use to add graphs to the output rdf data unit:
public static RDFDataUnit.Entry createGraph(WritableRDFDataUnit rdfDataUnit, final String graphName) throws DataUnitException
This method creates a new entry in the output RDF data unit
rdfDataUnit
under the givengraphName
. ThegraphName
must be unique in the context of the data unit, becausegraphName
is used assymbolicName
of the entry. It also associates the newly created entry with the (generated) data graph URI. Such data graph URI may be used by the DPU developer to store RDF triples to (the data graph URI of the returned entryentry
may be obtained by callingasGraph(entry))
. For explanation ofsymbolicNames
and other metadata of entries in data units, please see Basic Concepts for DPU Developers . The metadata generated is stored in the working RDF store of UnifiedViews.
public static RDFDataUnit.Entry addGraph(WritableRDFDataUnit rdfDataUnit, URI graphURI) throws DataUnitException
This method adds existing data graph URI
graphURI
to therdfDataUnit
data unit. ThegraphURI
must be unique in the context of the data unit. It automatically creates new entry in therdfDataUnit
data unit with thesymbolicName
being equal to thegraphURI
. For explanation ofsymbolicNames
and other metadata of entries in data units, please see Basic Concepts for DPU Developers . In this case,graphURI
(the URI of the data graph where triples are stored) is specified by the DPU developer, who has to ensure that the name (URI) of such graph does not collide with the names of the graphs used by other entries/data units/DPUs. The URI of the such graph can be based on the prefix obtained by callingrdfDataUnit.getBaseDataGraphURI()
to ensure that the name of the graph does not collide with names of graphs in other data units/DPUs; however, the DPU developer must still ensure that the name of the data graph does not collide with names of other entries within the same data unit.
public static RDFDataUnit.Entry addGraph(WritableRDFDataUnit rdfDataUnit, URI graphURI, final String graphName)
The same as above, but the
symbolicName
for the created entry is explicitly specified asgraphName
.
Note
If you would like to reuse the data graph from input (RDF) data unit also in the output data unit, please check Working with the RDF Data Unit first.
As the methods above return RDFDataUnit.Entry
as a result, you may also use public static URI asGraph(RDFDataUnit.Entry entry)
method to convert returned entry to the URI of the data graph of that entry.
Code 5a shows how new graph can be created and added to the output RDF data unit.
Code 5a - Creating new data graph in the output data unit
RDFDataUnit.Entry createGraph = RDFHelper.createGraph(output, "http://output/graph"); URI outputURI = createGraph.getDataGraphURI();
The sample fragment in Code 5b shows how the developer can add existing graph to the output RDF data unit.
Code 5b - adding graph
URI graphURI = ... RDFHelper.addGraph(output, graphURI);
Code 6 shows how to add triples to the output RDF data unit (output
is the output RDF data unit).
Code 6 - Adding triples to output data unit
RepositoryConnection connection = null; try { connection = output.getConnection(); ValueFactory factory = connection.getValueFactory(); final URI subject = factory.createURI("http://data.example.org/resource/mySubject"); final URI predicate = factory.createURI("http://data.example.org/ontology/myPred"); final Literal object = factory.createLiteral("xxxx"); Statement s = factory.createStatement(subject, predicate, object); connection.add(s, outputURI); } catch (DataUnitException ex) { throw ContextUtils.dpuException(ctx, ex, "dpuName.error.addition"); } catch (RepositoryException ex) { throw ContextUtils.dpuException(ctx, ex, "dpuName.error.repository"); } finally { if (connection != null) { try { connection.close(); } catch (RepositoryException ex) { log.error("Error on close.", ex); } } }
In Line 3, we use the output data unit output
and obtain connection to the RDF repository. Lines 5-8 constructs a statement. In Line 9, we use the name of the graph, which we created before using RDFHelper, to which the data (one statement) should be written.
Note
You may remove statements by calling con.remove(subject, predicate, object)
;
try (RepositoryConnection connection = output.getConnection()) { ValueFactory factory = connection.getValueFactory(); final IRI subject = factory.createIRI("http://data.example.org/resource/subject"); final IRI predicate = factory.createIRI("http://data.example.org/ontology/predicate"); final Literal object = factory.createLiteral("value"); Statement s = factory.createStatement(subject, predicate, object); connection.add(s, outputURI); } catch (DataUnitException ex) { throw ContextUtils.dpuException(ctx, ex, "dpuName.error.addition"); } catch (RepositoryException ex) { throw ContextUtils.dpuException(ctx, ex, "dpuName.error.repository"); }
To query input/output RDF data unit, one obtains the connection as described in Code 6
and then use the approach described here.
It is also possible to call connection.getStatements(subject, null, null, false
)
method to get all statements from the repository having the given subject
as subject of the triple. See here for other options how to work with RepositoryConnection.
You may also use Dataset
class to restrict the graphs on top of which the query is executed, for example as depicted in Code 7:
Code 7
... String query = ...; final Update update = connection.prepareUpdate(QueryLanguage.SPARQL, query); Dataset dataset = new DatasetImpl(); dataset.addDefaultGraph(entry.getDataGraphURI()); dataset.addDefaultRemoveGraph(targetGraph); dataset.setDefaultInsertGraph(targetGraph); update.setDataset(dataset); update.execute(); ...
Apart from the RDFHelper, there is also an extension WritableSimpleRDF, which is not a data unit helper, but a DPU extension. Such an extension may be used to write triples into output data unit.
The advantage of such an extension is that:
it has methods for adding triples to the output data unit directly
the methods for creating new RDF entries, adding existing RDF entries are a bit simpler, as they do not specify data unit as the parameter. The
WritableSimpleFiles
is bound at the beginning to certain data unit based on the initialization of the extension.it automatically uses FaultTolerant extension, if it is allowed for the DPU. So if you prepare your DPUs fault tolerant, you should consider using
WritableSimpleRDF
extension, as it hides fault tolerance calls smoothly.
For details about WritableSimpleRDF extension, please see Working with the RDF Data Unit.
Advanced Topics
It is possible to use EntityBuilder
helper to construct statements about certain subject.
It is possible to use SparqlUtils
helper class to construct and execute SPARQL queries. This is an alternative to pure OpenRDF approach.
Notes:
Imports needed for Code 6:
import org.openrdf.model.Statement; import org.openrdf.model.URI; import org.openrdf.query.GraphQuery; import org.openrdf.query.GraphQueryResult; import org.openrdf.query.QueryLanguage; import org.openrdf.repository.RepositoryConnection; import eu.unifiedviews.dataunit.DataUnit; import eu.unifiedviews.dataunit.rdf.RDFDataUnit; import eu.unifiedviews.dataunit.rdf.WritableRDFDataUnit; import eu.unifiedviews.dataunit.DataUnitException; import org.openrdf.repository.RepositoryException; import eu.unifiedviews.helpers.dataunit.rdf.RDFHelper; import org.openrdf.model.ValueFactory; import org.openrdf.model.Literal;
Using API Classes Instead of RDFhelper
Using API Classes Instead of RDFhelper
This section contains a short guide on how to use the API classes of RDFHelper.
Note
You should use RDFHelper to access the RDF data unit if possible, see: Working with the RDF Data Unit
Read data to get a list of graphs within the data unit.
Further, lets start by showing how as a DPU developer you may iterate over the input data unit in order to get access to RDF graphs which comes over input RDF data unit. The code below goes to innerExecute()
method of the DPU.
Code 1 - Iterating over input RDF graphs using API classes
Set<URI> rdfGraphs = new HashSet<>(); FilesDataUnit.Iteration it = null; try { it = input.getIteration(); while (it.hasNext()) { final URI dataGraphURI = it.next().getDataGraphURI(); rdfGraphs.add(dataGraphURI); } } catch (DataUnitException ex) { throw ContextUtils.dpuException(ctx, ex, "dpuName.error"); } finally { if (it != null) { try { it.close(); } catch (DataUnitException ex) { log.error("Error on close.", ex); } } }
In Lines 5 - 8, we iterate over the entries (RDF graphs) in the input RDF data unit.
In Line 6, we got RDF data graph URI: the URI of the RDF graph in the working RDF store.
We then on Line 7 add this to the set of such URIs.
As the iterator over entries does not extend
AutoClosable
we need to take care about it’s closing at the end (Line 14). That’s why we do all the work intry-catch
block (Lines 3 - 11 ) withfinally
statement (Lines 11 - 19). Also we catchDataUnitException
which may be thrown by the iterator in Lines 9 -11.
So after executing Code 1, we have a set of RDF graphs in which the input RDF data (triples) reside in the variable rdfGraphs.
We may then work with the RDF graphs using standard approach of OpenRDF API.
The code introduced in Code1 can be simplified by using helper - eu.unifiedviews.helpers.dataunit.rdf.RDFHelper
in uv-dataunit-helpers. See Working with the RDF Data Unit.
In this case, DPU developer does not need to manually handle iteration over input RDF graphs.
In general, as a DPU developer you should use helpers, if possible.
There are two methods as DPU developers you may use to add graphs to output RDF data unit:
URI addNewDataGraph(String symbolicName) throws DataUnitException;
This method creates a new entry in the output RDF data unit under the given
symbolicName
. ThesymbolicName
must be unique in the context of the data unit. It also associates the newly created entry with the (generated) data graph URI, which is returned by this method. Such data graph URI may be used by the DPU developer to store RDF triples to.Note
For an explanation of symbolicNames and other metadata of entries in data units, please see Basic Concepts for DPU Developers . The metadata generated is stored in the working RDF store of UnifiedViews.
void addExistingDataGraph(String symbolicName, URI existingDataGraphURI) throws DataUnitException;
This method adds an existing data graph URI to the
output
data unit under the givensymbolicName
. ThesymbolicName
must be unique in the context of the data unit. It automatically creates new entry in theoutput
data unit with the givensymbolicName
. For explanation of symbolicNames and other metadata of entries in data units, please see Basic Concepts for DPU Developers.In this case, the name of the data graph (the URI of the data graph where triples are stored) is specified by the DPU developer, who has to ensure that the name (URI) of such graph does not collide with the names of the graphs used by other entries/data units/DPUs. The URI of such a graph can be based on the prefix obtained by calling getBaseDataGraphURI() to ensure that the name of the graph does not collide with names of graphs in other data units/DPUs.
In order to add existing RDF graph to the output data unit, the code below may be used in the innerExecute()
method of the DPU.
Code 4 - Creating RDF graph in the output data unit using API classes
String graphName = ... URI graphURI = ... Symbolic symbolicName = output.addExistingDataGraph(graphName, graphURI);
In Line 3, the new entry in the output data unit is created and for such entry metadata
symbolicName
is set to be equal tographName
andexistingDataGraphURI
is set tographURI
.
Note
API classes does not support directly adding triples to the output RDF data unit or querying data units. To add triples/query data units, one has to use standard OpenRDF API methods.