Working with the RDF Data Unit

Abstract

This section contains a short guide on how RDF entries (graphs/triples) may be obtained from or written to input RDF data units.

For basic information about data units, please see Working with the RDF Data Unit.

Reading RDF Graphs From Input RDF Data Unit

Please prepare DPU 'MyDpu' as described in Working with the RDF Data Unit. To read RDF graphs from input RDF data unit, one has to define input RDF data unit.

Code 1 - defining input data unit

 @DataUnit.AsInput(name = "input")
 public RDFDataUnit input;

All data units must be public with proper annotation: they must at least contain a name, which will be the name visible in UnifiedViews administration interface for pipeline developers. The code above goes to the main DPU class.

In order to work with input RDF data unit, you typically use the RDFHelper class (eu.unifiedviews.helpers.dataunit.rdf.RDFHelper in uv-dataunit-helpers).

You can also use the data unit API directly, but this is typically not needed (details for that can be found Using API Classes Instead of RDFhelper).

RDFHelper class provides methods to get list of graphs/URIs the input RDF data unit contains or to operate with them, we introduce only a couple of them, the remaining ones are in eu.unifiedviews.helpers.dataunit.rdf.RDFHelper:

static Set<RDFDataUnit.Entry> getGraphs(RDFDataUnit rdfDataUnit)throws DataUnitException.
- This method returns a set of entries (RDF graphs) in the given rdfDataUnit.
static Set<URI> getGraphsURISet(RDFDataUnit rdfDataUnit) throws DataUnitException.
- Similar as the one above, but it returns set of URIs of the RDF graphs directly.
static org.openrdf.query.Dataset getDatasetWithDefaultGraphs(RDFDataUnit rdfDataUnit)throws DataUnitException.
- This method directly prepares Dataset (see OpenRDF API) with default graphs set to be equal to the set of graphs within rdfDataUnit. This approach is useful for further querying of the data.

Code 2 shows how the method for getting graphs can be used. The code below goes to innerExecute() method of the DPU).

Line 2 returns a set of entries in the form of URIs.

Code 2 - Iterating over input RDF graphs using RDFHelper

try {
        Set<URI> rdfGraphs = RDFHelper.getGraphsURISet(input);
} catch (DataUnitException ex) {
   throw ContextUtils.dpuException(ctx, ex, "dpuName.error");
}

By having the list of graphs, we may then work with the RDF graphs using the standard approach of OpenRDF API.

For example, when we want to query the input RDF data unit (query all the triples in the input data unit), we may use the approach described below (Querying the RDF Data Unit), which works in this case with connection to the input data unit.

There is also method (depicted in Code 3), which allows us to prepare Dataset object, which we may then use during querying to particularize the dataset (set of graphs) the query is operated on. See the approach described below in Code 7.

Code 3 - Getting dataset

try {
        org.openrdf.query.Dataset dataset = RDFHelper.getDatasetWithDefaultGraphs(input);
} catch (DataUnitException ex) {
   throw ContextUtils.dpuException(ctx, ex, "dpuName.error");
}

Writing RDF Entries to Output RDF Data Unit

Please prepare DPU 'MyDpu' as described in Working with the RDF Data Unit. To write RDF entries (graphs/triples) to output RDF data unit, one has to define output RDF data unit.

Code 4 - defining output data unit

@DataUnit.AsOutput(name = "output")
public WritableRDFDataUnit output;

In order to work with output RDF data unit (create graphs in the output data unit), you typically use RDFHelper class (eu.unifiedviews.helpers.dataunit.rdf.RDFHelper in uv-dataunit-helpers). You can also use the data unit API directly, but this is typically not needed.

Note

RDFHelper class does not support directly adding triples to the output RDF data unit or querying data units. To add triples/query data units, one has to use standard OpenRDF API methods (Simple examples are depicted in section Code 6, Code 7.) or a special UnifiedViews extension SimpleRDF, described below.

Basic Methods to Add Graphs to the Output RDF Data Unit

There are these basic methods DPU developers may use to add graphs to the output rdf data unit:

public static RDFDataUnit.Entry createGraph(WritableRDFDataUnit rdfDataUnit, final String graphName) throws DataUnitException
- This method creates a new entry in the output RDF data unit rdfDataUnit under the given graphName. The graphName must be unique in the context of the data unit, because graphName is used as symbolicName of the entry. It also associates the newly created entry with the (generated) data graph URI. Such data graph URI may be used by the DPU developer to store RDF triples to (the data graph URI of the returned entry entry may be obtained by calling asGraph(entry)). For explanation of symbolicNames and other metadata of entries in data units, please see Basic Concepts for DPU Developers . The metadata generated is stored in the working RDF store of UnifiedViews.
public static RDFDataUnit.Entry addGraph(WritableRDFDataUnit rdfDataUnit, URI graphURI) throws DataUnitException
- This method adds existing data graph URI graphURI to the rdfDataUnit data unit. The graphURI must be unique in the context of the data unit. It automatically creates new entry in the rdfDataUnit data unit with the symbolicName being equal to the graphURI. For explanation of symbolicNames and other metadata of entries in data units, please see Basic Concepts for DPU Developers . In this case, graphURI (the URI of the data graph where triples are stored) is specified by the DPU developer, who has to ensure that the name (URI) of such graph does not collide with the names of the graphs used by other entries/data units/DPUs. The URI of the such graph can be based on the prefix obtained by calling rdfDataUnit.getBaseDataGraphURI() to ensure that the name of the graph does not collide with names of graphs in other data units/DPUs; however, the DPU developer must still ensure that the name of the data graph does not collide with names of other entries within the same data unit.
public static RDFDataUnit.Entry addGraph(WritableRDFDataUnit rdfDataUnit, URI graphURI, final String graphName)
- The same as above, but the symbolicName for the created entry is explicitly specified as graphName.

Note

If you would like to reuse the data graph from input (RDF) data unit also in the output data unit, please check Working with the RDF Data Unit first.

As the methods above return RDFDataUnit.Entry as a result, you may also use public static URI asGraph(RDFDataUnit.Entry entry)method to convert returned entry to the URI of the data graph of that entry.

Code 5a shows how new graph can be created and added to the output RDF data unit.

Code 5a - Creating new data graph in the output data unit

RDFDataUnit.Entry createGraph = RDFHelper.createGraph(output, "http://output/graph");
URI outputURI = createGraph.getDataGraphURI();

The sample fragment in Code 5b shows how the developer can add existing graph to the output RDF data unit.

Code 5b - adding graph

URI graphURI = ...
RDFHelper.addGraph(output, graphURI);

Using OpenRDF API to Work With RDF Data UnitsAdding Triples to the Output RDF Data Unit

Code 6 shows how to add triples to the output RDF data unit (output is the output RDF data unit).

Code 6 - Adding triples to output data unit

RepositoryConnection connection = null;
        try {
            connection = output.getConnection();
            ValueFactory factory = connection.getValueFactory();
            final URI subject = factory.createURI("http://data.example.org/resource/mySubject");
            final URI predicate = factory.createURI("http://data.example.org/ontology/myPred");
            final Literal object = factory.createLiteral("xxxx");
            Statement s = factory.createStatement(subject, predicate, object);
            connection.add(s, outputURI);
        } catch (DataUnitException ex) {
            throw ContextUtils.dpuException(ctx, ex, "dpuName.error.addition");
        } catch (RepositoryException ex) {
            throw ContextUtils.dpuException(ctx, ex, "dpuName.error.repository");
        } finally {
            if (connection != null) {
                try {
                    connection.close();
                } catch (RepositoryException ex) {
                    log.error("Error on close.", ex);
                }
            }
        }

In Line 3, we use the output data unit output and obtain connection to the RDF repository. Lines 5-8 constructs a statement. In Line 9, we use the name of the graph, which we created before using RDFHelper, to which the data (one statement) should be written.

Note

You may remove statements by calling con.remove(subject, predicate, object);

Alternative for Adding Triples to the Output RDF Data Unit

try (RepositoryConnection connection = output.getConnection()) {
    ValueFactory factory = connection.getValueFactory();
    final IRI subject = factory.createIRI("http://data.example.org/resource/subject");
    final IRI predicate = factory.createIRI("http://data.example.org/ontology/predicate");
    final Literal object = factory.createLiteral("value");
    Statement s = factory.createStatement(subject, predicate, object);
    connection.add(s, outputURI);
} catch (DataUnitException ex) {
    throw ContextUtils.dpuException(ctx, ex, "dpuName.error.addition");
} catch (RepositoryException ex) {
    throw ContextUtils.dpuException(ctx, ex, "dpuName.error.repository");
}

Querying the RDF Data Unit

To query input/output RDF data unit, one obtains the connection as described in Code 6 and then use the approach described here.

It is also possible to call connection.getStatements(subject, null, null, false ) method to get all statements from the repository having the given subject as subject of the triple. See here for other options how to work with RepositoryConnection.

Using Dataset While Querying

You may also use Dataset class to restrict the graphs on top of which the query is executed, for example as depicted in Code 7:

Code 7

...
String query = ...;
final Update update = connection.prepareUpdate(QueryLanguage.SPARQL, query);

Dataset dataset = new DatasetImpl();
dataset.addDefaultGraph(entry.getDataGraphURI());
dataset.addDefaultRemoveGraph(targetGraph);
dataset.setDefaultInsertGraph(targetGraph);
update.setDataset(dataset);
update.execute();
...

Using the WritableSimpleRDF DPU Extension

Apart from the RDFHelper, there is also an extension WritableSimpleRDF, which is not a data unit helper, but a DPU extension. Such an extension may be used to write triples into output data unit.

The advantage of such an extension is that:

it has methods for adding triples to the output data unit directly
the methods for creating new RDF entries, adding existing RDF entries are a bit simpler, as they do not specify data unit as the parameter. The WritableSimpleFiles is bound at the beginning to certain data unit based on the initialization of the extension.
it automatically uses FaultTolerant extension, if it is allowed for the DPU. So if you prepare your DPUs fault tolerant, you should consider using WritableSimpleRDF extension, as it hides fault tolerance calls smoothly.

For details about WritableSimpleRDF extension, please see Working with the RDF Data Unit.

Advanced Topics

It is possible to use EntityBuilder helper to construct statements about certain subject.

It is possible to use SparqlUtils helper class to construct and execute SPARQL queries. This is an alternative to pure OpenRDF approach.

Notes:

Imports needed for Code 6:

import org.openrdf.model.Statement;
import org.openrdf.model.URI;
import org.openrdf.query.GraphQuery;
import org.openrdf.query.GraphQueryResult;
import org.openrdf.query.QueryLanguage;
import org.openrdf.repository.RepositoryConnection;

import eu.unifiedviews.dataunit.DataUnit;
import eu.unifiedviews.dataunit.rdf.RDFDataUnit;
import eu.unifiedviews.dataunit.rdf.WritableRDFDataUnit;

import eu.unifiedviews.dataunit.DataUnitException;
import org.openrdf.repository.RepositoryException;
import eu.unifiedviews.helpers.dataunit.rdf.RDFHelper;
import org.openrdf.model.ValueFactory;
import org.openrdf.model.Literal;

Using API Classes Instead of RDFhelper

Abstract

Using API Classes Instead of RDFhelper

This section contains a short guide on how to use the API classes of RDFHelper.

Note

You should use RDFHelper to access the RDF data unit if possible, see: Working with the RDF Data Unit

Using API Classes to Read Data From RDF Data Unit

Read data to get a list of graphs within the data unit.

Further, lets start by showing how as a DPU developer you may iterate over the input data unit in order to get access to RDF graphs which comes over input RDF data unit. The code below goes to innerExecute() method of the DPU.

Code 1 - Iterating over input RDF graphs using API classes

Set<URI> rdfGraphs = new HashSet<>();
FilesDataUnit.Iteration it = null;
try {
    it = input.getIteration();
    while (it.hasNext()) {
        final URI dataGraphURI = it.next().getDataGraphURI();
        rdfGraphs.add(dataGraphURI);
    }
} catch (DataUnitException ex) {
    throw ContextUtils.dpuException(ctx, ex, "dpuName.error");
} finally {
    if (it != null) {
        try {
            it.close();
        } catch (DataUnitException ex) {
            log.error("Error on close.", ex);
        }
    }
}

In Lines 5 - 8, we iterate over the entries (RDF graphs) in the input RDF data unit.
In Line 6, we got RDF data graph URI: the URI of the RDF graph in the working RDF store.
We then on Line 7 add this to the set of such URIs.
As the iterator over entries does not extendAutoClosable we need to take care about it’s closing at the end (Line 14). That’s why we do all the work in try-catch block (Lines 3 - 11 ) with finally statement (Lines 11 - 19). Also we catch DataUnitException which may be thrown by the iterator in Lines 9 -11.

So after executing Code 1, we have a set of RDF graphs in which the input RDF data (triples) reside in the variable rdfGraphs. We may then work with the RDF graphs using standard approach of OpenRDF API.

The code introduced in Code1 can be simplified by using helper - eu.unifiedviews.helpers.dataunit.rdf.RDFHelper in uv-dataunit-helpers. See Working with the RDF Data Unit. In this case, DPU developer does not need to manually handle iteration over input RDF graphs.

In general, as a DPU developer you should use helpers, if possible.

Using API Classes to Create Graphs in the Output Data Unit

There are two methods as DPU developers you may use to add graphs to output RDF data unit:

URI addNewDataGraph(String symbolicName) throws DataUnitException;
- This method creates a new entry in the output RDF data unit under the given symbolicName. The symbolicName must be unique in the context of the data unit. It also associates the newly created entry with the (generated) data graph URI, which is returned by this method. Such data graph URI may be used by the DPU developer to store RDF triples to.
  Note
  For an explanation of symbolicNames and other metadata of entries in data units, please see Basic Concepts for DPU Developers . The metadata generated is stored in the working RDF store of UnifiedViews.
void addExistingDataGraph(String symbolicName, URI existingDataGraphURI) throws DataUnitException;
- This method adds an existing data graph URI to the output data unit under the given symbolicName. The symbolicName must be unique in the context of the data unit. It automatically creates new entry in the output data unit with the given symbolicName. For explanation of symbolicNames and other metadata of entries in data units, please see Basic Concepts for DPU Developers.
  In this case, the name of the data graph (the URI of the data graph where triples are stored) is specified by the DPU developer, who has to ensure that the name (URI) of such graph does not collide with the names of the graphs used by other entries/data units/DPUs. The URI of such a graph can be based on the prefix obtained by calling getBaseDataGraphURI() to ensure that the name of the graph does not collide with names of graphs in other data units/DPUs.

In order to add existing RDF graph to the output data unit, the code below may be used in the innerExecute() method of the DPU.

Code 4 - Creating RDF graph in the output data unit using API classes

String graphName = ...
URI graphURI = ...
Symbolic symbolicName = output.addExistingDataGraph(graphName, graphURI);

In Line 3, the new entry in the output data unit is created and for such entry metadata symbolicName is set to be equal to graphName and existingDataGraphURI is set to graphURI.

Note

API classes does not support directly adding triples to the output RDF data unit or querying data units. To add triples/query data units, one has to use standard OpenRDF API methods.

In this section: