Page tree
Skip to end of metadata
Go to start of metadata

(work in progress)

This tutorial explains how Relational data unit entries (table) may be obtained from input Relations data units and also how Relations entries may be written to the output  data unit.

For basic information about data units, please see the basic description of data units





(needs refinement)

 

To configure Relational data unit - db connection:

 
  • database.dataunit.sql.type
    • file (default) 
    • inMemory
  • database.dataunit.sql.baseurl
    • jdbc:h2:file: (default)

  • database.dataunit.sql.user
    • filesUser (default)

  • database.dataunit.sql.password

 

By default it uses H2 database in embedded mode, using files to store the tables. 

Technical:

Database file name (file DB mode): 

dbFileName.append("dataUnitDb");
dbFileName.append("_");
dbFileName.append(this.executionId);

 

Reading tables from input Relational data unit:

Please prepare DPU "MyDpu" as described in Tutorial: Creating new DPU. To read tables from input data unit, one has to define input Relational data unit. 

Code 1 - defining input data unit
 @DataUnit.AsInput(name = "input")
 public RelationalDataUnit input;

All data units must be public with proper annotation - they must at least contain name, which will be the name visible in UnifiedViews administration interface for pipeline developers. The code above goes to the main DPU class.

In order to work with input data unit, you typically use RelationalHelper class (eu.unifiedviews.helpers.dataunit.relational.RelationalHelper in uv-dataunit-helpers). 


RelationalHelper class provides methods to get list of tables the input Relational data unit contains or to operate with them: 
  • static Set<RelationalDataUnit.Entry> getTables(RelationalDataUnit relationalDataUnit) throws DataUnitException
    • This method returns set of entries (Relational tables) in the given relationalDataUnit
  • static Set<RelationalDataUnit.Entry> getTablesMap(RelationalDataUnit relationalDataUnit) throws DataUnitException
    • This method returns map of entries (Relational tables) in the given relationalDataUnit. In this case the key for each map entry is the symbolic name of the table. 


Code 2 shows how the method for getting tables can be used (The code bellow goes to innerExecute() method of the DPU). Line 2 returns set of table entries.
Code 2 - Iterating over input RDF graphs using RDFHelper
try {
	Set<RelationalDataUnit.Entry> tableEntries = RelationalHelper.getTables(input);
} catch (DataUnitException ex) {
   throw ContextUtils.dpuException(ctx, ex, "dpuName.error");
}

By having the set of tables, you may then iterate over tableEntries as follows

 

Code 3 - Getting dataset
try {
	org.openrdf.query.Dataset dataset = RelationalHelper.getDatasetWithDefaultGraphs(input);
} catch (DataUnitException ex) {
   throw ContextUtils.dpuException(ctx, ex, "dpuName.error");
}

 

Writing RDF entries to output RDF data unit

Please prepare DPU "MyDpu" as described in Tutorial: Creating new DPU. To write RDF entries (graphs/triples) to output RDF data unit, one has to define output RDF data unit.  

Code 4 - defining output data unit
@DataUnit.AsOutput(name = "output")
public WritableRDFDataUnit output;

All data units must be public with proper annotation - they must at least contain name, which will be the name visible in UnifiedViews administration interface for pipeline developers. The code above goes to the Main DPU class.

In order to work with output RDF data unit (create graphs in the output data unit), you typically use RelationalHelper class (eu.unifiedviews.helpers.dataunit.rdf.RelationalHelper in uv-dataunit-helpers). You can also use the data unit API directly, but this is typically not needed (details for that can be found here). Note: RelationalHelper class does not support directly adding triples to the output RDF data unit or querying data units. To add triples/query data units, one has to use standard OpenRDF API methods (Simple examples are depicted in section Code 6, Code 7.) or a special UnifiedViews extension SimpleRDF.


There are these basic methods DPU developers may use to add graphs to the output rdf data unit:
  • public static RDFDataUnit.Entry createGraph(WritableRDFDataUnit rdfDataUnit, final String graphName) throws DataUnitException
    • This method creates new entry in the output RDF data unit rdfDataUnit under the given graphName. The graphName must be unique in the context of the data unit, because graphName is used as symbolicName of the entry.  It also associates the newly created entry with the (generated) data graph URI. Such data graph URI may  be used by the DPU developer to store RDF triples to (the data graph URI of the returned entry entry may be obtained by calling asGraph(entry)). For explanation of symbolicNames and other metadata of entries in data units, please see Basic Concepts for DPU developers . The metadata generated is stored in the working RDF store of UnifiedViews. 
  • public static RDFDataUnit.Entry addGraph(WritableRDFDataUnit rdfDataUnit, URI graphURI) throws DataUnitException

    • This method adds existing data graph URI graphURI to the rdfDataUnit data unit. The graphURI must be unique in the context of the data unit.  It automatically creates new entry in the rdfDataUnit data unit with the symbolicName being equal to the graphURI. For explanation of symbolicNames and other metadata of entries in data units, please see Basic Concepts for DPU developers . In this case, graphURI  (the URI of the data graph where triples are stored)  is specified by the DPU developer, who has to ensure that the name (URI) of such graph does not collide with the names of the graphs used by other entries/data units/DPUs. The URI of the such graph can be based on the prefix obtained by calling rdfDataUnit.getBaseDataGraphURI() to ensure that the name of the graph does not collide with names of graphs in other data units/DPUs; however, the DPU developer must still ensure that the name of the data graph does not collide with names of other entries within the same data unit. 
  • public static RDFDataUnit.Entry addGraph(WritableRDFDataUnit rdfDataUnit, URI graphURI, final String graphName)
    • The same as above, but the symbolicName for the created entry is explicitly specified as  graphName

 

If you would like to reuse the data graph from input (RDF) data unit also in the output data unit, please check RDF Data Processing Performance Optimization first.



As the methods above return RDFDataUnit.Entry as a result, you may also use public static URI asGraph(RDFDataUnit.Entry entry) method to convert returned entry to the URI of the data graph of that entry. 

Code 5a shows how new graph can be created and added to the output RDF data unit. 

Code 5a - Creating new data graph in the output data unit
RDFDataUnit.Entry createGraph = RelationalHelper.createGraph(output, "http://output/graph");
URI outputURI = createGraph.getDataGraphURI();

 

The sample fragment in Code 5b shows how the developer can add existing graph to the output RDF data unit. 

 

Code 5b - adding graph
URI graphURI = ...
RelationalHelper.addGraph(output, graphURI); 

 


Using OpenRDF API to work with RDF data units  

Adding triples to the output RDF data unit 

Code 6 shows how to add triples to output RDF data unit (output is the output RDF data unit). 

Code 6 - Adding triples to output data unit
RepositoryConnection connection = null;
        try {
            connection = output.getConnection();
            ValueFactory factory = connection.getValueFactory();
            final URI subject = factory.createURI("http://data.example.org/resource/mySubject");
            final URI predicate = factory.createURI("http://data.example.org/ontology/myPred");
            final Literal object = factory.createLiteral("xxxx");
            Statement s = factory.createStatement(subject, predicate, object);
            connection.add(s, outputURI);
        } catch (DataUnitException ex) {
            throw ContextUtils.dpuException(ctx, ex, "dpuName.error.addition");
        } catch (RepositoryException ex) {
            throw ContextUtils.dpuException(ctx, ex, "dpuName.error.repository");
        } finally {
            if (connection != null) {
                try {
                    connection.close();
                } catch (RepositoryException ex) {
                    log.error("Error on close.", ex);
                }
            }
        }

In Line 3, we use the output data unit output and obtain connection to the RDF repository. Lines 5-8 constructs a statement. In Line 9, we use the name of the graph, which we created before using RelationalHelper, to which the data (one statement) should be written. 

Note: You may remove statements by calling con.remove(subject, predicate, object);

 

Querying RDF data unit 

To query input/output RDF data unit, one obtains the connection as described in Code 6 and then use the approach described here

It is also possible to call connection.getStatements(subject, null, null, falsemethod to get all statements from the repository having the given subject as subject of the triple. See here for other options how to work with RepositoryConnection. 


Using Dataset while querying

You may also use  Dataset class to restrict the graphs on top of which the query is executed, e.g. as depicted in Code 7:  

Code 7
...
String query = ...;
final Update update = connection.prepareUpdate(QueryLanguage.SPARQL, query);

Dataset dataset = new DatasetImpl();
dataset.addDefaultGraph(entry.getDataGraphURI());
dataset.addDefaultRemoveGraph(targetGraph);
dataset.setDefaultInsertGraph(targetGraph);
update.setDataset(dataset);
update.execute();
...

 


Using WritableSimpleRDF DPU extension

Apart from the RelationalHelper, there is also an extension WritableSimpleRDF, which is not a data unit helper, but a DPU extension. Such extension may be used to write triples into output data unit. The advantage of such extension is that:

  • it has methods for adding triples to the output data unit directly
  • the methods for creating new RDF entries, adding existing RDF entries are a bit simpler, as they do not specify data unit as the parameter. The WritableSimpleFiles is bound at the beginning to certain data unit based on the initialization of the extension.
  • it automatically uses FaultTolerant extension, if it is allowed for the DPU. So if you prepare your DPUs fault tolerant, you should consider using WritableSimpleRDF extension, as it hides fault tolerance calls smoothly. 

For details about WritableSimpleRDF extension, please see here

Advanced/Optional Topics

It is possible to use EntityBuilder helper to construct statements about certain subject. For sample use, please see here

It is possible to use SparqlUtils helper class to construct and execute SPARQL queries.  For sample use, please see here.  This is an alternative to pure OpenRDF approach.

TODO: 

Describe/Document: 

  •  DataUnitUtils.generateSymbolicName(Metadata.class)
  • RDFDataUnitImpl - addNewDataGraph - new URI is used as data graph URI and at the same time also as entry URI
  •  MetadataUtils.set(output, symbolicName, FilesVocabulary.UV_VIRTUAL_PATH, fileName);  UV_VIRTUAL GRAPH should be used similarly

 

Notes: 

Imports needed for Code 6:

import org.openrdf.model.Statement;
import org.openrdf.model.URI;
import org.openrdf.query.GraphQuery;
import org.openrdf.query.GraphQueryResult;
import org.openrdf.query.QueryLanguage;
import org.openrdf.repository.RepositoryConnection;

import eu.unifiedviews.dataunit.DataUnit;
import eu.unifiedviews.dataunit.rdf.RDFDataUnit;
import eu.unifiedviews.dataunit.rdf.WritableRDFDataUnit;

import eu.unifiedviews.dataunit.DataUnitException;
import org.openrdf.repository.RepositoryException;
import eu.unifiedviews.helpers.dataunit.rdf.RelationalHelper;
import org.openrdf.model.ValueFactory;
import org.openrdf.model.Literal;



 

  • No labels