Helpers for Adding Entries to RDF/Files Output Data Unit
Helpers for Adding Entries to RDF/Files Output Data Unit
This section contains a short guide on how to use helpers for adding RDF triples and files to output data units.
For basic information about data units, please refer to: Basic Concepts for DPU Developers.
To see the core data unit interfaces and how the particular types of data units (RDF, Files) extend such interfaces, please look at .
Helpers described on this page are advanced helpers which support adding of entries (files, triples) to output data units (Files, RDF data units). There are also data unit helpers, such as FilesHelper
, which should be also considered. Such helpers have certain disadvantage, but are a bit simpler to be used.
The DPU extensions described on this page do not support reading of entries. For reading entries from input data units, please see this tutorial.
Refer to this tutorial how to use these simpler helpers. In general, you should use DPU extensions described on this page (rather than the simpler helpers) for adding entries to an output data unit if you to write your DPUs fault tolerant. For a more detailed comparison of these helpers, see here.
This helper is an extension.
Before this extension can be used, you have to insert the following code to the Main DPU class, where param
in Line 1 contains the name of the data unit the helper wraps (in this case 'output').
@ExtensionInitializer.Init(param = "output") public WritableSimpleFiles outputFilesHelper;
There are two methods DPU developers may use to add files to the output files data unit (using the helper):
public File create(final String fileName) throws DPUException
This method created new empty file in the
output
data unit with thesymbolicName
andvirtualPath
metadata equal tofileName
. For explanation ofsymbolicNames
,virtualPath
and other metadata of entries in data units, please see Basic Concepts for DPU Developers . The physical name of the create file is generated and the file is physically stored in the working directory of the given pipeline execution.
public void add(final File file, final String fileName) throws DPUException
This method adds existing
file
to the output data unit. It automatically creates new entry in theoutput
data unit with thesymbolicName
andvirtualPath
metadata equal tofileName
. For explanation of symbolicNames, virtualPath and other metadata of entries in data units, please see Basic Concepts for DPU Developers . In this case, the real location and the physical name of the file is as it was when it was created before calling this method. Be careful that in this case, the file is not created in the working space of the given pipeline execution.
Without using this helper, the task of adding an existing file may be executed as follows:
Symbolic symbolicName = output.addExistingFile(fileName, file.toURI().toString()); MetadataUtils.set(output, symbolicName, FilesVocabulary.UV_VIRTUAL_PATH, fileName);
In Line 1 the new entry in the output data unit is created and for such entry
symbolicName
andfileURI
is set.Line 2 then sets
virtualPath
metadata for the same entry.
The advantage of the helper is that the code is cleaner: compare the code needed to add existing file to the output file data unit, which is one line (with helper) vs. two lines (when the helper is not used).
Additionally, when the helper is not used, you as a DPU developer must be aware of virtualPath
metadata, must know that the recommended practice is to set virtualPath
= symbolicName
.
This helper is an extension.
Before this extension can be used, the following code has to be inserted to the Main DPU class, where param
in Line 1 contains name of the data unit the helper wraps (in this case 'output').
@ExtensionInitializer.Init(param = "output") public WritableSimpleRdf outputRdfHelper;
There are two methods DPU developers may use to add RDF triples to the output RDF data unit (using the helper):
public WritableSimpleRdf add(Resource s, URI p, Value o) throws
SimpleRdfException
,DPUException
This method adds one triple to the
output
RDF data unit. Please see the example below how Resources, URIs and Values (all classes from openRDF API) may be used.
public WritableSimpleRdf add(List<Statement> statements) throws SimpleRdfException, DPUException
This method adds list of statements (triples) previously prepared using openRDF API.
Sample usage of the first method add(Resource s, URI p, Value o)
is depicted below:
org.openrdf.model.ValueFactory vf = outputRdfHelper.getValueFactory(); org.openrdf.model.URI s = valueFactory.createURI(http://data.example.com/resource/subjectS); org.openrdf.model.URI p = valueFactory.createURI(http://data.example.com/resource/predicateP); org.openrdf.model.Value = valueFactory.createLiteral(rowNumber) add(s,p,o);
By default, calling the add methods above, the triples are added to the default entry (RDF graph), automatically generated for you in the output data unit, the wrapper wraps. The symbolicName
of the default entry is set to default-output.
In typical cases, preparing one entry (RDF graph), where all the data (triples) is loaded, is sufficient.
In certain cases, if you already prepared an entry (e.g. by using RDFDataUnitUtils), you may specify that you want to add triples to this prepared entry (and not to the default entry generated) by calling:
public WritableSimpleRdf setOutput(RDFDataUnit.Entry entry) throws SimpleRdfException, DPUException
This method sets the output entry (RDF graph) to which data (triples) is added. This must be called before any method for adding triples (see above) is called.
public WritableSimpleRdf setOutput(final List<RDFDataUnit.Entry> entries) throws SimpleRdfException, DPUException
This method sets the output entry (RDF graph) to which data (triples) is added. This must be called before any method for adding triples (see above) is called. If the list of entries contains more then one entry then the added triples are automatically added to more output data unit entries (RDF graphs)
uses output.setOutput() and output.add()
uses output.add()
uses output.setOutput() and output.add()