Skip to main content

Helpers for Adding Entries to RDF/Files Output Data Unit

Abstract

Helpers for Adding Entries to RDF/Files Output Data Unit

This section contains a short guide on how to use helpers for adding RDF triples and files to output data units.

For basic information about data units, please refer to: Basic Concepts for DPU Developers.

To see the core data unit interfaces and how the particular types of data units (RDF, Files) extend such interfaces, please look at .

Helpers described on this page are advanced helpers which support adding of entries (files, triples) to output data units (Files, RDF data units). There are also data unit helpers, such as FilesHelper, which should be also considered. Such helpers have certain disadvantage, but are a bit simpler to be used.

The DPU extensions described on this page do not support reading of entries. For reading entries from input data units, please see this tutorial.

Refer to this tutorial how to use these simpler helpers. In general, you should use DPU extensions described on this page (rather than the simpler helpers) for adding entries to an output data unit if you to write your DPUs fault tolerant. For a more detailed comparison of these helpers, see here.

Adding Files to an Output Files Data Unit

This helper is an extension.

Before this extension can be used, you have to insert the following code to the Main DPU class, where param in Line 1 contains the name of the data unit the helper wraps (in this case 'output').

@ExtensionInitializer.Init(param = "output")
public WritableSimpleFiles outputFilesHelper;
Two Methods of Adding Files to an Output Data Unit

There are two methods DPU developers may use to add files to the output files data unit (using the helper):

  • public File create(final String fileName) throws DPUException

    • This method created new empty file in the output data unit with the symbolicName and virtualPath metadata equal to fileName. For explanation of symbolicNames, virtualPath and other metadata of entries in data units, please see Basic Concepts for DPU Developers . The physical name of the create file is generated and the file is physically stored in the working directory of the given pipeline execution.

  • public void add(final File file, final String fileName) throws DPUException

    • This method adds existing file to the output data unit. It automatically creates new entry in the output data unit with the symbolicName and virtualPath metadata equal to fileName. For explanation of symbolicNames, virtualPath and other metadata of entries in data units, please see Basic Concepts for DPU Developers . In this case, the real location and the physical name of the file is as it was when it was created before calling this method. Be careful that in this case, the file is not created in the working space of the given pipeline execution.

Without using this helper, the task of adding an existing file may be executed as follows:

Symbolic symbolicName = output.addExistingFile(fileName, file.toURI().toString());
MetadataUtils.set(output, symbolicName, FilesVocabulary.UV_VIRTUAL_PATH, fileName);
  • In Line 1 the new entry in the output data unit is created and for such entry symbolicName and fileURI is set.

  • Line 2 then sets virtualPath metadata for the same entry.

Advantages of Using the Helper:

The advantage of the helper is that the code is cleaner: compare the code needed to add existing file to the output file data unit, which is one line (with helper) vs. two lines (when the helper is not used).

Additionally, when the helper is not used, you as a DPU developer must be aware of virtualPath metadata, must know that the recommended practice is to set virtualPath = symbolicName.

Adding Triples to Output RDF Data Unit

This helper is an extension.

Before this extension can be used, the following code has to be inserted to the Main DPU class, where param in Line 1 contains name of the data unit the helper wraps (in this case 'output').

@ExtensionInitializer.Init(param = "output")
public WritableSimpleRdf outputRdfHelper;
Two Methods of Adding Triples to an RDF Data Unit

There are two methods DPU developers may use to add RDF triples to the output RDF data unit (using the helper):

  • public WritableSimpleRdf add(Resource s, URI p, Value o) throws SimpleRdfException, DPUException

    • This method adds one triple to the output RDF data unit. Please see the example below how Resources, URIs and Values (all classes from openRDF API) may be used.

  • public WritableSimpleRdf add(List<Statement> statements) throws SimpleRdfException, DPUException

    • This method adds list of statements (triples) previously prepared using openRDF API.

Sample usage of the first method add(Resource s, URI p, Value o) is depicted below:

org.openrdf.model.ValueFactory vf = outputRdfHelper.getValueFactory();
org.openrdf.model.URI s = valueFactory.createURI(http://data.example.com/resource/subjectS);
org.openrdf.model.URI p = valueFactory.createURI(http://data.example.com/resource/predicateP);
org.openrdf.model.Value = valueFactory.createLiteral(rowNumber)
add(s,p,o);

By default, calling the add methods above, the triples are added to the default entry (RDF graph), automatically generated for you in the output data unit, the wrapper wraps. The symbolicName of the default entry is set to default-output.In typical cases, preparing one entry (RDF graph), where all the data (triples) is loaded, is sufficient.

In certain cases, if you already prepared an entry (e.g. by using RDFDataUnitUtils), you may specify that you want to add triples to this prepared entry (and not to the default entry generated) by calling:

  • public WritableSimpleRdf setOutput(RDFDataUnit.Entry entry) throws SimpleRdfException, DPUException

    • This method sets the output entry (RDF graph) to which data (triples) is added. This must be called before any method for adding triples (see above) is called.

  • public WritableSimpleRdf setOutput(final List<RDFDataUnit.Entry> entries) throws SimpleRdfException, DPUException

    • This method sets the output entry (RDF graph) to which data (triples) is added. This must be called before any method for adding triples (see above) is called. If the list of entries contains more then one entry then the added triples are automatically added to more output data unit entries (RDF graphs)

Actual Usage of This Helper for Inspiration

E-SparqlEndpoint

  • uses output.setOutput() and output.add()

T-Tabular

  • uses output.add()

T-Metadata

  • uses output.setOutput() and output.add()