Basic Concepts for DPU Developers

This section contains topics on basic concepts related to UnifiedViews Data Processing Units (DPUs) for developers of such plugins.

Basic concepts in UnifiedViews are characterised by certain terms and ideas, which are explained here in more detail.

DPU -  Data Processing Unit

DPU refers to the term Data Processing Unit.

There are four DPU types:

  • Extractor: should be used if the DPU extracts something from outside UnifiedViews. Example: downloading file, querying remote SPARQL, etc.
  • Transformer: used for the DPUs which transform certain input data to output data. Additionally, data they input and output, are stored within UnifiedViews' working store. 
  • Loader: similar to Extractor. This type should be used for DPUs that move data outside of UnifiedViews.
  • Quality (experimental): used for special types of DPUs, namely DPUs that assess quality of the resources. 

Pipeline

Every data processing task is represented by a data processing pipeline (or simply pipeline) in UnifiedViews. Pipelines contain Data Processing Units (DPUs) and data flows between these DPUs.

Pipelines may be designed, executed, scheduled and debugged.

Data Unit

A data unit is a container for data being exchanged between DPUs.

We distinguish input and output data units - output data unit is the data unit containing data produced by the DPU. Input data unit is the data unit containing data being the inputs to the DPU's execution.

There are currently three types of data unit. They can be both input or output data units:

  • RDF data unit

  • Files data unit

  • Relational data unit

Every data unit holds entries of a certain type, depending on the type of data unit.

The supported entries are: 

  • RDF Graph: supported by the RDF data unit
  • Files: supported by Files data unit
  • Relational tables: supported by the Relational data unit

How to Use Data Units for DPU Design

As a DPU developer you do not have to work directly with data units, there are Java helpers, that allow you to realize typical operations on top of data units, such as:

  • adding a new entry, 
  • getting all entries, 
  • specifying certain metadata of the entry. 

Please refer to this Tutorial for details.

For each entry, data units hold basic metadata, such as the name of the entry, its location, and so on.

Metadata is represented in the form of RDF triples and stored in the UnifiedViews RDF working store.  

The following topics describe the main metadata associated with each entry (based on the type of the entry):