Skip to main content

Main Metadata Held by UnifiedViews Data Unit Entries

Abstract

Main Metadata Held by UnifiedViews Data Unit Entries

This section contains a short guide on the metadata, the UnifiedViews Data Unit entries can hold. Data Units and their entries are the component parts of Data Processing Units (DPUs), the UnifiedViews plugins.

Note

All data regarding DPUs as well as data units is stored in the RDF data format in a dedicated graph database you installed UnifiedViews in.

Main Metadata Held by All Types of Data Unit Entries

  • symbolicName = identifier of the entry (file, RDF graph, relational table) in the data unit.

    Note

    This identifier may change as the entry is being processed by the pipeline. Therefore as a DPU developer you must not rely on the durability of the symbolicName, since the created symbolicName may be changed by the next DPU in the pipeline execution.

    For RDF data units, the entry symbolicName is typically taken from dataGraphURI (details find below).

    For Files data units, the entry symbolicName is typically derived from virtualPath.

Main Metadata Held by Files Data Unit Entries (Files)

  • fileURI = URI under which a file is stored by the UnifiedViews backend engine, as it is processed in the given pipeline execution.

  • virtualPath = path under which the given file should be stored when it is loaded outside of UnifiedViews to some target server.

    Note

    Value of fileURI metadata cannot be used for this purpose as it points to internal storage of the file. If for example virtual path is e.g. x/y/data.ttl, and loader loads the data to target folder F, the file should be automatically placed to F/x/y/data.ttl.

    In most cases symbolicName of a file = virtualPath of a file.

Example

The sample below, in TriG syntax, shows metadata produced by e-filesDownload, as they are stored in the internal working RDF store of UnifiedViews.

We also show the configuration of the e-filesDownload below.

24577420.png

In this example of a Data Unit's metadata, the following details are present:

  • Line 8 defines symbolicName

  • Line 9 defines the real working URI under which the file is stored in UnifiedViews' working store.

  • Line 10 defines virtualPath.

All such data is stored in the context 'http://unifiedviews.eu/resource/internal/dataunit/exec/83/dpu/80/du/0'.

This is the unique RDF graph for the given data unit within the given DPU instance and within the given pipeline execution.

All entries of that data unit appear in this graph:

@prefix meta: <http://unifiedviews.eu/DataUnit/MetadataDataUnit/> .
@prefix meta-files: <http://unifiedviews.eu/DataUnit/MetadataDataUnit/FilesDataUnit/> .
@prefix helpers-vp: <http://unifiedviews.eu/VirtualPathHelper/> .
 
<http://unifiedviews.eu/resource/internal/dataunit/exec/83/dpu/80/du/0> {

        <http://unifiedviews.eu/resource/internal/dataunit/exec/83/dpu/80/du/0/entry/1> 
                meta:symbolicName "dokument.pdf" ;            
                meta-files:fileURI "file:/Users/tomasknap/Documents/PROJECTS/ETL-SWProj/UnifiedView/Core/backend/working/exec_83/storage/dpu_80/0/09f881476f7896764633322479342" ;
                helpers-vp:virtualPath "dokument.pdf" ;

}

Main Metadata Held by RDF Data Unit Entries (RDF Graphs)

  • dataGraphURI = URI under which the RDF graph is stored by the UnifiedViews backend engine as it is processed in the given pipeline execution (so it is URI of the RDF graph in the UnifiedViews working RDF store). In case of RDF data unit entries, symbolic name for entries is typically set to be equal to dataGraphURI.

  • virtualGraph = path under which the given RDF graph should be stored when it is loaded outside of UnifiedViews to some remote RDF store. Value of dataGraphURI metadata cannot be used for this purpose as it is auto generated internal value, which is different for each pipeline execution and which does not comply with any methodologies for preparing good RDF graph URIs. For example, if virtualGraph is for example http://data.company.com/graphs/myGraph, then data (triples) from this entry are automatically loaded to the virtualGraph on the remote RDF store.

The sample below shows metadata produced by the t-rdfGraphMerger DPU, which physically merges two or more RDF data units to one graph. You find the configuration of this t-rdfGraphMerger DPU below.

24577421.png
@prefix meta: <http://unifiedviews.eu/DataUnit/MetadataDataUnit/> .
@prefix meta-rdf: <http://unifiedviews.eu/DataUnit/MetadataDataUnit/RDFDataUnit/> .
@prefix helpers-vg: <http://unifiedviews.eu/VirtualGraphHelper/> .
 
<http://unifiedviews.eu/resource/internal/dataunit/exec/80/dpu/54/du/1> {
 
    <http://unifiedviews.eu/resource/internal/dataunit/exec/80/dpu/54/du/1/entry/1> 
        meta:symbolicName "GraphMerge/output/generated-1440424555316" ;
                meta-rdf:dataGraphURI <http://unifiedviews.eu/resource/internal/dataunit/exec/80/dpu/54/du/1/entry/1> ;
        helpers-vg:virtualGraph "http://outputGraph.com" 
}
 
<http://unifiedviews.eu/resource/internal/dataunit/exec/80/dpu/54/du/1/entry/1> {
    # triples in the data graph
    # ...
}
  • Line 8 defines symbolicName

  • Line 9 defines the real working URI under which the RDF graph is stored in UnifiedViews' working graph database.

  • Line 10 defines virtualGraph.

    All such data is stored in the context 'http://unifiedviews.eu/resource/internal/dataunit/exec/80/dpu/54/du/1'. This is the unique RDF graph for the given data unit within the given DPU instance and within the given pipeline execution. All entries of that data unit appear in this graph.

  • Lines 13 - 16 contains definition of the graph 'http://unifiedviews.eu/resource/internal/dataunit/exec/80/dpu/54/du/1/entry/1' . This contains the real data: triples that were merged and are sent to the output.

    • So in case of an RDF data unit, the working RDF data graph contains not only metadata, but also data.

    • In case of the Files data unit, data are store on the file system under fileURI metadata.

Support for DPU Developers to Work With Metadata

As a DPU developer you do not need to directly modify RDF triples with metadata of entries: there are Java helpers.

Example: to define the virtualPath metadata value, you can use VirtualPathHelper from the uv-dataunit-helpers module.