Skip to main content

Tabular File To RDF

Abstract

Tabular File To RDF

DescriptionTabular File To RDF (uv-t-tabular):

This DPU converts tabular data into RDF data. As an input it takes CSV, DBF and XLS files.

It supports RDF Validation Extension.

Configuration Parameters

Name

Description

Example

Resource URI base

This value is used as base URI for automatic column property generation and also to create absolute URI if relative URI is provided in 'Property URI' column.

http://localhost/

Key column

Name of the column that will be appended to 'Resource URI base' and used as subject for rows.

Employee

Encoding

Character encoding of input files. Possible values: UTF-8, UTF-16, ISO-8859-1, windows-1250

UTF-8

Rows limit

Max. count of processed lines

1,000

Class for a row entity

This value is used as a class for each row entity. If no value is specified, the default "Row" class is used.

http://unifiedviews.eu/ontology/t-tabular/Row

Full column mapping

A default mapping is generated for every column

true

Ignore blank cells

Blank cells are ignored and no output will be generated for them.

false

Use static row counter

When multiple files are processed those files share the same row counter. The process can be viewed as if files are appended before parsing.

false

Advanced key column

'Key column' is interpreted as template. An example of a template is http://localhost/{type}/content/{id}, where "type" and "id" are names of the columns in the input CSV file.

false

Generate row column

If checked, a triple containing the row number is generated for each row. The triple looks like this: <URI> <http://linked.opendata.cz/ontology/odcs/tabular/row> <Row Number>.

true

Generate subject for table

A subject for each table that points to all rows in given table is created. The predicate used is "http://linked.opendata.cz/ontology/odcs/tabular/hasRow". With the predicate "http://linked.opendata.cz/ontology/odcs/tabular/symbolicName" the symbolic name of the source table is also attached.

false

Auto type as string

All auto types are considered to be strings. This can be useful with full column mapping to enforce the same type over all the columns and get rid of warning messages.

false

Generate table/row class

If checked, a class for the entire table is generated. The triple looks like this: <File URI> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://unifiedviews.eu/ontology/t-tabular/Table>. Note: This additional triple is only generated when "Generate subject for table" is also checked.

false

Generate labels

If checked, a label for each column URI is generated. The corresponding value of the header row is used as label. If the file does not contain a header data from the first row is used.It does not generate labels for advanced mapping.

false

Remove trailing spaces

Trailing spaces in cells are removed.

false

Ignore missing columns

If a named column is missing only info level log is used instead of error level log.

false

There are three different types of mapping available:

  • Simple mapping

  • Advanced mapping with templates

  • XLS mapping

The simple mapping tab allows to define how the columns should be mapped to the resulting predicates, including also information about the datatypes. The Advanced mapping tab is equivalent to the Simple mapping tab, but it allows to specify templates for values of the predicates. A sample template is http://localhost/{type}/content/{id}, where "type" and "id" are names of the columns in the input file. The XLS mapping can be used for the static mapping of cells to named cells. Named cells are accessible as extension in every row.

CSV Specific Settings

Name

Description

Example

Quote char

If no quote char is indicated, no quote chars are used. In this case values must not contain separator characters.

"

Delimiter char

Character used to specify the boundary between separate values.

,

Skip n first lines

Number of indicated rows are skipped when processing the file.

10

Has header

If the file has no header the columns are accesible as col0, col1, ....

true

XLS Specific Settings

Name

Description

Example

Sheet name

Specify the name of the sheet that is to be processed.

Table1

Skip n first lines

Number of indicated rows are skipped when processing the file.

10

Has header

If the file has no header the columns are accesible as col0, col1, ....

true

Strip header for nulls

Trailing null values in the header are removed. This can be useful if the header is bigger than data so that no exepton for "diff number of cells in header" is thrown.

false

Use advanced parser for double

In XLS integer, double and date are all represented in the same way. This option enables advanced detection of integers and dates based on value and formatting.

false

Inputs and Outputs

Name

Type

Data Unit

Description

Required

table

input

FilesDataUnit

Input files containing tabular data

(tick)

triplifiedTable

output

FilesDataUnit

RDF data

(tick)
ExamplesDownload an CSV File, Convert the Table Data to RDF and Load It to Virtuoso

The following image shows a fragment of a pipeline which downloads a CSV file from the tmp folder of the UnifiedViews server. The data of the file is subsequently converted to RDF and loaded into a Virtuoso triple store. The DPU configuration is illustrated in the image below.

24577343.png
24577169.png
Download an Excel File Containing Download Links, Convert It to RDF and Use It to Configure Another Files Download DPU

The following image shows a fragment of a pipeline which downloads an Excel file (.xls) from the tmp folder of the UnifiedViews server. The data of the Excel file is subsequently converted to RDF and serves as input for a SPARQL Construct Query. The purpose of this query is to construct the configuration file of the second Files Download DPU. After the files are downloaded they are uploaded to the tmp folder of the UnifiedViews server using the Files Upload DPU. The DPU configuration is illustrated in the image below.

24577344.png
24577174.png

The query used in this pipeline creates triples containing the download URI and the file name of the files that are to be downloaded. The query reads as follows:

CONSTRUCT {
<http://localhost/resource/config>  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://unifiedviews.eu/ontology/dpu/filesDownload/Config>;
        <http://unifiedviews.eu/ontology/dpu/filesDownload/hasFile> <http://localhost/resource/file/0>.

<http://localhost/resource/file/0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://unifiedviews.eu/ontology/dpu/filesDownload/File>;
        <http://unifiedviews.eu/ontology/dpu/filesDownload/file/uri> ?fileUri; 
        <http://unifiedviews.eu/ontology/dpu/filesDownload/file/fileName> ?fileName.
}
WHERE {
?s <http://localhost/fileuri/fileName> ?fileName.
?s <http://localhost/fileUri> ?fileUri
}
Generate RDF Data From a CSV File With Simple Mapping and Add UUIDs to the RDF Data

The following image shows a fragment of a pipeline which downloads a CSV file from the server and transforms it into RDF. With a SPARQL Construct we convert the URI generated by the Tabular File To RDF Transformer into a UUID. The DPU configuration is illustrated in the image below.

24577345.png
24577176.png