Tabular File To RDF

Abstract

Tabular File To RDF

DescriptionTabular File To RDF (uv-t-tabular):

This DPU converts tabular data into RDF data. As an input it takes CSV, DBF and XLS files.

Configuration Parameters

Name	Description	Example
Resource URI base	This value is used as base URI for automatic column property generation and also to create absolute URI if relative URI is provided in 'Property URI' column.	http://localhost/
Key column	Name of the column that will be appended to 'Resource URI base' and used as subject for rows.	Employee
Encoding	Character encoding of input files. Possible values: UTF-8, UTF-16, ISO-8859-1, windows-1250	UTF-8
Rows limit	Max. count of processed lines	1,000
Class for a row entity	This value is used as a class for each row entity. If no value is specified, the default "Row" class is used.	http://unifiedviews.eu/ontology/t-tabular/Row
Full column mapping	A default mapping is generated for every column	true
Ignore blank cells	Blank cells are ignored and no output will be generated for them.	false
Use static row counter	When multiple files are processed those files share the same row counter. The process can be viewed as if files are appended before parsing.	false
Advanced key column	'Key column' is interpreted as template. An example of a template is http://localhost/{type}/content/{id}, where "type" and "id" are names of the columns in the input CSV file.	false
Generate row column	If checked, a triple containing the row number is generated for each row. The triple looks like this: <URI> <http://linked.opendata.cz/ontology/odcs/tabular/row> <Row Number>.	true
Generate subject for table	A subject for each table that points to all rows in given table is created. The predicate used is "http://linked.opendata.cz/ontology/odcs/tabular/hasRow". With the predicate "http://linked.opendata.cz/ontology/odcs/tabular/symbolicName" the symbolic name of the source table is also attached.	false
Auto type as string	All auto types are considered to be strings. This can be useful with full column mapping to enforce the same type over all the columns and get rid of warning messages.	false
Generate table/row class	If checked, a class for the entire table is generated. The triple looks like this: <File URI> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://unifiedviews.eu/ontology/t-tabular/Table>. Note: This additional triple is only generated when "Generate subject for table" is also checked.	false
Generate labels	If checked, a label for each column URI is generated. The corresponding value of the header row is used as label. If the file does not contain a header data from the first row is used.It does not generate labels for advanced mapping.	false
Remove trailing spaces	Trailing spaces in cells are removed.	false
Ignore missing columns	If a named column is missing only info level log is used instead of error level log.	false

There are three different types of mapping available:

Simple mapping
Advanced mapping with templates
XLS mapping

The simple mapping tab allows to define how the columns should be mapped to the resulting predicates, including also information about the datatypes. The Advanced mapping tab is equivalent to the Simple mapping tab, but it allows to specify templates for values of the predicates. A sample template is http://localhost/{type}/content/{id}, where "type" and "id" are names of the columns in the input file. The XLS mapping can be used for the static mapping of cells to named cells. Named cells are accessible as extension in every row.

CSV Specific Settings

Name	Description	Example
Quote char	If no quote char is indicated, no quote chars are used. In this case values must not contain separator characters.	"
Delimiter char	Character used to specify the boundary between separate values.	,
Skip n first lines	Number of indicated rows are skipped when processing the file.	10
Has header	If the file has no header the columns are accesible as col0, col1, ....	true

XLS Specific Settings

Name	Description	Example
Sheet name	Specify the name of the sheet that is to be processed.	Table1
Skip n first lines	Number of indicated rows are skipped when processing the file.	10
Has header	If the file has no header the columns are accesible as col0, col1, ....	true
Strip header for nulls	Trailing null values in the header are removed. This can be useful if the header is bigger than data so that no exepton for "diff number of cells in header" is thrown.	false
Use advanced parser for double	In XLS integer, double and date are all represented in the same way. This option enables advanced detection of integers and dates based on value and formatting.	false

Inputs and Outputs

Name	Type	Data Unit	Description	Required
table	input	FilesDataUnit	Input files containing tabular data
triplifiedTable	output	FilesDataUnit	RDF data

ExamplesDownload an CSV File, Convert the Table Data to RDF and Load It to Virtuoso

The following image shows a fragment of a pipeline which downloads a CSV file from the tmp folder of the UnifiedViews server. The data of the file is subsequently converted to RDF and loaded into a Virtuoso triple store. The DPU configuration is illustrated in the image below.

Download an Excel File Containing Download Links, Convert It to RDF and Use It to Configure Another Files Download DPU

The following image shows a fragment of a pipeline which downloads an Excel file (.xls) from the tmp folder of the UnifiedViews server. The data of the Excel file is subsequently converted to RDF and serves as input for a SPARQL Construct Query. The purpose of this query is to construct the configuration file of the second Files Download DPU. After the files are downloaded they are uploaded to the tmp folder of the UnifiedViews server using the Files Upload DPU. The DPU configuration is illustrated in the image below.

The query used in this pipeline creates triples containing the download URI and the file name of the files that are to be downloaded. The query reads as follows:

CONSTRUCT {
<http://localhost/resource/config>  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://unifiedviews.eu/ontology/dpu/filesDownload/Config>;
        <http://unifiedviews.eu/ontology/dpu/filesDownload/hasFile> <http://localhost/resource/file/0>.

<http://localhost/resource/file/0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://unifiedviews.eu/ontology/dpu/filesDownload/File>;
        <http://unifiedviews.eu/ontology/dpu/filesDownload/file/uri> ?fileUri; 
        <http://unifiedviews.eu/ontology/dpu/filesDownload/file/fileName> ?fileName.
}
WHERE {
?s <http://localhost/fileuri/fileName> ?fileName.
?s <http://localhost/fileUri> ?fileUri
}

Generate RDF Data From a CSV File With Simple Mapping and Add UUIDs to the RDF Data

The following image shows a fragment of a pipeline which downloads a CSV file from the server and transforms it into RDF. With a SPARQL Construct we convert the URI generated by the Tabular File To RDF Transformer into a UUID. The DPU configuration is illustrated in the image below.

In this section: