Tomas Knap: UnifiedViews in COMSODE pilot projects
-
Upload
semantic-web-company -
Category
Data & Analytics
-
view
460 -
download
0
Transcript of Tomas Knap: UnifiedViews in COMSODE pilot projects
The COMSODE project has received funding from the Seventh Framework Programme
of the European Union in the grant agreement number 611358.
UnifiedViews in COMSODE pilot projects
Tomas Knap1,2, Jakub Klimek2
1EEA s.r.o.,
http://www.eea.sk/
2Charles University in Prague,
Department of Software Engineering,
XML and Web Engineering Research Group
Agenda
UnifiedViews
UnifiedViews in Open Data Node
Pilot Applications
Slovak Environmental Agency
Czech Trade Inspection Authority
UnifiedViews
A tool for management of RDF data processing tasks
Task = progression of data processing units (DPUs)
Sample task:
Extract data from SPARQL Endpoint A
Extract data from CSV file B
Refine data with SPARQL queries X,Y, Z
Deduplicate data using Linker L
Publish data to SPARQL Endpoint B
UnifiedViews
UnifiedViews allows users to define, execute,
monitor, debug, schedule, and share tasks
UnifiedViews is ETL tool for RDF data
It differs from other ETL tools by natively
supporting RDF data
UnifiedViews provides set of plugins (DPUs) for
working with RDF data and new custom plugins
may be easily created
Open source, http://unifiedviews.eu
Open Data Node
Publication platform for (Linked) Open Data
Open Source
Developed in COMSODE project
2013-2015
Mission of the Slovak Environmental Agency
(SEA)
Policy support
Design and Implementation
Data provider/integrator
LandCover, Environmental burden, waste dumps
Infrastructure provider
Data Services (DB servers)
Consultancy provider
Analysis and design of environmental information systems
Initial Situation/Motivation
SEA publishes various geospatial data from the
environmental domain
SEA wanted to explore potential to increase re-
use of their data if published as Linked data
Goals
To publish as Linked Data datasets on:
Protected sites, species distribution, bio-geographical regions, land
cover, contaminated sites registered as enviromental burdens
Harvest and convert source data to RDF
Source data is available in the Geography Markup Language (GML)
via an API provided by the Web Feature Service (WFS), typically in
INSPIRE format
Initial barrier: the vocabularies mapping the INSPIRE XML schemas to
RDF were not available
Interlink with relevant RDF/Linked data resources
Provide visualizations, interface for querying
Approach and IT solution
Successfully deployed ODN with UnifiedViews on
remote cloud infrastructure of SEA
For each dataset we built a transforming data
processing pipeline in UnifiedViews, which harvested
the data from the data service and converted it to
RDF via XSL transformations.
We also created pipelines for enriching the datasets
with links to external datasets
We associated these pipelines with datasets in
catalog
Approach and IT solution
Data Transformation Since GML is an XML format we converted it to
RDF via XSL transformations.
We extend XSL transformations developed by the
GeoKnow project (http://geoknow.eu)
The target vocabularies produced by the
transformations were derived from the INSPIRE
schemas and were simplified and adjusted to match
linked data conventions
Done in cooperation with SmartOpenData project
(http://www.w3.org/2015/03/inspire)
Approach and IT solution
Data Enrichment We link datasets to external datasets including
Geonames.org and datasets from the European
Environmental Agency:
Biogeographical regions 2011
Natura 2000
EUNIS
Benefits of the Semantic Solution
A key benefit of the RDF version of the SEA
datasets is that it is straightforward to combine it
with third-party datasets
We did the linkage to GeoNames, Natura
2000 and EUNIS datasets
Lessons Learned
Open Data Node (and UnifiedViews) was able
to transform, enrich and publish RDF data in a
simple way, allowing easy maintenance for the
future
Making the data you publish adhere to common
standards, such as the INSPIRE schemas,
make it more reusable
Reuse of XSL transformations from other projects
Next Steps
Linking more third-party datasets and extending
the coverage of the source data included in the
RDF version
Data visualizations are being designed
Developed as extensions of LDVMi (http://ldvm.net).
Mission of Czech Trade Inspection
Authority (CTIA)
Monitors and inspects businesses and
individuals who
Supply goods
Sell goods
Provide services
Provided consumer credit
Operate marketplaces
Motivation
CTIA wanted to publish their data
To be used by third-party applications
Instead of building their own map
visualizations
Goals
CTIA wanted to (and managed to) be the first
Czech administrative government institution to
publish data in RDF (LOD)
CTIA wanted to publish additional anonymized
datasets
Approach and IT solution
UnifiedViews successfully deployed and
pipelines prepared to publish the source data as
Linked Open Data
Benefits of the Semantic Solution
A map application emerged
Uses RDF data combined with other datasets
Registry of Business Entities
Google Maps
Lessons Learned and Next Steps
Publishing data as LOD pays off
Publishing data as LOD is not difficult
All you need to start is a spare PC
CTIA is in the process of implementing the
COMSODE methodology for publising open
data
Demo
Resulting data published:
http://www.coi.cz/cz/spotrebitel/open-data-
databaze-kontrol-sankci-a-zakazu/
(in Czech)
Conclusions
UnifiedViews
http://unifiedviews.eu
Open Data Node
http://opendatanode.org
Pilots:
Slovak Environmental Agency
Czech Trade Inspection Authority