EDF 2012 Datasets

18
Jens Lehmann AKSW Group, University of Leipzig 6 June 2012 Realising and Exploiting the EU Data Cloud European Data Forum, Copenhagen, Denmark Dataset Presentations

description

The presentation shows which datasets have been converted to RDF and interlinked within the LATC EU project. In particular, it shows the typical conversion process for one example dataset - the EU financial transparency system.

Transcript of EDF 2012 Datasets

Page 1: EDF 2012 Datasets

Jens Lehmann AKSW Group, University of Leipzig

6 June 2012

Realising and Exploiting the EU Data Cloud

European Data Forum, Copenhagen, Denmark

Dataset Presentations

Page 2: EDF 2012 Datasets

EU-Level Dataset Development

Page 3: EDF 2012 Datasets

List of LATC Datasets

Business Legal Institutions

FTS(EU finance)

Eur-Lex(European Law)

EuroStat(Statistical Data)

CORDIS (EU projects, finance)

N-Lex(National Law)

Institution List

Euraxess (EU jobs, companies)

Taxation & Customs EU Who is Who

EURES (EU jobs)

EU Patent Office EU Barometer

EC Competition(market overview)

EU Agencies European Election Results

eSBN(eBusiness solutions)

PreLex(inter-institutional law)

European Parliament Media

UNODC(drugs & crime statistics)

European Central Bank Statitstics

Other: Eventseer, Sciencewise

Total: 22 Datasetshttp://latc-project.eu/datasets/

Page 4: EDF 2012 Datasets

Financial Transparency System

Step 1: Analysing the Dataset

Financial Transparency System (FTS) contains information about 110000+ EU grants

Contains beneficiaries, amount of funding, year, responsible department, country etc.

Covers years 2007 – 2010

Originally published in HTML, XML and CSV

Page 5: EDF 2012 Datasets

Financial Transparency System Step 2: Modelling the Data in RDF and OWL

Michael Martin, Claus Stadler, Philipp Frischmuth, Jens Lehmann: Increasing the Financial Transparency of European Commission Project Funding: Semantic Web Journal (Under review)

Page 6: EDF 2012 Datasets

Financial Transparency System Step 3: Converting the Dataset

Java classes generated automatically from XML Schema

XML data accessible as Java Objects → script based transformation

High flexibility for data cleansing and special cases

Source code of transformation

● https://github.com/AKSW/FTS-EC-2-RDF/

XML

XSD Java Classes

Java Objects RDF

JAXB

TransformationJAXB

Page 7: EDF 2012 Datasets

Financial Transparency System

Step 4: Publishing the Dataset

Landing Page, Linked Data, SPARQL endpoint, browser at http://fts.publicdata.eu via OntoWiki

Metadata: Datahub

OntoWiki

http://thedatahub.org

Page 8: EDF 2012 Datasets

Financial Transparency System

Page 9: EDF 2012 Datasets

Financial Transparency System

Page 10: EDF 2012 Datasets

Financial Transparency System

Step 5: Enriching the Dataset

Linking with LIMES (http://limes.aksw.org)

Link targets:

● LinkedGeoData: cities● DBpedia: cities, countries, years, schema

Geo-Coding of beneficiaries on city and address level – 45k coordinates

Meta data: author, license, source, statistics using DublinCore, Void, DataCube

Page 11: EDF 2012 Datasets

Financial Transparency System

Step 6: Queries, Applications, Visualisation

RDF version allows:

● Find organisations with highest funding● Compare funding across countries / beneficiaries● Compare funding per year and country (from FTS)

with gross domestic product (from DBpedia) – see next slide

→ overall increases transparency and may serve as input for research policy strategies

Page 12: EDF 2012 Datasets

Financial Transparency SystemSELECT * { { SELECT ?ftsyear ?ftscountry (SUM(?amount) AS ?funding) { ?com rdf:type fts-o:Commitment . ?com fts-o:year ?year . ?year rdfs:label ?ftsyear . ?com fts-o:benefit ?benefit . ?benefit fts-o:detailAmount ?amount . ?benefit fts-o:beneficiary ?beneficiary . ?beneficiary fts-o:country ?country . ?country owl:sameAs ?ftscountry . } } { SELECT ?dbpcountry ?gdpyear ?gdpnominal { ?dbpcountry rdf:type dbp-o:Country . ?dbpcountry dbp-p:gdpNominal ?gdpnominal . ?dbpcountry dbp-p:gdpNominalYear ?gdpyear . } } FILTER ((?ftsyear = str(?gdpyear)) && (?ftscountry = ?dbpcountry)) }

Page 13: EDF 2012 Datasets

Financial Transparency System

Page 14: EDF 2012 Datasets

European Employment Services

European Employment Services (EURES) cooperation network for free movement of workers in the EU

Publishes 1.2+ mio Job vacancies, 700 000 CVs, 25000 employers

RDF version can be used to:● compare geographical, economic information for new jobs

(DBpedia, LGD)● Salary comparisons relative to standards in job region● Quality of nearby schools

Page 15: EDF 2012 Datasets

European Employment Services

Neither API nor dump available → site scraping

Modelling considered existing ontologies

Published using D2R: http://www4.wiwiss.fu-berlin.de/eures/

7 mio triples, classes: Offer, Skill, Employer

3000 links to DBpedia cities + regions + countries + languages + currencies, LEXVO languages, Eurostat

Updates can be performed by scraping only new pages

Page 16: EDF 2012 Datasets

Euraxess

Contains research jobs in EU, 6400 organisations, 1700 open jobs, 61000 registered researchers, 18000 researcher CVs

http://ec.europa.eu/euraxess/

Contains information about people, jobs, skills, languages etc.

links to DBpedia languages and LEXVO languages

Page 17: EDF 2012 Datasets

Euraxess + EURES Query

Query: aggregates information about jobs and companies in a country from two different sources

SELECT DISTINCT ?job ?company WHERE {SERVICE <http://www4.wiwiss.fu-berlin.de/eures/sparql> { ?job eures:country ?countryjob. ?countryjob a eures:Country. ?countryjob rdfs:label ?n.}SERVICE <http://www4.wiwiss.fu-berlin.de/euraxess/sparql> { ?company euraxess:country ?countrycomp. ?countrycomp a euraxess:Country. ?countryjob owl:sameAs ?countrycomp .}}

Page 18: EDF 2012 Datasets

Summary / Take Away Messages

Linked Data increasingly important in EU E-Government

Many RDF conversion tools/techniques available depending on source format

Linked Data simplifies data integration – added value by enrichment, e.g. linking to other data sets or schema creation

LOD cloud provides rich background information

Thanks for your Attention!