Self-service Linked Government Data

Post on 09-May-2015

2.452 views 0 download

description

A publishing pipeline for Linked Government Data

Transcript of Self-service Linked Government Data

Copyright 2011 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

Self-service Linked Government Data

Fadi Maali, Richard Cyganiak, Vassilios Peristerasfirstname.lastname@deri.org

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge2

data.gov.uk

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge3

data.gov.uk

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge4

data.gov

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge5

data.gov

4997 datasets

2590 in CSV

272 in RDF

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge6

Why Linked Governemnt Data (LGD)?

Web accessible

Interlinkable

Decentralised publishing of data

Standardised

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge7

We need government data as Linked Data not just Raw Data

….aha, and of a good quality!

LGD

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge8

We want governments to provide Linked Data not just Raw Data… and of good quality

TIM

EM

ONE

Y SKIL

LS

LGD is Costly

http://code.google.com/p/google-refine/

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge9

DIY

Self-service Approach

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge10

Self-service Approach

DIYProvide tools, models and algorithms that enable the self-service approach (a publishing pipeline)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge11

Interactive approach

Graphical user interface

Reproducibility and traceability

Flexibility

Decentralisation

Results sharing

Publishing pipeline requirements

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge12

Interactive approach

Graphical user interface

Reproducibility and traceability

Flexibility

Decentralisation

Results sharing

Publishing pipeline requirements

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge13

Powerful data editing, transformation and enriching capabilities

Import capabilities e.g. JSON, Excel, CSV, TSV, XML, etc.

Persistent undo/redo history

Popular in open data community

Extensible and under active development

Free and open source

Google Refine

http://code.google.com/p/google-refine/

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge14

DIY Recipe (1000 feet view)

Publishers provide RDF representation of their catalogues

User shares the RDF data

Tool support to select datasets of interest and put them into RDF

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge15

DIY Recipe (100 feet view)

Publishers provide RDF representation of their catalogues

dcat

User shares the RDF data

Tool support to select datasets of interest and put them into RDF

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge16

Tool support to select datasets of interest and put them into RDF User shares the

RDF data

Publishers provide RDF representation of their catalogues

dcat

Google Refine

+ RDF export extension+ RDF reconciliation extension

DIY Recipe (100 feet view)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge17

User shares the RDF dataTool support to select datasets of interest and put them into RDF

Publishers provide RDF representation of their catalogues

dcat Google Refine

+ RDF export extension

+ RDF reconciliation extension

Share RDF data publicly (on CKAN.net) along with the sufficient provenance description

DIY Recipe (100 feet view)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge18

A Walk-through (1/5)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge19

A Walk-through (2/5)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge20

A Walk-through (3/5)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge21

A Walk-through (4/5)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge22

A Walk-through (5/5)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge23

Data on CKAN.net

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge24

:dataset

:csv-ds:export-process:json-history

dct:source:wasExportedBy

:usedData:operations

Data Provenance (simplified)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge25

An RDF vocabulary to describe government catalogues

Current status: First Public Working Draft by the W3C GLD Working Grouphttp://www.w3.org/TR/vocab-dcat/

Used on data.gov.uk (RDFa) and CKAN-based catalogues

“Enabling Interoperability of Government Data Catalogues.”EGOV 2010

DIY Recipe (10 feet view)

Dcat

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge26

RDF Mapping

DIY Recipe (10 feet view)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge27

RDF-centric mapping

Multiple tree structure

Expression language for custom expression

Vocabularies/ontologies support

More on RDF Mapping

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge28

Interlinking

RDF Reconcile Extension

Silk Server

SPARQL endpoint

Sindice se

arch A

PI

Crafted RDF

SPARQL

SPARQL endpoint with fulltext extension

Hybrid SPARQL

Silk LSL

Google Refine

DIY Recipe (10 feet view)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge29

More on Interlinking

Interlinking as a pre-RDF-creation step less unnecessary owl:sameAs

Focus on the interface

Semi-automatic process with good user support

“Re-using Cool URIs: Entity Reconciliation Against LOD Hubs.”LDOW 2011

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge30

Sharing

Captures the operations applied to the data

Represent them according to Open Provenance Model Vocabulary (OPMV)

Share the data and its provennce on CKAN.net

CKAN Extension fro Google Refinehttp://lab.linkeddata.deri.ie/2011/grefine-ckan/

DIY Recipe (10 feet view)

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge31

Case study - Fingal Catalogue

Number of datasets: 74 (68 available in CSV and 56 in XML)

Top publishers: Fingal county Council (41), Central Statistics Office (17), Department of Education and Science (4)

Top domains: Demographics(18), Citizen Participation(18), Education(9)

http://data.fingal.ie

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge32

Case study - Fingal Catalogue

The catalogue was represented in Dcat

60 datasets were converted to RDF using the publishing pipeline (~300K triples)

Data Cube was used for statistical data

URIs were used consistently and shared among datasets the data was interlinked

Externally linked to DBpedia

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge33

Evaluating/Refining the crowd-sourcing aspects of the RDF creation process

RDF Modeling: Can we assist RDF modeling by examining the raw data?

Open Issues

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge34

Lessons Learned

Interactive approach

Focus on plumbing tools together but don’t enforce a rigid process

Make it easy to adopt best-practices and good recipes