Self-service Linked Government Data

34
Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Enabling networked knowledge Self-service Linked Government Data Fadi Maali, Richard Cyganiak, Vassilios Peristeras [email protected]

description

A publishing pipeline for Linked Government Data

Transcript of Self-service Linked Government Data

Page 1: Self-service Linked Government Data

Copyright 2011 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

Self-service Linked Government Data

Fadi Maali, Richard Cyganiak, Vassilios [email protected]

Page 2: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge2

data.gov.uk

Page 3: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge3

data.gov.uk

Page 4: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge4

data.gov

Page 5: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge5

data.gov

4997 datasets

2590 in CSV

272 in RDF

Page 6: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge6

Why Linked Governemnt Data (LGD)?

Web accessible

Interlinkable

Decentralised publishing of data

Standardised

Page 7: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge7

We need government data as Linked Data not just Raw Data

….aha, and of a good quality!

LGD

Page 8: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge8

We want governments to provide Linked Data not just Raw Data… and of good quality

TIM

EM

ONE

Y SKIL

LS

LGD is Costly

http://code.google.com/p/google-refine/

Page 9: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge9

DIY

Self-service Approach

Page 10: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge10

Self-service Approach

DIYProvide tools, models and algorithms that enable the self-service approach (a publishing pipeline)

Page 11: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge11

Interactive approach

Graphical user interface

Reproducibility and traceability

Flexibility

Decentralisation

Results sharing

Publishing pipeline requirements

Page 12: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge12

Interactive approach

Graphical user interface

Reproducibility and traceability

Flexibility

Decentralisation

Results sharing

Publishing pipeline requirements

Page 13: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge13

Powerful data editing, transformation and enriching capabilities

Import capabilities e.g. JSON, Excel, CSV, TSV, XML, etc.

Persistent undo/redo history

Popular in open data community

Extensible and under active development

Free and open source

Google Refine

http://code.google.com/p/google-refine/

Page 14: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge14

DIY Recipe (1000 feet view)

Publishers provide RDF representation of their catalogues

User shares the RDF data

Tool support to select datasets of interest and put them into RDF

Page 15: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge15

DIY Recipe (100 feet view)

Publishers provide RDF representation of their catalogues

dcat

User shares the RDF data

Tool support to select datasets of interest and put them into RDF

Page 16: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge16

Tool support to select datasets of interest and put them into RDF User shares the

RDF data

Publishers provide RDF representation of their catalogues

dcat

Google Refine

+ RDF export extension+ RDF reconciliation extension

DIY Recipe (100 feet view)

Page 17: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge17

User shares the RDF dataTool support to select datasets of interest and put them into RDF

Publishers provide RDF representation of their catalogues

dcat Google Refine

+ RDF export extension

+ RDF reconciliation extension

Share RDF data publicly (on CKAN.net) along with the sufficient provenance description

DIY Recipe (100 feet view)

Page 18: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge18

A Walk-through (1/5)

Page 19: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge19

A Walk-through (2/5)

Page 20: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge20

A Walk-through (3/5)

Page 21: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge21

A Walk-through (4/5)

Page 22: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge22

A Walk-through (5/5)

Page 23: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge23

Data on CKAN.net

Page 24: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge24

:dataset

:csv-ds:export-process:json-history

dct:source:wasExportedBy

:usedData:operations

Data Provenance (simplified)

Page 25: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge25

An RDF vocabulary to describe government catalogues

Current status: First Public Working Draft by the W3C GLD Working Grouphttp://www.w3.org/TR/vocab-dcat/

Used on data.gov.uk (RDFa) and CKAN-based catalogues

“Enabling Interoperability of Government Data Catalogues.”EGOV 2010

DIY Recipe (10 feet view)

Dcat

Page 26: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge26

RDF Mapping

DIY Recipe (10 feet view)

Page 27: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge27

RDF-centric mapping

Multiple tree structure

Expression language for custom expression

Vocabularies/ontologies support

More on RDF Mapping

Page 28: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge28

Interlinking

RDF Reconcile Extension

Silk Server

SPARQL endpoint

Sindice se

arch A

PI

Crafted RDF

SPARQL

SPARQL endpoint with fulltext extension

Hybrid SPARQL

Silk LSL

Google Refine

DIY Recipe (10 feet view)

Page 29: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge29

More on Interlinking

Interlinking as a pre-RDF-creation step less unnecessary owl:sameAs

Focus on the interface

Semi-automatic process with good user support

“Re-using Cool URIs: Entity Reconciliation Against LOD Hubs.”LDOW 2011

Page 30: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge30

Sharing

Captures the operations applied to the data

Represent them according to Open Provenance Model Vocabulary (OPMV)

Share the data and its provennce on CKAN.net

CKAN Extension fro Google Refinehttp://lab.linkeddata.deri.ie/2011/grefine-ckan/

DIY Recipe (10 feet view)

Page 31: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge31

Case study - Fingal Catalogue

Number of datasets: 74 (68 available in CSV and 56 in XML)

Top publishers: Fingal county Council (41), Central Statistics Office (17), Department of Education and Science (4)

Top domains: Demographics(18), Citizen Participation(18), Education(9)

http://data.fingal.ie

Page 32: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge32

Case study - Fingal Catalogue

The catalogue was represented in Dcat

60 datasets were converted to RDF using the publishing pipeline (~300K triples)

Data Cube was used for statistical data

URIs were used consistently and shared among datasets the data was interlinked

Externally linked to DBpedia

Page 33: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge33

Evaluating/Refining the crowd-sourcing aspects of the RDF creation process

RDF Modeling: Can we assist RDF modeling by examining the raw data?

Open Issues

Page 34: Self-service Linked Government Data

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge34

Lessons Learned

Interactive approach

Focus on plumbing tools together but don’t enforce a rigid process

Make it easy to adopt best-practices and good recipes