Using linked data for dataset publication

Post on 08-May-2015

606 views 1 download

description

Presentation to Liber IDCC workshop on metadata for dataset reuse.

Transcript of Using linked data for dataset publication

Semantic web and linked data for data set publication

Dave Reynolds, Epimorphics Ltd@der42

Outline Background on linked data Roles in data set publishing Case study: Environment Agency Lessons

Linked data background

Linked data ...

publishing data on the web ...

... to enable integration, linking and reuse across silos

Linked dataApply the principles to the web to publication of dataThe linked data web:

is a global network of things each identified by a URI fetching a URI gives a set of statements things connected by typed links open, anyone can say anything about anything else

Linked data is “data you can click on”

in RDF

Example schools informationhttp://education.data.gov.uk/id/school/

401874

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”“Secondary”

“Cardiff”

label phasedistrict

Schoola

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”

phasedistrict

http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label

school:PhaseOfEducation_Secondary

label

aschool:Schoo

l

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”

school:phase

school:district

http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”rdfs:label

school:PhaseOfEducation_Secondary

rdfs:label

rdf:type school:School

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”

school:phase

school:district

http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label

school:PhaseOfEducation_Secondary

rdfs:label

rdf:type school:School

http://data.ordnancesurvey.co.uk/id/7000000000025484

admingeo:wardspatial:extent

admingeo:parishGML: 310499.4 184176.6

310476.5 ...

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”

school:phase

school:district

http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label

school:PhaseOfEducation_Secondary

rdfs:label

rdf:type school:School

http://data.ordnancesurvey.co.uk/id/7000000000025484

admingeo:wardspatial:extent

admingeo:parishGML: 310499.4 184176.6

310476.5 ...

owl:sameAs

Role in data set publication well suited to describing things

schools, companies, animal species, music tracks, tv programmes ...

what about datasets? environmental measurements, experimental results, statistical analyses ...

Approach 1 : Data catalogues treat the dataset as a single resource, identify with a URI provide metadata as linked data

descriptive categorical technical and structural

Benefits? separate of metadata from resource & repository easy aggregation of metadata into catalogues schema-less enables use-specific annotations and links use of sharable category schemes and reference data

=> support for discovery

Approach 2 : Fine grain publication publish the data set itself as linked data

entities, terms, individual records in data identified by URIs data set structure and ontologies linked from data still include dataset metadata

Benefits? all benefits of approach 1 to support discovery self-describing data slices addressable (trace back, provenance, annotation) integration across sets - reuse of terms for dimensions, units, values fine grained access

=> integration, comparison, context, data as a service

bathing water quality

what we do...

Press interest

start of season

20-22 samples in 22weeks

annual reportNovember

bath

ing se

aso

n

15th May

30th Sept

December

what informationis relevant to the publicabout beaches

what we do

how linkable data helps

Photo by Skellig2008 (flickr)

Tenby Tourist Information Centre Unit 2 , The Gateway Complex Tenby. Wales , SA70 7LT Tel: 01834 842 402Fax: 01834 845 439Email: tenby.tic@pembrokeshire.gov.uk

BathingWaters

SamplingPoints

Assessments

Vocabularies

e.g. http://location.data.gov.uk/def/ef/SampingPoint

Zones Of Influence

URI SetReference Data

BathingWaters

SamplingPoints

e.g. http://location.data.gov.uk/so/ef/SamplingPoint/bwsp.eaew

Zone Of Influence

Assessment

ObservationDatasets

http://environment.data.gov.uk/data/bathing-water-quality

AnnualComplian

ce

In-seasonWeekly

Assessment

void:subsetvoid:subset

.../in-season.../compliance

Publishing the Bathing Water Quality data set

Data cube vocabulary collaborative development

sponsored by data.gov.uk simple, flexible vocabulary mirrors core information models from:

SDMX (Statistical Data and Metadata eXchange) DDI (Data Documentation Initiative)

extension to SCOVO vocabulary

image: dullhunk @ flickr

Data cube modelA set of observations indexed by dimensions describing measures interpreted according to attributes

dimension(e.g. time)

dim

ensi

on

(e.g

. re

gio

n)

• population = 32,567

measure(s)

unit of measure = countstatus = preliminary...

attributes

Data cube vocabulary1. Top level DataSet

provenance and metadata structure

dimension valuesmeasure value(s)attribute values

qb:component

qb:DataSet

qb:Slice

qb:slice

qb:Observation

qb:observation

qb:dataset

qb:structure

qb:SliceKey

qb:sliceStructure

qb:DataStructureDefinition

qb:sliceKey

qb:subSlice

Data cube vocabulary1. Top level DataSet

provenance and metadata structure

Observation measured values, at dimensions

with attributes direct link to DataSet

dimension valuesmeasure value(s)attribute values

qb:component

qb:DataSet

qb:Slice

qb:slice

qb:Observation

qb:observation

qb:dataset

qb:structure

qb:SliceKey

qb:sliceStructure

qb:DataStructureDefinition

qb:sliceKey

qb:subSlice

Data cube vocabulary1. Top level DataSet

provenance and metadata structure

Observation measured values, at dimensions

with attributes direct link to DataSet

Slice optional grouping by fixing

dimensions guide to presentation allows for abbreviated data

dimension valuesmeasure value(s)attribute values

qb:component

qb:DataSet

qb:Slice

qb:slice

qb:Observation

qb:observation

qb:dataset

qb:structure

qb:SliceKey

qb:sliceStructure

qb:DataStructureDefinition

qb:sliceKey

qb:subSlice

Data cube vocabulary2. Data Structure Definition explicit definition of cube

structure, inline in the data enables

validation visualization discovery abbreviation

qb:ComponentSpecification

qb:DataStructureDefinition

qb:DataSetqb:structure

qb:component

qb:dimension

qb:measure

qb:attribute

qb:componentRequired qb:componentAttachment qb:order

Bathing Water Quality cubes measures

total coliform count, entero virus count, ... sample classification

dimensions sampling point sampling week sampling year

attributes abnormal weather

Everything has a URI Selected Lists and

Individual Bathing Waters

Lists and Individual Assessments In-Season or Annual

Compliance Vocabulary Terms Datasets (and subsets) Presented as:

HTML, (for people) JSON, XML, RDF and CSV

(for programs)

Data Platform and Applications

Web of Linked Data

http://environment.data.gov.uk/lab/bwq-os.html

Outcomes bathing water quality information available

as both data set and set of web APIs updated weekly (in season)

third party applications to use and combine the data seed a web of environmental and location data

reference identifiers can be reused for related information URI patterns designed to be compatible with INSPIRE

Wrapping up

image: erika g. @ flickr.com

Lessons importance of reference identifiers developer accessibility

linked data API publish once, consume many ways importance of maintenance and QoS expectation reusable patterns:

reusable vocabularies - Data Cube, org ... URI patterns provenance – OMPV and specializations

incremental approach

Acknowledgements Alex Coley (Environment Agency)

for slides 17, 18, and for sponsoring the bathing water quality data publication

Stuart Williams developer of the bathing water application and slides 19,27,28

John Sheridan (The National Archive) for sponsoring the development of data cube

Richard Cyganiak, Jeni Tennison co-developers of the data cube vocabulary

fin.

image: Christian Haugen @ flickr.com

fin.

Spare

Linked data principles Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful

information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover

more things

Pattern of application of semantic web stack

Linked open data cloud: 2007

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Linked open data cloud: 2009

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Linked open data cloud: 2010

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Accessing all this data link following

HTTP GET, follow links, aggregate relevant statements query

SPARQL

rdfs:labelont:districtAdministrative

SPARQL core idea is pattern matching

graph patterns with variables any subgraph which matches yields row of bindings

syntax based on Turtle syntax for RDF web API endpoints lots of power

?school

[ ] “Cardiff”

filters optionals named graphs

sub-queries property chains aggregation

federated query update construct

Accessing all this data link following

HTTP GET, follow links, aggregate relevant statements query

SPARQL linked data API

RESTful API onto linked data resources simple query, usable without RDF stack, web dev friendly easy to layer visualizations and UIs on top

third parties search engines and aggregators e.g. Sindice, sameAs.org

Semantic web layer cake

Data.gov.ukvisualizations on top of linked data

Data.gov.uk – linked datasets and APIs