Using linked data for dataset publication

44
Semantic web and linked data for data set publication Dave Reynolds, Epimorphics Ltd @der42

description

Presentation to Liber IDCC workshop on metadata for dataset reuse.

Transcript of Using linked data for dataset publication

Page 1: Using linked data for dataset publication

Semantic web and linked data for data set publication

Dave Reynolds, Epimorphics Ltd@der42

Page 2: Using linked data for dataset publication

Outline Background on linked data Roles in data set publishing Case study: Environment Agency Lessons

Page 3: Using linked data for dataset publication

Linked data background

Page 4: Using linked data for dataset publication

Linked data ...

publishing data on the web ...

... to enable integration, linking and reuse across silos

Page 5: Using linked data for dataset publication

Linked dataApply the principles to the web to publication of dataThe linked data web:

is a global network of things each identified by a URI fetching a URI gives a set of statements things connected by typed links open, anyone can say anything about anything else

Linked data is “data you can click on”

in RDF

Page 6: Using linked data for dataset publication

Example schools informationhttp://education.data.gov.uk/id/school/

401874

Page 7: Using linked data for dataset publication

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”“Secondary”

“Cardiff”

label phasedistrict

Schoola

Page 8: Using linked data for dataset publication

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”

phasedistrict

http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label

school:PhaseOfEducation_Secondary

label

aschool:Schoo

l

Page 9: Using linked data for dataset publication

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”

school:phase

school:district

http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”rdfs:label

school:PhaseOfEducation_Secondary

rdfs:label

rdf:type school:School

Page 10: Using linked data for dataset publication

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”

school:phase

school:district

http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label

school:PhaseOfEducation_Secondary

rdfs:label

rdf:type school:School

http://data.ordnancesurvey.co.uk/id/7000000000025484

admingeo:wardspatial:extent

admingeo:parishGML: 310499.4 184176.6

310476.5 ...

Page 11: Using linked data for dataset publication

Example schools informationhttp://education.data.gov.uk/id/school/

401874

“Cardiff High School”

school:phase

school:district

http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label

school:PhaseOfEducation_Secondary

rdfs:label

rdf:type school:School

http://data.ordnancesurvey.co.uk/id/7000000000025484

admingeo:wardspatial:extent

admingeo:parishGML: 310499.4 184176.6

310476.5 ...

owl:sameAs

Page 12: Using linked data for dataset publication
Page 13: Using linked data for dataset publication

Role in data set publication well suited to describing things

schools, companies, animal species, music tracks, tv programmes ...

what about datasets? environmental measurements, experimental results, statistical analyses ...

Page 14: Using linked data for dataset publication

Approach 1 : Data catalogues treat the dataset as a single resource, identify with a URI provide metadata as linked data

descriptive categorical technical and structural

Benefits? separate of metadata from resource & repository easy aggregation of metadata into catalogues schema-less enables use-specific annotations and links use of sharable category schemes and reference data

=> support for discovery

Page 15: Using linked data for dataset publication

Approach 2 : Fine grain publication publish the data set itself as linked data

entities, terms, individual records in data identified by URIs data set structure and ontologies linked from data still include dataset metadata

Benefits? all benefits of approach 1 to support discovery self-describing data slices addressable (trace back, provenance, annotation) integration across sets - reuse of terms for dimensions, units, values fine grained access

=> integration, comparison, context, data as a service

Page 16: Using linked data for dataset publication
Page 17: Using linked data for dataset publication

bathing water quality

what we do...

Press interest

start of season

20-22 samples in 22weeks

annual reportNovember

bath

ing se

aso

n

15th May

30th Sept

December

what informationis relevant to the publicabout beaches

what we do

Page 18: Using linked data for dataset publication

how linkable data helps

Photo by Skellig2008 (flickr)

Tenby Tourist Information Centre Unit 2 , The Gateway Complex Tenby. Wales , SA70 7LT Tel: 01834 842 402Fax: 01834 845 439Email: [email protected]

Page 19: Using linked data for dataset publication

BathingWaters

SamplingPoints

Assessments

Vocabularies

e.g. http://location.data.gov.uk/def/ef/SampingPoint

Zones Of Influence

URI SetReference Data

BathingWaters

SamplingPoints

e.g. http://location.data.gov.uk/so/ef/SamplingPoint/bwsp.eaew

Zone Of Influence

Assessment

ObservationDatasets

http://environment.data.gov.uk/data/bathing-water-quality

AnnualComplian

ce

In-seasonWeekly

Assessment

void:subsetvoid:subset

.../in-season.../compliance

Publishing the Bathing Water Quality data set

Page 20: Using linked data for dataset publication

Data cube vocabulary collaborative development

sponsored by data.gov.uk simple, flexible vocabulary mirrors core information models from:

SDMX (Statistical Data and Metadata eXchange) DDI (Data Documentation Initiative)

extension to SCOVO vocabulary

image: dullhunk @ flickr

Page 21: Using linked data for dataset publication

Data cube modelA set of observations indexed by dimensions describing measures interpreted according to attributes

dimension(e.g. time)

dim

ensi

on

(e.g

. re

gio

n)

• population = 32,567

measure(s)

unit of measure = countstatus = preliminary...

attributes

Page 22: Using linked data for dataset publication

Data cube vocabulary1. Top level DataSet

provenance and metadata structure

dimension valuesmeasure value(s)attribute values

qb:component

qb:DataSet

qb:Slice

qb:slice

qb:Observation

qb:observation

qb:dataset

qb:structure

qb:SliceKey

qb:sliceStructure

qb:DataStructureDefinition

qb:sliceKey

qb:subSlice

Page 23: Using linked data for dataset publication

Data cube vocabulary1. Top level DataSet

provenance and metadata structure

Observation measured values, at dimensions

with attributes direct link to DataSet

dimension valuesmeasure value(s)attribute values

qb:component

qb:DataSet

qb:Slice

qb:slice

qb:Observation

qb:observation

qb:dataset

qb:structure

qb:SliceKey

qb:sliceStructure

qb:DataStructureDefinition

qb:sliceKey

qb:subSlice

Page 24: Using linked data for dataset publication

Data cube vocabulary1. Top level DataSet

provenance and metadata structure

Observation measured values, at dimensions

with attributes direct link to DataSet

Slice optional grouping by fixing

dimensions guide to presentation allows for abbreviated data

dimension valuesmeasure value(s)attribute values

qb:component

qb:DataSet

qb:Slice

qb:slice

qb:Observation

qb:observation

qb:dataset

qb:structure

qb:SliceKey

qb:sliceStructure

qb:DataStructureDefinition

qb:sliceKey

qb:subSlice

Page 25: Using linked data for dataset publication

Data cube vocabulary2. Data Structure Definition explicit definition of cube

structure, inline in the data enables

validation visualization discovery abbreviation

qb:ComponentSpecification

qb:DataStructureDefinition

qb:DataSetqb:structure

qb:component

qb:dimension

qb:measure

qb:attribute

qb:componentRequired qb:componentAttachment qb:order

Page 26: Using linked data for dataset publication

Bathing Water Quality cubes measures

total coliform count, entero virus count, ... sample classification

dimensions sampling point sampling week sampling year

attributes abnormal weather

Page 27: Using linked data for dataset publication

Everything has a URI Selected Lists and

Individual Bathing Waters

Lists and Individual Assessments In-Season or Annual

Compliance Vocabulary Terms Datasets (and subsets) Presented as:

HTML, (for people) JSON, XML, RDF and CSV

(for programs)

Page 28: Using linked data for dataset publication

Data Platform and Applications

Web of Linked Data

http://environment.data.gov.uk/lab/bwq-os.html

Page 29: Using linked data for dataset publication

Outcomes bathing water quality information available

as both data set and set of web APIs updated weekly (in season)

third party applications to use and combine the data seed a web of environmental and location data

reference identifiers can be reused for related information URI patterns designed to be compatible with INSPIRE

Page 30: Using linked data for dataset publication

Wrapping up

image: erika g. @ flickr.com

Page 31: Using linked data for dataset publication

Lessons importance of reference identifiers developer accessibility

linked data API publish once, consume many ways importance of maintenance and QoS expectation reusable patterns:

reusable vocabularies - Data Cube, org ... URI patterns provenance – OMPV and specializations

incremental approach

Page 32: Using linked data for dataset publication

Acknowledgements Alex Coley (Environment Agency)

for slides 17, 18, and for sponsoring the bathing water quality data publication

Stuart Williams developer of the bathing water application and slides 19,27,28

John Sheridan (The National Archive) for sponsoring the development of data cube

Richard Cyganiak, Jeni Tennison co-developers of the data cube vocabulary

Page 33: Using linked data for dataset publication

fin.

image: Christian Haugen @ flickr.com

fin.

Page 34: Using linked data for dataset publication

Spare

Page 35: Using linked data for dataset publication

Linked data principles Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful

information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover

more things

Pattern of application of semantic web stack

Page 36: Using linked data for dataset publication

Linked open data cloud: 2007

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 37: Using linked data for dataset publication

Linked open data cloud: 2009

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 38: Using linked data for dataset publication

Linked open data cloud: 2010

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 39: Using linked data for dataset publication

Accessing all this data link following

HTTP GET, follow links, aggregate relevant statements query

SPARQL

Page 40: Using linked data for dataset publication

rdfs:labelont:districtAdministrative

SPARQL core idea is pattern matching

graph patterns with variables any subgraph which matches yields row of bindings

syntax based on Turtle syntax for RDF web API endpoints lots of power

?school

[ ] “Cardiff”

filters optionals named graphs

sub-queries property chains aggregation

federated query update construct

Page 41: Using linked data for dataset publication

Accessing all this data link following

HTTP GET, follow links, aggregate relevant statements query

SPARQL linked data API

RESTful API onto linked data resources simple query, usable without RDF stack, web dev friendly easy to layer visualizations and UIs on top

third parties search engines and aggregators e.g. Sindice, sameAs.org

Page 42: Using linked data for dataset publication

Semantic web layer cake

Page 43: Using linked data for dataset publication

Data.gov.ukvisualizations on top of linked data

Page 44: Using linked data for dataset publication

Data.gov.uk – linked datasets and APIs