Using linked data for dataset publication
-
Upload
dave-reynolds -
Category
Technology
-
view
606 -
download
1
description
Transcript of Using linked data for dataset publication
Semantic web and linked data for data set publication
Dave Reynolds, Epimorphics Ltd@der42
Outline Background on linked data Roles in data set publishing Case study: Environment Agency Lessons
Linked data background
Linked data ...
publishing data on the web ...
... to enable integration, linking and reuse across silos
Linked dataApply the principles to the web to publication of dataThe linked data web:
is a global network of things each identified by a URI fetching a URI gives a set of statements things connected by typed links open, anyone can say anything about anything else
Linked data is “data you can click on”
in RDF
Example schools informationhttp://education.data.gov.uk/id/school/
401874
Example schools informationhttp://education.data.gov.uk/id/school/
401874
“Cardiff High School”“Secondary”
“Cardiff”
label phasedistrict
Schoola
Example schools informationhttp://education.data.gov.uk/id/school/
401874
“Cardiff High School”
phasedistrict
http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label
school:PhaseOfEducation_Secondary
label
aschool:Schoo
l
Example schools informationhttp://education.data.gov.uk/id/school/
401874
“Cardiff High School”
school:phase
school:district
http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”rdfs:label
school:PhaseOfEducation_Secondary
rdfs:label
rdf:type school:School
Example schools informationhttp://education.data.gov.uk/id/school/
401874
“Cardiff High School”
school:phase
school:district
http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label
school:PhaseOfEducation_Secondary
rdfs:label
rdf:type school:School
http://data.ordnancesurvey.co.uk/id/7000000000025484
admingeo:wardspatial:extent
admingeo:parishGML: 310499.4 184176.6
310476.5 ...
Example schools informationhttp://education.data.gov.uk/id/school/
401874
“Cardiff High School”
school:phase
school:district
http://statistics.data.gov.uk/id/local-authority-district/00PT “Cardiff”label
school:PhaseOfEducation_Secondary
rdfs:label
rdf:type school:School
http://data.ordnancesurvey.co.uk/id/7000000000025484
admingeo:wardspatial:extent
admingeo:parishGML: 310499.4 184176.6
310476.5 ...
owl:sameAs
Role in data set publication well suited to describing things
schools, companies, animal species, music tracks, tv programmes ...
what about datasets? environmental measurements, experimental results, statistical analyses ...
Approach 1 : Data catalogues treat the dataset as a single resource, identify with a URI provide metadata as linked data
descriptive categorical technical and structural
Benefits? separate of metadata from resource & repository easy aggregation of metadata into catalogues schema-less enables use-specific annotations and links use of sharable category schemes and reference data
=> support for discovery
Approach 2 : Fine grain publication publish the data set itself as linked data
entities, terms, individual records in data identified by URIs data set structure and ontologies linked from data still include dataset metadata
Benefits? all benefits of approach 1 to support discovery self-describing data slices addressable (trace back, provenance, annotation) integration across sets - reuse of terms for dimensions, units, values fine grained access
=> integration, comparison, context, data as a service
bathing water quality
what we do...
Press interest
start of season
20-22 samples in 22weeks
annual reportNovember
bath
ing se
aso
n
15th May
30th Sept
December
what informationis relevant to the publicabout beaches
what we do
how linkable data helps
Photo by Skellig2008 (flickr)
Tenby Tourist Information Centre Unit 2 , The Gateway Complex Tenby. Wales , SA70 7LT Tel: 01834 842 402Fax: 01834 845 439Email: [email protected]
BathingWaters
SamplingPoints
Assessments
Vocabularies
e.g. http://location.data.gov.uk/def/ef/SampingPoint
Zones Of Influence
URI SetReference Data
BathingWaters
SamplingPoints
e.g. http://location.data.gov.uk/so/ef/SamplingPoint/bwsp.eaew
Zone Of Influence
Assessment
ObservationDatasets
http://environment.data.gov.uk/data/bathing-water-quality
AnnualComplian
ce
In-seasonWeekly
Assessment
void:subsetvoid:subset
.../in-season.../compliance
Publishing the Bathing Water Quality data set
Data cube vocabulary collaborative development
sponsored by data.gov.uk simple, flexible vocabulary mirrors core information models from:
SDMX (Statistical Data and Metadata eXchange) DDI (Data Documentation Initiative)
extension to SCOVO vocabulary
image: dullhunk @ flickr
Data cube modelA set of observations indexed by dimensions describing measures interpreted according to attributes
dimension(e.g. time)
dim
ensi
on
(e.g
. re
gio
n)
• population = 32,567
measure(s)
unit of measure = countstatus = preliminary...
attributes
Data cube vocabulary1. Top level DataSet
provenance and metadata structure
dimension valuesmeasure value(s)attribute values
qb:component
qb:DataSet
qb:Slice
qb:slice
qb:Observation
qb:observation
qb:dataset
qb:structure
qb:SliceKey
qb:sliceStructure
qb:DataStructureDefinition
qb:sliceKey
qb:subSlice
Data cube vocabulary1. Top level DataSet
provenance and metadata structure
Observation measured values, at dimensions
with attributes direct link to DataSet
dimension valuesmeasure value(s)attribute values
qb:component
qb:DataSet
qb:Slice
qb:slice
qb:Observation
qb:observation
qb:dataset
qb:structure
qb:SliceKey
qb:sliceStructure
qb:DataStructureDefinition
qb:sliceKey
qb:subSlice
Data cube vocabulary1. Top level DataSet
provenance and metadata structure
Observation measured values, at dimensions
with attributes direct link to DataSet
Slice optional grouping by fixing
dimensions guide to presentation allows for abbreviated data
dimension valuesmeasure value(s)attribute values
qb:component
qb:DataSet
qb:Slice
qb:slice
qb:Observation
qb:observation
qb:dataset
qb:structure
qb:SliceKey
qb:sliceStructure
qb:DataStructureDefinition
qb:sliceKey
qb:subSlice
Data cube vocabulary2. Data Structure Definition explicit definition of cube
structure, inline in the data enables
validation visualization discovery abbreviation
qb:ComponentSpecification
qb:DataStructureDefinition
qb:DataSetqb:structure
qb:component
qb:dimension
qb:measure
qb:attribute
qb:componentRequired qb:componentAttachment qb:order
Bathing Water Quality cubes measures
total coliform count, entero virus count, ... sample classification
dimensions sampling point sampling week sampling year
attributes abnormal weather
Everything has a URI Selected Lists and
Individual Bathing Waters
Lists and Individual Assessments In-Season or Annual
Compliance Vocabulary Terms Datasets (and subsets) Presented as:
HTML, (for people) JSON, XML, RDF and CSV
(for programs)
Data Platform and Applications
Web of Linked Data
http://environment.data.gov.uk/lab/bwq-os.html
Outcomes bathing water quality information available
as both data set and set of web APIs updated weekly (in season)
third party applications to use and combine the data seed a web of environmental and location data
reference identifiers can be reused for related information URI patterns designed to be compatible with INSPIRE
Wrapping up
image: erika g. @ flickr.com
Lessons importance of reference identifiers developer accessibility
linked data API publish once, consume many ways importance of maintenance and QoS expectation reusable patterns:
reusable vocabularies - Data Cube, org ... URI patterns provenance – OMPV and specializations
incremental approach
Acknowledgements Alex Coley (Environment Agency)
for slides 17, 18, and for sponsoring the bathing water quality data publication
Stuart Williams developer of the bathing water application and slides 19,27,28
John Sheridan (The National Archive) for sponsoring the development of data cube
Richard Cyganiak, Jeni Tennison co-developers of the data cube vocabulary
fin.
image: Christian Haugen @ flickr.com
fin.
Spare
Linked data principles Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover
more things
Pattern of application of semantic web stack
Linked open data cloud: 2007
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked open data cloud: 2009
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked open data cloud: 2010
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Accessing all this data link following
HTTP GET, follow links, aggregate relevant statements query
SPARQL
rdfs:labelont:districtAdministrative
SPARQL core idea is pattern matching
graph patterns with variables any subgraph which matches yields row of bindings
syntax based on Turtle syntax for RDF web API endpoints lots of power
?school
[ ] “Cardiff”
filters optionals named graphs
sub-queries property chains aggregation
federated query update construct
Accessing all this data link following
HTTP GET, follow links, aggregate relevant statements query
SPARQL linked data API
RESTful API onto linked data resources simple query, usable without RDF stack, web dev friendly easy to layer visualizations and UIs on top
third parties search engines and aggregators e.g. Sindice, sameAs.org
Semantic web layer cake
Data.gov.ukvisualizations on top of linked data
Data.gov.uk – linked datasets and APIs