Using the Data Cube vocabulary for Publishing Environmental Linked Data on...

35
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au Canberra Semantic Web meetup CSIRO COMPUTATIONAL INFORMATICS Laurent Lefort, Armin Haller

description

Canberra Semantic Web Meetup. Initiatives have been launched to develop semantic vocabularies representing statistical classifications and discovery metadata. Tools are also being created by statistical organizations to support the publication of dimensional data conforming to the Data Cube specification, now in Last Call at W3C. The meeting will be an opportunity to hear about two semantic Web and Linked Data initiatives for statistical data that are driven by the Australian Government. The Bureau of Meteorlogy and CSIRO have recently released a Linked Data version of the ACORN-SAT historical climate data at http://lab.environment.data.gov.au and the ABS has released the Census data modelled in the Data Cube vocabulary which is part of a challenge the ABS is organising in context of the SemStats Workshop (http://www.datalift.org/en/event/semstats2013/challenge) at the International Semantic Web Conference (ISWC) in Sydney (http://iswc2013.semanticweb.org). Come along to hear about these two projects, the challenges encountered and the solutions developed.

Transcript of Using the Data Cube vocabulary for Publishing Environmental Linked Data on...

Page 1: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.auCanberra Semantic Web meetup

CSIRO COMPUTATIONAL INFORMATICS

Laurent Lefort, Armin Haller

Page 2: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort

Outline

• ACORN-SAT Dataset• Building the Data Cube• Enriching ACORN-SAT Linked Data with Metadata• Published ACORN-SAT Linked Data

2 |

Page 3: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

The ACORN-SAT dataset

• Released by Aus. Bureau of Meteorology (23 March 2012)• Available at http://www.bom.gov.au/climate/change/acorn-sat/ • 112 stations in total - 60 from 1910 to 2011• Homogenised (adjusted) daily temperatures• Tabular format (1 file per time series/station)

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort3 |

Page 4: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

“Catalogue websites do notunlock the full potential of thecollected data and metadata”

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort4 |

Richard Cyganiak,

Page 5: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Limitations of ACORN-SAT in Tabular files

• Metadata fields are not documented• Querying across the catalog is difficult• Exploring the catalog through different facets

geographical/statistical/tabular is not possible• Bulk processing of the dataset or parts of it is not possible• Social annotations are not possible• Integrating the dataset within other datasets is difficult

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort5 |

Page 6: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

ACORN-SAT as Linked Data

Linked Data is a shift from publishing data in human readable HTML documents to machine readable documents.

Linked Data Principles:1. Use URIs as identifiers for Things

http://sws.geonames.org/2172517

2. Make them actionable→ http://www.geonames.org/2172517/canberra.html

3. Return information following standards→ http://sws.geonames.org/2172517/about.rdf

4. Link to other information objects<rdfs:seeAlso rdf:resource="http://dbpedia.org/resource/Canberra"/>

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort6 |

Page 7: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

ACORN-SAT as Linked Data

RDF Data Cube: a method to organise linked data in slices • A vocabulary published by the W3C

Government Linked Data (GLD) Working Group (Working Draft)• Also the method used to publish statistics data and environmental data in

Europe e.g. for Bathing Water Quality in UK http://www.epimorphics.com/web/projects/bathing-water-quality

Advantages• Allows multiple views on the same data (similar to OLAP)• Generic approach which supports the links to domain-specific definitions

Useable:• In any browser via Linked Data API (HTML output)• In JavaScript via Linked Data API (JSON output)• In R via SPARQL

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort7 |

Page 8: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

RDF Data Cube 101 - Slices and observations

Dimension d6

Dimension d7

Dimension d1

Dimension d2

Dimension d3

Dimension d4

Dimension d5

Measure m1, m2, …Attribute a1, a2, …

Cube

Slice

Observation

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort8 |

Page 9: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

RDF Data Cube 101 – Dataset, Slice, ObservationCube and Slice

qb:DataSet

qb:slice

qb:Observation

Cube observation

qb:observation

qb:subSlice

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort9 |

qb:Slice

qb:dataSet

void:subset

Page 10: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

RDF Data Cube 101 – Data Structure Definitions (DSDs)

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort10 |

http://sdmx.org/wp-content/uploads/2012/11/SDMX-Guidelines-for-the-Design-of-Data-Structure-Definitions.pdf RDF Data Cube model compatible with SDMX

Page 11: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

5 basic steps

• 1.Define the prefixes to be used • 2.Publish your schema

• Define the dimension(s) – used to identify the observations (ex. time, region), what the observation applies to

• Define the measure(s) – the phenomenon being observed • Define the attribute(s) - unit of measure • Define the DSD (attach components)

• 3.Publish your data • Define the Dataset (attach DSD) • Define Observations – the actual data

• 4.Include Slices (views) on your data • Define SliceKey(s) - the fixed dimensions • Define the DSD (attach SliceKey(s)) • Define the Dataset (attach Slices to be defined) • Define Slices and Observations

• 5.Select appropriate URIs

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort11 |

Page 12: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort

1. Prefixes

• PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>• PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#• PREFIX qb: <http://purl.org/linked-data/cube#>• PREFIX interval: <http://reference.data.gov.uk/def/intervals/>• PREFIX gn: <http://www.geonames.org/ontology#>• PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn#>• PREFIX acorn-sat:

<http://lab.environment.data.gov.au/def/acorn/sat/>• PREFIX acorn-series:

<http://lab.environment.data.gov.au/def/acorn/time-series/>

12 |

Page 13: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort

2. Define the schema

13 |

Dimension

Dimension

Dimension

Measure

Atttribute

Measure

Attribute

Measure

Attribute

Atttribute

Atttribute

Dimension

Page 14: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort

3. Define the Observations

14 |

Page 15: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

4. Define the slices

Observation

- MinTemperature- MaxTemperature- Rainfall

- Booleans for missing data

Day

(3) Month

(2) Year

(1) ACORN-SAT Series/System (station)

Current Data Cube structure (and URI/API logic)• Stations/time series

• Year• Month

• All linking to observations

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort15 |

Page 16: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Define the DSD

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort16 |

Page 17: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

5. Select appropriate URIs

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort17 |

Page 18: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort18 |

(extra) Statistics at slice levelTo port to DDI-RDF Discovery

Page 19: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

• Data describing the deployment history • Available in ACORN-SAT station catalogue (pdf)• Not available in tabular format distribution

• ACORN-SAT composite stations – composed of one or several BoM stations

• BoM (Bureau of Meteorology) stations – composed of one or several station sharing the same codes

• Textual description of significant events

• Data describing the detailed conditions of observations• Sensors• Deployment Intervals

… using Semantic Sensor Network (SSN) ontology• SSN-XG report http://www.w3.org/2005/Incubator/ssn/XGR-ssn/• SSN Ontology http://purl.oclc.org/NET/ssnx/ssn

Station metadata

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort19 |

Page 20: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

SSN: deployed systems and observations

Skeleton

Device

Deployment

PlatformSite

System

ssn:System

onPlatform

hasSubsystem

hasDeployment

ssn:DeploymentRelatedProcess

ssn:Deployment

deploymentProcesPartdeployedSystem

ssn:Platform

deployedOnPlatform

attachedSystem

ssn:Device

ssn:Sensor

ssn:SensingDevice

observes

inDeployment

observedBy

ssn:PropertyobservedProperty

ssn:Observation

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort20 |

Page 21: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Example (Darwin)Time series – Weather stations – Sites – (Sensors)

Darwin Post Office 014016 (1910-1942)

Darwin Airport014015 (1941-2007 & 2001-now)2 sites – 1km apart – same code used

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort21 |

Page 22: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Deployment phases in Darwin

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort22 |

Page 23: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Multiple Views on Data – Mashups

• Display the station locations and their average temperature readings on a map• http://lab.environment.data.gov.au/mashup/drilldown

• Select a Date range for climate readings for a given location• http://lab.environment.data.gov.au/mashup

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort23 |

Page 24: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Multiple Views on Data – ELDA Linked Data API

ssn:hasSubSystemssn:hasDeployment

ssn:deploymentProcessPartssn:observedBy

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort24 |

Page 25: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Multiple Views on Data – SPARQL

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort25 |

Page 26: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Multiple Views on Data – SPARQL

PREFIX cube: <http://purl.org/linked-data/cube#>PREFIX sat: <http://lab.environment.data.gov.au/def/acorn/sat/>

SELECT ?x, MAX(?max) AS ?MaxEver

WHERE { <http://lab.environment.data.gov.au/data/acorn/climate/slice/station/086071> cube:subSlice ?y . ?y cube:subSlice ?x .

?x sat:month ?z .?x cube:observation ?obs .?obs sat:maxTemperature ?max .FILTER regex(?z, "07")

}ORDER BY DESC(?max) LIMIT 1

RESULT:http://lab.environment.data.gov.au/data/acorn/climate/slice/station/086071/year/1975/month/07 23.3

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort26 |

Page 27: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort27 |

Page 28: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Wrap up

• Experimental version of ACORN-SAT data • Available at http://lab.environment.data.gov.au/ • Developed for the Australian Bureau of Meteorology (BOM) by CSIRO in cooperation with the Australian

Government Information Management Office (AGIMO)• Temperature (homogenised) plus Rainfall (not homogenised)

• First version presented at Australian GovHack Day• Alternative to tabular data

• Last version, uploaded to LOD cloud• http://thedatahub.org/dataset/acorn-sat

• Linked data (and well managed URIs) to build the bridges between the different agencies• Current linked data pilot is one agency (BoM) and one server but applies solutions and

schemes already in place in multi-agencies and multi-service providers context (e.g. UK)

• Thanks to AGIMO for helping us to set up http://lab.environment.data.gov.au/

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort28 |

Page 29: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Use It! http://michaelhalls.net/planforsun/index.php

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort29 |

Page 30: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Australian Government Linked Data Working Group (AGLDWG)

• Ad-hoc group established August 2012– BoM, OSP, CSIRO , AGIMO, DRALGAS, NAA, GA, ABS

• Terms of reference– Develop technical guidelines and best practice on the use of ‘linked-

data’ by AG agencies– Inform the development of data.gov.au as a platform for publishing

Commonwealth PSI– Promote the benefits and encourage adoption of ‘linked-data’ for

publishing Commonwealth PSI– Where appropriate, undertake specific activities and coordinate

projects in pursuit of these objectives• Seeking formal endorsement

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort30 |

Page 31: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Conclusions

• Approach is applicable to all climate time series • Opportunities to link to other datasets (Australia, World)

• Geo-features (e.g. GeoNames - done) for weather station sites, districts• Other climate data e.g. regional and world climate data archives, cyclone

tracks (not yet available as linked data)• Other environmental data (not yet available as linked data)

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort31 |

Page 32: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort

ISWC 2013

32 |

• The 12th International Semantic Web Conferenceand the 1st Australasian Semantic Web Conference21-25 October 2013, Sydney, Australia

• http://iswc2013.semanticweb.org/• https://twitter.com/iswc2013

• First International Workshop on Semantic Statistics (SemStats 2013)• SemStats 2013 Challenge

• Call for Papers • http://datalift.org/en/event/semstats2013/challenge-cfp• Data• http://datalift.org/en/event/semstats2013/challenge

Recommended by!

Page 33: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

CSIRO Computational InformaticsLaurent LefortOntologistt +61 2 9123 4567e [email protected] csiro.au

CSIRO COMPUTATIONAL INFORMATICS

Thank you

Page 34: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Images credits

• Blair Trewin The ACORN-SAT station at Butlers Gorge in central Tasmania (surfacetemperatures.blogspot.com.au )

• Nathanael Boehm

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort34 |

Page 35: Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au

Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort

More information

• Laurent Lefort, Josh Bobruk, Armin Haller, Kerry Taylor and Andrew Woolf A Linked Sensor Data Cube for a 100 Year Homogenised daily temperature dataset Proc. SSN 2012

35 |