Linked Open Government Data in UK

71
John Sheridan @johnlsheridan 18 January 2012 Linked Data

description

Linked DataJohn Sheridan,National Archives UK REEEP Open Data Workshop, Abu Dhabi, UAE18 Jan 2011

Transcript of Linked Open Government Data in UK

Page 1: Linked Open Government Data in UK

John Sheridan

@johnlsheridan

18 January 2012

Linked Data

Page 2: Linked Open Government Data in UK

“We shape our tools and they in turn shape us”

Marshall McLuhan

2

Page 3: Linked Open Government Data in UK

The Wealth of Networks

“Different technologies make different kinds of human action and interaction easier or harder to perform. All other things being equal, things that are easier to do are more likely to be done and things that are harder to do are less likely to be done.

All other things are *never* equal.

That is why technological determinism in the strict sense–if you have technology “t” you should expect social structure or relation “s” to emerge–is false…Neither deterministic nor wholly malleable, technology sets some parameters of individual and social action. It can make some actions, relationships, organizations and institutions easier to pursue, and others harder…

The same technologies of networked computers can be adopted in very different patterns. There is no guarantee that networked information technology will lead to the improvements in innovation, freedom and justice that I suggest are possible…The way we develop will, in significant measure, depend on choices we make in the next decade or so.”

– Yochai Benkler, The Wealth of Networks

Page 4: Linked Open Government Data in UK

Information economics and data

• Better informed markets operate more efficiently• Governments are making more data available on the web• We are at the beginning of an age of data abundance• Large scale data aggregation is now possible

4

Page 5: Linked Open Government Data in UK

Interoperability with the world?

• [DN: insert picture of globe]

5

Page 6: Linked Open Government Data in UK

UK POLICY CONTEXT

6

Page 7: Linked Open Government Data in UK

Transparency and data.gov.uk

7

Page 8: Linked Open Government Data in UK

Commitments

8

Page 9: Linked Open Government Data in UK

Which says…

16. GOVERNMENT TRANSPARENCY

The Government believes that we need to throw open the doors of public bodies, to enable the public to hold politicians and public bodies to account. We also recognise that this will help to deliver better value for money in public spending, and help us achieve our aim of cutting the record deficit. Setting government data free will bring significant economic benefits by enabling businesses and non-profit organisations to build innovative applications and websites.

We will ensure that all data published by public bodies is published in an open and standardised format, so that it can be used easily and with minimal cost by third parties.

9

Page 10: Linked Open Government Data in UK

Open Data Policy in the UK

• Open by default• Open Government Licence• Seeking to address substantial policy issues through the

use of open data• Health and Transport data are at the forefront of this drive• Consultation in Autumn 2011, White Paper early this year

10

Page 11: Linked Open Government Data in UK

CHOICES

11

Page 12: Linked Open Government Data in UK

Choosing formats for data

12

Formats for people Focused on presentation or

typographic layout Look good, but hard to

access the underlying data

Formats for machines Focused on data interchange

between computers Look dreadful, hard for

people to understand but easy to import into other systems and use

Page 13: Linked Open Government Data in UK

A false dichotomy

13

Single source of

data

Formats for people Focused on

presentation or typographic layout

Formats for machines Focused on data

interchange between computers

Page 14: Linked Open Government Data in UK

Download or programmatic access?

• Downloado Good for static informationo Small fileso Used for export/importo Easy for publisherso Most of the data registered on data.gov.uk

• Programmatic accesso Good for dynamic or real-time information or very large datasetso Lets developers select and use just the information they needo Retains more control for the publishero More complicated to implement but much more powerfulo Vital for many useful datasets

14

Page 15: Linked Open Government Data in UK

STANDARDS

15

Page 16: Linked Open Government Data in UK

He also developed the first industrially practical screw-cutting lathe in 1800, allowing standardisation of screw thread sizes for the first time. This allowed the concept of interchangeability (a idea that was already taking hold) to be practically applied to nuts and bolts. Before this, all nuts and bolts had to be made as matching pairs only. This meant that when machines were disassembled, careful account had to be kept of the matching nuts and bolts ready for when reassembly took place.

http://en.wikipedia.org/wiki/Henry_Maudslay

Henry Maudslay (1771–1831)

Page 17: Linked Open Government Data in UK

In 1841, Joseph Whitworth created a design that, through its adoption by many British railroad companies, became a national standard for the United Kingdom called British Standard Whitworth. During the 1840s through 1860s, this standard was often used in the United States and Canada as well, in addition to myriad intra- and inter-company standards. .

http://en.wikipedia.org/wiki/Screw_thread#History_of_standardization

Joseph Whitworth (1804-1887)

Page 18: Linked Open Government Data in UK

Tim Berners-Lee five stars

* make your stuff available on the Web (whatever format) under an open

licence

** make it available as structured data (e.g., Excel instead of image scan of a table)

*** use non-proprietary formats (e.g., CSV instead of Excel)

**** use URIs to identify things, so that people can point at your stuff

***** link your data to other data to provide context

18

Page 19: Linked Open Government Data in UK

LINKED DATA

19

Page 20: Linked Open Government Data in UK

Linked Data

• Give names, or web identifiers (URIs), to things• Publish information about them as Web Resources• Use RDF triples (subject, property, value)• Link to other data about those things

20

Page 21: Linked Open Government Data in UK

Benefits

• Enables web-scale data publishing - distributed publication with web-based discovery mechanisms

• Everything is a resource – follow your nose to discover more about properties, classes, or codes within a code list

• Everything can be annotated - make comments about observations, data series, points on a map

• Easy to extend - create new properties as required, no need to plan everything up-front

• Easy to merge - slot together RDF graphs, no need to worry about name clashes

21

Page 22: Linked Open Government Data in UK

You can do more with Linked Data

Page 23: Linked Open Government Data in UK

UK Government has been:

• developing standards for responsible publishing of key types of data (financial data, organisation data, aggregate statistics, location data)

• developing guidance, practices and tools that make it easy to publish data in Linked Data form, at low cost

• making it easy for people to consume data in a programmatic way

Page 24: Linked Open Government Data in UK

2008 2009 2010

A 1,345 1,456 2,301

B 2,112 3,543 2,111

C 2,345 2,987 2,455

D 6,342 6,256 6,123

E 7,435 7,432 8,102

Transaction Date Supplier Amount

A-1263 09/09/2010 Spottiswoode & Co £ 2,345

A-1264 09/09/2010 JSB & Sons £ 2,111

A-1265 09/09/2010 BLG Ltd £ 2,455

A-1266 09/09/2010 Spottiswoode & Co £ 6,123

A-1267 09/09/2010 BLG Ltd £ 8,102

Director General

Director (Operations)

Director (Strategy)

Deputy Director (A)

Deputy Director (A)

Types of data:

Page 25: Linked Open Government Data in UK

Naming things with URIs

• URI = uniform resource identifier• Everything starts HTTP – which gives us actionable names• There is choice about how to make URIs• We are using {sector}.data.gov.uk/id/{something}

25

Page 26: Linked Open Government Data in UK

Location URIs for INSPIRE

Page 27: Linked Open Government Data in UK

Naming things in legislation

Page 28: Linked Open Government Data in UK

Naming things in legislation

• If you visit legislation.gov.uk you will see we have taken great care with naming things

Returns an html document for United Kingdom Public General Act (ukpga), 2005, Chapter 14, Section 1

Returns an html document with a list from all legislation types where the title contains “wildlife”

Page 29: Linked Open Government Data in UK

Some names are quite sophisticated…

• UK Public General Act (ukpga)• 1981• Chapter 69• Section 5• As it extends to England• As it stood on 30th January 2001• Displayed as an HTML document with the timeline on• Although URIs are opaque having this type of design

changes how people use the service

29

Page 30: Linked Open Government Data in UK

Legislation as Open Data

• Everything on legislation.gov.uk is available as open data under the terms of our Open Government Licence

• To access the data, visit any page and add:o /data.xmlo /data.rdfo /data.xht

• For listso /data.feed

30

Page 31: Linked Open Government Data in UK

Linked Data Standards

• Re-use where we can, create where we must• Small, high level, light weight vocabularies

o Examples include datacube, organization, provenance• Create local specialisations

o Examples include payments, central-government• Post hoc linking

31

Page 32: Linked Open Government Data in UK

Data cube vocabulary

qb:ComponentSpecificationqb:componentRequired : booleanqb:componentAttachment : rdfs:Classqb:order : xsd:int

qb:ComponentProperty

qb:DimensionProperty

qb:AttributeProperty

qb:MeasureProperty

qb:CodedPropertysdmx:ConceptRole

skos:ConceptSchemeqb:codeList

qb:concept

qb:DataSet

qb:Slice

qb:slice

qb:Observation

qb:observation

qb:dataset

qb:structure

qb:SliceKey

qb:sliceStructure

qb:DataStructureDefinition

qb:sliceKey

sdmx:FrequencyRolesdmx:CountRolesdmx:EntityRolesdmx:TimeRole...

sdmx:Concept

sdmx:CodeList

qb:componentProperty

qb:measureTypeskos:Concept

qb:dimensionqb:attributeqb:measureqb:componentProperty

qb:subSlice

Page 33: Linked Open Government Data in UK

Payments (a cube specialisation)

33

qb:slice

PaymentDataset

Payment

ExpenditureLinePurchase

qb:dataset

foaf:Agent

payer

payee

payment

expenditureLine

interval:Intervaldate

skos:Concept

expenditureCode

amountIncludingVAT

amountExcludingVAT

vatCategory

vatRate

order

invoice

contract

transactionReference

paymentReference

totalAmountIncludingVAT

purchase

skos:Concept

narrative

ItemCategory

foaf:Agent

org:OrganizationalUnitunit

qb:structure

redacted

capital

revenue

procurementCategory

Item

skos:Concept

item totalAmountExcludingVAT

Page 34: Linked Open Government Data in UK

DATA

34

Page 35: Linked Open Government Data in UK

Reference data

http://reference.data.gov.uk/id/day/2012-01-18

http://reference.data.gov.uk/id/department/CO

http://transport.data.gov.uk/id/station/WAT

http://education.data.gov.uk/id/school/341451

http://location.data.gov.uk/id/3245677362123

http://www.legislation.gov.uk/id/ukpga/2009/12/section/2

Page 36: Linked Open Government Data in UK

British time intervals

• http://reference.data.gov.uk/id/day/2011-06-1• There are similar URIs for seconds, minutes, hours,

weeks, months, quarters, years• We were a bit slow (170 years) to move from the Julian

to Gregorian Calendar (see the Calendar Act, 1750)• To transition, we lost 11 days in 1752• Convoluted explanation of why the tax year in the UK

starts on the 6th April• Our URIs for time intervals work this way too and the

British time intervals URI Set is linked to the legislation

Page 37: Linked Open Government Data in UK

PRODUCTION

37

Page 38: Linked Open Government Data in UK

Chop-O-Matic

• Malcolm Gladwell article on Ron Popeil from 2000 in the New Yorker:

• ”And how do you persuade people to disrupt their lives? Not merely by ingratiation or sincerity, and not by being famous or beautiful. You have to explain the invention to consumers - not once or twice but three or four times, with a different twist each time. You have to show them exactly how it works and why it works, and make them follow your hands as you chop liver with it, and then tell them precisely how it fits into their routine, and, finally, sell them on the paradoxical fact that, revolutionary as the gadget is, it's not at all hard to use.”

Page 39: Linked Open Government Data in UK

Google Refine (formerly Gridworks)

39

Page 40: Linked Open Government Data in UK

Use Refine to map and export Linked Data

40

Page 41: Linked Open Government Data in UK

PUBLISHING

41

Page 42: Linked Open Government Data in UK

42

Page 43: Linked Open Government Data in UK

Linked Data API

• Open Standard• Generic approach for creating APIs from Linked Data• Sits on top of a Linked Data store• Several implementations, most mature is Puelia

43

Page 44: Linked Open Government Data in UK

44

Page 45: Linked Open Government Data in UK

45

Page 46: Linked Open Government Data in UK

CASE STUDIES

46

Page 47: Linked Open Government Data in UK

Back to those commitments

47

Page 48: Linked Open Government Data in UK

Publishing Organisation Data

• We will require public bodies to publish online the job titles of every member of staff and the salaries and expenses of senior officials paid more than the lowest salary permissible in Pay Band 1 of the Senior Civil Service pay scale, and organograms that include all positions in those bodies.

Page 49: Linked Open Government Data in UK

Our first go…

• October 2010• CSV template and PDFs of organograms, typically authored

using Powerpoint• Emphasis on visual appearance, led to inconsistent

datasets which are very hard to re-use• No relationship between the organogram and data• Not using web standards

49

Page 50: Linked Open Government Data in UK

Press Release

“The Government has published the most comprehensive organisational charts of the UK Civil Service ever released online, taking another step towards its goal of being the most transparent government in the world and opening up the structure of the Civil Service to public scrutiny”

Page 51: Linked Open Government Data in UK

It’s *all* Linked Data

• 100s of UK Government Organisations published their organisation data as Linked Data

• Distributed data publishing• The data is deeply linked (Departments, Grades ,

Professions, date of the snapshot)• Cross dataset queries are perhaps the most interesting• Proves Linked Data is moving from research topic to

commodity publishing• We can now extend this approach to other types of dataset

and link our transparency data

51

Page 52: Linked Open Government Data in UK

Our aims with Organogram Data

• Make it as simple as possible for people in Departments to create Linked Data

• Create high quality, consistent data that matches the policy intent and guidance

• Distributed capture and publishing• Create open data in open standards using open source tools• Human readable and machine readable from single source• Provide download and API access in different formats (CSV,

XML, JSON, RDF, HTML)• Evolutionary route to create longitudinal datasets,

reconciling against previous data• Enable everyone to publish 5 Star Linked Data

52

Page 53: Linked Open Government Data in UK

The process

• Capture organisation data using a spreadsheet, which verifies policy rules and datatypes

• Upload spreadsheet• Preview organogram• Download RDF and two CSVs• Publish on your website and register with data.gov.uk

53

Page 54: Linked Open Government Data in UK

The Excel bit…

• It’s the tool most Civil Servants have• This *does* also work in Libre Office / Open Office etc

54

Page 55: Linked Open Government Data in UK

55

Page 56: Linked Open Government Data in UK

56

Page 57: Linked Open Government Data in UK

57

Page 58: Linked Open Government Data in UK

5. CreateRDF

58

Organogram (PHP)

SesameRDF Store

Senior CSV

Junior CSV

XLWrap

TDB

Linked Data API

Mapping TRiG

Excel file

RDF fileAPI

Config

Organogram HTML, CSS &

JavaScript

1. Upload Excel

2. CreateCSVs

3. CreateMapping

4. Query (SPARQL)

6. LoadRDF

7. Query (SPARQL)

JSONXMLHTML

Reconciliation

Linked Data Publishing Infrastructure

Page 59: Linked Open Government Data in UK

Linked Data adds value

• Implicit properties are made explicit (person, role, person in a role)

• Reconciliation adds value by automatic linking to other data• Provenance• Example data• Explicit open licence

Page 60: Linked Open Government Data in UK

60

Page 61: Linked Open Government Data in UK
Page 62: Linked Open Government Data in UK

On the web, everything is a claim

• How did you come by this information?• What did you do with it?• When, who and how?

62

Page 63: Linked Open Government Data in UK

An opportunity

• We are developing a new system for publishing legislation, operating inside the government secure intranet / extranet

• We want to provide evidence that supports the data we are publishing

63

Page 64: Linked Open Government Data in UK

Legislation workflows

• Complicated and vary by jurisdiction and content type• We take documents in different formats (Word,

Framemaker) and convert them to a single format (XML)• We store XML documents in an XML Database• We take documents from a single format (XML) and

transform them to different formats (HTML and PDF)• Complex processes for handling images etc• Sometimes mistakes are made, which can be corrected

through a “Correction Slip”

64

Page 65: Linked Open Government Data in UK

Objectives for provenance with legislation

• Transparency and public trust - we substantiate our claim that this web page is what the legislation says

• The audit trail is repeatable• Performs automatic checks along the way and evidence

that checking• Use digital signatures rather than rely on the immutability of

paper, to ensure authenticity• Create a data source we can use to resolve any disputes

(where did that footnote go?)• Create a data source we can use to measure contractual

performance (how long did it take to publish that document?)

65

Page 66: Linked Open Government Data in UK

Our technology choices

• We use both XML and RDF• XML is brilliant for single source publishing solutions – one

source, many outputs• RDF provides a flexible data model for other types of

information (bibliographic metadata, but also things like which item of legislation has changed what)

• We are recording provenance in RDF using the Open Provenance Model Vocabulary

66

Page 67: Linked Open Government Data in UK

Open Provenance Model Vocabulary

67

Opmv:Artifact(k-1)

Opmv:Process

Opmv:wasGeneratedBy

Opmv:used

Opmv:Agent

Opmv:wasControlledBy

Opmv:wasPerformedBy

Opmv:Artifact(k)Document(k)

Opmv:Artifact(k-1)Opmv:Artifact

Document(k)Document

Opmv:Artifact(k)Opmv:Artifact

Page 68: Linked Open Government Data in UK

Provenance chain audit trail

68

Container1

Container2

Container3

Signature(c1)

Signature(c2)

Signature(c3)

<urn:uuid:6F677120-152C-11E1-8715-95963F5713B6> <http://w8www077254:9999/vsrs_api/bundle/2011-11-09/2/uksi/task/word-export-wml/1> a ns0:Process ; rdfs: "Word Export to WML1 Process" ; ns0:wasControlledBy <http://www.legislation.gov.uk/id/software/MsWord/2003> , <http://www.legislation.gov.uk/id/software/WordToClml/1.0> ; ns1:hasParentProcess <http://w8www077254:9999/vsrs_api/bundle/2011-11-09/2/uksi/task/word-to-xml> ; ns2:source <http://w8www077254:9999/vsrs_api/bundle/2011-11-09/2/uksi/data.doc> .}

<urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB> {<urn:uuid:6F677120-152C-11E1-8715-95963F5713B6> swp:assertedBy <urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB> ; swp:digest "N2U1ZGZhMzI3M2IzNmFjNDNlMmZkZTkyZTkwY2RlYWY4NmU5MDJiYw=="^^<http://www.w3.org/2001/XMLSchema#base64Binary> ; swp:digestMethod swp:JjcRdfC14N-sha1 .

<urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB> swp:assertedBy <urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB> ; swp:authority <http://www.tsoshop.co.uk> ; swp:signature "kWcf…6g=="^^<http://www.w3.org/2001/XMLSchema#base64Binary> ; swp:signatureMethod swp:JjcRdfC14N-rsa-sha1 . <http://www.tsoshop.co.uk> swp:X509Certificate "MIIG …. “ .}

Page 69: Linked Open Government Data in UK

Publishing provenance

• Provenance information may be associated by including a <link> element in the HTML <head> section:

<html xmlns="http://www.w3.org/1999/xhtml"> <head> <link rel="provenance" href="provenance-URI"> <link rel="anchor" href="entity-URI"> <title>Welcome to example.com</title> </head> <body> ... </body></html>

69

Page 70: Linked Open Government Data in UK

Summary

• Linked Data is essential to realising the promise of Open Government Data

• Using Linked Data means working ono Standardso Reference Datao Productiono Publishing

• Benefits grow with the more data you want to combine• Lots of opportunities for international collaboration• Best advice, just start

Page 71: Linked Open Government Data in UK

Questions?

71