Linked (Open) Data - But what does it buy me?

65
Linked (Open) Data But what does it buy me? Rinke Hoekstra VU University Amsterdam/University of Amsterdam [email protected] Linked (Open) Data - But what does it buy me? by Rinke Hoekstra Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License . maandag 11 maart 13

description

Pres

Transcript of Linked (Open) Data - But what does it buy me?

Linked (Open) DataBut what does it buy me?

Rinke HoekstraVU University Amsterdam/University of Amsterdam

[email protected]

Linked (Open) Data - But what does it buy me? by Rinke HoekstraLicensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

maandag 11 maart 13

maandag 11 maart 13

http://www.youtube.com/watch?v=ga1aSJXCFe0

maandag 11 maart 13

maandag 11 maart 13

http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide.html

maandag 11 maart 13

Linked Open Data

maandag 11 maart 13

Linked Open Data

Texts taken from http://5stardata.info

maandag 11 maart 13

Why people go “Meh”

• Data needs to be converted to RDF

• Data needs to be published on the Web

• An open license is required even for a single ★

Pacific Barreleye, http://imgur.com/gallery/Mzyb5(can rotate its eyes forwards or upwards to look through the transparent head to prey above)

maandag 11 maart 13

Why people go “Meh”

• Data needs to be converted to RDF

• Data needs to be published on the Web

• An open license is required even for a single ★

What if people draw incorrect conclusions from my data?

Pacific Barreleye, http://imgur.com/gallery/Mzyb5(can rotate its eyes forwards or upwards to look through the transparent head to prey above)

maandag 11 maart 13

Why people go “Meh”

• Data needs to be converted to RDF

• Data needs to be published on the Web

• An open license is required even for a single ★

What if people draw incorrect conclusions from my data?

What if journalists draw incorrect conclusions from my data?

Pacific Barreleye, http://imgur.com/gallery/Mzyb5(can rotate its eyes forwards or upwards to look through the transparent head to prey above)

maandag 11 maart 13

Why people go “Meh”

• Data needs to be converted to RDF

• Data needs to be published on the Web

• An open license is required even for a single ★

What if people draw incorrect conclusions from my data?

What if journalists draw incorrect conclusions from my data?

What if combining data results in privacy infringement?

Pacific Barreleye, http://imgur.com/gallery/Mzyb5(can rotate its eyes forwards or upwards to look through the transparent head to prey above)

maandag 11 maart 13

... but LOD is just asking for more!

maandag 11 maart 13

... how can I sell this internally?

maandag 11 maart 13

maandag 11 maart 13

Open DataLinked

maandag 11 maart 13

DataLinkedSix Ingredients

The missing ★

Mix ‘n MashContextualize!

Choose your Grain Size

Lower the Threshold

Repeatable Transformation

maandag 11 maart 13

1The missing ★

maandag 11 maart 13

1The missing ★

maandag 11 maart 13

1The missing ★

http://give.everything/a/URI

HTTPs URIs only please!(or resolver + URN)

Version information

Version agnostic

Guessable

maandag 11 maart 13

Messy Datahttp://wetten.overheid.nl/BWBIdService/BWBIdList.xml.zip

NB: The problem with the XML processing instruction was reported and fixed, but returned some weeks later

maandag 11 maart 13

Example: Juriconnect

• Existing identification standard: Juriconnect

• URN-like... but no naming servercf. Document Object Identifiers

• Named elements do not carry identifier

• No explicit version information, only contextual

1.0:c:BWBR0005416&artikel=6vs

http://wetten.overheid.nl/cgi-bin/deeplink/law1/bwbid=BWBR0005416/article=6/date=2005-01-14vs

http://wetten.overheid.nl/BWBR0005416/TitelII698946/HoofdstukII/Artikel16/geldigheidsdatum_14-01-2005

maandag 11 maart 13

Levels of Identification

• IFLA FRBR levels

• Work

• Expression

• Manifestation

Bibliographic Entity Work

Expression

Manifestation

Item

XML version of regulation

exemplifies

embodies

realizes

Version of regulation Regulation

XML version of regulation on my harddisk

maandag 11 maart 13

• Hierarchical information (work)

• Version and language (expression)

• Format information (manifestation)

http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1

http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01

http://doc.metalex.eu/doc/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01/data.xml

http://doc.metalex.eu/id/BWBR0011823/artikel/1

Transparent = Guessable

maandag 11 maart 13

Versioning Issues• URIs don’t carry semantics...

• Detect changes:

• which element versions are the same

• ... and which versions are different?

Art. 44, lid 4(2011-03-26)

Art. 44, lid 4(2011-04-05)

From: Besluit prudentiële regels Wft, BWBR0020420

maandag 11 maart 13

Opaque Identifiers

• Content information

• Unique SHA1 Hash of text

http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9

vermogen van de erflater

SWHoofdstuk I, Artikel 10

2011-01-01

dcterms:subject

SHA18738ef273ea4dbc73

owl:sameAs

SWHoofdstuk I, Artikel 10

2011-10-12

maandag 11 maart 13

Opaque Identifiers

• Content information

• Unique SHA1 Hash of text

http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9

vermogen van de erflater

SWHoofdstuk I, Artikel 10

2011-01-01

dcterms:subject

SHA18738ef273ea4dbc73

owl:sameAs

SWHoofdstuk I, Artikel 10

2011-10-12

owl:sameAs

maandag 11 maart 13

Opaque Identifiers

• Content information

• Unique SHA1 Hash of text

http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9

vermogen van de erflater

SWHoofdstuk I, Artikel 10

2011-01-01

dcterms:subject

SHA18738ef273ea4dbc73

owl:sameAs

SWHoofdstuk I, Artikel 10

2011-10-12

owl:sameAs

dcterms:subject

owl:sameAs

maandag 11 maart 13

Opaque Identifiers

• Content information

• Unique SHA1 Hash of text

http://doc.metalex.eu/BWBR0011823/hoofdstuk/1/artikel/34b0cee26ee5138c74aa2c62caf2c117d3c616e9

vermogen van de erflater

SWHoofdstuk I, Artikel 10

2011-01-01

dcterms:subject

SHA18738ef273ea4dbc73

owl:sameAs

SWHoofdstuk I, Artikel 10

2011-10-12

SHA1a433f53273c78a56f2

owl:sameAs

maandag 11 maart 13

Network Analysis

maandag 11 maart 13

2Repeatable Transformation

Transformation should be part of routine ...... manageable and scalable ...

... repeatable ...http://www.w3.org/TR/prov-overview/

maandag 11 maart 13

2Repeatable Transformation

Transformation should be part of routine ...... manageable and scalable ...

... repeatable ...

Linked Data will not be the official source anytime soon

http://www.w3.org/TR/prov-overview/

Provenance is key

maandag 11 maart 13

maandag 11 maart 13

LODStatshttp://stats.lod2.eu

maandag 11 maart 13

40.745.554.078 Triples!

maandag 11 maart 13

40.745.554.078 Triples!(1.6 Billion)

(I tried to check the latest figures, but http://stats.lod2.eu was down)

maandag 11 maart 13

3Choose your Grain Size

• The document is the traditional grain size(dublin core)

• Linked data allows for deep links into data

• Cost versus usefulness

• Are you the right party to provide detailed descriptions?

http://creatingandeducating.blogspot.nl/2011/11/blog-post.html

maandag 11 maart 13

RDF Report Card

Report Card Categories

RDF Report Card by Leigh Dodds, talk at Semtech Biz London, 2011, http://slideshare.net/ldodds

Report Card Categories

MetadataScope

StructureInternals

Low Detail High Detail

maandag 11 maart 13

4 Mix ‘n Mash

• Multiple vocabularies won’t bite

• Multiple identifiers won’t bite

• Choose what’s useful for you...

• ... then map to others!

Image © David Sykes 2009 All rights reserved

maandag 11 maart 13

4 Mix ‘n Mash

• Multiple vocabularies won’t bite

• Multiple identifiers won’t bite

• Choose what’s useful for you...

• ... then map to others!

Image © David Sykes 2009 All rights reserved

Good News: the bulk has already been done for you!

maandag 11 maart 13

Semantically-Interlinked Online Communities

maandag 11 maart 13

Semantically-Interlinked Online Communities

maandag 11 maart 13

Example: Provenance

http://doc.metalex.eu/id/BWBR0017869/2009-10-23

http://doc.metalex.eu/id/process/BWBR0017869/2009-10-23 http://doc.metalex.eu/id/event/BWBR0017869/2009-10-23

opmv:wasGeneratedByml:resultOf

http://doc.metalex.eu/id/date/2009-10-23

opmv:wasGeneratedAt

ml:date

ml:LegislativeModification

rdf:type

opmv:Process

rdf:type

"2009-10-23"^^xsd:date

time:inXSDDateTime

time:hasEnd

time:Instant

rdf:type

ml:Date

rdf:type

opmv:Artifact

rdf:type ml:BibliographicExpression

rdf:type

sem:Event

rdf:type

sem:eventType

sem:hasTime

sem:Time

rdf:typesem:timeTypesem:hasTimeStamp

The expression (version) URI of a regulation

The process that generated the expression

The date at which the expression was created

rdf:value

The creation event of the regulation

maandag 11 maart 13

5• Information is not always compatible

• Make explicit in which context the information holds ...

• ... and who stated the information, why and how.

Contextualize!

Flat Earth and Square Earth idea courtesy of Szymon Klarman

maandag 11 maart 13

• Namespaces don’t mean anything

• Use named graphs to compartmentalize metadata

• Add provenance information about groups of statements

<http://example.com/workbook1/sheet1/corrected><http://example.com/workbook1/sheet1>

:curation20120126

provo:wasGeneratedBy

provo:Activity

:RinkeHoekstra

_:a_:b

rdf:type

provo:hadAgent

provo:endedAtprovo:startedAt

"20120126T09:00:00" "20120126T08:30:00"

time:inXSDDateTime time:inXSDDateTime

_:x

:14--15_1875--1874

d2s:dimension

"11"^^xsd:int

d2s:populationSize

"1"^^xsd:int

d2s:populationSize

:14-15

d2s:ageGroup

:1875--1874d2s:birthYears

"1889"^^xsd:intd2s:censusYear

:Assendelft

d2s:gemeente

maandag 11 maart 13

Compliance

Regulation A Art 12 Art 14, lid 3, 2e volzin

maandag 11 maart 13

Compliance

Regulation A Art 12 Art 14, lid 3, 2e volzin

start

State Nameentry/actiondo/activityexit/actionevent/action(arguments)

Stateaction

end

maandag 11 maart 13

Compliance

Regulation A Art 12 Art 14, lid 3, 2e volzin

start

State Nameentry/actiondo/activityexit/actionevent/action(arguments)

Stateaction

end

maandag 11 maart 13

Compliance

Regulation A Art 12 Art 14, lid 3, 2e volzin

start

State Nameentry/actiondo/activityexit/actionevent/action(arguments)

Stateaction

end

maandag 11 maart 13

Compliance

Regulation A Art 12 Art 14, lid 3, 2e volzin

start

State Nameentry/actiondo/activityexit/actionevent/action(arguments)

Stateaction

end

maandag 11 maart 13

Compliance

Regulation A Art 12 Art 14, lid 3, 2e volzin

start

State Nameentry/actiondo/activityexit/actionevent/action(arguments)

Stateaction

end

Art 14, lid 3, 2e volzin

maandag 11 maart 13

Compliancestart

State Nameentry/actiondo/activityexit/actionevent/action(arguments)

Stateaction

end

Regulation A(01-01-2011)

Art 12(04-02-2011)

Art 14, lid 3, 2e volzin(11-06-2008)

Art 14, lid 3, 2e volzin(01-07-2011)

maandag 11 maart 13

Contextual Annotation

vermogen van de erflater

Successiewetvermogen van de erflater

SW Hoofdstuk Ivermogen van de erflater

SW Artikel 10vermogen van de erflater

SW Art. 10, zin 1vermogen van de erflater

Successiewet

SWHoofdstuk I, Artikel 10

SWHoofdstuk I

SWHoofdstuk I, Artikel 10

Zin 1

dcterms:subject

dcterms:subject

dcterms:subject

dcterms:subject

No nice background because Google Image search only returned boring images

maandag 11 maart 13

6Lower the Threshold

• Integrate Linked Data production into everyday tools

• Allow tools to do the work for you

• Use a built-in reward model

Image courtesy of http://themaisonette.net

maandag 11 maart 13

6Lower the Threshold

• Integrate Linked Data production into everyday tools

• Allow tools to do the work for you

• Use a built-in reward model

Image courtesy of http://themaisonette.net

Linked Data allows you to trace usage!

maandag 11 maart 13

Wrap Legacy Systems

http://www.w3.org/TR/r2rml/

maandag 11 maart 13

maandag 11 maart 13

Idea: use reward mechanisms of Web 2.0

maandag 11 maart 13

• Lightweight Web Application

• Interface to API of existing data repositories

• Enrich metadata by linking to Linked Data resources

• Provide annotation services for data files

• Plugin based architecture

• Publish RDF metadata as new data publicationhttp://linkitup.data2semantics.org

maandag 11 maart 13

recoprovReconstruct provenance using

Dropbox file edit history

0

1

8

9

12

13

16 22

2

4

7

11

17

19

3

5

6

14

23

10 15

18

21

20

24

Sara Magliacane and Paul Grothmaandag 11 maart 13

plsheetHow are results calculated (1)? Automatic analyis of workflow in spreadsheets

Analyse dependencies between cells in complex spreadsheets

Martine de Vos, Jan Wielemaker and Willem van Hagemaandag 11 maart 13

plsheet

Reconstruct and explain the workflow of computations

Martine de Vos, Jan Wielemaker and Willem van Hagemaandag 11 maart 13

Albert Merono-Penuela, Rinke Hoekstra, Laurens Rietveld, Christophe Gueret

TabLinker

http://www.cedar-project.nl

Semi-automatic RDF converter for eccentric spreadsheets

maandag 11 maart 13

Albert Merono-Penuela, Rinke Hoekstra, Laurens Rietveld, Christophe Gueret

TabLinker

http://www.cedar-project.nl

Semi-automatic RDF converter for eccentric spreadsheets

maandag 11 maart 13

DataLinkedSix Ingredients

The missing ★

Mix ‘n MashContextualize!

Choose your Grain Size

Lower the Threshold

Repeatable Transformation

maandag 11 maart 13

Open DataLinkedThe missing ★

Mix ‘n MashContextualize!

Choose your Grain Size

Lower the Threshold

Repeatable Transformation

... be sure to use it internally too!

maandag 11 maart 13