Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all...

96
1 Introduction to Linked Data Diederik Tirry (SADL KU Leuven) [email protected]

Transcript of Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all...

Page 1: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

1

Introduction to

Linked Data

Diederik Tirry

(SADL KU Leuven)

[email protected]

Page 2: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

2 Modules

1. Introduction

2. From data to information to knowledge

3. Towards the semantic web

4. Linked Data basics

5. Publishing Linked Data

6. Linked Data usage

7. Linked Data vs. Open Data

Page 3: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

3 After the training you will be able to:

• Identify and describe the concepts of semantic web and linked data

• Identify and describe the buildings blocks of Linked Data

• Identify the different steps in publishing linked data

• Understand how linked data can be consumed

• Explain the difference between linked and open data

• Understand the benefits of linked data

Page 4: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

4 Target Audience

This seminar aims at :

• managers, ICT strategists and professionals that need a basic understanding of Linked Data.

Prior knowledge:

• no explicit pre-requisites are required.

Page 5: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

5

Part 1

Introduction

Page 7: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

7 Search for data - Sports

Page 8: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

8 Search for data – Emergency Response

Page 9: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

9 Search for data – Emergency Response

Page 10: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

10 Search for data – Emergency Response

Page 11: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

11 How to build such an application?

Site editors roam the Web for new facts Approach 1: • They update the site manually • And the site gets soon out-of-date

Approach 2: • “Scrape” the sites with a program to extract the information

i.e. write some code to incorporate the new data • Easily get out of date again… Approach 3: • Write some code to incorporate the new data via APIs • Easily get out of date again…

Page 12: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

12 How to build such a site?

Use external, public datasets • Wikipedia, MusicBrainz, …

They are available as data • not API-s or hidden on a Web site • data can be extracted using, e.g., HTTP requests or standard

queries

Page 13: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

13

Part 2

From data to information to knowledge

Page 14: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

14 Proper meaning of terms

Data, information & knowledge

Is there any distinction? What do these terms mean and how do they relate to each other?

Page 15: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

15 Data

Data are kind of raw material and can be considered as facts of the world.

Data can be a group of symbols, numbers, or writing.

Page 16: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

16 Information

When data has been processed, data become information. Information is basically a framework of data which have a useful meaning for someone who read it.

Trendanalyzer, Hans Rosling

Page 17: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

17 Knowledge

Knowledge is a new form of information which has been transformed into something that is triggering people to act. It is the understanding of rules needed to interpret information.

Using the previous example:

Hans Rosling could use his information to provide new insights on population growth

Page 18: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

18 Distribution of data

The web as it is today !

• Data is delivered to us in the form

of web pages - HTML with separate download links or web applications.

• Documents that are linked to each other through the use of hyperlinks.

• Humans or machines can read these documents, but machines have difficulty extracting any meaning from these documents themselves.

Page 19: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

19 Use of applications

Page 20: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

20 Data on the web

• There are more an more data on the Web

government data, health related data, general knowledge, company information, flight information, restaurants,…

• More and more applications rely on the availability of that data

Page 21: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

21 But… data are often in isolation, “silos”

Page 22: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

22

Part 3

Towards the semantic web

Page 23: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

23 Imagine…

A “Web” where • documents are available for download on the Internet • but there would be no hyperlinks among them

The problem is real !!!

Page 24: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

24 Data on the web is not

enough…

We need a proper infrastructure for a real Web of Data data is available on the Web accessible via standard Web technologies data are interlinked over the Web i.e. data can be integrated over the Web

This is where Semantic Web technologies come in !

Page 25: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

25 I.e.,… connect the silos

Page 26: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

26 Semantic web

The semantic web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a format that can be read and used by software agents, thus permitting them to find, share and integrate information more easily.

Page 27: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

27 Web of data

The Web of Data is about enabling the access to this data, by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs), thus enabling people and machines to collect the data, and put it together to do all kinds of things with it (permitted by the licence).

Machine-readable data (or metadata) is data in a format that can be interpreted by a computer.

2 types of machine-readable data:

• Human-readable data that is marked up so that it can also be understood by computers, e.g. microformats, RDFa;

• Data formats intended principally for computers, e.g. RDF, XML and JSON.

See also: http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html http://linkeddatabook.com/editions/1.0/

Page 28: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

28 Semantic web stack

Page 29: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

29 Linked Data

• “A way of making the Semantic Web happen“ (it is hoped)

• Key concept: leverage the existence of structured data and combine it with the languages and infrastructures of the Web and the Semantic Web

https://en.wikipedia.org/wiki/Marie_Curie

http://dbpedia.org/page/Marie_Curie

Page 30: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

30 Linked Data

Page 31: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

31 Linked Data

“Linked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations, business and citizens.”

The four design principles of Linked Data (by Tim Berners Lee):

1. Use Uniform Resource Identifiers (URIs) as names for things.

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL).

4. Include links to other URIs so that they can discover more things.

Page 32: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

32 The four principles in practice

Page 33: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

33 Linked Open Data Cloud

Page 34: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

34

Part 4

Linked data basics

Page 35: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

35 Core components

Semantic technologies:

• URIs for naming things

• RDF for modelling data

• SPARQL for querying

• OWL for modelling concepts or ontologies

Page 36: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

36

Linked data basics: URIs

Page 37: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

37 Uniform Resource Identifier (URI)

“A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource.”

• A person, e.g. Albert Einstein

http://dbpedia.org/resource/Albert_Einstein

• A country, e.g. Belgium

http://dbpedia.org/resource/Belgium

• A world heritage site, e.g. the Acropolis of Athens

http://dbpedia.org/resource/Acropolis_of_Athens

• A dataset, e.g. Fertility Indicators

http://open-data.europa.eu/en/data/dataset/ 03YMULVqadXL7IO6JZiBkQ

See also: http://www.slideshare.net/OpenDataSupport/design-and-manage-persitent-uris

BE

Page 38: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

38 Identify data items

pd:cygri

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Person

pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri

dbpedia:Berlin = http://dbpedia.org/resource/Berlin

From http://www.ai.sri.com/~nysmith/slides/aic-seminars/090724-bizer.ppt

Page 39: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

39 Resolving URIs over the web

dp:Cities_in_Germany

3.405.259 dp:population

skos:subject

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Person pd:cygri

From http://www.ai.sri.com/~nysmith/slides/aic-seminars/090724-bizer.ppt

Page 40: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

40 Dereferencing URIs over the web

dp:Cities_in_Germany

3.405.259 dp:population

skos:subject

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Person rdf:type

dbpedia:Hamburg

dbpedia:Muenchen

skos:subject

skos:subject

pd:cygri

From http://www.ai.sri.com/~nysmith/slides/aic-seminars/090724-bizer.ppt

Page 41: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

41

Linked data basics: RDF

Page 42: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

42 Resource Description framework (RDF and RDFS)

RDF stands for: – Resource: Everything that can

have a unique identifier (URI), e.g. pages, places, people, dogs, products...

– Description: attributes, features, and relations of the resources

– Framework: model, languages and syntaxes for these descriptions

• RDF was published as a W3C recommendation in 1999.

• RDF was originally introduced as a data model for metadata.

• RDF was generalised to cover knowledge of all kinds.

Page 43: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

43 Resource Description framework (RDF and RDFs)

• The model is domain-neutral, application-neutral

• The model can be viewed as directed, labeled graphs or as an object-oriented model (object/attribute/value)

RDF data model is an abstract, conceptual layer independent of XML

consequently, XML is a transfer syntax for RDF, not a component of RDF

RDF data might never occur in XML form

Page 44: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

44 Resource Description framework (RDF and RDFs)

RDF breaks every piece of information down in triples:

• Subject – a resource, which may be identified with a URI.

• Predicate – a URI-identified reused specification of the relationship.

• Object – a resource or literal to which the subject is related.

SPARQL is a standardised language for querying RDF data.

http://dbpedia.org/resource/Brussels is the capital of “Belgium”. OR

http://dbpedia.org/resource/Brussels is the capital of http://dbpedia.org/resource/Belgium.

Subject Predicate Object

See also: http://www.slideshare.net/OpenDataSupport/introduction-to-rdf-sparql

Page 45: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

45 RDF model example

http://www.w3.org/TR/REC-rdf-syntax/

“Ora Lassila”

dc:Creator

“1999-02-22”

dc:Date

“W3C”

dc:Publisher

Page 46: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

46 RDF model example

• Nike, Dahliastraat 24, 2160 Wommelgem

<rdf:RDF xmlns:rov=“http://www.w3.org/TR/vocab-regorg/ “ xmlns:org=“http://www.w3.org/TR/vocab-org/” xmlns:locn=“http://www.w3.org/ns/locn#” > <rov:RegisteredOrganization rdf:about=“http://example.com/org/2172798119”> <rov:legalName> “Nike”< /rov:legalName> <org:hasRegisteredSite rdf:resource=“http://example.com/site/1234”/> </rov:RegisteredOrganization> <locn:Address rdf:about=“http://example.com/site/1234”/> <locn:fullAddress>” Dahliastraat 24, 2160 Wommelgem”</locn:fullAddress> </locn:Address> </rdf:RDF>

Page 47: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

47 RDF serialization formats

• XML: is currently the only syntax that is standardised by W3C

• N3 (Notation 3): a non-XML serialization of RDF models designed to be easier to write by hand, and in some cases easier to follow.

• Turtle (Terse RDF Triple Language): a format for expressing data in the Resource Description Framework (RDF) data model with the syntax similar to SPARQL.

• N-triples: It is a line-based, plain text serialisation format for RDF (Resource Description Framework) graphs, and a subset of the Turtle (Terse RDF Triple Language) format

• JSON (JavaScript Object Notation): is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. New upcoming W3C recommendation: JSON-LD

Page 48: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

48

Linked data basics: SPARQL

Page 49: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

49 SPARQL

SPARQL is the standard language to query graph data represented as RDF triples.

• SPARQL Protocol and RDF Query Language

• One of the three core standards of the Semantic Web, along with RDF and OWL.

• SPARQL can be used to query and update RDF data.

• Became a W3C standard January 2008.

• SPARQL 1.1 now in Working Draft status.

Page 50: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

50 Query types

• SELECT: Return a table of all X, Y, etc. satisfying the following conditions ...

• CONSTRUCT: Find all X, Y, etc. satisfying the following conditions ... and substitute them into the following template in order to generate (possibly new) RDF statements, creating a new graph.

• DESCRIBE: Find all statements in the dataset that provide information about the following resource(s) ... (identified by name or description)

• ASK: Are there any X, Y, etc. satisfying the following conditions ...

Page 51: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

51 Query example

return the name of an organisation with particular URI

comp:A rov:haslegalName “Niké” . comp:A org:hasRegisteredSite site:1234 . Comp:B rov:haslegalName “BARCO” . site:1234 locn:fullAddress “Dahliastraat 24, 2160 Wommelgem . PREFIX comp: < http://example/org/org/> PREFIX org: < http://www.w3.org/TR/vocab-regorg/ > PREFIX site: <http://example.org/site/> PREFIX rov: <http://www.w3.org/TR/vocab-regorg/> SELECT ?name WHERE { ?x org:hasRegisteredSite site:1234 . ?x rov:haslegalName ?name .}

name

“Niké”

Sample data

Query

Result

Page 52: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

52

Linked data basics: RDF Schema

Page 53: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

53 RDF schema • First step towards the “extra knowledge”: terms,

restrictions, relationships….

• Defines small vocabulary for RDF:

o Class, subClassOf, type o Property, subPropertyOf o domain, range

• Vocabulary can be used to define other vocabularies for your application domain

Person

Student Researcher

subClassOf subClassOf

Jeen

type

hasSuperVisor domain range

Frank

type

hasSuperVisor

Page 54: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

54 RDF schema in XML

Page 55: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

55 RDFS constraints

• RDFS is a framework allowing:

– typing, subtyping

– properties to be put in a hierarchy

– datatypes can be defined

• RDFS is sufficient for many vocabularies, but not for all!

• Complex applications may want more possibilities. Can a program reason about some terms?

E.g.: “if «Person» resources «A» and «B» have the same «foaf:email» property, then «A» and «B» are identical”

Page 56: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

56

Linked data basics: OWL and ontologies

Page 57: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

57 OWL and ontologies

• OWL = Web Ontology Language

• An ontology is a conceptual model.

• An Ontology is the collection of semantic definitions for a domain.

– Example: an Aircraft Ontology is the set of semantic definitions for the Aircraft domain, e.g.

• Predator is a subClassOf Aircraft.

• sensorID is a FunctionalProperty.

• Platform is an equivalentClass to Aircraft.

– Predator, Aircraft etc. are concepts.

• OWL is complex. It is a large set of additional terms and allows for logical axioms.

Page 58: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

58 Basic idea of conceptual modeling

The semiotic triangle (not only in Semantic Web)

Page 59: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

59 Ontologies

Communities of users (application builders, ...) can

• Search for ontologies

• Re-use existing ontologies

– Established domain-specific ontologies (e.g., real-estate, medicine, bioinformatics)

– „The big one“: Cyc, see www.cyc.com

• Link to existing ontologies

• Extend existing ontologies

Page 60: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

60 Ontologies as conceptual model

Or Database (knowledge base) = Ontology + Instances

My Life and Times

Illusions

First and Last Freedom

Paul McCartney

Richard Bach

J. Krishnamurti

June, 1998

1972

1974

title author date

BookCatalogue

<owl:Class rdf:ID="BookCatalogue"/>

<owl:DatatypeProperty rdf:ID="title">

<rdfs:domain rdf:resource="#BookCatalogue"/>

<rdfs:range rdf:resource="&xsd;#string"/>

</owl:DatatypeProperty>

<owl:DatatypeProperty rdf:ID="author">

<rdfs:domain rdf:resource="#BookCatalogue"/>

<rdfs:range rdf:resource="&xsd;#string"/>

</owl:DatatypeProperty>

<owl:DatatypeProperty rdf:ID="date">

<rdfs:domain rdf:resource="#BookCatalogue"/>

<rdfs:range rdf:resource="&xsd;#date"/>

</owl:DatatypeProperty>

<?xml version=“1.0”?>

<BookCatalogue>

<title>My Life and Times</title>

<author>Paul McCartney</author>

<date>June, 1998</date>

</BookCatalogue>

Page 61: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

61 Popular ontologies or vocabularies

Friend-of-a-Friend (FOAF) Vocabulary for describing people

Core Person Vocabulary

Vocabulary to describe the fundamental

characteristics of a person, e.g. the name, the

gender, the date of birth...

DOAP Vocabulary for describing projects

ADMS.SW Vocabulary for describing open source software

projects

ADMS Vocabulary for describing interoperability assets.

Dublin Core Defines general metadata attributes

Registered Organisation Vocabulary Vocabulary for describing organizations, typically in a

national or regional register

Organization Ontology for describing the structure of organizations

Core Location Vocabulary Vocabulary capturing the fundamental characteristics

of a location.

Core Public Service Vocabulary Vocabulary capturing the fundamental characteristics

of a service offered by public administration

schema.org Agreed vocabularies for publishing structured data on the Web elaborated by Google, Yahoo and Microsoft See also:

http://www.w3.org/wiki/TaskForces/Community

Projects/LinkingOpenData/CommonVocabulari

es

Page 62: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

62 Typical usage of owl:sameAs

Linking from one data set (DBpedia) to the other (Geonames):

This is a major mechanism of “Linking” in the Linked Open Data project

<http://dbpedia.org/resource/Amsterdam>

owl:sameAs <http://sws.geonames.org/2759793>;

Page 63: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

63

Part 5

Publishing Linked Data

Page 64: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

64 5-star schema of Linked

(Open) Data

• Make your stuff available on the Web (whatever format) under an open license.

• Make it available as structured data (e.g., Excel instead of image scan of a table)

• Use non-proprietary formats (e.g., CSV instead of Excel)

• Use URIs to denote things, so that people can point at your stuff

• Link your data to other data to provide context

Page 65: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

65

★ Make your stuff available on the Web under an open licence

Page 66: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

66

Pros & cons of ★ open data

As a consumer... As a publisher...

You can look at it. It is simple to publish.

You can store it locally. You do not have explain repeatedly to others that they can use your data.

You can enter the data into any other system.

You can change the data.

You can share the data with anyone.

Page 67: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

67

★ ★ Make it available as structured data

Page 68: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

68

Pros & cons of ★ ★ open data

All the benefits of ★ open data; plus

As a consumer... As a publisher...

You can directly process it with proprietary software to aggregate it, perform calculations, visualise it, etc.

It is still simple to publish.

You can export it into another (structured) format.

Page 69: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

69

• Proprietary: Excel, Word, PDF...

• Non-proprietary: XML, CSV, RDF, JSON, ODF...

• Road safety- Accidents 2006:

★ ★ ★ Use non-proprietary formats

Page 70: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

70

Pros & cons of ★ ★ ★ open data

• All the benefits of ★ ★ open data; plus

As a consumer... As a publisher...

You can manipulate the data in any way you like, without being confined by the capabilities of any particular software.

It is still simple to publish.

- But, you do need converters or plug-ins to export the data from the proprietary format.

Page 71: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

71

★ ★ ★ ★ Use URIs to denote things

For example, creating an URI for one of the units of the Greek Ministry of the Administrative Reform and e-Governance.

See also: http://www.slideshare.net/OpenDataSupport/design-and-manage-persitent-uris

http://org.testproject.eu/id/office/office-of-the-deputy-minister-for-administrative-reform-and-e-governance

Page 72: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

72

Pros & cons of ★ ★ ★ ★ open data

As a consumer... As a publisher...

You can link to it from any other place. You have fine-granular control over the data items and can optimise their access.

You can bookmark it. Other data publishers can now link into your data, promoting it to 5 star.

You can reuse parts of the data. You will be able to reuse vocabularies, data and metadata, and URI design patterns instead of creating them from scratch.

You may be able to reuse existing tools and libraries.

You can combine the data safely with other data.

- But you typically need to invest some time in slicing and dicing your data.

- But understanding the technology requires effort and can have a steep learning curve.

All the benefits of ★ ★ ★ open data; plus

Page 73: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

73

★ ★ ★ ★ ★ Link your data to other data to provide context

Page 74: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

74

Pros & cons of ★ ★ ★ ★ ★ open data

All the benefits of ★ ★ ★ ★ open data; plus

As a consumer... As a publisher...

You can discover more (related) data while consuming the data.

You make your data discoverable.

You can directly learn about the data schema.

You increase the context, expressivity, quality and value of your data (and consequently you give visibility to your organisation).

You can combine data from different source, be innovative, gain new knowledge, be an entrepreneur...

- This requires an investment in time, money, technology and competencies/ skills.

- But, you now have to deal with broken data links. Not all publishers/data sources will be reliable.

Page 75: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

75

Part 6

Linked Data usage

Storing, accessing, combining and inferencing SW data

Page 76: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

76 How is LD stored?

• Simple standalone RDF files

• In „Semantic Web / LOD databases“: triplestores

– A triplestore is a purpose-built database for the storage and retrieval of Resource Description Framework (RDF) metadata.

– A triplestore can store many (up to billions) of RDF triples

– For a list of implementations, see http://en.wikipedia.org/wiki/Triplestore

Page 77: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

77 How is LD accessed?

– By search engines that can extract the markup from Web pages

• e.g., Google

– By search engines that directly access triplestores

• e.g. Sindice

– By your own applications that directly access triplestores

• Obviously, data can then also be transformed into RDF (e.g. RDFa) or into human-readable web pages, see the following for an example

Page 78: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

78 Example RDF->HTML

Page 79: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

79 How can LD be combined

What does the combination/integration of this information require?

– “Linkability“ at the technical level: see Linked Data

principles

– “Linkability“ at the semantic level of identity: sameAs

– “Linkability“ at the semantic level of more complex relationships: schema / ontology matching

Page 80: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

80 Inferencing

Inference is the act or process of deriving logical conclusions from premises known or assumed to be true.

Deductive reasoning

• All swans are white.

• Tilly is a swan.

Tilly is white.

• Truth-preserving!

Inductive reasoning

• Tilly and Edda and Edwin and … are swans.

• Tilly and Edda and Edwin and … are white.

All swans are white.

• „Bringing new knowledge into the world“

Page 81: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

81 OWL properties can be…

– Functional

– Inverse functional (or: Inverse of another relation)

– Transitive

– Symmetric

– Asymmetric

– Reflexive

– Irreflexive

… and this allows for inferences on individuals

Page 82: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

82 Inference example

Class(a:bus_driver complete intersectionOf(

a:person

restriction(a:drives someValuesFrom (a:bus))))

Class(a:driver complete intersectionOf(

a:person

restriction(a:drives someValuesFrom(a:vehicle))))

Class(a:bus partial a:vehicle)

Conclusion: Busdrivers are Drivers

Page 83: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

83 Use of ontology concepts in

LOD

Page 84: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

84

Part 7

Linked Data vs. Open Data

Page 85: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

85 Opening Data

• Increasing trend for all things ‘open’ in Europe.

• Driven by two very different motives:

– Need for transparency of administration

– Fuel for economic growth

• Removes many obstacles to sharing data

• Inspired by USA

Page 86: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

86 What is Open Data (OD)?

“A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” --opendefinition.org

In summary, this means the following:

• Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.

• Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets.

• Universal Participation: everyone must be able to use, reuse and redistribute - there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.

Page 87: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

87 What is Open Government

Data (OGD)?

Open government data means:

– Data produced or commissioned by government or government controlled entities.

– Data which is open as defined in the Open Definition – that is, it can be freely used, reused and redistributed by anyone.

– Data that is not sensitive or private.

Source:[http://data.gov.uk/data]

Source:[http://publicdata.eu/]

Page 88: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

88 Expected benefits of OGD

Transparency. Citizens need to know what their government is doing. They need to be able freely to access government data and information and to share that information with other citizens. Sharing and reuse allows analysing and visualising to create more understanding.

Releasing social and commercial value. Data is a key resource for social and commercial activities. Government creates or holds a large amount of information. Open government data can help drive the creation of innovative business and services that deliver social and commercial value.

Participatory governance. Open Data enables citizens to be much more directly informed and involved in decision-making and facilitation their contribution to the process of governance.

Reducing government costs. Open Data enables the sharing of information within governments in machine-readable interoperable formats, hence reducing costs of information exchange and data integration. Governments themselves are the biggest reusers of Open Government Data.

Page 89: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

89 Linked Data vs Open Data

Open data

Data can be published and be publicly available under an open licence without linking to other data sources.

Linked data

Data can be linked to URIs from other data sources, using open standards such as RDF without being publicly available under an open licence.

See also: Cobden et al., A research agenda for Linked Closed Data http://ceur-ws.org/Vol-782/CobdenEtAl_COLD2011.pdf

Page 90: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

90

Examples of Linked data initiatives

Page 91: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

91

Page 92: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

92

DE – Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria, Berlin and Brandenburg.

IT – Agenzia per l’Italia digitiale

Three datasets published as linked data: the Index of Public Administration, the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration.

NL – Building and address register

The Dutch Address and Buildings base register published as linked data.

UK – Ordnance Survey

Three OS Open Data products published as linked data: the 1:50 000 Scale Gazetteer, Code-Point Open and the administrative geography taken from Boundary Line.

UK – Companies House

Publishing basic company details as linked data using a simple URI for each company in their database.

Page 93: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

93

Non-governmental applications

Page 94: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

94 Conclusions

• Linked data is a set of design principles for sharing machine-readable data on the Web.

• Linked data and open data are not the same.

• URIs, RDF, OWL and SPARQL form the foundational layer for Linked data.

• Linked data offers a number of advantages for:

o Data integration with small impact on legacy systems;

o Enables for semantic interoperability;

o Enables creativity and innovation through context and knowledge-creation.

Page 95: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

95 Discussion time

Page 96: Introduction to Linked Datasadl.kuleuven.be/docs/smeSpire_training... · put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata)

96 Greatful thanks and acknowledgements to

• W3C

– Introduction to the Semantic Web (2011 Semantic Technologies Conference, 6th of June, 2011, San Francisco, CA, USA Ivan Herman)

• European Commission

– Open Data Support training modules (https://joinup.ec.europa.eu/community/ods/document/online-training-material)

• Bettina Berendt, KU Kuleuven

– Course Knowledge and the Web, 1st semester 2013/2014, http://www.cs.kuleuven.be/~berendt/teaching/

• Bart van Leeuwen, Netage.NL

– http://www.slideshare.net/semanticfire