Linked Data and Services

34
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Institute AIFB www.kit.edu Linked Data and Services Andreas Harth and Barry Norton

description

Presentation of Linked Open Services and Linked Data Services at the PlanetData kick-off

Transcript of Linked Data and Services

Page 1: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Institute AIFB

www.kit.edu

Linked Data and Services

Andreas Harth and Barry Norton

Page 2: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Outline

Motivation

Linked Data Principles

Query Processing over Linked Data

Linked Data Services (LIDS) and Linked Open Services (LOS)

Conclusion

Page 3: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Motivation

Semantic Web/Linked Data technologies are well-suited for data integration

11.04.2023

Data Integration

Interactive Data Exploration

Common Data Format/Access

Protocol

!?

Page 4: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Linked Data Principles*

1. Use URIs to name things; not only documents, but also people, locations, concepts, etc.

2. To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs

3. When someone looks up a URI we provide useful information; with 'useful' in the strict sense we usually mean structured data in RDF.

4. Include links to other URIs allowing agents (machines and humans) to discover more things

(*) http://www.w3.org/DesignIssues/LinkedData.html

Page 5: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Correspondence between thing-URI and source-URI

5

User Agent

Web Server

http://www.polleres.net/foaf.rdf#me

http://www.polleres.net/foaf.rdf

HTTPGET

RDF

Page 6: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Correspondence between thing-URI and source-URI

6

User Agent

Web Server

http://dbpedia.org/resource/Gordon_Brown

http://dbpedia.org/data/Gordon_Brown

HTTP GET

303 HTTPGET

RDF

http://dbpedia.org/page/Gordon_Brown

Page 7: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Page 8: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Queries over Linked Data

SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}

?f ?n

SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }

Page 9: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Data warehousing or materialisation-based approaches (MAT)

Querying Data Across Sources

9 15.03.2010

CRAWL INDEX SERVE

SELECT * FROM…

R S

Distributed query processing approaches (DQP)

R S

Page 10: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

DQP on Linked Data

10 15.03.2010

SELECT * FROM…

R S

R S

SELECT ?s WHERE…

TP TP

TP TP

HTTP GET

HTTP GET

ODBCODBC

Page 11: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Query Processing Overview

Andreas Harth

Data Summaries for On-Demand Queries over Linked Data

11 15.03.2010

TP(an:f#ah foaf:knows ?f)

SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}

TP(?f foaf:name ?n)

?f ?n

http://danbri.org/foaf.rdf#danbri Dan Brickley

Select source(s)

Select source(s)

HTTP

GETRDF HTTP

GET

RDF

Page 12: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Problem: Source Selection for Triple Patterns

12 15.03.2010

(?s ?p ?o)(#s ?p ?o)(?s #p ?o)(?s ?p #o)(#s #p ?o)(#s ?p #o)(?s #p #o)(#s #p #o)

Given a triple pattern, which source can contribute bindings for the triple pattern?

Page 13: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Keep index of properties and/or classes contained in sources

(?s #p ?o), (?s rdf:type #o)

Covers only queries containing schema-level elements

Commonly used properties select potentially too many sources

Schema-Level Indices [Stuckenschmidt et al. 2004]

13 15.03.2010

SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}

SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }

Page 14: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Exploits correspondence between thing-URI and source-URI

Linked Data sources (aka RDF files) return typically triples with a subject corresponding to the source

Sometimes the sources return triples with object corresponding to the source

(#s ?p ?o), (#s #p ?o), (#s #p #o)(?s ?p #o), (?s #p #o)

Incomplete wrt. patterns but also wrt. to URI reuse across sources

Limited parallelism, unclear how to schedule lookups

Direct Lookup (DL) [Hartig et al. 2009]

SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}

SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }

Page 15: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Combined description of schema-level and instance-level

Use approximation to reduce index size (incurs false positives)

Possible to use entire query for source selection

Parallel lookups since sources can be determined for the entire query

(?s ?p ?o), (#s ?p ?o), (?s #p ?o), (?s ?p #o), (#s #p ?o), (#s ?p #o), (?s #p #o), (#s #p #o)and combinations of triple patterns

Approximate Data Summaries

15 15.03.2010

SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}

SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }

Page 16: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Implementation

Deploy wrappers „in the cloud“

Google App Engine: hosting of Java and Python webapps on Google’s Cloud infrastructure

Limited amount of processing time (6hrs/day)

Single-threaded applications

Suited for deploying wrappers

e.g. http://twitter2foaf.appspot.com/ converts Twitter user data to RDF

Page 17: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Linking Open Data Cloud 2007

Page 18: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Linking Open Data Cloud 2008

Page 19: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Linking Open Data Cloud 2009

Page 20: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Linking Open Data Cloud 2010

Page 21: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Geonames Services

Page 22: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Geonames Services

Page 23: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Geonames Services

{"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO", ...

Page 24: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

{"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO", ...

Geonames Services

Page 25: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Linked Open Service Principles

REST Principles1. Application state and functionality is divided into resources2. Every resource is uniquely addressable3. All resources share a uniform interface: a) A constrained set of well-defined operations b) A constrained set of content types

Linked Data Principles1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)4. Include links to other URIs. so that they can discover more things.

Linked Open Service Principles1. Describe services as LOD prosumers with input and output descriptions as SPARQL graph patterns2. Communicate RDF by RESTful content negotiation3. The output should make explicit its relation with the input

Page 26: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

LOS Weather Service

Page 27: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

LOS Geo Resources

Page 28: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Resource-Based Linked Open Services

GETAccept: text/html

303 REDIRECT /page

GETAccept: application/rdf+xml (or text/n3)

303 REDIRECT /data Link

ed

Dat

aLi

nked

S

ervi

ce

GET /weatherAccept: application/rdf+xml(or text/n3)

200 <rdf:Description>

Page 29: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Interlinking Data with Data from Services?

Page 30: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Linked Data Services

We’d like to integrate data services with Linked Data

1. LIDS need to adhere to Linked Data principles

We’d like to use data services in software programs

2. LIDS need machine-readable descriptions of input and output

Compared to naïve approach: assign URI to service output

Relationship between input and output is explicitly described

Dynamicity is supported

Multiple or no output resources can be linked to input

Page 31: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Interlink LIDS and Linked Data

Generate service URIs with input bindings, from evaluating :select Xi where Ti

sameAs: binding for i

Page 32: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Query Answering using LIDS and Linked Data

Query execution resolves URIs

=> enlarges data set

LIDS are interlinked

Query is executed again on new data set

Repeat until no new links or no new data

Combine results

Page 33: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Experiment: Query Answering

Input:List of 562 (potential) universities from Facebook Graph API

Output: Facebook fans and DBpedia student numbers for 104 universities

PREFIX u: <http://openlids.org/universities.rdf#> SELECT ?n ?f ?s WHERE { u:list foaf:topic ?u . ?u foaf:name ?n . ?u og:fan_count ?f .?u d:numberOfStudents ?s }

Page 34: Linked Data and Services

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

Linked * Services and PlanetData

Several areas seem likely to produce services:Stream, inc. Sensor, resources (latest values)

Any others exposing dynamic resources

Dynamic computations, inc. on-the-fly quality assessments

Other areas seem likely to consider service technologies and move towards more service-like HTTP interactions

Access control (OpenID, OAuth, etc.)

Finally, remaining areas could serve to complement LIDS/LOS alignment

Provenance