Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

73
ICWE 2012 Tutorial An Introduction to SPARQL and Queries over Linked Data ● ● ● Chapter 3: Querying Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universität zu Berlin

description

These are the slides from my ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data"

Transcript of Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Page 1: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

ICWE 2012 Tutorial

An Introduction to SPARQL and Queries over Linked Data

● ● ●

Chapter 3: Querying Linked Data

Olaf Hartighttp://olafhartig.de/foaf.rdf#olaf

@olafhartig

Database and Information Systems Research GroupHumboldt-Universität zu Berlin

Page 2: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 2

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets Linked Data Queries

Page 3: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 3

SPARQL Endpoints

● SPARQL query processing service

● Supports the SPARQL protocol

● Issuing a SPARQL query is an HTTP GET requestwith parameter query

GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1

URL-encoded stringwith the SPARQL query

Page 4: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 4

Query Result Formats

● For SELECT and ASK queries: XML, JSON, plain text

● For CONSTRUCT and DESCRIBE: RDF/XML, Turtle, ...

● How to request?● ACCEPT header

● Non-standard alternative: parameter out

GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1Accept: application/sparql-results+json

GET /sparql?out=json&query=... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1

Page 5: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 5

SPARQL Client Libraries

● More convenient than on the protocol level:● SPARQL JavaScript Library

http://www.thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_a_sparql.html

● ARC for PHP http://arc.semsol.org/● RAP – RDF API for PHP

http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html● Jena / ARQ (Java) http://jena.sourceforge.net/● Sesame (Java) http://www.openrdf.org/● SPARQL Wrapper (Python)

http://sparql-wrapper.sourceforge.net/● PySPARQL (Python)

http://code.google.com/p/pysparql/

Page 6: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 6

SPARQL Client Libraries

● Example with Jena ARQ:

import com.hp.hpl.jena.query.*;

String service = "..."; // address of the SPARQL endpointString query = "SELECT ..."; // your SPARQL queryQueryExecution e = QueryExecutionFactory.sparqlService( service, query );ResultSet results = e.execSelect();while ( results.hasNext() ) {

QuerySolution s = results.nextSolution();// …

}e.close();

Page 7: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 7

SPARQL Endpoints

● Several Linked Data sets exposed via SPARQL endpoint● DBpedia http://dbpedia.org/sparql● Musicbrainz http://dbtune.org/musicbrainz/sparql● Semantic Web dog food http://data.semanticweb.org/sparql● etc. http://esw.w3.org/topic/SparqlEndpoints

● Send your query, receive the result

Page 8: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 8

SPARQL Endpoints

● Several Linked Data sets exposed via SPARQL endpoint● DBpedia http://dbpedia.org/sparql● Musicbrainz http://dbtune.org/musicbrainz/sparql● Semantic Web dog food http://data.semanticweb.org/sparql● etc. http://esw.w3.org/topic/SparqlEndpoints

● Send your query, receive the result

Querying a single dataset is quite boring

compared to:

Issuing SPARQL queries over multiple datasets

Querying a single dataset is quite boring

compared to:

Issuing SPARQL queries over multiple datasets

Page 9: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 9

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets Linked Data Queries

Page 10: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 10

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries

Page 11: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 11

Querying a Given Collection

● Some public SPARQL endpoints provide access to a collection of data from multiple sources● http://lod.openlinksw.com/sparql● http://sparql.sindice.com/

● Pros:● Nothing to set up● Good query execution times

● Cons:● Queried data might be out of date● Not all relevant data in the collection

Page 12: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 12

Setting up Your Own Collection

● RDF-specific DBMSs:● Virtuoso http://virtuoso.openlinksw.com/● Allegro Graph http://www.franz.com/agraph/allegrograph/● Bigdata http://www.systap.com/bigdata.htm● OWLIM http://www.ontotext.com/owlim● 4store http://4store.org/● Jena TDB

http://jena.apache.org/● Sesame

http://www.openrdf.org/● etc.

Page 13: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 13

Populating Your Own Collection

● Datasets provided as RDF dumps

● (Focused) crawling● ldspider http://code.google.com/p/ldspider/

Page 14: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 14

Setting up Your Own Collection

● Pros:● All relevant data● Independent of existence, availability,

efficiency of SPARQL endpoints● Good query execution times

(once set up properly)

● Cons:● Effort to set up● Effort to operate● Queried data might

be out of date

Page 15: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 15

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries

Page 16: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 16

???

?

SPARQL Endpoint Federation

● Idea of federated query processing:● Querying a query federation

service (mediator)● Mediator distributes

sub-queries torelevant sources

● Finally, mediatorcombinessub-results

● Prototypes:● FedX● SPLENDID● ANAPSID

Page 17: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 17

???

?

SPARQL Endpoint Federation

● Pros:● Queried data is up to date

● Cons:● All relevant datasets

must be exposed viaa SPARQL endpoint

● Effort to setup mediator

Page 18: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 18

SPARQL 1.1 Federation Extension

● SERVICE pattern in SPARQL 1.1● Explicitly specify query patterns whose execution

must be distributed to a remote SPARQL endpoint

SELECT ?v ?ve WHERE

{

?v rdf:type umbel-sc:Volcano ;

p:location dbpedia:Italy .

SERVICE <http://volcanos.example.org/query> {

?v p:lastEruption ?ve }

}

SELECT ?v ?ve WHERE

{

?v rdf:type umbel-sc:Volcano ;

p:location dbpedia:Italy .

SERVICE <http://volcanos.example.org/query> {

?v p:lastEruption ?ve }

}

Page 19: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 19

For all these approaches ...

● … you have to know the relevant data sources beforehand● When selecting a SPARQL endpoint over an existing

collection of datasets● When setting up your own collection● When configuring your federation system● When using the SERVICE pattern

● … you restrict yourself to the selected sources

● … you do not tap the full potential of the Web

Page 20: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 20

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries

Page 21: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 21

Main Idea

Discovered data

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 22: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 22

Main Idea

Discovered data

filmingLocationlives_in

?loc

Queryhttp://.../movie2449

acto

r_in

?actor

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 23: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 23

Main Idea

Queried data

filmingLocationlives_in

?loc

Queryhttp://.../movie2449

acto

r_in

?actor

http://.../movie2449

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 24: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 24

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

Queried data

filmingLocationlives_in

?loc

Queryhttp://.../movie2449

acto

r_in

?actor

http://mdb.../Paul

?actor

actor_in

http://.../movie2449

http://mdb.../Paul

Page 25: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 25

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

Queried data

filmingLocation

http://.../movie2449

acto

r_in

lives_in ?loc

Query

?actor

http

://m

db...

/Pau

l

?

http://mdb.../Paul

?actor

Page 26: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 26

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://mdb.../Paul

?actor

Queried data

http://mdb.../Paul http://geo.../Berlin

?loc?actor

filmingLocation

http://.../movie2449

acto

r_in

lives_in ?loc

Query

?actor

lives_inhttp://geo.../Berlin

http://mdb.../Paul

Page 27: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 27

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://mdb.../Paul

?actor

Queried data

http://mdb.../Paul http://geo.../Berlin

?loc?actor

filmingLocation

http://.../movie2449

acto

r_in

lives_in ?loc

Query

?actor

Page 28: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 28

“Real World” Example

SELECT DISTINCT ?author ?phone WHERE {

?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> .

?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .

FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .

?pub swrc:author ?author .

{ ?author owl:sameAs ?authorAlt }

UNION

{ ?authorAlt owl:sameAs ?author }

?authorAlt foaf:phone ?phone

}

Return phone numbers ofauthors of ontology engineering papers

at ESWC'09.

2

297

161min 30sec

Result size

# of retrieved docs

# of accessed servers

avg. execution time

Page 29: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 29

Summary

O. Hartig and A. Langegger. A Database Perspective on Consuming Linked Data on the Web. Datenbankspektrum 10(2), 2010

Page 30: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 30

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning

Page 31: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 31

http://mdb.../Paul http://geo.../Berlin

?loc?actor

SPARQL Pattern Evaluation

eval(P,G ) = { μ1 , μ2 , ... }

filmingLocationlives_in

?loc

http://.../movie2449

acto

r_in

?actor

Page 32: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 32

http://mdb.../Paul http://geo.../Berlin

?loc?actor

QP(W ) = { μ1 , μ2 , ... }

SPARQL Linked Data Query

filmingLocationlives_in

?loc

http://.../movie2449

acto

r_in

?actor

Page 33: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 33

QP(W ) = { μ1 , μ2 , ... }

Full-Web Semantics

eval(P,AllData(W ))

Page 34: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 34

Reachability-based Semantics

● Seed URIs S

● Reachability criterion c

Page 35: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 35

Reachability-based Semantics

WQP,S( ) = eval(P,AllData(W

* ))c

Page 36: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 36

Reachability-based Semantics

WQP,S( ) = eval(P,AllData(W

* ))cAll

Page 37: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 37

Reachability-based Semantics

WQP,S( ) = eval(P,AllData(W

* ))cNone

Page 38: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 38

Reachability-based Semantics

WQP,S( ) = eval(P,AllData(W

* ))cMatch

Page 39: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 39

TM

Computability

● (Ordinary) Turing machinesunsuitable:● Limited data access capabilities

not properly captured

● Web machines● Abiteboul and Vianu, 1997● Mendelzon and Milo, 1997

WQP,S( )cMatch

Page 40: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 40

LD Machine

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙

● Multi-tape Turing machine➔ Web Input

➔ Input

➔ Work

➔ Output

● Access to Web input is restricted● Only by performing

a particular procedurein a particular state

Page 41: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 41

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙➔ Web Input

➔ Input

➔ Work

➔ Output

● For Q exists an LD machine MQ such that for any W holds:

● MQ halts after a finite number of computation steps, and

● MQ outputs the complete result Q(W )

Finitely Computable LD Queries

step 1 ∙ ∙ ∙ step k - 3 step k - 2 step k – 1 step k

∙ ∙ ∙

# enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) #

Page 42: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 42

Eventually Computable LD Queries

stepk + 2

∙ ∙ ∙∙ ∙ ∙

stepk - 3

stepk - 2

stepk - 1

stepk

stepk + 1

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙

# enc(μ1) # enc(μ2)

➔ Web Input

➔ Input

➔ Work

➔ Output

● For Q exists an LD machine MQ such that for any W holds:

1. Output always encodes a subset of query result Q(W ), and

2. Each μ Q(W ) eventually appears on the output

✗ No guarantee for termination

Page 43: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 43

Main Results for cMatch-Semantics

Theorem: Any satisfiable SPARQL based Linked Data query QP,S under cMatch-semantics that is monotonic, is at least eventually computable; Any non-monotonic QP,S is either finitely computable or not even eventually computable.

Theorem: Any satisfiable SPARQL based Linked Data query QP,S under cMatch-semantics that is monotonic, is at least eventually computable; Any non-monotonic QP,S is either finitely computable or not even eventually computable.

cMatch

cMatch

Theorem: TERMINATION(cMatch) is not LD machine decidable.Theorem: TERMINATION(cMatch) is not LD machine decidable.

Problem: TERMINATION(cMatch )

Web Input: W – a (potentially infinite) Web of Linked DataOrd.Input: S – a finite but nonempty set of seed URIs

P – a SPARQL expressionQuestion: Does an LD machine exist that computes QP,S (W )

and halts?

Problem: TERMINATION(cMatch )

Web Input: W – a (potentially infinite) Web of Linked DataOrd.Input: S – a finite but nonempty set of seed URIs

P – a SPARQL expressionQuestion: Does an LD machine exist that computes QP,S (W )

and halts?cMatch

Page 44: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 44

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning

Page 45: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 45

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Seed: <http://.../orgaX>

Iterator Based Execution

Page 46: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 46

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Seed: <http://.../orgaX>

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

query-localdataset

Iterator Based Execution

Page 47: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 47

query-localdataset

Next?

Next?

Next?

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution

Page 48: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 48

Next?

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

:

<http://.../alice> ex:affiliated_with <http://.../orgaX>

:

query-localdataset

{ ?p = <http://.../alice> }

Next?

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution

Page 49: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 49

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

query-localdataset

{ ?p = <http://.../alice> }

Next?

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution

Next?

Page 50: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 50

Next?

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

:

<http://.../alice> ex:interested_in <http://.../b1>

:

query-localdataset

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Iterator Based Execution

{ ?p = <http://.../alice> }

Page 51: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 51

Next?

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

query-localdataset

Iterator Based Execution

{ ?p = <http://.../alice> }

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Page 52: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 52

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

Next?

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

:

<http://.../b1> rdf:type <http://.../Book>

:

query-localdataset

Iterator Based Execution

{ ?p = <http://.../alice> }

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Page 53: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 53

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

query-localdataset

Iterator Based Execution

{ ?p = <http://.../alice> }

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Page 54: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 54

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Seed: <http://.../orgaX>

tp2 = ( ?p , ex:interested_in , ?b ) I

2

Alternative Execution Order

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

Page 55: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 55

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Seed: <http://.../orgaX>

tp2 = ( ?p , ex:interested_in , ?b ) I

2query-local

dataset

Iterator Based Execution

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

Page 56: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 56

Next?:

<http://.../alice> ex:affiliated_with <http://.../orgaX>

:

query-localdataset

Next?

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

END!

Alternative Execution Order

Page 57: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 57

query-localdataset

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

END!

END!

END!Computed queryresult may dependon the order of triple patterns

= logical query execution plan

Alternative Execution Order

Page 58: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 58

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning

Page 59: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 59

Query Plan Selection

Assumptions about Q P,S :● P refers to instance data● S = uris(P)

cMatch

● Assessment criteria:● Cost (query execution time)● Benefit (size of computed of result)

● Cost and benefit must be estimated without plan execution

● Estimation impossible due to “zero knowledge”

● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE

● SEED TP RULE

● NO VOCAB SEED RULE

● FILTERING TP RULE

Page 60: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 60

Query Plan Selection

● Assessment criteria:● Cost (query execution time)● Benefit (size of computed of result)

● Cost and benefit must be estimated without plan execution

● Estimation impossible due to “zero knowledge”

● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE

● SEED TP RULE

● NO VOCAB SEED RULE

● FILTERING TP RULE

Assumptions about Q P,S :● P refers to instance data● S = uris(P)

cMatch

Page 61: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 61

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Use a dependency respecting query plan

Page 62: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 62

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Use a dependency respecting query plan

Page 63: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 63

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?b , rdf:type , <http://.../Book> ) I

2

tp3 = ( ?p , ex:interested_in , ?b ) I

3

Use a dependency respecting query plan

Page 64: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 64

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

Use a dependency respecting query plan

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

● Rationale:Avoidcartesianproducts

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?b , rdf:type , <http://.../Book> ) I

2

tp3 = ( ?p , ex:interested_in , ?b ) I

3

Page 65: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 65

SEED TP RULE

● Potential seed triple pattern

… is a triple pattern that contains at least one HTTP URI

● Seed triple pattern of a plan

… is the first triple pattern in the plan and

… is a potential seed triple pattern

● Rationale: goodstarting point

Use a plan with a seed triple pattern

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

√√

Recall: S = uris(P)

Page 66: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 66

NO VOCAB SEED RULE

● Not only vocabulary term URIs in the seed triple pattern

● Patterns to avoid: ?s ex:any_property ?o

?s rdf:type ex:any_class

● Rationale: URIs for vocabulary term usually resolve tovocabulary definitions with little instance data

Avoid a seed triple pattern with vocabulary terms

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Page 67: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 67

FILTERING TP RULE

● Filtering triple pattern: each variable already occurs in oneof the preceding triple patterns

● For each resultconsumed as inputa filtering TP canonly report 1 or 0results as output

● Rationale: Reduce cost

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

{ ?p = <http://.../alice> }

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Use a plan where all filtering triple patterns areas close to the seed triple pattern as possible

Page 68: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 68

Evaluation Procedure

● Generate all possible plans

● Execute each plan:● 5 runs (+ 1 initial warm-up run) ● Use an initially empty query-local dataset for each run

● Measure for each plan:● Avg. execution time● Avg. number of RDF documents retrieved during execution● Avg. number of query results

Page 69: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 69

Evaluation Query (Example)

SELECT ?spec ?genus WHERE {

geospecies:4qyn7 gs:inFamily ?fam .

?fam skos:narrowerTransitive ?spec .

?spec skos:closeMatch ?sp2 .

?sp2 rdfs:subClassOf ?genus .

?spec gs:isExpectedIn ?loc .

geospecies:4qyn7 gs:isExpectedIn ?loc

?loc rdf:type gs:State . }

● 2 potential seed triple patterns thatsatisfy our NO SEED VOCAB RULE

● 56 different dependency respectingplans, each contains 2 filtering TPs

Of what genus are the species that are● classified in the

same family as the American Badger,

● and expected in the same states as the American Badger ?

Picture source: Wikipedia

Page 70: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 70

Measurements

1st Filtering TP

Percentage of plans in each group with a filtering TP in specific positions

2nd Filtering TP

0 30 60 90 120 150 1800

100

200

300

400

query exec. times (in seconds)

quer

y re

sults

0 30 60 90 120 150 1800

10

20

30

query exec. times (in seconds)

1 2 3 4 5 6 70

100

TP position in the ordered BGP

1 2 3 4 5 6 70

100

TP position in the ordered BGP

retr

ieve

d d

ocu

men

ts

Page 71: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 71

Summary (Linked Data Queries)

● Theoretical foundations of Linked Data queries● Full-Web semantics, (family of) reachability based semantics● Theoretical properties of queries (e.g. computability)

● Link traversal based query execution● Novel paradigm for executing Linked Data queries● Sound and complete for conjunctive Linked Data queries

under cMatch-semantics

● Iterator implementation of the LTBQE paradigm● Trades off completeness for a termination guarantee● Degree of completeness depends on execution order of TPs

● Heuristic based plan selection

Page 72: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 72

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning

Page 73: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 73

These slides have been created byOlaf Hartig

http://olafhartig.de

This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)