Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)
-
Upload
olaf-hartig -
Category
Technology
-
view
3.499 -
download
2
description
Transcript of Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)
ICWE 2012 Tutorial
An Introduction to SPARQL and Queries over Linked Data
● ● ●
Chapter 3: Querying Linked Data
Olaf Hartighttp://olafhartig.de/foaf.rdf#olaf
@olafhartig
Database and Information Systems Research GroupHumboldt-Universität zu Berlin
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 2
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 3
SPARQL Endpoints
● SPARQL query processing service
● Supports the SPARQL protocol
● Issuing a SPARQL query is an HTTP GET requestwith parameter query
GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1
URL-encoded stringwith the SPARQL query
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 4
Query Result Formats
● For SELECT and ASK queries: XML, JSON, plain text
● For CONSTRUCT and DESCRIBE: RDF/XML, Turtle, ...
● How to request?● ACCEPT header
● Non-standard alternative: parameter out
GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1Accept: application/sparql-results+json
GET /sparql?out=json&query=... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 5
SPARQL Client Libraries
● More convenient than on the protocol level:● SPARQL JavaScript Library
http://www.thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_a_sparql.html
● ARC for PHP http://arc.semsol.org/● RAP – RDF API for PHP
http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html● Jena / ARQ (Java) http://jena.sourceforge.net/● Sesame (Java) http://www.openrdf.org/● SPARQL Wrapper (Python)
http://sparql-wrapper.sourceforge.net/● PySPARQL (Python)
http://code.google.com/p/pysparql/
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 6
SPARQL Client Libraries
● Example with Jena ARQ:
import com.hp.hpl.jena.query.*;
String service = "..."; // address of the SPARQL endpointString query = "SELECT ..."; // your SPARQL queryQueryExecution e = QueryExecutionFactory.sparqlService( service, query );ResultSet results = e.execSelect();while ( results.hasNext() ) {
QuerySolution s = results.nextSolution();// …
}e.close();
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 7
SPARQL Endpoints
● Several Linked Data sets exposed via SPARQL endpoint● DBpedia http://dbpedia.org/sparql● Musicbrainz http://dbtune.org/musicbrainz/sparql● Semantic Web dog food http://data.semanticweb.org/sparql● etc. http://esw.w3.org/topic/SparqlEndpoints
● Send your query, receive the result
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 8
SPARQL Endpoints
● Several Linked Data sets exposed via SPARQL endpoint● DBpedia http://dbpedia.org/sparql● Musicbrainz http://dbtune.org/musicbrainz/sparql● Semantic Web dog food http://data.semanticweb.org/sparql● etc. http://esw.w3.org/topic/SparqlEndpoints
● Send your query, receive the result
Querying a single dataset is quite boring
compared to:
Issuing SPARQL queries over multiple datasets
Querying a single dataset is quite boring
compared to:
Issuing SPARQL queries over multiple datasets
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 9
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 10
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 11
Querying a Given Collection
● Some public SPARQL endpoints provide access to a collection of data from multiple sources● http://lod.openlinksw.com/sparql● http://sparql.sindice.com/
● Pros:● Nothing to set up● Good query execution times
● Cons:● Queried data might be out of date● Not all relevant data in the collection
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 12
Setting up Your Own Collection
● RDF-specific DBMSs:● Virtuoso http://virtuoso.openlinksw.com/● Allegro Graph http://www.franz.com/agraph/allegrograph/● Bigdata http://www.systap.com/bigdata.htm● OWLIM http://www.ontotext.com/owlim● 4store http://4store.org/● Jena TDB
http://jena.apache.org/● Sesame
http://www.openrdf.org/● etc.
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 13
Populating Your Own Collection
● Datasets provided as RDF dumps
● (Focused) crawling● ldspider http://code.google.com/p/ldspider/
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 14
Setting up Your Own Collection
● Pros:● All relevant data● Independent of existence, availability,
efficiency of SPARQL endpoints● Good query execution times
(once set up properly)
● Cons:● Effort to set up● Effort to operate● Queried data might
be out of date
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 15
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 16
???
?
SPARQL Endpoint Federation
● Idea of federated query processing:● Querying a query federation
service (mediator)● Mediator distributes
sub-queries torelevant sources
● Finally, mediatorcombinessub-results
● Prototypes:● FedX● SPLENDID● ANAPSID
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 17
???
?
SPARQL Endpoint Federation
● Pros:● Queried data is up to date
● Cons:● All relevant datasets
must be exposed viaa SPARQL endpoint
● Effort to setup mediator
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 18
SPARQL 1.1 Federation Extension
● SERVICE pattern in SPARQL 1.1● Explicitly specify query patterns whose execution
must be distributed to a remote SPARQL endpoint
SELECT ?v ?ve WHERE
{
?v rdf:type umbel-sc:Volcano ;
p:location dbpedia:Italy .
SERVICE <http://volcanos.example.org/query> {
?v p:lastEruption ?ve }
}
SELECT ?v ?ve WHERE
{
?v rdf:type umbel-sc:Volcano ;
p:location dbpedia:Italy .
SERVICE <http://volcanos.example.org/query> {
?v p:lastEruption ?ve }
}
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 19
For all these approaches ...
● … you have to know the relevant data sources beforehand● When selecting a SPARQL endpoint over an existing
collection of datasets● When setting up your own collection● When configuring your federation system● When using the SERVICE pattern
● … you restrict yourself to the selected sources
● … you do not tap the full potential of the Web
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 20
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 21
Main Idea
Discovered data
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 22
Main Idea
Discovered data
filmingLocationlives_in
?loc
Queryhttp://.../movie2449
acto
r_in
?actor
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 23
Main Idea
Queried data
filmingLocationlives_in
?loc
Queryhttp://.../movie2449
acto
r_in
?actor
http://.../movie2449
?
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 24
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
Queried data
filmingLocationlives_in
?loc
Queryhttp://.../movie2449
acto
r_in
?actor
http://mdb.../Paul
?actor
actor_in
http://.../movie2449
http://mdb.../Paul
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 25
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
Queried data
filmingLocation
http://.../movie2449
acto
r_in
lives_in ?loc
Query
?actor
http
://m
db...
/Pau
l
?
http://mdb.../Paul
?actor
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 26
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http://mdb.../Paul
?actor
Queried data
http://mdb.../Paul http://geo.../Berlin
?loc?actor
filmingLocation
http://.../movie2449
acto
r_in
lives_in ?loc
Query
?actor
lives_inhttp://geo.../Berlin
http://mdb.../Paul
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 27
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http://mdb.../Paul
?actor
Queried data
http://mdb.../Paul http://geo.../Berlin
?loc?actor
filmingLocation
http://.../movie2449
acto
r_in
lives_in ?loc
Query
?actor
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 28
“Real World” Example
SELECT DISTINCT ?author ?phone WHERE {
?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> .
?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .
FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .
?pub swrc:author ?author .
{ ?author owl:sameAs ?authorAlt }
UNION
{ ?authorAlt owl:sameAs ?author }
?authorAlt foaf:phone ?phone
}
Return phone numbers ofauthors of ontology engineering papers
at ESWC'09.
2
297
161min 30sec
Result size
# of retrieved docs
# of accessed servers
avg. execution time
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 29
Summary
O. Hartig and A. Langegger. A Database Perspective on Consuming Linked Data on the Web. Datenbankspektrum 10(2), 2010
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 30
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 31
http://mdb.../Paul http://geo.../Berlin
?loc?actor
SPARQL Pattern Evaluation
eval(P,G ) = { μ1 , μ2 , ... }
filmingLocationlives_in
?loc
http://.../movie2449
acto
r_in
?actor
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 32
http://mdb.../Paul http://geo.../Berlin
?loc?actor
QP(W ) = { μ1 , μ2 , ... }
SPARQL Linked Data Query
filmingLocationlives_in
?loc
http://.../movie2449
acto
r_in
?actor
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 33
QP(W ) = { μ1 , μ2 , ... }
Full-Web Semantics
eval(P,AllData(W ))
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 34
Reachability-based Semantics
● Seed URIs S
● Reachability criterion c
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 35
Reachability-based Semantics
WQP,S( ) = eval(P,AllData(W
* ))c
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 36
Reachability-based Semantics
WQP,S( ) = eval(P,AllData(W
* ))cAll
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 37
Reachability-based Semantics
WQP,S( ) = eval(P,AllData(W
* ))cNone
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 38
Reachability-based Semantics
WQP,S( ) = eval(P,AllData(W
* ))cMatch
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 39
TM
Computability
● (Ordinary) Turing machinesunsuitable:● Limited data access capabilities
not properly captured
● Web machines● Abiteboul and Vianu, 1997● Mendelzon and Milo, 1997
WQP,S( )cMatch
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 40
LD Machine
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
● Multi-tape Turing machine➔ Web Input
➔ Input
➔ Work
➔ Output
● Access to Web input is restricted● Only by performing
a particular procedurein a particular state
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 41
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
● MQ halts after a finite number of computation steps, and
● MQ outputs the complete result Q(W )
Finitely Computable LD Queries
step 1 ∙ ∙ ∙ step k - 3 step k - 2 step k – 1 step k
∙ ∙ ∙
# enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) #
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 42
Eventually Computable LD Queries
stepk + 2
∙ ∙ ∙∙ ∙ ∙
stepk - 3
stepk - 2
stepk - 1
stepk
stepk + 1
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
# enc(μ1) # enc(μ2)
➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
1. Output always encodes a subset of query result Q(W ), and
2. Each μ Q(W ) eventually appears on the output
✗ No guarantee for termination
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 43
Main Results for cMatch-Semantics
Theorem: Any satisfiable SPARQL based Linked Data query QP,S under cMatch-semantics that is monotonic, is at least eventually computable; Any non-monotonic QP,S is either finitely computable or not even eventually computable.
Theorem: Any satisfiable SPARQL based Linked Data query QP,S under cMatch-semantics that is monotonic, is at least eventually computable; Any non-monotonic QP,S is either finitely computable or not even eventually computable.
cMatch
cMatch
Theorem: TERMINATION(cMatch) is not LD machine decidable.Theorem: TERMINATION(cMatch) is not LD machine decidable.
Problem: TERMINATION(cMatch )
Web Input: W – a (potentially infinite) Web of Linked DataOrd.Input: S – a finite but nonempty set of seed URIs
P – a SPARQL expressionQuestion: Does an LD machine exist that computes QP,S (W )
and halts?
Problem: TERMINATION(cMatch )
Web Input: W – a (potentially infinite) Web of Linked DataOrd.Input: S – a finite but nonempty set of seed URIs
P – a SPARQL expressionQuestion: Does an LD machine exist that computes QP,S (W )
and halts?cMatch
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 44
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 45
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Seed: <http://.../orgaX>
Iterator Based Execution
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 46
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Seed: <http://.../orgaX>
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
query-localdataset
Iterator Based Execution
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 47
query-localdataset
Next?
Next?
Next?
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 48
Next?
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
:
<http://.../alice> ex:affiliated_with <http://.../orgaX>
:
query-localdataset
{ ?p = <http://.../alice> }
Next?
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 49
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
query-localdataset
{ ?p = <http://.../alice> }
Next?
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Iterator Based Execution
Next?
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 50
Next?
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
:
<http://.../alice> ex:interested_in <http://.../b1>
:
query-localdataset
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Iterator Based Execution
{ ?p = <http://.../alice> }
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 51
Next?
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
query-localdataset
Iterator Based Execution
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 52
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
Next?
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
:
<http://.../b1> rdf:type <http://.../Book>
:
query-localdataset
Iterator Based Execution
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 53
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I
1
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
query-localdataset
Iterator Based Execution
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 54
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Seed: <http://.../orgaX>
tp2 = ( ?p , ex:interested_in , ?b ) I
2
Alternative Execution Order
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 55
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
Seed: <http://.../orgaX>
tp2 = ( ?p , ex:interested_in , ?b ) I
2query-local
dataset
Iterator Based Execution
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 56
Next?:
<http://.../alice> ex:affiliated_with <http://.../orgaX>
:
query-localdataset
Next?
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
END!
Alternative Execution Order
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 57
query-localdataset
tp1 = ( ?b , rdf:type , <http://.../Book> ) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
3
END!
END!
END!Computed queryresult may dependon the order of triple patterns
= logical query execution plan
Alternative Execution Order
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 58
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 59
Query Plan Selection
Assumptions about Q P,S :● P refers to instance data● S = uris(P)
cMatch
● Assessment criteria:● Cost (query execution time)● Benefit (size of computed of result)
● Cost and benefit must be estimated without plan execution
● Estimation impossible due to “zero knowledge”
● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE
● SEED TP RULE
● NO VOCAB SEED RULE
● FILTERING TP RULE
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 60
Query Plan Selection
● Assessment criteria:● Cost (query execution time)● Benefit (size of computed of result)
● Cost and benefit must be estimated without plan execution
● Estimation impossible due to “zero knowledge”
● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE
● SEED TP RULE
● NO VOCAB SEED RULE
● FILTERING TP RULE
Assumptions about Q P,S :● P refers to instance data● S = uris(P)
cMatch
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 61
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
√
Use a dependency respecting query plan
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 62
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?p , ex:interested_in , ?b ) I
2
tp3 = ( ?b , rdf:type , <http://.../Book> ) I
3
Use a dependency respecting query plan
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 63
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?b , rdf:type , <http://.../Book> ) I
2
tp3 = ( ?p , ex:interested_in , ?b ) I
3
Use a dependency respecting query plan
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 64
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
DEPENDENCY RESPECT RULE
Use a dependency respecting query plan
● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns
● Rationale:Avoidcartesianproducts
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
tp2 = ( ?b , rdf:type , <http://.../Book> ) I
2
tp3 = ( ?p , ex:interested_in , ?b ) I
3
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 65
SEED TP RULE
● Potential seed triple pattern
… is a triple pattern that contains at least one HTTP URI
● Seed triple pattern of a plan
… is the first triple pattern in the plan and
… is a potential seed triple pattern
● Rationale: goodstarting point
Use a plan with a seed triple pattern
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
√√
√
Recall: S = uris(P)
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 66
NO VOCAB SEED RULE
● Not only vocabulary term URIs in the seed triple pattern
● Patterns to avoid: ?s ex:any_property ?o
?s rdf:type ex:any_class
● Rationale: URIs for vocabulary term usually resolve tovocabulary definitions with little instance data
Avoid a seed triple pattern with vocabulary terms
?p ex:affiliated_with <http://.../orgaX>
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Query
√
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 67
FILTERING TP RULE
● Filtering triple pattern: each variable already occurs in oneof the preceding triple patterns
● For each resultconsumed as inputa filtering TP canonly report 1 or 0results as output
● Rationale: Reduce cost
tp2 = ( ?p , ex:interested_in , ?b )
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
I2
tp3 = ( ?b , rdf:type , <http://.../Book> )
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
I3
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I
1
{ ?p = <http://.../alice> }
{ ?p = <http://.../alice> , ?b = <http://.../b1> }
Use a plan where all filtering triple patterns areas close to the seed triple pattern as possible
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 68
Evaluation Procedure
● Generate all possible plans
● Execute each plan:● 5 runs (+ 1 initial warm-up run) ● Use an initially empty query-local dataset for each run
● Measure for each plan:● Avg. execution time● Avg. number of RDF documents retrieved during execution● Avg. number of query results
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 69
Evaluation Query (Example)
SELECT ?spec ?genus WHERE {
geospecies:4qyn7 gs:inFamily ?fam .
?fam skos:narrowerTransitive ?spec .
?spec skos:closeMatch ?sp2 .
?sp2 rdfs:subClassOf ?genus .
?spec gs:isExpectedIn ?loc .
geospecies:4qyn7 gs:isExpectedIn ?loc
?loc rdf:type gs:State . }
● 2 potential seed triple patterns thatsatisfy our NO SEED VOCAB RULE
● 56 different dependency respectingplans, each contains 2 filtering TPs
Of what genus are the species that are● classified in the
same family as the American Badger,
● and expected in the same states as the American Badger ?
Picture source: Wikipedia
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 70
Measurements
1st Filtering TP
Percentage of plans in each group with a filtering TP in specific positions
2nd Filtering TP
0 30 60 90 120 150 1800
100
200
300
400
query exec. times (in seconds)
quer
y re
sults
0 30 60 90 120 150 1800
10
20
30
query exec. times (in seconds)
1 2 3 4 5 6 70
100
TP position in the ordered BGP
1 2 3 4 5 6 70
100
TP position in the ordered BGP
retr
ieve
d d
ocu
men
ts
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 71
Summary (Linked Data Queries)
● Theoretical foundations of Linked Data queries● Full-Web semantics, (family of) reachability based semantics● Theoretical properties of queries (e.g. computability)
● Link traversal based query execution● Novel paradigm for executing Linked Data queries● Sound and complete for conjunctive Linked Data queries
under cMatch-semantics
● Iterator implementation of the LTBQE paradigm● Trades off completeness for a termination guarantee● Degree of completeness depends on execution order of TPs
● Heuristic based plan selection
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 72
Chapter 3
Accessing a SPARQL Endpoint Queries over Multiple Datasets
➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution
Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 73
These slides have been created byOlaf Hartig
http://olafhartig.de
This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)