How Caching Improves Efficiency and Result Completeness for Querying Linked Data

60
How Caching Improves Efficiency and Result Completeness for Querying Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf Database and Information Systems Research Group Humboldt-Universität zu Berlin

description

That's the slides with which I presented my paper at the Linked Data on the Web workshop (LDOW) at WWW 2011, Hyderabad, India.

Transcript of How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Page 1: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

How Caching ImprovesEfficiency and Result Completeness

for Querying Linked Data

Olaf Hartighttp://olafhartig.de/foaf.rdf#olaf

Database and Information Systems Research GroupHumboldt-Universität zu Berlin

Page 2: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 2

SELECT DISTINCT ?i ?labelWHERE {

?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ; foaf:topic_interest ?i .

OPTIONAL { ?i rdfs:label ?label FILTER( LANG(?label)="en" || LANG(?label)="") }}ORDER BY ?label

?

Can we query the Web of Dataas of it were a single,giant database?

Our approach: Link Traversal Based Query Execution [ISWC'09]

Page 3: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 3

Main Idea

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 4: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 4

Main Idea

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 5: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 5

query-localdataset

Main Idea

http://bob.name

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 6: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 6

query-localdataset

Main Idea

http://bob.name

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 7: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 7

query-localdataset

Main Idea

http://bob.name

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 8: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 8

query-localdataset

Main Idea

http://bob.name

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

“Descriptor object”

Page 9: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 9

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

n

ame

project ?prj

Query

?prjName

know

s

http://bob.name

?acq

Page 10: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 10

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

knows

http://bob.name

http://alice.name

n

ame

project ?prj

Query

?prjName

know

s

http://bob.name

?acq n

ame

project ?prj

Query

?prjName

know

s

http://bob.name

?acq

Page 11: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 11

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name

?acq

n

ame

project ?prj

Query

?prjName

know

s

http://bob.name

?acq

knows

http://bob.name

http://alice.name

Page 12: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 12

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http

://al

ice.

nam

e

?

http://alice.name

?acq

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 13: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 13

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http

://al

ice.

nam

e

?

http://alice.name

?acq

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 14: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 14

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http

://al

ice.

nam

e

?

http://alice.name

?acq

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 15: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 15

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

http://alice.name

?acq

Page 16: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 16

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name

?acq

n

ame

?prjName

know

s

http://bob.nameQuery

project ?prj?acq

Page 17: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 17

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

project http://.../AlicesPrj

http://alice.name

http://alice.name

?acq

n

ame

?prjName

know

s

http://bob.nameQuery

project ?prj?acq

Page 18: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 18

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name http://.../AlicesPrj

?prj?acq

http://alice.name

?acq

n

ame

?prjName

know

s

http://bob.nameQuery

project ?prj?acq

project http://.../AlicesPrj

http://alice.name

Page 19: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 19

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name

?acq

n

ame

?prjName

know

s

http://bob.nameQuery

project ?prj?acq

http://alice.name http://.../AlicesPrj

?prj?acq

Page 20: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 20

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name

?acq

n

ame

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

http://alice.name http://.../AlicesPrj

?prj?acq

http://.../AlicesPrj “ … “

?prjName?prj

Page 21: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 21

query-localdataset

Main Idea

http://alice.name

?acq

n

ame

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

http://alice.name http://.../AlicesPrj

?prj?acq

http://.../AlicesPrj “ … “

?prjName?prj

http://alice.name

?acqhttp://.../AlicesPrj “ … “

?prjName?prj

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 22: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 22

Characteristics

● Link traversal based query execution:● Evaluation on a continuously augmented dataset● Discovery of potentially relevant data during execution● Discovery driven by intermediate solutions

● Main advantage:● No need to know all data sources in advance

● Limitations:● Query has to contain a URI as a starting point● Ignores data that is not reachable* by the query execution

* formal definition in the paper

Page 23: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 23

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

Page 24: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 24

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

http://bob.name?

Page 25: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 25

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

knows

http://bob.name

http://alice.name

Page 26: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 26

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

knows

http://bob.name

http://alice.name

?acq ?iLabel?i

Page 27: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 27

query-localdataset

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

nam

e

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

Page 28: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 28

query-localdataset

query-localdataset

Reusing the Query-Local Dataset

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

nam

e

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

Page 29: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 29

query-localdataset

Reusing the Query-Local Dataset

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

nam

e

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

knows

http://alice.name

http://bob.name

Page 30: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 30

query-localdataset

Reusing the Query-Local Dataset

lab

el

interest?i

Querykn

ows

?iLabel

?acq

http://bob.name

nam

e

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

http://alice.name

?acq

knows

http://bob.name

http://alice.name

Page 31: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 31

Re-using the query-local dataset (a.k.a. data caching)may benefit

query performance + result completeness

Hypothesis

Page 32: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 32

Contributions

● Systematic analysis of the impact of data caching● Theoretical foundation* ● Conceptual analysis* ● Empirical evaluation of the potential impact

● Out of scope: Caching strategies (replacement, invalidation)

*see paper

Page 33: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 33

Experiment – Scenario

● Information about thedistributed social network of FOAF profiles● 5 types of queries

● Experiment Setup:● 23 persons● Sequential use➔ 115 queries

Page 34: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 34

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse given order

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

hit rate

● no reuse experiment:● No data caching

● given order experiment● Reuse of the query-local

dataset for the complete sequence of all 115 queries

● Hit rate:

look-ups answered from cacheall look-up requests

Page 35: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 35

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

hit rate

● no reuse experiment:● No data caching

● given order experiment● Reuse of the query-local

dataset for the complete sequence of all 115 queries

● Hit rate:

look-ups answered from cacheall look-up requests

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse given order

Page 36: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 36

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

hit rate query execution time(in seconds)

number of query results

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse given order

Page 37: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 37

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

hit rate query execution time(in seconds)

number of query results

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse given order

Page 38: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 38

Summary

● Contributions:● Theoretical foundation● Conceptual analysis● Empirical evaluation

● Main findings:● Additional results possible (for semantically similar queries)● Impact on performance may be positive but also negative

● Future work:● Analysis of caching strategies in our context● Main issue: invalidation

Page 39: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 39

Backup Slides

Page 40: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 40

Contributions

● Theoretical foundation (extension of the original definition)

● Reachability by a Dseed

-initialized execution of a BGP query b

● Dseed

-dependent solution for a BGP query b

● Reachability R(B) for a serial execution of B = b1 , … , b

n

➔ Each solution for bcur

is also R(B)-dependent solution for bcur

● Conceptual analysis of the impact of data caching

● Performance factor: p( bcur

, B ) = c( bcur

, [ ] ) – c( bcur

, B )

● Serendipity factor: s( bcur

, B ) = b( bcur

, B ) – b( bcur

, [ ] )

● Empirical verification of the potential impact

● Out of scope: Caching strategies (replacement, invalidation)

Page 41: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 41

Query Template Contact

SELECT * WHERE { <PERSON> foaf:knows ?p .

OPTIONAL { ?p foaf:name ?name } OPTIONAL { ?p foaf:firstName ?firstName } OPTIONAL { ?p foaf:givenName ?givenName } OPTIONAL { ?p foaf:givenname ?givenname } OPTIONAL { ?p foaf:familyName ?familyName } OPTIONAL { ?p foaf:family_name ?family_name } OPTIONAL { ?p foaf:lastName ?lastName } OPTIONAL { ?p foaf:surname ?surname }

OPTIONAL { ?p foaf:birthday ?birthday }

OPTIONAL { ?p foaf:img ?img }

OPTIONAL { ?p foaf:phone ?phone } OPTIONAL { ?p foaf:aimChatID ?aimChatID } OPTIONAL { ?p foaf:icqChatID ?icqChatID } OPTIONAL { ?p foaf:jabberID ?jabberID } OPTIONAL { ?p foaf:msnChatID ?msnChatID } OPTIONAL { ?p foaf:skypeID ?skypeID } OPTIONAL { ?p foaf:yahooChatID ?yahooChatID }

}

Page 42: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 42

Query Template UnsetProps

SELECT DISTINCT ?result ?resultLabel WHERE{

?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> .?result rdfs:domain foaf:Person .

OPTIONAL { <PERSON> ?result ?var0 } FILTER ( !bound(?var0) )

<PERSON> foaf:knows ?var2 .?var2 ?result ?var3 .?result rdfs:label ?resultLabel .?result vs:term_status ?var1 .

}ORDER BY ?var1

Page 43: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 43

Query Template Incoming

SELECT DISTINCT ?result WHERE{

?result foaf:knows <PERSON> .

OPTIONAL{

?result foaf:knows ?var1 .FILTER ( <PERSON> = ?var1 )<PERSON> foaf:knows ?result .

}FILTER ( !bound(?var1) )

}

Page 44: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 44

Query Template 2ndDegree1

SELECT DISTINCT ?result WHERE{

<PERSON> foaf:knows ?p1 .<PERSON> foaf:knows ?p2 .FILTER ( ?p1 != ?p2 )

?p1 foaf:knows ?result .FILTER ( <PERSON> != ?result )?p2 foaf:knows ?result .

OPTIONAL {<PERSON> ?knows ?result .FILTER ( ?knows = foaf:knows )

}FILTER ( !bound(?knows) )

}

Page 45: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 45

Query Template 2ndDegree2

SELECT DISTINCT ?result WHERE{

<PERSON> foaf:knows ?p1 .<PERSON> foaf:knows ?p2 .FILTER ( ?p1 != ?p2 )

?result foaf:knows ?p1 .FILTER ( <PERSON> != ?result )?result foaf:knows ?p2 .

OPTIONAL {<PERSON> ?knows ?result .FILTER ( ?knows = foaf:knows )

}FILTER ( !bound(?knows) )

}

Page 46: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 46

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

Experiment – Single Query

hit rate

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

● no reuse experiment:● No data caching

● upper bound experiment● Reuse of query-local dataset

for 3 executions of each query● Third execution measured

● Hit rate:

look-ups answered from cacheall look-up requests

Page 47: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 47

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

Experiment – Single Query

hit rate

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

● no reuse experiment:● No data caching

● upper bound experiment● Reuse of query-local dataset

for 3 executions of each query● Third execution measured

● Hit rate:

look-ups answered from cacheall look-up requests

Page 48: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 48

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

Experiment – Single Query

hit rate query execution time(in seconds)

number of query results

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

Page 49: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 49

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

Experiment – Single Query

hit rate query execution time(in seconds)

number of query results

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

Page 50: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 50

Experiment – Single Query

● In the ideal case for Bupper

= [ bcur

, bcur

] :

● pupper

( bcur

, Bupper

) = c( bcur

, [ ] ) – c( bcur

, Bupper

) = c( bcur

, [ ] )

● supper

( bcur

, Bupper

) = b( bcur

, Bupper

) – b( bcur

, [ ] ) = 0

Experiment Avg.1 number of Query Results

(std.dev.)

Average1

Hit Rate

(std.dev.)

Avg.1 query Execution Time

(std.dev.)

no reuse4.983

(11.658)

0.576(0.182)

30.036 s(46.708)

upper bound5.070

(11.813)

0.996(0.017)

1.943 s(11.375)

1 Averaged over all 115 queries

Page 51: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 51

Experiment – Single Query

● Summary (measurement errors aside):● Same number of query results● Significant improvements in query performance

Experiment Avg.1 number of Query Results

(std.dev.)

Average1

Hit Rate

(std.dev.)

Avg.1 query Execution Time

(std.dev.)

no reuse4.983

(11.658)

0.576(0.182)

30.036 s(46.708)

upper bound5.070

(11.813)

0.996(0.017)

1.943 s(11.375)

1 Averaged over all 115 queries

Page 52: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 52

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

given order

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

hit rate

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

query execution time(in seconds)

number of query results

● given order experiment:● Reuse of the query-local

dataset for the complete sequence of all 115 queries

Page 53: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 53

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

query execution time(in seconds)

number of query results

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

hit rate

● given order experiment:● Reuse of the query-local

dataset for the complete sequence of all 115 queries

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

given order

Page 54: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 54

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

hit rate query execution time(in seconds)

number of query results

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

given order

Page 55: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 55

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

hit rate query execution time(in seconds)

number of query results

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

given order

Page 56: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 56

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

given order

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

hit rate query execution time(in seconds)

number of query results

Bgiven order

= [ q1 , … , q

38 ]

s( q39

, Bgiven order

) = b( q39

, Bgiven order

) – b( q39

, [ ] )

= 9 – 1

= 8

Page 57: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 57

ContactInfoPhillipe

UnsetPropsPhillipe

2ndDegree1Phillipe

2ndDegree2Phillipe

IncomingPhillipe

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

no reuse upper bound

given order

Experiment – Complete Sequence

(Query No. 36)

(Query No. 37)

(Query No. 38)

(Query No. 39)

(Query No. 40)

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 20 40 60 80

0 20 40 60 80

hit rate query execution time(in seconds)

number of query results

Bgiven order

= [ q1 , … , q

38 ]

p'( q39

, Bgiven order

) = c'( q39

, [ ] ) – c'( q39

, Bgiven order

)

= 31.48 s – 68.64 s

= – 37.16 s

Page 58: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 58

Experiment – Complete Sequence

● Summary:● Data cache may provide for additional query results● Impact on performance may be positive but also negative

Experiment Avg.1 number of Query Results

(std.dev.)

Average1

Hit Rate

(std.dev.)

Avg.1 query Execution Time

(std.dev.)

no reuse4.983

(11.658)

0.576(0.182)

30.036 s(46.708)

upper bound5.070

(11.813)

0.996(0.017)

1.943 s(11.375)

given order6.878

(12.158)

0.932(0.139)

39.845 s(145.898)

1 Averaged over all 115 queries

Page 59: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 59

Experiment – Complete Sequence

● Executing the query sequence in a random order results in measurements similar to the given order.

Experiment Avg.1 number of Query Results

(std.dev.)

Average1

Hit Rate

(std.dev.)

Avg.1 query Execution Time

(std.dev.)

no reuse4.983

(11.658)

0.576(0.182)

30.036 s(46.708)

upper bound5.070

(11.813)

0.996(0.017)

1.943 s(11.375)

given order6.878

(12.158)

0.932(0.139)

39.845 s(145.898)

random orders6.652

(11.966)

0.954(0.036)

36.994 s(118.700)

Page 60: How Caching Improves Efficiency and Result Completeness for Querying Linked Data

Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 60

These slides have been created byOlaf Hartig

http://olafhartig.de

This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)