SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig...

24
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin

Transcript of SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig...

Page 1: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

SPARQL Query Graph Model(How to improve query evaluation?)

Ralf Heese and Olaf Hartig

Humboldt-Universität zu Berlin

Page 2: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 2

A Posting in a Newsgroup

Question:

• A series of SPARQL queries of the form:… WHERE { {?family <http://dad> ?d . ?d <http://name> "Peter" .}{?family <http://mom> ?m . ?m <http://name> "Robin" .} …

• My queries runs very slowly

• Simple queries on a database of 10,000 trees describing families

Answer:

• Put the more specific part of the query first; it makes a significant difference. …

Reply:

• … My time went from 33000ms 150ms. …Date: Mar 8, 2006http://groups.yahoo.com/group/jena-dev/message/21436

Page 3: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 3

One query, many ways to execute

{?family <http://dad> ?d .?d <http://name> "Peter" .}

{?family <http://mom> ?m .?m <http://name> "Robin" .}

{?family <http://pet> ?p .?p <http://name> "Toller" .}

{?family <http://mom> ?m .?m <http://name> "Robin" .}

{?family <http://dad> ?d .?d <http://name> "Peter" .}

{?family <http://pet> ?p .?p <http://name> "Toller" .}

{?family <http://mom> ?m .?m <http://name> "Robin" .}

{?family <http://pet> ?p .?p <http://name> "Toller" .}

{?family <http://dad> ?d .?d <http://name> "Peter" .}

Page 4: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 4

Outline

Query processing in databases

SPARQL query graph model (SQGM)

Transforming SQGMs

Evaluation

Conclusion

Page 5: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Query Processing in Databases

Page 6: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 6

Internal representationof the query

Internal representationof the query

SPARQLQueryGraphModel

SPARQLQueryGraphModel

Tasks of the query engine

Query parsingQuery parsing Query rewritingQuery rewriting

QEP generationQEP generationQEP executionQEP execution

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

QEP = Query Execution Plan

Page 7: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

SPARQL Query Graph Model (SQGM)

Page 8: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 8

Extensible to new concepts of the query

language

Advantages

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

Supports all phases of query processing

Adaptable to changes of the query languageStore additional

information needed for query processing

SPARQLQueryGraphModel

SPARQLQueryGraphModel

Page 9: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 9

Basic Structures

Directed graph

Operation

• Head: provided variables

• Body: operation details

Dataflow

• connects the input andthe output of two operations

Body

Head

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

Page 10: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 10

Constructing an SQGM

SELECT ?n ?cFROM http://example.org/university.rdfWHERE { ?s rdf:type ub:GraduateStudent . OPTIONAL { ?s ub:takesCourse ?c .} ?s ub:name ?n .}

http://example.org/university.rdf

?s ub:name ?n

?s ?n

?s ub:takesCourse ?c

?s ?c

?s rdf:type ub:GraduateStudent

?s

Join

?s ?c

Select

?n ?c

Join

?s ?c ?n

optional?s ?c?s

?s ?c

?s ?n

?n ?c

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

Page 11: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 11

Operation Types and Dataflow Types

Variable providing operations

Graph providing operations

Variable dataflows

Graph dataflows

?s ub:name ?n

?s ?n

?s ub:takesCourse ?c

?s ?c

?s rdf:type ub:GraduateStudent

?s

Select

?n ?c

Join

?s ?c ?n

Join

?s ?c

http://example.org/university.rdf

optional?s ?c?s

?s ?c

?s ?n

?n ?c

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

Page 12: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Transforming SQGMs

Page 13: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 13

Query Rewriting

Goals

• More efficient evaluation of a query

• Provide more options for the generation of query plans, e.g.,

Data access strategy

Join order

Selection of indexes

Means

• Rule-based transformation, i.e., restructuring of the query, detection of redundancies and contradictions

• Heuristic = set of rules

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

Page 14: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 14

optional

?s ub:name ?n

?s ?n

?s ub:takesCourse ?c

?s ?c

?s rdf:type ub:GraduateStudent

?s

Select

?n ?c

Join

?s ?c ?n

Join

?s ?c

http://example.org/university.rdf

?s ?c?s

?s ?c

?s ?n

?n ?c

Heuristic: Combine Basic Graph Pattern

Basic graph pattern cannot be mergedBut these could be merged if they wereoperands of the same join operation.

Apply transformation rules to the SQGMQuery

Processing in Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

Page 15: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 15

Next Step

http://example.org/university.rdf

?s ub:name ?n

?s ?n

?s ub:takesCourse ?c

?s ?c

?s rdf:type ub:GraduateStudent

?s

Join

?s ?n

Select

?n ?c

Join

?s ?c ?n

optional

?s ?c

?s

?s ?n

?s ?n

?n ?c

Apply another transformation rule

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion?s rdf:type ub:GraduateStudent?s ub:name ?n

?s ?n

Page 16: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Evaluation

Page 17: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 17

Prototype

Setup

• Jena Semantic Web Framework

• ARQ – SPARQL query processor for Jena

• RDF graphs stored on secondary storage

Extended by

• SPARQL query graph model

• Rule engine

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

Page 18: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 18

Interaction between ARQ and SQGM extension

ARQ SQGM extension

Construction ofan ARQ query model

Construction ofan ARQ query model

Translation intoan SQGM

Translation intoan SQGM

Translation intoan ARQ model

Translation intoan ARQ model

Rewriting ofthe SQGM

Rewriting ofthe SQGM

Generation of a Query Execution Plan

Generation of a Query Execution Plan

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

SPARQL Query

Execution of the QEPExecution of the QEP

Query result

Page 19: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 19

Evaluation – Setup

RDF Data

A set of 41 SPARQL queries

• Different combinations of graph patterns including OPTIONAL, FILTER and UNION

UnivBench (1.0)

UnivBench (5.0)

UnivBench (10.0)

#Triples 100,543 624,532 1,272,575

#Resources 20,659 129,533 263,427

Generator UBA (v.1.7) of Lehigh University Benchmarkshttp://swat.cse.lehigh.edu/projects/lubm/

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

Page 20: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 20

Evaluation – Results

Measured query execution time of a selected query

• Factor 2.4

• Time needed for transformation between models< 1 ms

• Average time savings of approx. 87%

• Only one case with slightly higher execution time

0

10

20

30

40

50

60

70

80

Seco

nd

s

UnivBench(1.0)

UnivBench(5.0)

UnivBench(10.0)

original queryrewritten query

5.8 2.5

39.4 16.4 32.377.9

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

SELECT ?n ?cFROM http://example.org/university.rdfWHERE { ?s rdf:type ub:GraduateStudent . OPTIONAL { ?s ub:takesCourse ?c .} ?s ub:name ?n .}

Page 21: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 21

Explanation for the Result

Fast path algorithm of Jena

• Perform pattern matching within the underlying relational database

• Match multiple filtered basic graph patterns

WHERE { ?s rdf:type ub:GraduateStudent . OPTIONAL { ?s ub:takesCourse ?c .} ?s ub:name ?n .}

WHERE { ?s rdf:type ub:GraduateStudent . ?s ub:name ?n . OPTIONAL { ?s ub:takesCourse ?c .}}

Fast pathnot applicable

Fast pathapplicable

Page 22: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Conclusion

Page 23: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Ralf Heese, SPARQL Query Graph Model 23

Conclusion and Future Work

SQGM: a query model for SPARQL

• Supporting all phases of query processing

• Easy to extend

• Transformation rules and heuristics for SQGMs

Implementation illustrated the potential of SQGMs

Outlook

Develop further heuristics to rewrite SPARQL queries

Integrate index selection into the query optimization

Query Processing in

Databases

SPARQL Query Graph Model

Transforming SQGMs

Evaluation

Conclusion

Page 24: SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Thank you!