An Evolutionary Perspective on Approximate RDF Query Answering

73
An Evolutionary Perspective on Approximate RDF Query Answering Christophe Guéret, Eyal Oren, Stefan Schlobach, Frank van Harmelen and Martijn Schut Vrije Universiteit, Amsterdam

description

RDF is increasingly being used to represent large amounts of data on the Web. Current query evaluation strategies for RDF are inspired by databases, assuming perfect answers on finite repositories. In this paper, we present a novel query method based on evolutionary computing, which allows us to handle uncertainty, incompleteness and unsatisfiability, and deal with large datasets, all within a single conceptual framework. Our technique supports approximate answers with anytime behaviour. We present initial results and analyse next steps for improvement.

Transcript of An Evolutionary Perspective on Approximate RDF Query Answering

Page 1: An Evolutionary Perspective on Approximate RDF Query Answering

An Evolutionary Perspective onApproximate RDF Query Answering

Christophe Guéret, Eyal Oren, Stefan Schlobach,Frank van Harmelen and Martijn Schut

Vrije Universiteit, Amsterdam

Page 2: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDF

I Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answering

I Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answering

I Finding some, almost valid, data

I The Evolutionary Perspective

I Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 3: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the Web

I Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answering

I Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answering

I Finding some, almost valid, data

I The Evolutionary Perspective

I Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 4: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answering

I Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answering

I Finding some, almost valid, data

I The Evolutionary Perspective

I Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 5: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answering

I Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answering

I Finding some, almost valid, data

I The Evolutionary Perspective

I Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 6: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answeringI Finding data matching criterion

I ... many queries are actually not satisfiable

I Approximate RDF Query answering

I Finding some, almost valid, data

I The Evolutionary Perspective

I Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 7: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answering

I Finding some, almost valid, data

I The Evolutionary Perspective

I Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 8: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answering

I Finding some, almost valid, data

I The Evolutionary Perspective

I Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 9: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answeringI Finding some, almost valid, data

I The Evolutionary Perspective

I Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 10: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answeringI Finding some, almost valid, data

I The Evolutionary Perspective

I Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 11: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answeringI Finding some, almost valid, data

I The Evolutionary PerspectiveI Test different solutions

I Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 12: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

The next 30 minutes in 4 points...

I RDFI Data on the WebI Inconsistent, uncertain, heterogeneous, Huge and growing!

I RDF Query answeringI Finding data matching criterionI ... many queries are actually not satisfiable

I Approximate RDF Query answeringI Finding some, almost valid, data

I The Evolutionary PerspectiveI Test different solutionsI Progressive optimisation of the result

SUM 2008 - October 2, 2008 2 / 24

Page 13: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

1 What’s the problem ?Querying RDF datastoresStandard techniques

2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it

3 Does it work ?Evolution of the qualitySome characteristics of this method

4 TODO list

SUM 2008 - October 2, 2008 3 / 24

Page 14: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

1 What’s the problem ?Querying RDF datastoresStandard techniques

2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it

3 Does it work ?Evolution of the qualitySome characteristics of this method

4 TODO list

SUM 2008 - October 2, 2008 4 / 24

Page 15: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Example

I RDF dataset<Ullman88> type Book .<Ullman88> label "Principles of Database and

Knowledge-Base Systems" .<Ullman88> author b1 .b1 _1 ullman .ullman homepage <http://stanford.edu/~ullman/> .

I SPARQL querySELECT ?title WHERE {?publication type Book .?publication label ?title .}

I Expected answer?title = "Principles of Database and Knowledge-Base Systems"

SUM 2008 - October 2, 2008 5 / 24

Page 16: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Problem description

I Triple =subject,predicate,object

I Dataset =graph oftriples

I Querying :find apattern inthe graph

A query and a graph [PSPARQL07]SUM 2008 - October 2, 2008 6 / 24

Page 17: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Standard techniques

I Standard approach :

1 Find all the possible results for ?publication typeBook

I?publication<Ullman88>

2 Find all the possible results for ?publication label?title

I?publication ?title<Ullman88> "Principles of ..."

3 Do a join on the two tables and return the result

I ?title = "Principles of ..."

I Fast thanks to the creation of indexes and queryoptimisation

SUM 2008 - October 2, 2008 7 / 24

Page 18: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Standard techniques

I Standard approach :1 Find all the possible results for ?publication type

Book

I?publication<Ullman88>

2 Find all the possible results for ?publication label?title

I?publication ?title<Ullman88> "Principles of ..."

3 Do a join on the two tables and return the result

I ?title = "Principles of ..."

I Fast thanks to the creation of indexes and queryoptimisation

SUM 2008 - October 2, 2008 7 / 24

Page 19: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Standard techniques

I Standard approach :1 Find all the possible results for ?publication type

Book

I?publication<Ullman88>

2 Find all the possible results for ?publication label?title

I?publication ?title<Ullman88> "Principles of ..."

3 Do a join on the two tables and return the result

I ?title = "Principles of ..."

I Fast thanks to the creation of indexes and queryoptimisation

SUM 2008 - October 2, 2008 7 / 24

Page 20: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Standard techniques

I Standard approach :1 Find all the possible results for ?publication type

Book

I?publication<Ullman88>

2 Find all the possible results for ?publication label?title

I?publication ?title<Ullman88> "Principles of ..."

3 Do a join on the two tables and return the result

I ?title = "Principles of ..."

I Fast thanks to the creation of indexes and queryoptimisation

SUM 2008 - October 2, 2008 7 / 24

Page 21: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Standard techniques

I Standard approach :1 Find all the possible results for ?publication type

Book

I?publication<Ullman88>

2 Find all the possible results for ?publication label?title

I?publication ?title<Ullman88> "Principles of ..."

3 Do a join on the two tables and return the result

I ?title = "Principles of ..."

I Fast thanks to the creation of indexes and queryoptimisation

SUM 2008 - October 2, 2008 7 / 24

Page 22: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Standard techniques

I Standard approach :1 Find all the possible results for ?publication type

Book

I?publication<Ullman88>

2 Find all the possible results for ?publication label?title

I?publication ?title<Ullman88> "Principles of ..."

3 Do a join on the two tables and return the result

I ?title = "Principles of ..."

I Fast thanks to the creation of indexes and queryoptimisation

SUM 2008 - October 2, 2008 7 / 24

Page 23: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Standard techniques

I Standard approach :1 Find all the possible results for ?publication type

Book

I?publication<Ullman88>

2 Find all the possible results for ?publication label?title

I?publication ?title<Ullman88> "Principles of ..."

3 Do a join on the two tables and return the resultI ?title = "Principles of ..."

I Fast thanks to the creation of indexes and queryoptimisation

SUM 2008 - October 2, 2008 7 / 24

Page 24: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Standard techniques

I Standard approach :1 Find all the possible results for ?publication type

Book

I?publication<Ullman88>

2 Find all the possible results for ?publication label?title

I?publication ?title<Ullman88> "Principles of ..."

3 Do a join on the two tables and return the resultI ?title = "Principles of ..."

I Fast thanks to the creation of indexes and queryoptimisation

SUM 2008 - October 2, 2008 7 / 24

Page 25: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Motivation

I Designed to return results only when there are someI Not designed for incomplete and approximate

queries/answersI Hard to distribute

I Approximate answers to precise queries

I If the query is unsat, return the best almost sat solutionfound

I Precises answers to approximate queries

I Return a subset of existing solutions instead of showingthem all

I Interactive querying

I Use of intermediate results to help the user improving hisquery

SUM 2008 - October 2, 2008 8 / 24

Page 26: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Motivation

I Designed to return results only when there are someI Not designed for incomplete and approximate

queries/answersI Hard to distribute

I Approximate answers to precise queriesI If the query is unsat, return the best almost sat solution

found

I Precises answers to approximate queries

I Return a subset of existing solutions instead of showingthem all

I Interactive querying

I Use of intermediate results to help the user improving hisquery

SUM 2008 - October 2, 2008 8 / 24

Page 27: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Motivation

I Designed to return results only when there are someI Not designed for incomplete and approximate

queries/answersI Hard to distribute

I Approximate answers to precise queriesI If the query is unsat, return the best almost sat solution

found

I Precises answers to approximate queriesI Return a subset of existing solutions instead of showing

them all

I Interactive querying

I Use of intermediate results to help the user improving hisquery

SUM 2008 - October 2, 2008 8 / 24

Page 28: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Motivation

I Designed to return results only when there are someI Not designed for incomplete and approximate

queries/answersI Hard to distribute

I Approximate answers to precise queriesI If the query is unsat, return the best almost sat solution

found

I Precises answers to approximate queriesI Return a subset of existing solutions instead of showing

them all

I Interactive queryingI Use of intermediate results to help the user improving his

query

SUM 2008 - October 2, 2008 8 / 24

Page 29: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

1 What’s the problem ?Querying RDF datastoresStandard techniques

2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it

3 Does it work ?Evolution of the qualitySome characteristics of this method

4 TODO list

SUM 2008 - October 2, 2008 9 / 24

Page 30: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Approach

I “I’m Feeling Lucky” approach :

1 Assign some random values to the variables

I?publication = <Ullman88>?title = Book

2 Verify if the solution is valid

I

Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no

3 If the solution is OK, stop. Otherwise, try again withsomething else

I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query

SUM 2008 - October 2, 2008 10 / 24

Page 31: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Approach

I “I’m Feeling Lucky” approach :1 Assign some random values to the variables

I?publication = <Ullman88>?title = Book

2 Verify if the solution is valid

I

Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no

3 If the solution is OK, stop. Otherwise, try again withsomething else

I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query

SUM 2008 - October 2, 2008 10 / 24

Page 32: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Approach

I “I’m Feeling Lucky” approach :1 Assign some random values to the variables

I?publication = <Ullman88>?title = Book

2 Verify if the solution is valid

I

Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no

3 If the solution is OK, stop. Otherwise, try again withsomething else

I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query

SUM 2008 - October 2, 2008 10 / 24

Page 33: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Approach

I “I’m Feeling Lucky” approach :1 Assign some random values to the variables

I?publication = <Ullman88>?title = Book

2 Verify if the solution is valid

I

Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no

3 If the solution is OK, stop. Otherwise, try again withsomething else

I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query

SUM 2008 - October 2, 2008 10 / 24

Page 34: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Approach

I “I’m Feeling Lucky” approach :1 Assign some random values to the variables

I?publication = <Ullman88>?title = Book

2 Verify if the solution is valid

I

Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no

3 If the solution is OK, stop. Otherwise, try again withsomething else

I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query

SUM 2008 - October 2, 2008 10 / 24

Page 35: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Approach

I “I’m Feeling Lucky” approach :1 Assign some random values to the variables

I?publication = <Ullman88>?title = Book

2 Verify if the solution is valid

I

Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no

3 If the solution is OK, stop. Otherwise, try again withsomething else

I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query

SUM 2008 - October 2, 2008 10 / 24

Page 36: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Approach

I “I’m Feeling Lucky” approach :1 Assign some random values to the variables

I?publication = <Ullman88>?title = Book

2 Verify if the solution is valid

I

Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no

3 If the solution is OK, stop. Otherwise, try again withsomething else

I Rely on membership testing (instead of lookup)

I The testing loop can be stopped at any timeI A result may satisfy part of the query

SUM 2008 - October 2, 2008 10 / 24

Page 37: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Approach

I “I’m Feeling Lucky” approach :1 Assign some random values to the variables

I?publication = <Ullman88>?title = Book

2 Verify if the solution is valid

I

Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no

3 If the solution is OK, stop. Otherwise, try again withsomething else

I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any time

I A result may satisfy part of the query

SUM 2008 - October 2, 2008 10 / 24

Page 38: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Approach

I “I’m Feeling Lucky” approach :1 Assign some random values to the variables

I?publication = <Ullman88>?title = Book

2 Verify if the solution is valid

I

Triple Is in the graph ?<Ullman88> type Book yes<Ullman88> label Book no

3 If the solution is OK, stop. Otherwise, try again withsomething else

I Rely on membership testing (instead of lookup)I The testing loop can be stopped at any timeI A result may satisfy part of the query

SUM 2008 - October 2, 2008 10 / 24

Page 39: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Our choices

I Need to pay attention to two aspects

1 Each try should be a step closer to the solution

I Random guessing may never endI Stopping the process at t + 1 should give better results than

at t

2 Testing a candidate solution must be fast

I Will try a lot of solutions

I We made the following choices

I Generation of solutions : Evolutionary algorithmI Verification of solutions : Bloom filter based testing

SUM 2008 - October 2, 2008 11 / 24

Page 40: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Our choices

I Need to pay attention to two aspects1 Each try should be a step closer to the solution

I Random guessing may never endI Stopping the process at t + 1 should give better results than

at t

2 Testing a candidate solution must be fast

I Will try a lot of solutions

I We made the following choices

I Generation of solutions : Evolutionary algorithmI Verification of solutions : Bloom filter based testing

SUM 2008 - October 2, 2008 11 / 24

Page 41: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Our choices

I Need to pay attention to two aspects1 Each try should be a step closer to the solution

I Random guessing may never endI Stopping the process at t + 1 should give better results than

at t2 Testing a candidate solution must be fast

I Will try a lot of solutions

I We made the following choices

I Generation of solutions : Evolutionary algorithmI Verification of solutions : Bloom filter based testing

SUM 2008 - October 2, 2008 11 / 24

Page 42: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Our choices

I Need to pay attention to two aspects1 Each try should be a step closer to the solution

I Random guessing may never endI Stopping the process at t + 1 should give better results than

at t2 Testing a candidate solution must be fast

I Will try a lot of solutions

I We made the following choicesI Generation of solutions : Evolutionary algorithmI Verification of solutions : Bloom filter based testing

SUM 2008 - October 2, 2008 11 / 24

Page 43: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Binary Bloom filters (1/2)

I Compact representation of information : a set of n = 8 bits

1 2 3 4 5 6 7 8

I Supports two operationsI INSERT(KEY) : Insert a key into the filterI CONTAINS(KEY) : Test for the presence of a key

I Use k = 3 hash functions to compute a set of bits from akey

HASH3(“HELLO WORLD”)=3HASH2(“HELLO WORLD”)=6HASH1(“HELLO WORLD”)=8

SUM 2008 - October 2, 2008 12 / 24

Page 44: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Binary Bloom filters (2/2)

I INSERT(“HELLO WORLD”)Current

“Hello world”

New

OR

=

I Bit-wise or operation

I Always successful (i.e.unlimited capacity)

I Precision depends ofnumber of elements m.

I CONTAINS(“BONJOUR !”)Current

“Bonjour !”

Test result

AND

=

I Bit-wise and operation

I Positive result can be acollision

perror = (1− e−knm )k

SUM 2008 - October 2, 2008 13 / 24

Page 45: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

A first (naive) approach

I Insert all the triples into a unique Bloom filter.I INSERT(“<Ullman88>_type_Book”)I INSERT(“<Ullman88>_label_"Principles of ..."”)I . . .

I Use the CONTAINS operation to verify a solution

I CONTAINS(“<Ullman88>_type_Book”)⇒ trueI CONTAINS(“<Ullman88>_label_Book”)⇒ false

I Not the best approach ! Let’s see what happen in detail . . .

?publication label ?title

CONTAINS(“<Ullman88>_label_Book”)

modify ?publication and ?title

SUM 2008 - October 2, 2008 14 / 24

Page 46: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

A first (naive) approach

I Insert all the triples into a unique Bloom filter.I INSERT(“<Ullman88>_type_Book”)I INSERT(“<Ullman88>_label_"Principles of ..."”)I . . .

I Use the CONTAINS operation to verify a solutionI CONTAINS(“<Ullman88>_type_Book”)⇒ trueI CONTAINS(“<Ullman88>_label_Book”)⇒ false

I Not the best approach ! Let’s see what happen in detail . . .

?publication label ?title

CONTAINS(“<Ullman88>_label_Book”)

modify ?publication and ?title

SUM 2008 - October 2, 2008 14 / 24

Page 47: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

A first (naive) approach

I Insert all the triples into a unique Bloom filter.I INSERT(“<Ullman88>_type_Book”)I INSERT(“<Ullman88>_label_"Principles of ..."”)I . . .

I Use the CONTAINS operation to verify a solutionI CONTAINS(“<Ullman88>_type_Book”)⇒ trueI CONTAINS(“<Ullman88>_label_Book”)⇒ false

I Not the best approach ! Let’s see what happen in detail . . .

?publication label ?title

CONTAINS(“<Ullman88>_label_Book”)

modify ?publication and ?title

SUM 2008 - October 2, 2008 14 / 24

Page 48: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

A first (naive) approach

I Insert all the triples into a unique Bloom filter.I INSERT(“<Ullman88>_type_Book”)I INSERT(“<Ullman88>_label_"Principles of ..."”)I . . .

I Use the CONTAINS operation to verify a solutionI CONTAINS(“<Ullman88>_type_Book”)⇒ trueI CONTAINS(“<Ullman88>_label_Book”)⇒ false

I Not the best approach ! Let’s see what happen in detail . . .

?publication label ?title

CONTAINS(“<Ullman88>_label_Book”)

modify ?publication and ?title

SUM 2008 - October 2, 2008 14 / 24

Page 49: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Graph parsing

I Every triple of the graph is inserted into 4 Bloom filters

type<Ullman88> Book

<Ullman88>_type_Book

SPO

<Ullman88>_type

SP

type_Book

PO

<Ullman88>_Book

SO

I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>

I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46

SUM 2008 - October 2, 2008 15 / 24

Page 50: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Graph parsing

I Every triple of the graph is inserted into 4 Bloom filters

type<Ullman88> Book

<Ullman88>_type_Book

SPO

<Ullman88>_type

SP

type_Book

PO

<Ullman88>_Book

SO

I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>

I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46

SUM 2008 - October 2, 2008 15 / 24

Page 51: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Graph parsing

I Every triple of the graph is inserted into 4 Bloom filters

type<Ullman88> Book

<Ullman88>_type_Book

SPO

<Ullman88>_type

SP

type_Book

PO

<Ullman88>_Book

SO

I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>

I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46

SUM 2008 - October 2, 2008 15 / 24

Page 52: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Graph parsing

I Every triple of the graph is inserted into 4 Bloom filters

type<Ullman88> Book

<Ullman88>_type_Book

SPO

<Ullman88>_type

SP

type_Book

PO

<Ullman88>_Book

SO

I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>

I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46

SUM 2008 - October 2, 2008 15 / 24

Page 53: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Graph parsing

I Every triple of the graph is inserted into 4 Bloom filters

type<Ullman88> Book

<Ullman88>_type_Book

SPO

<Ullman88>_type

SP

type_Book

PO

<Ullman88>_Book

SO

I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>

I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46

SUM 2008 - October 2, 2008 15 / 24

Page 54: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Graph parsing

I Every triple of the graph is inserted into 4 Bloom filters

type<Ullman88> Book

<Ullman88>_type_Book

SPO

<Ullman88>_type

SP

type_Book

PO

<Ullman88>_Book

SO

I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>

I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46

SUM 2008 - October 2, 2008 15 / 24

Page 55: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Graph parsing

I Every triple of the graph is inserted into 4 Bloom filters

type<Ullman88> Book

<Ullman88>_type_Book

SPO

<Ullman88>_type

SP

type_Book

PO

<Ullman88>_Book

SO

I Three domains are definedS = <Ullman88> b1 ullmanP = type label author _1 homepageO = Book "Principles of ..." b1 ullman <http://...>

I Each term is replaced by an integer (with a dictionary)I <Ullman88>→ 46

SUM 2008 - October 2, 2008 15 / 24

Page 56: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Evolutionary algorithm flowchart [Eiben2003]

I Set of populations + Set of operators

SUM 2008 - October 2, 2008 16 / 24

Page 57: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Query parsing

I Definition of the chromosome for the individuals?publication1 ?publication2 ?title

I Creation of constraints to verify

I Clause ?publication type Book .bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)

I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)

I Equality constraintequal(?publication1,?publication2)

Removedbecausealways true

SUM 2008 - October 2, 2008 17 / 24

Page 58: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Query parsing

I Definition of the chromosome for the individuals?publication1 ?publication2 ?title

I Creation of constraints to verify

I Clause ?publication type Book .bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)

I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)

I Equality constraintequal(?publication1,?publication2)

Removedbecausealways true

SUM 2008 - October 2, 2008 17 / 24

Page 59: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Query parsing

I Definition of the chromosome for the individuals?publication1 ?publication2 ?title

I Creation of constraints to verifyI Clause ?publication type Book .

bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)

I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)

I Equality constraintequal(?publication1,?publication2)

Removedbecausealways true

SUM 2008 - October 2, 2008 17 / 24

Page 60: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Query parsing

I Definition of the chromosome for the individuals?publication1 ?publication2 ?title

I Creation of constraints to verifyI Clause ?publication type Book .

bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)

I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)

I Equality constraintequal(?publication1,?publication2)

Removedbecausealways true

SUM 2008 - October 2, 2008 17 / 24

Page 61: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Query parsing

I Definition of the chromosome for the individuals?publication1 ?publication2 ?title

I Creation of constraints to verifyI Clause ?publication type Book .

bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)

I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)

I Equality constraintequal(?publication1,?publication2)

Removedbecausealways true

SUM 2008 - October 2, 2008 17 / 24

Page 62: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Query parsing

I Definition of the chromosome for the individuals?publication1 ?publication2 ?title

I Creation of constraints to verifyI Clause ?publication type Book .

bloom(spo |?publication1 type Book)bloom(sp |?publication1 type)bloom(po |type Book)

I Clause ?publication label ?title .bloom(spo |?publication2 label ?title)bloom(sp |?publication2 label)bloom(po |label ?title)bloom(so |?publication2 ?title)

I Equality constraintequal(?publication1,?publication2)

Removedbecausealways true

SUM 2008 - October 2, 2008 17 / 24

Page 63: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Evaluation of a candidate solution

I Solution is checked against all the constraints. If one issatisfied,

I A global reward w is wonI Each variable used is equally rewarded

I Rewards for : bloom(spo|?publication2 label?title)

I reward(solution) += wI reward(?publication1) += w

2I reward(?title) += w

2

SUM 2008 - October 2, 2008 18 / 24

Page 64: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Creation of new individuals

I Select two individuals and do a one point crossover

dblp:ullman <Ullman88> "Principles. . ."

<Ullman88> dblp:ullman _:b1

Randomly pick a pivot point

dblp:ullman <Ullman88> _:b1

<Ullman88> dblp:ullman "Principles. . ."

Swap the two parts

I Mutate the least efficient variable

dblp:ullman <Ullman88> "Principles. . ."

0 3× w 2× w

Select the variable with lowestreward

<Ullman88> <Ullman88> "Principles. . ."

Assign a random new value

SUM 2008 - October 2, 2008 19 / 24

Page 65: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

1 What’s the problem ?Querying RDF datastoresStandard techniques

2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it

3 Does it work ?Evolution of the qualitySome characteristics of this method

4 TODO list

SUM 2008 - October 2, 2008 20 / 24

Page 66: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Results on some (small) datasets

I Database FOAF (15k triples) and DBLP (3M triples)I Query with, respectively, 4 and 11 different variablesI Average result for 200 individuals and 500 generations

0

10

20

30

40

50

60

0 100 200 300 400 500

fitne

ss v

alue

n-th generation

60

70

80

90

100

0 100 200 300 400 500

fitne

ss v

alue

n-th generation

I Solutions with maximum reward (52) are found for FOAFI Not enough time for DBLP (max 319)

SUM 2008 - October 2, 2008 21 / 24

Page 67: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Scalibility & speed

I Low memory requirementsI Only depends on the number of individuals and the size of

the Bloom filters

(a) parsing

dataset memoryFOAF 65 MBDBLP 230 MB

(b) querying

dataset memoryFOAF 15 MBDBLP 140 MB

Table: Average memory usage (mostly due to dictionary)

I Computation can be distributedI Candidate solutions are independentI The dictionary can be based on a DHT

SUM 2008 - October 2, 2008 22 / 24

Page 68: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

1 What’s the problem ?Querying RDF datastoresStandard techniques

2 And Now for Something Completely DifferentGuessing the solution insteadThe way we do it

3 Does it work ?Evolution of the qualitySome characteristics of this method

4 TODO list

SUM 2008 - October 2, 2008 23 / 24

Page 69: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Status and future work

I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,

number of generations, . . . )

I Current work

1 Improve benchmarking

I Test with more queries and more datasetsI Better study of the influence of the parameters

2 Improve evolution

I Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach

3 Test other, easy to parallelize and anytime, optimizer

I Swarm based algorithm (PSO, ...) or an other EAI CSP solver

SUM 2008 - October 2, 2008 24 / 24

Page 70: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Status and future work

I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,

number of generations, . . . )

I Current work

1 Improve benchmarking

I Test with more queries and more datasetsI Better study of the influence of the parameters

2 Improve evolution

I Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach

3 Test other, easy to parallelize and anytime, optimizer

I Swarm based algorithm (PSO, ...) or an other EAI CSP solver

SUM 2008 - October 2, 2008 24 / 24

Page 71: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Status and future work

I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,

number of generations, . . . )

I Current work1 Improve benchmarking

I Test with more queries and more datasetsI Better study of the influence of the parameters

2 Improve evolution

I Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach

3 Test other, easy to parallelize and anytime, optimizer

I Swarm based algorithm (PSO, ...) or an other EAI CSP solver

SUM 2008 - October 2, 2008 24 / 24

Page 72: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Status and future work

I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,

number of generations, . . . )

I Current work1 Improve benchmarking

I Test with more queries and more datasetsI Better study of the influence of the parameters

2 Improve evolutionI Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach

3 Test other, easy to parallelize and anytime, optimizer

I Swarm based algorithm (PSO, ...) or an other EAI CSP solver

SUM 2008 - October 2, 2008 24 / 24

Page 73: An Evolutionary Perspective on Approximate RDF Query Answering

griffioen

Problem and context Method proposed Experimental results Conclusion

Status and future work

I Current statusI The search process can be slow to convergeI Several parameters to tune (rewards, size of the population,

number of generations, . . . )

I Current work1 Improve benchmarking

I Test with more queries and more datasetsI Better study of the influence of the parameters

2 Improve evolutionI Experiment different type of crossover and mutationI Implement dynamic valuations for the rewardsI Improve early results on tabbu search approach

3 Test other, easy to parallelize and anytime, optimizerI Swarm based algorithm (PSO, ...) or an other EAI CSP solver

SUM 2008 - October 2, 2008 24 / 24