ISWC 2012 "Efficient execution of top-k SPARQL queries"

55
Ef#icient Execution of topk SPARQL queries Sara Magliacane (VU University Amsterdam) Alessandro Bozzon (Politecnico di Milano) Emanuele Della Valle (Politecnico di Milano)

description

Slides of my talk on efficient execution of top-k queries in SPARQL at ISWC 2012 in Boston. Including some bonus back-up slides :)

Transcript of ISWC 2012 "Efficient execution of top-k SPARQL queries"

Page 1: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Ef#icient  Execution  of  top-­‐k  SPARQL  queries  Sara  Magliacane  (VU  University  Amsterdam)  Alessandro  Bozzon  (Politecnico  di  Milano)  Emanuele  Della  Valle  (Politecnico  di  Milano)  

Page 2: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Outline  •  Introduc?on  

• What  are  top-­‐k  queries?  • Why  do  we  need  to  op?mize  them?  

• Our  approach:  •  A  rank-­‐aware  SPARQL  algebra  •  A  rank-­‐aware  execu?on  model  •  Three  planning  strategies  

•  Evalua?on   1  

Page 3: ISWC 2012 "Efficient execution of top-k SPARQL queries"

What  is  a  top-­‐k  query?  

2  

 

• A  query  that  returns    1.  a  limited  number  of  results  k    

2.  ordered  by  a  scoring  func?on  that  combines  several  criteria  

 

Page 4: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Rankings,  rankings  everywhere…  

3  

Page 5: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Rankings,  rankings  everywhere…  

4  

Page 6: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Rankings,  rankings  everywhere…  

5  

Page 7: ISWC 2012 "Efficient execution of top-k SPARQL queries"

 

A  very  intui?ve  and  simplified  example:  

• Top  3  largest  countries  (by  both  area  and  popula?on)  

6  

 

Why  do  we  need  to  optimize  them?  

Page 8: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  standard  way:    materialize-­‐then-­‐sort  scheme  

7  

Countries  by  area  

Countries  by  popula?on  

Compute  all  the  242  join  combina?ons  

Sort  all  the  242  join  combina?ons    

Fetch  3  best  results  

…  

…  

…  

…  

…  

242   242  

…  

Page 9: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Can  we  make  it  more  ef#icient?  

8  

Countries  by  area  

Countries  by  popula?on  

Order  incrementally  the  combina?ons  using  par0al  orders  

Fetch  3  best  results  

…  

7   9  

Can  we  exploit  the  available  sorted  access  by  area  and  by  popula?on?  

Page 10: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  split-­‐and-­‐interleave  scheme    

•  The  intui?on  of  the  previous  example  can  be  formalized  with  the  split-­‐and-­‐interleave  scheme  from  RDBMS  [Li2005,  Hwang2007,  Ilyas2004,  Ilyas2008]  1.  Split  the  evalua?on  of  the  scoring  func?on  into  single  criteria    2.  Interleave  them  with  other  operators  3.  Use  par?al  orders  to  construct  incrementally  the  final  order  

•  Standard  assump?ons:  •  Monotone  scoring  func?on  •  Each  criterion  is  evaluated  as  a  [0,1]  number  (normaliza?on)  

•  Op?mized  for  the  case  of  fast  sorted  access  for  each  criterion  9  

Page 11: ISWC 2012 "Efficient execution of top-k SPARQL queries"

No  free  lunch…  

Users  are  interested  in  1<=  k  <=  100  (search  engines)  

!

"#$%&'(%)*+!

"#$%&'(%)*+!

",)*-).-!!

/01(+!

234!,567!

*8697.!0:!-7;5.7-!.7;8<,;!+!

!!+=!

!

>!

?61.0@767*,! /[email protected])-!

234!,567!

C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!

C537-!*8697.!0:!)<<!.7;8<,;!E!

C537-!

237A8,50*!,567!0:!,B7!;A0.5*F!!

:8*A,50*!:0.!7)AB!95*-5*F!

G9.7)+!7@7*!105*,H!,737A!

!

",)*-).-!"#$%&'!

C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!!

",)*-).-!"#$%&'!

*8697.!0:!)<<!.7;8<,;!E!

234!,567!;89<5*7).!

!

"#$%&'(%)*+!

"#$%&'(%)*+!

",)*-).-!!

/01(+!

234!,567!

*8697.!0:!-7;5.7-!.7;8<,;!+!

!!+=!

!

>!

?61.0@767*,! /[email protected])-!

234!,567!

C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!

C537-!*8697.!0:!)<<!.7;8<,;!E!

C537-!

237A8,50*!,567!0:!,B7!;A0.5*F!!

:8*A,50*!:0.!7)AB!95*-5*F!

G9.7)+!7@7*!105*,H!,737A!

!

",)*-).-!"#$%&'!

C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!!

",)*-).-!"#$%&'!

*8697.!0:!)<<!.7;8<,;!E!

234!,567!;89<5*7).!

Orders  of  magnitude    

Split-­‐and-­‐interleave  

Orders  of  magnitude    

Page 12: ISWC 2012 "Efficient execution of top-k SPARQL queries"

11  

Page 13: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Top-­‐k  queries  in  SPARQL  1.1  Example  query  on  BSBM  [Bizer2009]:  •  The  top  10  offers  ordered  by  the  product  ra?ngs  and  offer  price:  

 

     

Tens  of  seconds  on  5M  triples  (could  be  improved  to  milliseconds)  

SELECT  ?product  ?offer    (norm1(?avgRat1)  +  norm2(?avgRat2)  +  norm3(?price)  AS  ?score)  WHERE  {    

?product  hasAvgRat1  ?avgRat1  .  ?product  hasAvgRat2  ?avgRat2  .  ?product  hasName  ?name  .  ?product  hasOffers  ?offer  .  ?offer  hasPrice  ?price    

}  ORDER  BY  DESC  (?score)    LIMIT  10  

       

12  

Page 14: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Split-­‐and-­‐interleave  in  SPARQL?    Related  work    • A  possible  solu?on  [Straccia2010,  Bozzon2011]:  

•  Rewrite  SPARQL  into  SQL    •  Use  exis?ng  op?mized  RDBMS  (e.g.  RankSQL  [Li2005])  

• Disadvantages:    • Works  if  data  are  already  in  a  RDBMS  

• What  about  na?ve  SPARQL  op?miza?ons?  •  Federated  queries  over  Linked  Data  [Wagner2012]:  complementary  to  our  approach   13  

Page 15: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Challenges  for  native  SPARQL  split-­‐and-­‐interleave  solutions    

14  

Differences  with  SQL  and  RDBMS   Proposed  solu0on  

Different  algebra     STEP  1:  New  algebra  (algebraic  operators  and  algebraic  equivalences)  

Different  cost  of  data  access  in  na?ve  RDF  triplestores  (sorted  access  is  slow)  

STEP  2:  New  algorithms  for  physical  operators,  possibly  using  less  sorted  access  

Addi?onal  op?miza?on  dimensions   STEP  3:  New  planning  strategies  

Algebra  Planning  strategies  

Planner  Query   Algebraic    tree  

Physical  operators  

Algebra  generator  

Query  plan    

Page 16: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Step  1:  a  rank-­‐aware  algebra  •  SPARQL-­‐Rank  algebra  [Bozzon2011]  

•  Extends  the  standard  SPARQL  algebra  [Perez2009]    •  Ranked  set  of  mappings:  set  of  mappings  augmented  with  an  order  rela?on  

 

15  

Extended  OPERATORS  

New  EQUIVALENCES  

Page 17: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  SPARQL-­‐Rank  algebraic  operators  

New  operator  rank    

16  

(a) (b) (c)

g1(?a1)

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

seqScan

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

orderScan_a1

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

?pr = ?pr

?pr, ?of, ?score

[0,10]SLICE

g1(?a1)

g3(?p1)seqScan

?pr hasN ?n

Sequence

seqScan

?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1

Page 18: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Ω  

ρp1  

ρp1(Ω  )  

?x   ?y   ?p1   ?p2  

µ1   1   8   0.8   0.8  

µ2   3   3   0.3   0.6  

µ3   3   4   0.4   0.6  

?x   ?y   ?p1   Fp1  

µ1   1   8   0.8   1.8  

µ3   3   4   0.4   1.4  

µ2   3   3   0.3   1.3  

The  Rank  Operator  

Page 19: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  SPARQL-­‐Rank  algebraic  operators  

Redefined    standard    operators    

18  

Page 20: ISWC 2012 "Efficient execution of top-k SPARQL queries"

?x   ?z   ?p2   Fp2  µ4   1   9   0.8   1.8  

µ5   3   0   0.6   1.6  

Ω’p2

 

?x   ?y   ?z   ?p1   ?p2   Fp1Up2  

µ1  U  µ4   1   8   9   0.8   0.8   1.6  

µ3  U  µ5   3   4   0   0.4   0.6   1.0  

µ2  U  µ5   3   3   0   0.3   0.6   0.9  

?x   ?y   ?p1   Fp1  

µ1   1   8   0.8   1.8  

µ3   3   4   0.4   1.4  

µ2   3   3   0.3   1.3  

Ωp1  

The  Join  Operator  

Page 21: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

Split  

20  

Page 22: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

21  

•  Allows  the  splimng  of  a  monolithic  scoring  func?on  into  several  rank  operators  

 

Page 23: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

Interleave  

22  

Page 24: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

•  Allows  to  order  incrementally  the  results  by  pushing  the  rank  operator  inside  the  query  tree.  

     

Page 25: ISWC 2012 "Efficient execution of top-k SPARQL queries"

24  

Image  from:    hnp://de-­‐?mekeeper.com/yahoo_site_admin/assets/images/benzinger20gold20gears200291.17120724_std.jpg  

From  algebra  to  execution  

Page 26: ISWC 2012 "Efficient execution of top-k SPARQL queries"

 •  Rank  operator    

•  If  there  is  a  sorted  access  index  on  the  ranking  criterion  we  use  it  •  Otherwise:  rank  aggrega?on  algorithms,  e.g.  [Hwang2007]    

•  Join  operator  •  If  the  right  operand  does  not  influence  the  ranking:  streaming  index  join  

•  Otherwise:  a  rank-­‐join  algorithm  [see  next  slides]  

•  Other  operators  are  straighsorward:  •  E.g.  the  standard  FILTER  conserves  the  ordering  of  its  input  

Step  2:  physical  operators    (top-­‐k  algorithms)  

25  

Page 27: ISWC 2012 "Efficient execution of top-k SPARQL queries"

•  Different  algorithms  based  on  available  access  in  the  inputs:  

•  Hash  Rank-­‐Join  •  e.g.  HRJN  [Ilyas2004]        

     •  Random  Access  Rank-­‐Join  

•  e.g.  RA-­‐HRJN  [Ilyas2004]      

•  RankSequence  (e,g,  RSEQ)  •  Minimum  sorted  access  •  Leverages  random  access  

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess(a)

RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

Rank-­‐Join  algorithms  

26  

Page 28: ISWC 2012 "Efficient execution of top-k SPARQL queries"

•  Different  algorithms  based  on  available  access  in  the  inputs:  

•  Hash  Rank-­‐Join  •  e.g.  HRJN  [Ilyas2004]        

     •  Random  Access  Rank-­‐Join  

•  e.g.  RA-­‐HRJN  [Ilyas2004]      

•  RankSequence  (e,g,  RSEQ)  •  Minimum  sorted  access  •  Leverages  random  access  

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess(a)

RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

Rank-­‐Join  algorithms  

27  

Literature  

Page 29: ISWC 2012 "Efficient execution of top-k SPARQL queries"

•  Different  algorithms  based  on  available  access  in  the  inputs:  

•  Hash  Rank-­‐Join  •  e.g.  HRJN  [Ilyas2004]        

     •  Random  Access  Rank-­‐Join  

•  e.g.  RA-­‐HRJN  [Ilyas2004]      

•  RankSequence  (e,g,  RSEQ)  •  Minimum  sorted  access  •  Leverages  random  access  

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess(a)

RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

Rank-­‐Join  algorithms  

28  

New  

Page 30: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Step3:  planning  strategies  •  Using  the  algebraic  equivalences  we  can  produce  several  equivalent  algebraic  trees  

•  The  planner  can  use  them  to  implement  several  planning  strategies  

 

(a) (b) (c)

g1(?a1)

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

seqScan

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

orderScan_a1

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

?pr = ?pr

?pr, ?of, ?score

[0,10]SLICE

g1(?a1)

g3(?p1)seqScan

?pr hasN ?n

Sequence

seqScan

?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1

(a) (b) (c)

g1(?a1)

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

seqScan

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

orderScan_a1

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

?pr = ?pr

?pr, ?of, ?score

[0,10]SLICE

g1(?a1)

g3(?p1)seqScan

?pr hasN ?n

Sequence

seqScan

?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1

?pr, ?of, ?score

[0,10]SLICE

?pr hasA1 ?a1. ?pr hasA2 ?a2 . ?pr hasN ?n . ?pr hasO ?of .?of hasP ?p1.

[?score]ORDER

[?score =g1(?a1)+g2(?a2)+g3(?p1)]EXTEND

(a)

?pr = ?pr

?pr, ?of, ?score

[0,10]SLICEJoin

g3(?p1) g1(?a1)?pr hasO ?of .?of hasP ?p1 . ?pr hasA1 ?a1 .

?pr = ?prRankJoin

?pr = ?pr?pr hasN ?n .

RankJoin

g2(?a2)

?pr hasA2 ?a2 .

(b)

1.  Rank  of  BGPs   2.  Interleaved   3.  Rank  Join   29  

Page 31: ISWC 2012 "Efficient execution of top-k SPARQL queries"

?product, ?offer, ?score

[0,10]SLICE

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

[?score]ORDER

[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]EXTEND

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm3(?price) norm1(?avgRat1)?product hasOffer ?offer .?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1}

?product = ?productRankJoin

?product = ?product {?product hasName ?name} RankJoin

norm2(?avgRat2)

?product hasAvgRat2 ?avgRat2}

norm2(?avgRat2)

norm3(?price)

?product, ?offer, ?score

[0,10]SLICE

?product, ?offer, ?score

norm1(?avgRat1)

norm3(?price)

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

norm1(?avgRat1) norm2(?avgRat2)

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasOffer ?offer .?offer hasPrice ?price.

1.  Rank  of  BGPs  (ROB)  •  Split  the  monolithic  scoring  func?on  into  several  incremental  rank  operators  (rho)  

30  Rank  of  BGPs  Materialize-­‐then-­‐sort  

Page 32: ISWC 2012 "Efficient execution of top-k SPARQL queries"

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm1(?avgRat1)

norm3(?price) {?product hasName ?name }

norm2(?avgRat2)

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasOffer ?offer .?offer hasPrice ?price.

2.  Interleaved  (INTER)  •  Separate  the  panern  in  two  groups:  

•  Triple  panerns  that  influence  the  ranking    •  Triple  panerns  that  don’t  influence  the  ranking  

 

31  Interleaved  

?product, ?offer, ?score

[0,10]SLICE

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

[?score]ORDER

[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]EXTEND

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm3(?price) norm1(?avgRat1)?product hasOffer ?offer .?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1}

?product = ?productRankJoin

?product = ?product {?product hasName ?name} RankJoin

norm2(?avgRat2)

?product hasAvgRat2 ?avgRat2}

Materialize-­‐then-­‐sort  

Page 33: ISWC 2012 "Efficient execution of top-k SPARQL queries"

?product, ?offer, ?score

[0,10]SLICE

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

[?score]ORDER

[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]EXTEND

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm3(?price) norm1(?avgRat1)

?product hasOffer ?offer .?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1}

?product = ?productRankJoin

?product = ?product {?product hasName ?name} RankJoin

norm2(?avgRat2)

?product hasAvgRat2 ?avgRat2}

3.  Rank-­‐Join  (RJ)  •  Split  into  one  triple  panern  for  each  ranking  criterion  •  Most  appropriate  join  algorithm  based  on  available  access  

32  Rank-­‐Join  

?product, ?offer, ?score

[0,10]SLICE

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

[?score]ORDER

[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]EXTEND

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm3(?price) norm1(?avgRat1)?product hasOffer ?offer .?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1}

?product = ?productRankJoin

?product = ?product {?product hasName ?name} RankJoin

norm2(?avgRat2)

?product hasAvgRat2 ?avgRat2}

Materialize-­‐then-­‐sort  

Page 34: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experimental  evaluation  

33  

Page 35: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experimental  evaluation  •  Prototype  implementa?on  of  our  system:  

•  ARQ-­‐Rank  (extends  Jena  ARQ  2.8.9)    •  Extended  version  of  Berlin  SPARQL  Benchmark  [Bizer2009]  •  Added  ranking  anributes    •  Added  top-­‐k  queries  

•  Jena  TDB  0.8.11  as  storage  

•  Code  and  experiments:  sparqlrank.search-­‐compu?ng.org  34  

Page 36: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experiment  1:  compare  planning  strategies  •  Example  query,  5M  triples  dataset  •  Worst-­‐case  scenario:  no  sorted  access  indexes  (slow  sorted  access)  

35  

One  to  two  orders  of  magnitude  bener    

Page 37: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experiment  1:  compare  planning  strategies  •  Example  query,  5M  triples  dataset  •  Standard  scenario:  sorted  access  indexes  (fast  sorted  access)    

36  

Two  orders  of  magnitude  bener    

Page 38: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experiment  2:  Small  Benchmark    (8  queries)  

!""#$$ %&"#$$ &""#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ &'$

!"($

!")$

!"%$

!"!$

!"($

!")$

!"%$

!"!$

!"($

!")$

!"%$

!"!$*+,-.$,

/,0+12

3$14

,$5467$

89:96,:$6;<,$ 89:96,:$6;<,$ 89:96,:$6;<,$

!"#$ !%#$ !&#$

37  !""#$$ %&"#$$ &""#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ &'$

!"($

!")$

!"%$

!"!$

!"($

!")$

!"%$

!"!$

!"($

!")$

!"%$

!"!$*+,

-.$,/,0+12

3$14

,$54

67$

89:96,:$6;<,$ 89:96,:$6;<,$ 89:96,:$6;<,$

!"#$ !%#$ !&#$

Page 39: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Conclusions  and  Future  Work  •  A  system  that  speeds  up  the  execu?on  of  top-­‐k  queries  in  SPARQL  by  orders  of  magnitude:  •  STEP  1:  A  rank-­‐aware  SPARQL  algebra  (SPARQL-­‐Rank  algebra)  •  STEP  2:  A  rank-­‐join  algorithm  (RSEQ)  •  STEP  3:  Three  planning  strategies  (ROB,  INTER,  RJ)  

 •  ARQ-­‐Rank,  a  rank-­‐aware  extension  of  Jena  ARQ  •  A  small  benchmark  for  top-­‐k  queries,  based  on  BSBM  [Bizer2009]    

•  All  available  at  sparqlrank.search-­‐compu?ng.org      •  Future  work:  

•  More  advanced,  cost-­‐based,  op?miza?on  techniques  •  Extension  to  federated  top-­‐k  query  processing    •  Top-­‐k  queries  under  OWL2QL  entailment  regime  

38  

Page 40: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Bibliography  •  [Bozzon2011]  A.  Bozzon  et  al.  Towards  and  efficient  SPARQL  top-­‐k  query  execu?on  in  virtual  RDF  stores.  In  DBRANK  workshop  at  VLDB  ’11,  2011.  

•  [Wagner2012]  A.  Wagner  et  al.  Top-­‐k  Linked  Data  Query  Processing.  In  ESWC  ’12.  Springer,  2012.  

•  [Bizer2009]  C.  Bizer  and  A.  Schultz.  The  Berlin  SPARQL  Benchmark.  Int.  J.  Seman?c  Web  Inf.  Syst.,  5(2),  2009.  

•  [Li2005]  C.  Li  et  al.  RankSQL:  query  algebra  and  op?miza?on  for  rela?onal  top-­‐k  queries.  In  SIGMOD  ’05.  ACM,  2005.  

•  [DellaValle2012]  E.  Della  Valle  et  al.  Order  maners!  harnessing  a  world  of  orderings  for  reasoning  over  massive  data.  Seman?c  Web  Journal,  2012.  

•  [Hwang2007]  S.-­‐w.  Hwang  and  K.  Chang.  Probe  minimiza?on  by  schedule  op?miza?on:  Suppor?ng  top-­‐k  queries  with  expensive  predicates.  IEEE  TKDE,  19(5),  2007.  

39  

Page 41: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Bibliography  •  [Ilyas2004]  I.  F.  Ilyas  et  al.  Rank-­‐aware  Query  Op?miza?on.  In  SIGMOD  ’04.  ACM,  2004.    

•  [Ilyas2008]  I.F.Ilyas  et  al.  A  survey  of  top-­‐k  query  processing  techniques  in  rela?onal  database  systems.  ACM  Comput.  Surv.,  40(4),  2008.    

•  [Perez2009]  J.  Perez  et  al.  Seman?cs  and  complexity  of  SPARQL.  ACM  Trans.  Database  Syst.,  34(3),  2009.    

•  [Schmidt2010]  M.  Schmidt  et  al.  Founda?ons  of  SPARQL  query  op?miza?on.  In  ICDT  ’10,  ACM,  2010.    

•  [Straccia2010]  U.  Straccia.  SoxFacts:  A  top-­‐k  retrieval  engine  for  ontology  mediated  access  to  rela?onal  databases.  In  SMC  ’10.  IEEE,  2010.    

40  

Page 42: ISWC 2012 "Efficient execution of top-k SPARQL queries"

41  

Page 43: ISWC 2012 "Efficient execution of top-k SPARQL queries"

BACK-­‐UP  SLIDES   42  

Page 44: ISWC 2012 "Efficient execution of top-k SPARQL queries"

 

An  addi?onal  less  intui?ve  and  less  simplified  example:  

• Top  2  couples  of  most  populated  ci?es  and  largest  countries  

43  

 

Why  do  we  need  to  optimize  them?  

Shanghai  Moscow  

Page 45: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  materialize-­‐then-­‐sort  scheme  

44  

Countries  by  area  

Ci?es  by  popula?on  

Materialize  all  14K  combina?ons  

Sort  all  14K  join  combina?ons    

Fetch  2  best  results  

…  

249   14K*  

0.04  

2e-­‐08  

Shanghai  Moscow  

Va?can  

*  According  to  DBPedia,  but  probably  more  

1   Shanghai  Istanbul  Karachi  Mumbai  Moscow  

0.567  

0.563  0.497  

0.05  0.185  

Shanghai  …   Va?can  

Page 46: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Can  we  make  it  more  ef#icient?  

45  

Countries  by  area  

Ci?es  by  popula?on  

Order  incrementally  the  combina?ons  using  par0al  orders  

Fetch  2  best  results  

9   13  

Can  we  exploit  the  sorted  access  by  area  and  by  popula?on?    

Shanghai  Moscow  

…  

Shanghai  Istanbul  Karachi  Mumbai  Moscow  …  

Page 47: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Maximal possible score

Given a scoring function F (p1, …, pn) and a set of predicates P = {p1, …, pj} the maximal possible score for a mapping µ is defined as:

pi = pi [µ] if pi ∈ P pi = 1 otherwise

∀i FP (p1, …, pn) [µ] = F ( )

Mapping µ … an intermediate SPARQL solution, equivalent to a SQL tuple

?x   ?y   ?p1   ?p2  µ1   1   8   0.8   0.8  

µ2   3   3   0.3   0.6  set of mappings

SPARQL-­‐Rank  algebra  De#initions  

Page 48: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Ranking principle

Ranked set of mappings

Given a set of predicates P, a ranked set of mappings ΩP is a set of mappings Ω augmented with the following properties:

•  Score: for each mapping µ, the maximal possible score FP [µ] •  Order: the order relation <ΩP is defined on ΩP based on the scores

of the single mappings

Given two mappings µ1 e µ2 with FP [µ1]> FP [µ2] , if we process µ2 we need to process also µ1.

SPARQL-­‐Rank  algebra  De#initions  

Page 49: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  SPARQL-­‐Rank  algebraic  operators  

48  

Page 50: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

49  

Page 51: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Allows to order incrementally the results by pushing the rank operator inside the query execution tree.

SPARQL-­‐Rank  algebraic  equivalences  

Page 52: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  RSEQ  algorithm  

51  

Page 53: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Evaluation:  additional  technical  information  •  Experimental  semng:  

•  AMD  64  bit  processor  2.66  GHz  •  4  GB  RAM  •  Debian  kernel  2.6.26-­‐2  •  Sun  Java  1.6.0    

•  Maximum  heap  size  2GB    

•  8  queries  available  at  sparqlrank.search-­‐compu?ng.org  

52  

Page 54: ISWC 2012 "Efficient execution of top-k SPARQL queries"

More  experimental  results  the  RankJoin  operators  •  Example  query,  5M  triples  dataset  •  Worst-­‐case  scenario:  no  sorted  access  indexes  (lex)  

•  RSEQ  is  the  best,  especially  for  k  <  1000  •  Standard  scenario:  sorted  access  indexes  (right)  

•  All  three  are  comparable,  RA-­‐HRJN  is  best  for  k  >  1000  

53  

Page 55: ISWC 2012 "Efficient execution of top-k SPARQL queries"

ARQ-­‐Rank  architecture  

54