CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

17
CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

Transcript of CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

Page 1: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

CSE 636Data Integration

Answering Queries Using Views

Bucket Algorithm

Fall 2006

Page 2: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

2

• Each subgoal g of Q must be “covered” by some view

• Make a list of candidates (buckets) per query subgoal

• Consider combinations of candidates from different buckets

• Not all combos are “compatible”• Keep the compatible ones and minimize them• Discard the ones contained in another• Take their union

The Bucket Algorithm

Page 3: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

3

The Bucket Algorithm

q(X,Y,R) :- ForSale(X,Y,C,”auto”), Review(X,R,”auto”), Y > 1985

Step 1: For each subgoal, put the relevant sources into a bucket:

V1(name, year) :- ForSale(name, year, “France”, “auto”), year > 1990 would be relevantV3(name, year) :- ForSale(name, year, “France”, “cheese”) would be irrelevant

Step 2: Take the Cartesian product of the bucketsAlgorithm produces maximally contained rewritingIgnores interactions between subgoals in Step 1

Page 4: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

4

The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

Step 1: For each query subgoal, put the relevant sources into a bucket

Page 5: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

5

The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

PProf, CCrs, QQtr

Note: Arithmetic predicates don’t pose a problem

V2

Buckets

V4

teaches reg course

Page 6: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

6

The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

SStd, CCrs, QQtr

Note: V3 doesn’t work: arithmetic predicates not consistentV4 doesn’t work: S not in the output of V4

V2

Buckets

V4

teaches reg course

V1V2

Page 7: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

7

The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

CCrs, TTitle V2

Buckets

V4

teaches reg course

V1V2

V1V4

Page 8: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

8

The Bucket Algorithm: Example

Step 2:• Try all combos of views, one each from a bucket• Test satisfaction of arithmetic predicates in each case

– e.g., two views may not overlap, i.e., they may be inconsistent

• Desired rewriting = union of surviving ones

Query rewriting 1:

q1(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S”,C,Q’,T)– no problem from arithmetic predicates (none in V2)– May or may not be minimal (why?)

V2V4

teaches reg course

V1V2

V1V4

Page 9: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

9

The Bucket Algorithm: Example

Unfolding of rewriting 1:q1’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), c(C,T’), r(S”,C,Q’), c(C,T), C ≥ 500, Q ≥ Aut98, C ≥ 500, Q’ ≥ Aut98

• Black r’s can be mapped to green r:S’S, S”S, Q’Q

• Black c can be mapped to green c:just extend above mapping to TT’

Minimized unfolding of rewriting 1:q1m’(S,C,P) :- t(P,C,Q), r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98Minimized rewriting 1:q1m(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’)

Page 10: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

10

The Bucket Algorithm: Example

Query Rewriting 2:

q2(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’)q2’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98, r(S”,C,Q’), c(C,T), t(P’,C,Q’), Q’ ≤ Aut97• This combo is infeasible: consider the conjunction of

arithmetic predicates in V1 and V4

Query rewriting 3:

q3(S,C,P) :- V2(S’,P,C,Q), V2(S,P’,C,Q), V4(P”,C,T,Q’)

V2V4

teaches reg course

V1V2

V1V4

V2V4

teaches reg course

V1V2

V1V4

Page 11: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

11

The Bucket Algorithm: Example

Unfolding of rewriting 3:q3’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), t(P’,C,Q), r(S”,C,Q’), c(C,T), t(P”,C,Q’), Q’ ≤ Aut97• The green subgoals can cover the black ones under the

mapping: S’S, S”S, P’P, P”P, Q’Q

Minimized rewriting 3:q3m(S,C,P) :- V2(S,P,C,Q), V4(P,C,T,Q)

Verify that there are only two rewritings that are not covered by others

Maximally Contained Rewriting:q’ = q1m q3m

Page 12: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

12

The Bucket Algorithm: Example 2

Query:q(X) :- cites(X,Y), cites(Y,X), sameTopic(X,Y)

Views:V4(A) :- cites(A,B), cites(B,A)V5(C,D) :- sameTopic(C,D)V6(F,H) :- cites(F,G), cites(G,H), sameTopic(F,G)

Note: Should we list V4(X) twice in the buckets?

V4

Buckets

V6

cites cites sameTopic

V4

V6

V5

V6

Page 13: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

13

The Bucket Algorithm: Example 2

• Consider all combos & check for containment of the unfolded rewriting in Q

• V4(X) cannot be combined with anything (why?) Try q1(X) :- V4(X), V4(X), V5(X,Y)Try q2(X) :- V4(X), V6(X,Y), V5(X,Y)

• Does any of these work?• When can we discard a view from consideration?

Page 14: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

14

The Bucket Algorithm: Example 2

Here is a successful rewriting:q3(X) :- V6(X,Y), V6(X,Y), V6(X,Y)

• By itself is not contained in Q• But, with subgoal X=Y added, it is!

By minimizing the rewriting, we get:q3m(X,Y) :- V6(X,X)

Page 15: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

15

The Bucket Algorithm: Example 2

Remarks: • V4 didn’t contribute to any rewrite, but the

bucket algorithm doesn’t recognize it ahead• Consider:

q2(X,Y) :- cites(X,Y), cites(Y,X)• Then both cites predicates can be folded into V4

– Not recognized by the bucket algorithm

Page 16: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

16

The State of Affairs

• Bucket algorithm:– deals well with predicates, Cartesian product can be

large (containment check required for every candidate rewriting)

• Inverse rules:– modular (extensible to binding patterns, FD’s)– no treatment of predicates– resulting rewritings need significant further

optimization

Neither scales up• The MINICON algorithm:

– change perspective: look at query variables

Page 17: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006.

17

References

• Querying Heterogeneous Information Sources Using Source Descriptors– By Alon Y. Levy, Anand Rajaraman and Joann J. Ordille– VLDB, 1996

• Laks VS Lakshmanan– Lecture Slides

• Alon Halevy– Answering Queries Using Views: A Survey– VLDB Journal, 2000– http://citeseer.ist.psu.edu/halevy00answering.html