1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang...
-
date post
22-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang...
![Page 1: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/1.jpg)
1
Query Planning with Limited Source Capabilities
Chen Li Stanford University
Edward Y. ChangUniversity of California, Santa Barbara
![Page 2: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/2.jpg)
2
• Heterogeneous information sources on the WWW
• Information-integration systems
• Limited query capabilities
– Music stores: amazon.com, cdnow.com.– Must specify a value of Artist or Title.
– The sources do not answer queries such as “Give me all your information about CDs.”
Motivation
![Page 3: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/3.jpg)
3
Sources View Schemas Must Bind
1 v1(Song, CD) Song2 v2(CD, Artist, Price) CD3 v3(CD, Artist, Price) Artist
Query: “Find the prices of CDs containing a song titled Friends.”
Example
v1(Friends, CD) v2(CD, Artist, Price)v1(Friends, CD) v3(CD, Artist, Price)
![Page 4: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/4.jpg)
4
Source tuples
v1(Song, CD)
<Friends, Love>
<Friends, Life>
v2(CD, Artist, Price)
<Love, Lucy, $15><Story, Snoopy, $14>
v3(CD, Artist, Price)
<Story, Lucy, $13>
<Love, Snoopy, $10>
<Life, Charlie, $8>
Not all the tuples couldbe retrieved from thesources due to the restrictions.
![Page 5: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/5.jpg)
5
Traditional approach: consider each join at a time.
v1 v2: {$15}
v1 v3: empty, no binding for Artist.
v1(Song, CD)
<Friends, Love>
<Friends, Life>
v2(CD, Artist, Price)
<Love, Lucy, $15><Story, Snoopy, $14>
v3(CD, Artist, Price)
<Story, Lucy, $13>
<Love, Snoopy, $10>
<Life, Charlie, $8>
![Page 6: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/6.jpg)
6
Our approach: retrieve as many tuples as possible.
X
X
X
X
X
XThis approach could savethe user $15 - $10 = $5!
v1(Song, CD)
<Friends, Love>
<Friends, Life>
v2(CD, Artist, Price)
<Love, Lucy, $15><Story, Snoopy, $14>
v3(CD, Artist, Price)
<Story, Lucy, $13>
<Love, Snoopy, $10>
<Life, Charlie, $8>
v1 v2: {$15}v1 v3: {$10}
![Page 7: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/7.jpg)
7
• Access views not in a join to retrieve bindings;• Recursive process;• Some tuples in the answer cannot be retrieved.
X
X
X
X
X
X
v1(Song, CD)
<Friends, Love>
<Friends, Life>
v2(CD, Artist, Price)
<Love, Lucy, $15><Happy, Snoopy, $14>
v3(CD, Artist, Price)
<Happy, Lucy, $13>
<Love, Snoopy,$10>
<Life, Charlie, $8>
Observations
![Page 8: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/8.jpg)
8
• How to compute the maximal answer?• When should we access sources not in a query?• What sources should be accessed?
Questions
![Page 9: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/9.jpg)
9
Source views
• A set of source views V with binding patterns:– b: a value must be specified for the attribute– f: free
• Each view schema uses a set of global attributes
CD Artist PriceSong
b fv1(Song, CD)
b f fv2(CD, Artist, Price)
f b fv3(CD, Artist, Price)
Hypergraph representation:
![Page 10: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/10.jpg)
10
A query Q includes:– Input attributes: I;
– Output attributes: O.
Queries
Input attribute: {Song}Output attribute: {Price}
CD Artist PriceSong
v1(Song, CD)
v2(CD, Artist, Price)
v3(CD, Artist, Price)
![Page 11: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/11.jpg)
11
• Connection: a set of views that connect I and O in Q.
• Meaning: natural join of the views.
• Universal-relation-like assumptions, but connections can be generated in various ways.
Connections
T1={v1,v2}, T2={v1,v3}
CD Artist PriceSong
v1(Song, CD)
v2(CD, Artist, Price)
v3(CD, Artist, Price)
![Page 12: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/12.jpg)
12
Question 1: Computing the maximal answer
• Translate a query and source views into a Datalog program.
• Borrowed the idea from Duschka and Levy [IJCAI-97]. – We eliminate useless source accesses.
• Why Datalog programs? Recursion.
![Page 13: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/13.jpg)
13
Constructing program (Q,V)Connection rules: ans(P) :- V1(s1, C) & V2 (C, A, P) ans(P) :- V1(s1, C) & V3 (C, A, P)Fact rule: song(s1) :-
}v1(Song, CD)-rule: V1(S, C) :- song(S) & v1(S,C)Domain rule: cd(C) :- song(S) & v1(S, C)
}v2(CD, Artist, Price)
}v3(CD, Artist , Price)
V2(C, A, P) :- cd(C) & v2(C, A, P)artist(A) :- cd(C) & v2(C, A, P)price(P) :- cd(C) & v2(C, A, P)V3(C, A, P) :- artist(A) & v3(C, A, P)cd(C) :- artist(A) & v3(C, A, P)price(P) :- artist(A) & v3(C, A, P)
![Page 14: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/14.jpg)
14
• Binding assumptions:
– A binding for an attribute is from the attribute’s domain;
– Do not allow the “strategy” of trying all the possible strings to “test” the source (may not terminate);
– Any binding is either obtained from the query, or from a tuple returned by a source query.
• The program (Q,V) computes the maximal answer.
![Page 15: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/15.jpg)
15
A
B
C D
E F
f f bv2(A, B, C)
b fv3(C, D)
b fv1(A, C)
b fv5(E, F)
f fv4(C, E)
Query: Input: A = a1
Output: D = ?Connections: T1 = {v1,v3}, T2 = {v2,v3}
Not all the views need to accessed.
Question 2: when to access off-query sources?
![Page 16: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/16.jpg)
16
• T1: accessing outside T1 sources is NOT necessary.
A C v3(C, D)v1(A, C) D
• T2: accessing outside T2 sources is necessary to get
C bindings.
AB
C D
v2(A, B, C)
v3(C, D)
![Page 17: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/17.jpg)
17
Independent connections• A connection T is independent if all the views in T can be
queried starting from the input attributes as the initial bindings and using only the views in T.
• T2 is not independent, it needs C bindings.
AB
C D
v2(A, B, C)
v3(C, D)
• T1 is independent. A C v3(C, D)v1(A, C) D
• Theorem: off-connection source accesses are only necessary for nonindependent connections.
![Page 18: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/18.jpg)
18
• A view v is relevant to connection T if we may miss some answers to T when v is not used.
A
B
C D
E F
v2(A, B, C)
v3(C, D)v1(A, C)
v5(E, F)v4(C, E)
• The relevant views of T2 are: v2, v3 , v1, v4 .
• How to find all the relevant views of a nonindependent connection?
Question 3: what sources should be accessed?
![Page 19: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/19.jpg)
19
Kernel• A kernel of a connection is a minimal set of attributes that
need to be initially bound in addition to the input attributes to query the full connection.
• A connection may have multiple kernels.
• T1 has one kernel: {} A C v3(C, D)v1(A, C) D
• T2 has one kernel: {C}
AB
C D
v2(A, B, C)
v3(C, D)
![Page 20: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/20.jpg)
20
Algorithm FIND_REL: Finding relevant views of a connection
Find all the relevant views of connection T2 = {v2,v3}:
A
B
C D
E F
v2(A, B, C)
v3(C, D)v1(A, C)
v5(E, F)v4(C, E)
(1) Compute queryable views: {v1,v2 ,v3,v4,v5};(2) Find a kernel K of T2 : K = {C};
(4) Return R T2 = {v1,v2 ,v3 ,v4}.
(3) Compute all the views that can help produce bindings for the attributes in K: R = {v1,v2 ,v4} ;
![Page 21: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/21.jpg)
21
Constructing an efficient program
• Compute the relevant views for each connection; • Take the union of all these relevant source views;• Use these views to construct a new program;• Remove useless rules.
![Page 22: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/22.jpg)
22
Conclusions
• A query-planning framework to compute the maximal answer to a query (Duschka and Levy [IJCAI-97]).
• Techniques for telling when to access off-query views;
• Algorithms:– finding all the relevant sources for a query;
– constructing an efficient program.
![Page 23: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/23.jpg)
23
Other related work
• Rajaraman, Sagiv, and Ullman [PODS-95]: – Shows how to find an equivalent query rewriting using views with
binding restrictions;
– We give the maximal rewriting of a query.
• Optimizing conjunctive queries with binding restrictions:– Yerneni, Li, Garcia-Molina, and Ullman [ICDT-99];
– Florescu et al. [SIGMOD-99].
• Testing connection containment:– Li [Stanford-CS-TR 2000], using results of monadic programs to
prove the problem is decidable.
![Page 24: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/24.jpg)
24
Predicates
EDB predicates IDB predicatesv1(S, C) V1 (S, C)v2(C, A,P) V2 (C, A, P)v3(C, A, P) V3 (C, A, P)
cd(C)song(S)artist(A)price(P)
ans(P)
}-predicates
}domain predicates
![Page 25: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/25.jpg)
25
Evaluating program (Q,V)
• Assume the right side of an -rule or a domain rule is:
domA1(A1), …, domAp(Ap), vi(A1,…, Am)
• Once we have bindings for domA1(A1), …, domAp(Ap), evaluate the rule and populate the domain predicates and -predicate.
• Repeat until no more facts can be derived.• Compute the maximal answer to the query.
![Page 26: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/26.jpg)
26
Forward-closureGiven views W V, and attributes X, the forward-closure of X given W, denoted f-closure(X,W), is the the set of views in W that can be eventually queried by using the views in W, starting from the initial bindings X.
f-closure({A},{v1,v2,v3}) = {v1,v2,v3}
f-closure({D},{v1,v2,v3}) = {}
A
B
C D
E F
v2(A, B, C)
v3(C, D)v1(A, C)
v5(E, F)v4(C, E)
![Page 27: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/27.jpg)
27
• Backward-closure of a set of attributes X: b-closure(X), is the set of views that can help retrieve bindings for X.
Backward-closure
• Lemma: All backward-closures of a connection are the same.
b-closure(C) = {v1,v2,v4}
A
B
C D
E F
v2(A, B, C)
v3(C, D)v1(A, C)
v5(E, F)v4(C, E)
![Page 28: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/28.jpg)
28
• BF-chain:
• Backward-closure:
BF-chain, backward-closure
free
bound bound bound
freefree
A
B
C D
E F
v2(A, B, C)
v3(C, D)v1(A, C)
v5(E, F)v4(C, E)
b-closure(C) = {v1,v2,v4}
![Page 29: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/29.jpg)
29
Other possibilities of obtaining bindings
• Cached data: For a cached tuple ti(a1,a2) for view vi(A1,A2), add the following rules to the program (Q, V):
vi(a1,a2) :-
domA1(a1) :-
domA2(a2) :-
• Domain knowledge: – student(name, dept, GPA).
– dept = CS, Physics, Chemistry, etc.
![Page 30: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d785503460f94a5a4d5/html5/thumbnails/30.jpg)
30
Computing a partial answer
• Independent connections: complete answers are computable.
• Nonindependent connections: access some relevant views. May terminate evaluating the program after some results are computed.