Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
-
Upload
tobias-spencer -
Category
Documents
-
view
212 -
download
0
Transcript of Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
![Page 1: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/1.jpg)
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy
Presented By Vinay Hoskere
Join Synopses for Approximate Query Answering
![Page 2: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/2.jpg)
1. Introduction
2. AQUA
3. Problem with joins
4. Join synopses
5. Allocation
6. Maintenance of join synopses
7. Experimental Evaluation
Contents
![Page 3: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/3.jpg)
Traditional query processing: Exact answers, minimize response time, maximize throughput
Data warehouses: full precision of exact answer not needed, less time, minimum number of accesses to base data.
Random Sampling techniques: generate approximate answers
Foreign keys joins: Large tables increase the size of base relation. Schemes for providing approximate join aggregates that rely on using random samples of base relations suffer from disadvantages
Introduction
![Page 4: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/4.jpg)
Approximate query answering : Improves response time, avoids access to original database
Maintains smaller sized statistical summaries – “SYNOPSES”
Provides confidence bounds. It has 3 components
◦ Statistics Collection◦ Query Rewriting◦ Maintenance.
It sits on top of a DBMS.
AQUA
![Page 5: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/5.jpg)
![Page 6: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/6.jpg)
Natural set of synopses would be random samples from each of the base relation◦ Non Uniform Result Sample◦ Small join result sizes
Using samples on base relations is not feasible.
Problem with Joins
![Page 7: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/7.jpg)
Join Synopses
Naïve way - execute all possible join queries and collect samples
Join synopses - samples are taken from small set of distinguished joins
This scheme is for foreign key joins
Model database schema as a graph vertex - base relation directed edge (u to v) – if u has at
least one attribute which is foreign key in v
![Page 8: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/8.jpg)
Definition
![Page 9: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/9.jpg)
Join Synopses There is a 1-1 correspondence between a tuple in
a relation ‘r’ & a tuple in the output of any foreign key join involving ‘r’ & any of its descendants in the graph.
The subgraph of G on the ‘k’ nodes in any k-way foreign key join must be a connected subgraph with a single root node
![Page 10: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/10.jpg)
Definition
![Page 11: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/11.jpg)
Join Synopses For each node u in G, corresponding to a relation
r1, define J(u) to be the output of the maximum foreign key join r1xr2x..xrk with source r1.
Let Su be a uniform random sample of r1. The join synopsis J(Su) is the output of
Suxr2xr3…..xrk. J(Su) is a uniform random sample of J(u) with |Su|
tuples. Thus we can extract from our synopsis a uniform
random sample of the output of any k-way foreign key join.
![Page 12: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/12.jpg)
Optimal strategy for allocating the available space among the various join synopses when certain properties of the query work load are known .
Let ‘S’ be a set of queries with selects, aggregates, group by’s & foreign key joins.
For each relation Ri, find fraction Fi of queries in S for which Ri
is the source relation in a foreign key join.
The average relative error bound over the queries is
proportional to sum(Fi /sqrt(ni)).
Allocation
![Page 13: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/13.jpg)
Heuristic allocation: When properties of work load are not known.
There are 3 procedures- EqJoin – Divides up the space allotted equally amongst
relations
CubeJoin – Divides up the space in proportion to the cube root of their join synopsis tuple size.
PropJoin – Divides up the space in proportion to their join synopsis tuple size.
Allocation
![Page 14: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/14.jpg)
Maintenance of Join Synopses Need to maintain the join synopses when base
relation is updated (insert or delete) does not require frequent access to base relation If a new tuple is inserted
◦ Let Pu be the probability of newly arrived tuple for relation u in random sample Su
◦ Let uxr2xr3…..xrk be the max foreign key join with source u.
◦ We add ‘T’ (new tuple) to Su with probability Pu.
◦ If ‘T’ is added to Su, we add to J(Su) the tuple Txr2xr3…..xrk
![Page 15: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/15.jpg)
Maintenance of Join Synopses
◦ If T is added to Su and Su exceeds its target size, then select uniformly at random a tuple T’ to evict from Su and remove the tuple in J(Su) corresponding to T’.
On delete of a tuple T from u◦ T is in Su delete the tuple from Su and remove
the tuple from J(Su) corresponding to T◦ If sample becomes too small due to many
deletions repopulate by scanning relation u. This algorithm performs lookups with the base
relation with small probability Pu
![Page 16: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/16.jpg)
Experimental Evaluation
Test bed – TPC-D decision support benchmark. DB of around 300 MB.
Machine – 296MHz UltraSPARC-II, 256 MB of memory, Solaris 5.6.
Query used is based on Q5 & an aggregate computed on join of Lineitem, Customer, Order, Supplier, Nation, Region.
The query used is
![Page 17: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/17.jpg)
Experimental Evaluation
![Page 18: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/18.jpg)
Experimental Evaluation
![Page 19: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/19.jpg)
Approximate query answering is becoming increasingly essential in data warehouses.
One of the fundamental problems faced here : computing approximate answers to aggregates on multi way joins.
Join synopses : Solution for schemas that involve foreign key joins.
Provides better performance than schemas based on base samples
Can be maintained efficiently during updates Approximating answers for group by, rank and set
valued queries still remains.
Conclusion
![Page 20: Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eda5503460f94be9cbd/html5/thumbnails/20.jpg)
Thank You.