Ripple Joins for Online Aggregation
description
Transcript of Ripple Joins for Online Aggregation
1
Ripple Joins for Online Aggregation
By:Peter J. Haas and Joseph M. Hellersteinpublished in June 1999
:
Presented By:Sthuti Kripanidhi
9/28/2010 CSE 6339 - Data Exploration
CSE 6339 - Data Exploration 2
OverviewWhat the paper is all aboutTraditional Algorithms Online AggregationRipple Joins: IntroductionHow different is Ripple joinRipple Join variantsAspect ratiosFuture Work
9/28/2010
CSE 6339 - Data Exploration 3
What the paper is about..The paper talks about a class of join algorithms
called Ripple joins for the online processing of multi-table aggregation queries.
This paper tells how to join a bunch of tables and get the SUM, COUNT, or AVG in GROUP BY clauses showing approximate results immediately and the confidence interval of the results from the first few tuples retrieved.
9/28/2010
CSE 6339 - Data Exploration 4
Traditional AlgorithmsTraditional algorithms take a lot of time
since they have to process the entire tables or relations
The users have to wait for a long time before the results are returned.
An better method is Online Aggregation.
9/28/2010
CSE 6339 - Data Exploration 5
Online AggregationA running estimate of the final aggregates
are continuously displayed to the user.Quick results rather than minimize time
for completion.The proximity of the running estimate to
the final result is also displayed to the user.(confidence interval).
9/28/2010
CSE 6339 - Data Exploration 6
GUI
9/28/2010
CSE 6339 - Data Exploration 7
Ripple Joins: IntroductionGeneralize the traditional block nested loops
and hash joins.
Non blocking
Square ripple join – samples are drawn at the same rate
Rectangular ripple join – samples out one relation at a higher rate than another.
9/28/2010
CSE 6339 - Data Exploration 8
Ripple Join: IntroductionTypical query forms
SELECT op(expression) FROM R1, R2, … , RKWHERE predicateGROUP BY columns;
9/28/2010
CSE 6339 - Data Exploration 9
How different is Ripple join?Traditional hash join blocks until the entire
query output is finished.Ripple join reports approximate results after
each sampling step, and allows user intervention.
In the inner loop, an entire table is scanned. Ripple join expands the sample set
incrementally.
Ripple joins avoid complete scan of the relations.9/28/2010
CSE 6339 - Data Exploration 10
How Ripple Join works..Assume ripple join of relations R and S Select a random tuple r from R. Join with previously selected S tuples. Select a random tuple s from S. Join with previously selected R tuples. Join r and s.
9/28/2010
CSE 6339 - Data Exploration 11
Ripple Join: Square two table join
9/28/2010
RS X
N = 1
CSE 6339 - Data Exploration 129/28/2010
RS X X
X X
N = 2
CSE 6339 - Data Exploration 139/28/2010
RS X X X
X X XX X X
N = 3
CSE 6339 - Data Exploration 14
Ripple Join AlgorithmFor(max=1 to infinity){ for(i=1 to max-1) if(predicate(R[i],s[max])) output(R[i],S[max]); for(i=1 to max) if(predicate(R[max],s[i])) output(R[max],S[i]);}
9/28/2010
CSE 6339 - Data Exploration 15
Ripple Join IteratorAn iterator based DBMS invokes an iterator’s
next() method each time an output tuple is needed.
The iterator needs to store the next position to be fetched from each of its inputs R and S.
9/28/2010
CSE 6339 - Data Exploration 16
PipeliningCan easily be pipelined for multiple binary
joins
Cannot do three-table joins as two binary ripple joins.
9/28/2010
CSE 6339 - Data Exploration 17
Ripple Join VariantsBlock Ripple JoinHash Ripple JoinIndex Ripple Join
9/28/2010
CSE 6339 - Data Exploration 18
Block Ripple JoinTakes disk blocks of R and S in turn (not
tuples)Read a disk block of R and scan against old S Evict from memoryRead Block of S and compare with older R
tuples.
Has I/O saving since each block is taken out at a time.
9/28/2010
CSE 6339 - Data Exploration 19
Index and Hash Ripple JoinsIndex Ripple Join
Identical to indexed-enhanced nested loop join
Hash Ripple JoinUsed only for Equijoin queries.
9/28/2010
CSE 6339 - Data Exploration 20
Statistical ConsiderationsGoal-to provide efficient, accurate,
interactive estimation.Estimator unbiased, consistentRunning average is biased but consistent
Capable of giving tight confidence intervals
9/28/2010
CSE 6339 - Data Exploration 21
Aspect Ratios Aspect ratio: how many tuples are
retrieved from each base relation per sampling step.
e.g. β1 = 1, β2 = 3, …
Ripple join adjusts the aspect ratio according to the sizes of the base relations.
9/28/2010
CSE 6339 - Data Exploration 22
Why is it called Ripple Join?
9/28/2010
1. The algorithm seems to ripple out from a corner of the join.
2. Acronym: "Rectangles of Increasing Perimeter Length"
CSE 6339 - Data Exploration 23
Performance
9/28/2010
CSE 6339 - Data Exploration 24
Conclusions and Future Work Complete implementation of online
aggregation must be able to handle multi-table queries.
This paper introduces ripple joins, a family of join algorithms designed to meet the performance needs of online aggregation system.
9/28/2010
CSE 6339 - Data Exploration 25
Though ripple joins are symmetric, it is still not clear how a query optimizer should choose among the ripple join variants, nor how it should order a sequence of ripple joins.
9/28/2010
CSE 6339 - Data Exploration 26
ReferencesHaas & Hellerstein, “Ripple Joins for Online
Aggregation” (SIGMOD ’99)
Haas & Hellerstein, “Online Query Processing: A Tutorial”
P. J Haas, J.M Hellerstein and H.J Wang Online aggregation. In Proc. 1997 ACM SIGMOD Intl Conf. Management of data pages.
9/28/2010