Ripple Joins for Online Aggregation

26
Ripple Joins for Online Aggregation By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/2010 1 CSE 6339 - Data Exploration

description

Ripple Joins for Online Aggregation. By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi. Overview. What the paper is all about Traditional Algorithms Online Aggregation Ripple Joins: Introduction How different is Ripple join - PowerPoint PPT Presentation

Transcript of Ripple Joins for Online Aggregation

Page 1: Ripple Joins for Online Aggregation

1

Ripple Joins for Online Aggregation

By:Peter J. Haas and Joseph M. Hellersteinpublished in June 1999

:

Presented By:Sthuti Kripanidhi

9/28/2010 CSE 6339 - Data Exploration

Page 2: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 2

OverviewWhat the paper is all aboutTraditional Algorithms Online AggregationRipple Joins: IntroductionHow different is Ripple joinRipple Join variantsAspect ratiosFuture Work

9/28/2010

Page 3: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 3

What the paper is about..The paper talks about a class of join algorithms

called Ripple joins for the online processing of multi-table aggregation queries.

This paper tells how to join a bunch of tables and get the SUM, COUNT, or AVG in GROUP BY clauses showing approximate results immediately and the confidence interval of the results from the first few tuples retrieved.

9/28/2010

Page 4: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 4

Traditional AlgorithmsTraditional algorithms take a lot of time

since they have to process the entire tables or relations

The users have to wait for a long time before the results are returned.

An better method is Online Aggregation.

9/28/2010

Page 5: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 5

Online AggregationA running estimate of the final aggregates

are continuously displayed to the user.Quick results rather than minimize time

for completion.The proximity of the running estimate to

the final result is also displayed to the user.(confidence interval).

9/28/2010

Page 6: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 6

GUI

9/28/2010

Page 7: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 7

Ripple Joins: IntroductionGeneralize the traditional block nested loops

and hash joins.

Non blocking

Square ripple join – samples are drawn at the same rate

Rectangular ripple join – samples out one relation at a higher rate than another.

9/28/2010

Page 8: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 8

Ripple Join: IntroductionTypical query forms

SELECT op(expression) FROM R1, R2, … , RKWHERE predicateGROUP BY columns;

9/28/2010

Page 9: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 9

How different is Ripple join?Traditional hash join blocks until the entire

query output is finished.Ripple join reports approximate results after

each sampling step, and allows user intervention.

In the inner loop, an entire table is scanned. Ripple join expands the sample set

incrementally.

Ripple joins avoid complete scan of the relations.9/28/2010

Page 10: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 10

How Ripple Join works..Assume ripple join of relations R and S Select a random tuple r from R. Join with previously selected S tuples. Select a random tuple s from S. Join with previously selected R tuples. Join r and s.

9/28/2010

Page 11: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 11

Ripple Join: Square two table join

9/28/2010

RS X

N = 1

Page 12: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 129/28/2010

RS X X

X X

N = 2

Page 13: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 139/28/2010

RS X X X

X X XX X X

N = 3

Page 14: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 14

Ripple Join AlgorithmFor(max=1 to infinity){ for(i=1 to max-1) if(predicate(R[i],s[max])) output(R[i],S[max]); for(i=1 to max) if(predicate(R[max],s[i])) output(R[max],S[i]);}

9/28/2010

Page 15: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 15

Ripple Join IteratorAn iterator based DBMS invokes an iterator’s

next() method each time an output tuple is needed.

The iterator needs to store the next position to be fetched from each of its inputs R and S.

9/28/2010

Page 16: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 16

PipeliningCan easily be pipelined for multiple binary

joins

Cannot do three-table joins as two binary ripple joins.

9/28/2010

Page 17: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 17

Ripple Join VariantsBlock Ripple JoinHash Ripple JoinIndex Ripple Join

9/28/2010

Page 18: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 18

Block Ripple JoinTakes disk blocks of R and S in turn (not

tuples)Read a disk block of R and scan against old S Evict from memoryRead Block of S and compare with older R

tuples.

Has I/O saving since each block is taken out at a time.

9/28/2010

Page 19: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 19

Index and Hash Ripple JoinsIndex Ripple Join

Identical to indexed-enhanced nested loop join

Hash Ripple JoinUsed only for Equijoin queries.

9/28/2010

Page 20: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 20

Statistical ConsiderationsGoal-to provide efficient, accurate,

interactive estimation.Estimator unbiased, consistentRunning average is biased but consistent

Capable of giving tight confidence intervals

9/28/2010

Page 21: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 21

Aspect Ratios Aspect ratio: how many tuples are

retrieved from each base relation per sampling step.

e.g. β1 = 1, β2 = 3, …

Ripple join adjusts the aspect ratio according to the sizes of the base relations.

9/28/2010

Page 22: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 22

Why is it called Ripple Join?

9/28/2010

1. The algorithm seems to ripple out from a corner of the join.

2. Acronym: "Rectangles of Increasing Perimeter Length"

Page 23: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 23

Performance

9/28/2010

Page 24: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 24

Conclusions and Future Work Complete implementation of online

aggregation must be able to handle multi-table queries.

This paper introduces ripple joins, a family of join algorithms designed to meet the performance needs of online aggregation system.

9/28/2010

Page 25: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 25

Though ripple joins are symmetric, it is still not clear how a query optimizer should choose among the ripple join variants, nor how it should order a sequence of ripple joins.

9/28/2010

Page 26: Ripple Joins for Online Aggregation

CSE 6339 - Data Exploration 26

ReferencesHaas & Hellerstein, “Ripple Joins for Online

Aggregation” (SIGMOD ’99)

Haas & Hellerstein, “Online Query Processing: A Tutorial”

P. J Haas, J.M Hellerstein and H.J Wang Online aggregation. In Proc. 1997 ACM SIGMOD Intl Conf. Management of data pages.

9/28/2010