Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

17
Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang Presented by Archana Vijayalakshmanan

description

Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang. Presented by Archana Vijayalakshmanan. Contents. Introduction Example Advantages Requirements Approaches to building a system System issues Conclusion. +. AVG. Query Results. 3.262574342. - PowerPoint PPT Presentation

Transcript of Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Page 1: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Online AggregationJoseph M. HellersteinPeter J.HaasHelen J.Wang

Presented by

Archana Vijayalakshmanan

Page 2: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Contents

Introduction Example Advantages Requirements Approaches to building a system System issues Conclusion

Page 3: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Online Aggregation: Motivation

Select AVG(grade) from ENROLL; A “fancy” interface:

+Query Results

AVG3.262574342

Page 4: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

A Better Approach

Don’t process in batch! Online aggregation:

Page 5: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Example Select AVG(grade) from ENROLL

GROUP BY major;

Page 6: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Advantages

• stopping condition set on the fly!• statistical techniques are more sophisticated• can handle GROUP BY w/o a priori

knowledge

Page 7: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Requirements

Usability Continuous output

non-blocking query plans

time/precision control fairness/partiality

Performance time to accuracy time to completion pacing

Page 8: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

A Naive Approach

SELECT running_avg(final_grade),

running_confidence(final_grade),

running_interval(final_grade) FROM grades;No groupingCan’t meet performance & usability needs:

no guarantee of continuous output no guarantee of fairness (or control over partiality) no control over pacing

Page 9: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Random Access to Data

Heap ScanOK if clustering uncorrelated to agg & grouping attrs

Index Scan can scan an index on attrs uncorrelated to agg or

grouping Sampling from indices

could introduce new sampling access methods (e.g. Olken’s work)

Page 10: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Group By & Distinct

• Can’t sort! sorting blocks sorting is unfair

• Must use hash-based techniques non-blocking approach but do not scale gracefully.

• Hybrid Hashing.• “Hybrid Cache” even better.

Page 11: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Index Striding

For fair Group By:read tuples in round-robin fashion.

(want random tuple from Group 1, random tuple from Group 2, ...)

each group is updated at appropriate rate.gives info/speed match!

Page 12: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Join Algorithms

Non-Blocking Joinsno sorting!merge join OK, but watch for the sorted output hybrid hash not greatsymmetric pipeline hashnested loops always good, can be too slow

Page 13: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Query Optimization

Avoid sorting Blocking sub-operations

2 components in cost function: dead time (td ): time spent doing “invisible” work -- tax

this at a high rate! output time (to ): time spent producing output

Preference to plans that maximize user control e.g., index striding

Page 14: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Extended Aggregate Functions

Basically,aggregate functions must provide running estimates

SUM,COUNT-straight forward

VAR,STD DEV-algorithms return confidence intervals

Page 15: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

APICurrent API uses built-in methods

e.g., StopGroup(cursor,groupval) speedUpGroup(cursor,groupval)

slowDownGroup(cursor,groupval)

setSkipFactor(cursor name,integer)

Page 16: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Future Work

Better UI -online data visualization (Tioga DataSplash)

data viz = “graphical” aggregate

- “drill down” and roll up” facilities Nested Queries Control w/o Indices Checkpointing/continuation Tracking online queries Extensions of statistical results

Page 17: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

References

control.cs.berkeley.edu/online/olamd/olamd.PPT