Incremental Aggregation on Multiple Continuous Queries

Incremental Aggregationon Multiple Continuous Queries

Chun JinCarnegie Mellon University

09/28/2006 ISMIS, Bari Italy

•Intelligence monitoring•Fraud detection•Onset epidemic patterns•Network intrusion detection•GeoSpacial changes

•Transactions•Senor network readings•Network traffic data

Stream Processing

Problem

• Aggregate queries

• Continuous evaluation

• Multiple concurrent queries

Solutions

• Incremental aggregation

• Incremental multiple aggregate query optimization (incremental sharing)

Roadmap

• System overview

• Query examples

• Incremental Aggregation

• Incremental sharing

• Evaluation

QueryNetwork

QueryCoordinator

SystemCatalog

Common Computation Identifier

Network Operation Manager (NOM)

Code Assembler

Sharing Optimizer(SO)

Projection Manager(PM)

System ArchitectureNew Query Insertion:1. Index query network2. Identify common computation3. Select optimal sharing path4. Expand query network

Query Network Execution:1. Code assembly2. Incremental aggregation3. Periodical execution

Engine

Generator

Oracle

hospital vdate COUNT(*) SUM(fee) AVERAGE(fee)

dis_cat hospital vdate COUNT(*) SUM(fee) AVERAGE(fee)

SELECT dis_cat, hospital, vdate,COUNT(*), AVERAGE(fee)

FROM MedGROUP BY CAT(disease) AS dis_cat,

hospital, DAY(visit_time) AS vdate(a) Query A

SELECT hospital, vdate,AVERAGE(fee)

FROM MedGROUP BY hospital,

DAY(visit_time) AS vdate(b) Query B

Query Examples

Roadmap

• System overview

• Query examples

• Incremental Aggregation• Incremental sharing

• Evaluation

Aggregate Function Types

• Distributive: aggregate function itself. Sum, count.

• Algebraic: a finite set of aggregate functions. Average.

• Holistic: no such finite set. Quantiles.

Incremental Aggregation

Holistic Aggregation

• Revisiting the entire history.

• Usage: – For holistic aggregates.– For post-non-incrementally-evaluated

aggregates.– Baseline to incremental aggregation.

GID COUNT(*)

AS COUNTA

SUM(fee)

AS SUMA

AVERAGE(fee)

AS AVGA

GID COUNT(*)

AS COUNTA

SUM(fee)

AS SUMA

AVERAGE(fee)

AS AVGA

0: PreUpdate State

1: Aggregate AN

t1: AH

t2: AN

2: Merge Groupst2.COUNTA = t1.COUNTA + t2.COUNTAt2.SUMA = t1.SUMA + t2.SUMA

3: Compute Algebraic Aggregate

COUNTAt

SUMAtAVGAt

4: Drop Duplicates

5: Insert New Results

Algorithm

Complexity

1. Aggregate SN. T1 = O(|SN|)

2. Merge groups in AH to AN. Tcurr2 = O(|AH| + |AN|), Thash2 = O(|AH| + |AN|), Tprefetch2 = O(|AN|)

3. Compute algebraic aggregates in AN. T3 = O(|AN|)

4. Drop duplicates. Tcurr4 = O(|AN|*|AN

H|) = O(|AN|2), Thash4 = O(|AH|+|AN|), Tprefetch4 = O(|AN|)

5. Insert new results. T5 = O(|AN|)Incremental Aggregation

Implementation

• System catalog:– AggreRules– AggreBasics

• Incremental aggregation instantiation

System Catalog

Function Category Incremental Aggregation Rule

Vertical Expansion Rule

AVERAGE A SUMX/COUNTW SUMX/COUNTW

SUM D SUMX(H)+SUMX(N) SUM(SUMX)

MEDIAN H NULL NULL

COUNT D COUNTW(H)+COUNTW(N) SUM(COUNTW)

Function Basics Basic ID

AVERAGE COUNT(W) COUNTW

AVERAGE SUM(X) SUMX

SUM SUM(X) SUMX

COUNT COUNT(W) COUNTW

AggreBasics

AggreRules

COUNTW

SUMXAVERAGE )()()(

)()()(

NCOUNTWHCOUNTWNCOUNTW

NSUMXHSUMXNSUMX

AggreRules:AggreBasics:AVERAGE: SUM(X): SUMXAVERAGE: COUNT(W): COUNTW

New Query A:AVERAGE(fee)

GroupColumns:SUM(fee): SUMACOUNT(*): COUNTAAVERAGE(fee): AVGA

AVERAGE fee

COUNTA

SUMAAVGA

COUNTW

SUMXfeeAVERAGE )(

COUNTW

SUMXAVGA

COUNTAt

SUMAtAVGAt

)()()(

NCOUNTAHCOUNTANCOUNTA

NSUMAHSUMANSUMA

COUNTAtCOUNTAtCOUNTAt

SUMAtSUMAtSUMAt

.2.1.2

SUM(X) SUMXCOUNT(W) COUNTW

SUM(fee) SUMXCOUNT(*) COUNTW

retrieve rules

substitute

insert columns

SUM(fee) SUMX

COUNT(*) COUNTW

COUNTAAVERAGE(fee) AVGA

Name Mapping:

InstantiationIncremental Aggregation

Roadmap

• System overview

• Query examples

• Incremental sharing• Evaluation

Incremental Multiple Query Optimization (Incremental Sharing)

• Index existing query plan information R.

• Given a new query Q, identify the sharable computations from R.

• Select the optimal sharing path.

• Expand R to compute Q.

Incremental Sharing

Expanding Query Network

• Limited sharing on holistic aggregates

• Sharing on distributive/algebraic aggregates through vertical expansion

Incremental Sharing

BID Rest

COUNT(*)

AS COUNTA

SUM(fee)

AS SUMA

AVERAGE(fee)

AS AVGA

1: Further Aggregate:COUNTB=SUM(COUNTA)SUMB=SUM(SUMA)GROUP BY BID

COUNTB

SUMBAVGB

BID COUNT(*)

AS COUNTB

SUM(fee)

AS SUMB

AVERAGE(fee)

AS AVGB

1: Further AggregateCOUNTB=SUM(COUNTA)SUMB=SUM(SUMA)GROUP BY BID

Vertical Expansion

Incremental Sharing

Vertical Expansion

Rest ID

COUNT(*)

AS COUNTA

SUM(fee)

AS SUMA

Rest ID

COUNT(*)

AS COUNTB

SUM(fee)

AS SUMB

AVERAGE(fee)

AS AVGB

2: Merge Groupst2.COUNTA = t1.COUNTA + t2.COUNTAt2.SUMA = t1.SUMA + t2.SUMA

1: Further AggregateCOUNTB=SUM(COUNTA)SUMB=SUM(SUMA)GROUP BY BID

Vertical Expansion

3: Compute Algebraic Aggregate

COUNTB

SUMBAVGB

COUNT(*)

AS COUNTB

SUM(fee)

AS SUMB

AVERAGE(fee)

AS AVGB

BDig BN

4: Drop Duplicates

5: Insert New Results

Vertical Expansion Complexity

• TVcurr = O(|AN|2 + |BH|)

• TVhash = O(|AN| + |BH|)

• TVprefetch = O(|AN|)

Incremental Sharing

Original DirectParent NodeName GroupID

Original ExprCanonical ColumnName NodeName

Original GroupExprCanonical GroupExprID

GroupExprID GroupID

GroupTopology

GroupExprSet

GroupExprIndex

GroupColumns

Incremental Sharing

SystemCatalog

Select Optimal Sharing Path

• Select least-size node for sharing

Incremental Sharing

Rerouting

Animation Evolution

Incremental Sharing

Roadmap

• System overview

• Query examples

• Incremental sharing

• Evaluation

Evaluation

• Databases: – Synthesized FedWire money transfers– Anonymized Medical patient admission records

• Queries:– Seed queries– Generate sharable queries from seeds– A wild range of queries (aggregates in this paper)

• Simulation:– Historical data (300000 on Fed, and 600000 on Med)– Chunks of new data (4000 per chunk)

Evaluation

(350 queries)

(450 queries)

662 316

Non Incremental Aggregation

6236 938

Total execution time in seconds

Evaluation

Number of FED queries

0 50 100 150 200 250 300 350

SIA NS-IA

(a) FedEvaluation

0 50 100 150 200 250 300 350 400 450

SIA NS-IA

Number of MED queries

(a) MedEvaluation

Conclusion

• Multiple aggregates over streams• Solutions:

– Incremental aggregation– Incremental MQO (incremental sharing)– Built atop DBMSs for direct practical utility

• Big performance improvement• Future work:

– A broad range of queries– Built atop DSMSs.

Acknowledgement

• Work with Professor Jaime Carbonell.

• Part of ARGUS by CMU and Dynamix.

• Team: Phil Hayes, Santosh Ananthraman, Bob Frederking, Eugene Fink, Dwight Dietrich, Ganesh Mani, Johny Mathew.

• Thanks to Professor Chris Olston for helpful discussion.

1 3 10 33 100 333 1000 3333 10000 30000

Incremental Size: |SN|

NonVE ITTVE ITT

Non-VE IBTVE IBT

FED Query Pair 1

(a) Pair 1Evaluation

Incremental Aggregation on Multiple Continuous Queries

Documents

Transcript of Incremental Aggregation on Multiple Continuous Queries

Incremental Queries and Transformations for Engineering Critical Systems

BlazeIt: Optimizing Declarative Aggregation and Limit Queries for … · 2019-12-17 · FRAMEQL can express selection queries in prior work [6,9,46,55], along with new classes of

MongoDB: Queries and Aggregation Framework with NBA Game Data

Moment-Based Quantile Sketches for Efﬁcient High Cardinality Aggregation Queries … · 2019-07-12 · for Efﬁcient High Cardinality Aggregation Queries Edward Gan, Jialin Ding,

IncQuery-D: Distributed Incremental Graph Queries

Lecture 6: Full-Relation Queries - cs.ucc.iekieran/cs1106/lectures/lecture6_grouping.pdf · Summary SQL’s aggregation functions. Aggregation and grouping. KH (06/10/17) Lecture

An Incremental Nearest Neighbor Algorithm with Queries · PDF fileAn Incremental Nearest Neighbor Algorithm with Queries 613 however labeling such patterns are not necessarily representative

Authenticated Index Structures for Aggregation Queriesreyzin/papers/auth-db-agg.pdfAuthenticated Index Structures for Aggregation Queries · 3 that contain in reality outdated results.

Hold ’em or Fold ’em? Aggregation Queries under ... › Pubs › TechRpts › 2015 › EECS-201… · 2Aggregation Queries Aggregation queries are widely prevalent in modern sys-tems.

Continuous Queries over Append-Only Databasescs227b/papers/pubsub/TGNO92-Conti… · sequence of incremental queries is the same as executing the orig-inal user query after every

Approximate and Incremental Processing of …Approximate and Incremental Processing of Complex Queries against the Web of Data Thanh Tran, Gun ter Ladwig, and Andreas Wagner Institute

BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

Winter Semester 2003/2004Selected Topics in Web IR and Mining6-1 6 Rank Aggregation and Top-k Queries 6.1 Fagin‘s Threshold Algorithm 6.2 Rank Aggregation.

Sharded Joins for Scalable Incremental Graph Queries

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges

Incremental Model Queries over EMF Models · Incremental Model Queries over EMF Models? Gabor Bergmann´ 1, Akos Horv´ ath´ 1, Istvan R´ ´ath 1, Daniel Varr´ o´1, Andras Balogh´

NESTED QUERIES AND AGGREGATION - Home | …tozsu/courses/CS338/lectures/6 Nested SQL.pdf · More Complex SQL Retrieval Queries •Self-Joins •Renaming Attributes and Results ...

CBiX: Incremental Sliding-Window Aggregation For Real-Time … · 2019. 3. 27. · DEIM Forum 2019 F7-5 CBiX: Incremental Sliding-Window Aggregation For Real-Time Analytics Over Out-of-Order

Efficient and Semantic OLAP Aggregation Queries in a …ijiee.org/papers/190-X145.pdfEfficient and Semantic OLAP Aggregation Queries in a Peer to Peer Network Yang Kehua and Agnes

WS-Aggregation: Distributed Aggregation of Web Services Data · usually priced by the amount of queries or analytics run. Aggregation of Web services data is sometimes achieved with