Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur...
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
2
Transcript of Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur...
![Page 1: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/1.jpg)
Monitoring Streams -- A New Class of Data Management Applications
Don Carney Brown University
Uğur Çetintemel Brown University
Mitch Cherniack Brandeis University
Christian Convey Brown University
Sangdon Lee Brown University
Greg Seidman Brown University
Michael Stonebraker MIT
Nesime Tatbul Brown University
Stan Zdonik Brown University
![Page 2: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/2.jpg)
Background
• MIT/Brown/Brandeis team• First Aurora, then Borealis
– Practical system– Designed for Scalablility: 106 stream inputs, queries– QoS-Driven Resource Management – Stream Storage Management – Realiability/ Fault Tolerance– Distribution and Adaptivity
• First stream startup: StreamBase– Financial applications
![Page 3: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/3.jpg)
Example Stream Applications
• Market Analysis– Streams of Stock Exchange Data
• Critical Care– Streams of Vital Sign Measurements
• Physical Plant Monitoring– Streams of Environmental Readings
• Biological Population Tracking– Streams of Positions from Individuals of a Species
![Page 4: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/4.jpg)
Not Your Average DBMS
1. External, Autonomous Data Sources
2. Querying Time-Series
3. Triggers-in-the-large
4. Real-time response requirements
5. Noisy Data, Approximate Query Results
![Page 5: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/5.jpg)
Outline
2. Aurora Overview/ Query Model
3. Runtime Operation
4. Adaptivity
![Page 6: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/6.jpg)
Aurora from 100,000 Feet
Query App QoS...
...
Query App QoS
...
Query App QoS
...
...
...
...
Each Provides:
• A over input data streams
• A Quality-Of-Service Specification ( )(specifies utility of partial or late results)
Application
Query
QoS
![Page 7: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/7.jpg)
Aurora from 100 Feet
App QoS...
...
App QoS
...
App QoS
...
...
Queries = Workflow (Boxes and Arcs)
• Workflow Diagram = “Aurora Network”
• Boxes = Query Operators
• Arcs = Streams
Slide
Tumble
Streams (Arcs)
• stream: tuple sequence from common source
(e.g., sensor)
• tuples timestamped on arrival (Internal use: QoS)
Query Operators (Boxes)
• Simple: FILTER, MAP, RESTREAM
• Binary: UNION, JOIN, RESAMPLE
• Windowed: TUMBLE, SLIDE, XSECTION, WSORT
![Page 8: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/8.jpg)
Aurora in Action
App QoS...
...
App QoS
...
App QoS
...
...
Slide
Tumble
App
TumbleTumble App
“Box-at-a-time” Scheduling
Arcs Tuple Queues
Outputs Monitored for QoS
![Page 9: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/9.jpg)
…
Continuous and Historical Queries
ad-hoc query
O4
O5
QoS
App…
O1 O3O2
continuous query
QoS
App… …Queues
O7O8 O9
view3 Days
QoS… …
ConnectionPoint
1 Hour
![Page 10: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/10.jpg)
Quality-of-Service (QoS)
Output Value
Specifies “Utility” Of Imperfect Query ResultsDelay-Based (specify utility of late results)Delivery-Based, Value-Based (specify utility of partial results)
QoS Influences…
Scheduling, Storage Management, Load Shedding
% Tuples Delivered
B
Delay
A C
![Page 11: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/11.jpg)
Talk Outline
1. Introduction
2. Aurora Overview
3. Runtime Operation
4. Adaptivity
5. Related Work and Conclusions
![Page 12: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/12.jpg)
Runtime OperationBasic Architecture
Scheduler
QOSMonitor
Box Processors
.
.
.
Buffer
Storage Manager
Persistent Store
…q1…q2
…qi
…q1
…qn
.
.
.
…q2
...
.
.
.
Catalog
Router
inputs outputs
![Page 13: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/13.jpg)
Runtime OperationScheduling: Maximize Overall QoS
Choice 1: A: Cost: 1 sec(…, age: 1 sec)
B: Cost: 2 sec(…, age: 3 sec)
Delay = 2 secUtility = 0.5
Delay = 5 secUtility = 0.8
Schedule Box A now rather than later
Ideal: Maximize Overall Utility Presently exploring scalable heuristics (e.g., feedback-based)
Choice 2:
![Page 14: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/14.jpg)
Runtime OperationScheduling: Minimizing Per Tuple Processing Overhead
Train Scheduling:
A B… xyz A (x)A (y)A (z) B (A (x))B (A (y))B (A (z))
Default Operation: = Context Switch
AB… xyz B (A (x))B (A (y))B (A (z))Box Trains:
A B… xyz A (z, y, x) B (A (z), A (y), A (x))Tuple Trains:
![Page 15: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/15.jpg)
1. Run-time Queue Management
Prefetch Queues Prior to Being Scheduled
Drop Tuples from Queues to Improve QoS
2. Connection Point Management
Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, …
Runtime OperationStorage Management
![Page 16: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/16.jpg)
Talk Outline
1. Introduction
2. Aurora Overview
3. Runtime Operation
4. Adaptivity
5. Related Work and Conclusions
![Page 17: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/17.jpg)
Stream Query Optimization
• Differences with Traditional Query Optimization?
![Page 18: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/18.jpg)
Motivation of ‘Query Migration’
• Continuous query over streams– Statistics unknown before start– Statistics changing during execution
• Stream rates, arrival pattern, distribution, etc
• Need for dynamic adaptation– Plan re-optimization
• Change the shape of query plan tree
![Page 19: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/19.jpg)
Stream Query Optimization• New classes of operators (windows) may mean
new rewrites• New execution modes (continuous/pipelining)• More dynamic fluctuations in statistics compile
time optimization not possible• Global optimization not practical; as huge query
networks Adaptive optimization.• Other cost models taking memory into account, not
throughput but output rate, etc.• Query optimization and load shedding
![Page 20: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/20.jpg)
Query Optimization
Compile-time, Global Optimization Infeasible
Too Many Boxes
Too Much Volatility in Network, Data
Dynamic, Local OptimizationScope re what to optimize
Threshold re when to optimize
![Page 21: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/21.jpg)
Run-time Plan Re-Optimization
• Step 1 - Decide when to optimize– Statistics Monitoring
• Step 2 – Generate new query plan– Query Optimization
• Step 3 – Replace current plan by new plan– Plan Migration
![Page 22: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/22.jpg)
Adaptivity in Query Optimization
Dynamic Optimization : Migration
3. Drain Subnetwork4. Optimize Subnetwork5. Turn on Taps
1. Identify Subnetwork2. Buffer Inputs
![Page 23: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/23.jpg)
Stateful Operator in CQ
• But what about stateful operators ?– Need non-blocking operators in CQ– Operator needs to output partial results– State data structure keep received tuples
AB
A B
b1b2b3b4b5
ax
State A State B
ax
ax b2ax b3
Key Observation: The purge of tuples in states relies on processing of new tuples.
Example: Symmetric NL join w/ window constraints
![Page 24: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/24.jpg)
Naïve Migration Strategy Revisited
• Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan
AB
BC
A B C(2)
All tuples drained
(4)Processing
Resumed
(3) Old Replaced
By new
Deadlock Waiting Problem:
![Page 25: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/25.jpg)
AdaptivityQuery Optimization
State Movement Protocol
Parallel Track Protocol
![Page 26: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/26.jpg)
Moving State Strategy
• Basic idea– Share common states
between two migration boxes
• Key steps– State Matching
• Match states based on IDs.– State Moving
• Create new pointers for matched states in new box
– What’s left?• Unmatched states in new
box
CDSABC SD
BCSAB SC
ABSA SB
ABSA SBCD
CDSBC
SD
BCSB SC
QA QB QC QD QA QB QC QD
QABCD QABCD
Old Box New Box
![Page 27: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/27.jpg)
Parallel Track Strategy
• Basic idea– Execute both plans
in parallel and gradually “push” old tuples out of old box by purging
• Key steps– Connect boxes– Execute in parallel
• Until old box “expired” (no old tuple or sub-tuple)
– Disconnect old box– Start execute new
box only
CD
SABC SD
BC
SAB SC
AB
SA SB
AB
SASBCD
CD
SBC SD
BCSB SC
QA QB QCQD
QA QB QC QD
QABCD QABCD
![Page 28: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/28.jpg)
1. Two Load Shedding Techniques:• Random Tuple Drops
Add DROP box to network (DROP a special case of FILTER)Position to affect queries w/ tolerant delivery-based QoS reqts
• Semantic Load SheddingFILTER values with low utility (acc to value-based QoS)
2. Triggered by QoS Monitor
e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS
AdaptivityLoad Shedding
![Page 29: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/29.jpg)
AdaptivityDetecting Overload
Throughput Analysis
Cost = cSelectivity = s
Input rate = r Output rate = min (1/c, r) * s
1/c > r Problem
C,SI O
P
C,SI O
P
C,SI O
P
C,SI O
P
C,SI O
P
C,SI O
P
C,SI O
P
C,SI O
P
C,SI O
P
Monitor each application’s Delay-based QoS
Problem: Too many apps in “bad zone”
Latency Analysis
![Page 30: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/30.jpg)
ImplementationGUI
![Page 31: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/31.jpg)
ImplementationRuntime
0 1 2 3 4 56
![Page 32: Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d775503460f94a590ae/html5/thumbnails/32.jpg)
ConclusionsAurora Stream Query Processing System
1. Designed for Scalability
2. QoS-Driven Resource Management
3. Continuous and Historical Queries
4. Stream Storage Management
5. Implemented Prototype
Web site: www.cs.brown.edu/research/aurora/