Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
-
date post
21-Dec-2015 -
Category
Documents
-
view
224 -
download
2
Transcript of Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
Towards Adaptive Dataflow Infrastructure
Joe Hellerstein, UC Berkeley
Online Query Processing:The CONTROL Project (’96-’01)
Data Analysis on massive datasets takes forever No feedback, 100% accuracy
Challenge: make queries more like image delivery But images are pre-encoded in progressive format Query is ad hoc
Solution: Online Aggregation Continuous sampling w/o replacement New pipelining query processing algorithms with good
statistical properties (e.g. Ripple Joins) and user control (Online Reordering – “Juggle”)
Estimators and confidence intervals for aggregates
Streaming samples, streaming answers
Images Are Aggregates
Can do Online “Enumeration” Too
“Potter’s wheel”
Volatility in Streaming Queries:Analogies for Sensors
Query engines map queries to dataflows Flow graph laid out by a query optimizer (typically on cluster) Query executor runs the flow
User priorities change during CONTROL queries Breaks “compile-then-run” query optimization paradigm Dynamic reordering of commutative tasks: f(g(x))? g(f(x)) ? Dynamic reordering of data objects: x1, x2, x3, … Requires dynamic competition among choices: f(x) or f’(x)?
Volatile networks are similar Hard to predict rates of consumption/production a priori Volatile over time, and queries may run “forever” Imagine interactive user “cockpit" on the sensor net!
Added metrics of power and data quality And different kinds of volatility, no doubt
Adaptive Dataflow: Convergence of DBs/Nets
The idea from two angles Queries are flows, query optimization is routing
Sensor queries need nets-style adaptivity New networking SW looks like a query engine
Click, Scout. Also CANs. Sensor Qs need DB-style semantic optimization (up to app)
Telegraph: An Adaptive Dataflow System Boxes & Arrows dataflow programming Adaptive reoptimization of the flow graph (Eddies) Adaptive prioritization of the delivery (Juggle) Adaptive load-balancing/FT across nodes (FLuX) Mix Push/Pull to blend streams and pools (Fjords)
Extra Slides on Telegraph
Telegraph Apps to Date
Web Queries: Election 2000 http://fff.cs.berkeley.edu
Enhanced P2P functionality Query by album or artist, via joins with web data Working on pure P2P query processing
Initial sensor app Join I-80 traffic movement with webcams and
incidents Smart Dust Mote simulations
Telenap: Amazon Meets Napster
Movie Stars Who Donated to Bush
Query >> Search: http://fff.cs.berkeley.edu
“Federated Facts and Figures” Yahoo join FECInfo
Query >> Search:http://fff.cs.berkeley.edu
“Federated Facts and Figures” APBNews join
FECInfo