Adaptivity in continuous query systems
-
Upload
stanislaus-duscha -
Category
Documents
-
view
41 -
download
4
description
Transcript of Adaptivity in continuous query systems
![Page 1: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/1.jpg)
Adaptivity in continuous query systems
Luis A. Sotomayor & Zhiguo Xu
Professor Carlo ZanioloCS240B - Spring 2003
![Page 2: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/2.jpg)
Sotomayor - Xu 2
Outline Introduction Adapting to the “burstiness” of data
streams by using a smart operator scheduling strategy
Adapting to high volumes of data streamed by multiple data sources through the use of “adaptive filters”
Conclusion
![Page 3: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/3.jpg)
Sotomayor - Xu 3
Introduction Two distinguishing characteristics of
data streams: Volume of data is extremely high Decisions are made in close to real time
Traditional solutions are impractical Data cannot be stored in static databases
for offline querying Importance of data streams is due to
variety of applications
![Page 4: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/4.jpg)
Sotomayor - Xu 4
Applications of data streams Network monitoring Intrusion detection systems Fraud detection Financial monitoring E-commerce Sensor networks
![Page 5: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/5.jpg)
Sotomayor - Xu 5
Research efforts Large number of applications has led to
many efforts seeking to construct full-fledged DSMS
Efforts have concentrated on issues of System architectures Query languages Algorithm efficiency
Issues such as efficient resource allocation, and communication overhead have received less attention
![Page 6: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/6.jpg)
Sotomayor - Xu 6
Importance of adaptivity DSMS deal with multiple long-running continuous
queries Data streams do not usually arrive at a regular rate
Considerable “burstiness” and variation over time Environment conditions in which queries are
executed are frequently different from the conditions for which the query plans were generated
DSMS may face an increasing number of data sources and therefore an increased volume of traffic
![Page 7: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/7.jpg)
The “Chain” operator scheduling strategy
![Page 8: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/8.jpg)
Sotomayor - Xu 8
The classic solution Buffer the backlog of unprocessed
tuples Work through them during periods of
light load Problem:
Heavy load could exceed physical memory (causing page switches)
The memory used for these backlogs has to be minimized
![Page 9: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/9.jpg)
Sotomayor - Xu 9
Finding a better solution Claim: the operator scheduling
strategy can have a significant impact on run-time resource consumption
Use an operator scheduling strategy that will minimize the amount of memory used during query execution I.e. reduce the size of the backlogs
![Page 10: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/10.jpg)
Sotomayor - Xu 10
Chain scheduling A near optimal operator scheduling
strategy Outperforms competing operator
scheduling strategies Strategy concentrates on
Single stream queries involving Selection Projection Foreign-key joins with stored relations
Sliding window queries over multiple streams
![Page 11: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/11.jpg)
Sotomayor - Xu 11
The model Query execution is conceptualized as a data
flow diagram (a directed acyclic graph) Nodes correspond to pipelined operators Edges represent compositions of operators
An edge from A to B indicates the output of operator A is the input to operator B
Another interpretation: an edge represents an input queue that buffers the output from A before it is input to B
![Page 12: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/12.jpg)
Sotomayor - Xu 12
An example Suppose the query is
SELECT Name FROM EmployeeStream WHERE ID = ‘12345’;
Operators are Projection (SELECT …) Selection (WHERE …)
Input stream
Select ProjectOutput stream
Operator path
![Page 13: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/13.jpg)
Sotomayor - Xu 13
Main ideas Operators are thought of as filters
Operate on a set of tuples Produce s tuples in return
s selectivity of an operator If s = 0.2 we can interpret the value in
two ways Out of every 10 tuples, the operator outputs
2 tuples If the input requires 1 unit of memory, the
output will require 0.2 units of memory
![Page 14: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/14.jpg)
Sotomayor - Xu 14
Example Consider an operator path with two
operators O1 and O2 Assume that O1 takes one unit of time
to process a tuple and that its selectivity is 0.2
Assume that O2 takes one unit of time to process 0.2 tuples and that its selectivity is 0
I.e. O2 outputs tuples out of the system
![Page 15: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/15.jpg)
Sotomayor - Xu 15
Example (cont) Now consider two strategies
FIFO A tuple is passed through both operators in
two consecutive time units No other tuples are processed during that
time Greedy strategy
If there is a tuple buffered before O1 then it is operated on using one time unit
Otherwise if there are tuples buffered before O2, 0.2 tuples are processed using 1 time unit
![Page 16: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/16.jpg)
Sotomayor - Xu 16
Example (cont)
Time Greedy scheduling FIFO scheduling
0 1 11 1.2 1.22 1.4 2.03 1.6 2.24 1.8 3.05 2.0 3.26 2.2 4.0
Memory usage
Need to consider the growth or reduction of data as it travels along the operator path
![Page 17: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/17.jpg)
Sotomayor - Xu 17
Progress charts Behavior of data
is captured by progress charts Points represent
an operator The ith operator
takes (ti – ti-1) units of time to process a tuple of size si-1
Result is a tuple of size si
![Page 18: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/18.jpg)
Sotomayor - Xu 18
Progress charts (cont) We can define
selectivity as the drop in tuple size from operator i to operator i+1. In other words
selectivity is equal to si/si-1
selectivity
![Page 19: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/19.jpg)
Sotomayor - Xu 19
The lower envelope Consider some point (s,
t) on the progress chart Imagine there is a line
from this point to every operator point (ti, si) to its right
The operator that corresponds to the line with the steepest slope is called the “steepest descent operator point”
![Page 20: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/20.jpg)
Sotomayor - Xu 20
The lower envelope (cont) By starting at the first
point (t0, s0) and repeatedly calculating the steepest descent operator point we find the lower envelope P’ for a progress chart P
Notice that the slopes of the segments are non-increasing
![Page 21: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/21.jpg)
Sotomayor - Xu 21
The lower envelope (cont) So what is it?
A way to find which segments of the operator path yield the biggest drops in tuple size
It allows us to consider changes in selectivity across groups of operators We call these groups “chains”
![Page 22: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/22.jpg)
Sotomayor - Xu 22
“Chain” scheduling Chain assigns priorities to operators
equaling the slope of the lower envelope segment to which the operator belongs
At any time Out of all the operators with tuples in their
input queues the one with the highest priority is chosen
When there are “ties,” the operator with the oldest tuples is chosen (based on arrival time)
![Page 23: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/23.jpg)
Sotomayor - Xu 23
The Chain strategy along the progress chart Tuples don’t actually
move along lower envelope
They instead move along the operator path
When the Chain strategy moves along the actual progress chart P, the memory requirements are not that much greater than before
![Page 24: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/24.jpg)
Sotomayor - Xu 24
Multiple stream queries Queries that have at least one
tuple-based sliding window join between two streams
![Page 25: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/25.jpg)
Sotomayor - Xu 25
Multiple stream query execution
Query is first broken up into parallel operator paths
R
S
R
S
Shared
![Page 26: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/26.jpg)
Sotomayor - Xu 26
Experimental results Compared the performance of
Chain, FIFO, Greedy, and Round-Robin
2 data sets (network data) Synthetic data set Real data set
Queries used IP addresses and packet sizes in selection and projection predicates
![Page 27: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/27.jpg)
Sotomayor - Xu 27
Experiment: single stream queries (4 operators)
Query: 4 operators Third operator is
very selective In between two
less selective operators
![Page 28: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/28.jpg)
Sotomayor - Xu 28
Experiment results
![Page 29: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/29.jpg)
Sotomayor - Xu 29
Multiple stream experiment Three simultaneous queries
A sliding window join Two single stream queries with
selectivities less than one Results show Chain outperforms other
strategies by a large margin
![Page 30: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/30.jpg)
Sotomayor - Xu 30
Multiple stream experiment results
![Page 31: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/31.jpg)
Sotomayor - Xu 31
Summary Proved that the choice of operator
scheduling strategy has a significant impact on resource consumption
Proved that the Chain scheduling strategy outperforms competing strategies
Future work Latency and starvation issues Consider query plans that change over time Consider the sharing of computation and
memory in query plans
![Page 32: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/32.jpg)
Sotomayor - Xu 32
“Adaptive filters” for continuous queries over distributed data streams
![Page 33: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/33.jpg)
Sotomayor - Xu 33
What’s the problem? Distributed data sources continuously stream
updates to a centralized processor where continuous queries are evaluated
Because of the high volume of data updates, the communication overhead jeopardizes system performance E.g. path latency computed by monitoring
queuing latency at routers: the volume of monitoring traffic from routers may exceed that of normal traffic
Can we reduce the communication overhead to make continuous queries based on multiple data streams feasible and efficient?
![Page 34: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/34.jpg)
Sotomayor - Xu 34
Important observations Exact precision for continuous queries is not
always needed E.g. path latency application: <= 5 ms of accuracy
Approximate answers of sufficient precision can usually be computed from a small fraction of the input stream. E.g. average network traffic volume received by all
hosts within the organization The precision constraint for queries may
change over time. E.g. more precise traffic volume needed in face of
attack
![Page 35: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/35.jpg)
Sotomayor - Xu 35
Overview of Approach Reduce communication overhead at
the cost of query precision. Quantitative precision constraints specified
with the continuous queries Bounded approximate answer [L, H] Precision constraint δ. 0 ≤ H – L ≤ δ
Filters installed at the remote data sources by the stream processor
Filter at data object O’s source: [Lo, Ho] of width Wo centered around most recent numeric update V.
![Page 36: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/36.jpg)
Sotomayor - Xu 36
Naive filtering policy Uniform allocation
E.g a single CQ: AVG(O1, O2, …, On) Precision constraint δ Filters with a bound of width δ
The wider a bound, the more restrictive a filter and consequently the more imprecise the query answers.
Cons Multiple CQs are issued on one object. If the
smallest bound width is chosen for the filter, the higher update stream rate may be wasted on a few CQs.
Data updates rate and magnitudes not counted.
![Page 37: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/37.jpg)
Sotomayor - Xu 37
System structure Data source Filters Stream coordinator Precision manager Bound cache CQ evaluator
![Page 38: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/38.jpg)
Sotomayor - Xu 38
System structure
![Page 39: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/39.jpg)
Sotomayor - Xu 39
Adaptive filter setting algorithm Goal: set bound widths for steam filters adaptively to
reduce communication costs while guaranteeing the precision constraints of CQs AVG queries analyzed only
Q1, Q2, …, Qm with sets S1, S2, …, Sm. Sj is a subset of a set of n data objects O1, O2, …, On
Query result Qj :
Precision constraint: Basic idea:
Implicit bound width shrinking Explicit bound width growing
ji SOnii
j
VS ,1
1
jjSOnii SWji
,1
![Page 40: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/40.jpg)
Sotomayor - Xu 40
Bound shrinking Filtering bound width Wi for object
Oi Maintained both at the central stream
coordinator and at the source filter Wi Wi · (1 – S) for every Γ time units
Γ: adjustment period S: shrink percentage
![Page 41: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/41.jpg)
Sotomayor - Xu 41
Bound growing
Burden score: the degree to which an object is contributing to the overall communication cost due to streamed updates
where Ci is communication cost for Oi, Wi is the current bound width, and
ii
ii WP
CB
ii NP
Burden target: the lowest overall burden required of the objects in the query in order to meet the precision constraint at all times.
Where Ni is the number of updates of Oi received by the stream coordinator in the last Γ time units
![Page 42: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/42.jpg)
Sotomayor - Xu 42
Bound growing (Cont) Burden deviation: the
degree to which an object is “over-burdened” with respect to the burden targets of the queries that access it.
Queried objects are considered in order of decreasing deviation, and it is assigned the maximum possible bound growth when it is considered.
ji SOmjjii TBD
,1
0,max
jkji SOnk
kjjSOmj
i WSW,1
,1min
![Page 43: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/43.jpg)
Sotomayor - Xu 43
Bound growing (Summary) Each object is assigned a burden score Each query is assigned a burden target by either
averaging burden scores or invoking an iterative linear solver
Each object is assigned a deviation value based on the difference between its burden score and the burden targets of the queries that access it
The objects are considered in order of decreasing deviation, and each object is assigned the maximum possible bound growth when it is considered
![Page 44: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/44.jpg)
Sotomayor - Xu 44
Burden Target Computation Single AVG query Qk over every object O1, …, On.
B1 = B2 = … = Bn = Tk
Or
Intuitive explanation behind this formula Objects having higher than average burden scores will
be given a higher priority for bound width growth to lower their burden scores;
Objects having lower than average burden scores will shrink by default, thereby raising their burden scores.
ki SOnii
kk B
ST
,1
1
![Page 45: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/45.jpg)
Sotomayor - Xu 45
Burden Target Computation (Cont) Multiple queries over different set of objects
θi,j : the portion of object Oi’s burden score corresponding to query Qj and
Goal for adjusting burden scores in presence of overlapping queries is to have the burden score Bi of each object Oi equal the sum of the burden targets of the queries over Oi.
Burden target:
iSOmi ji Bji
,1 ,
ji kiSOni SOjkmkki
j
j TBS
T,1 ,,1
1
![Page 46: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/46.jpg)
Sotomayor - Xu 46
Validation against optimized strategy The adaptive bound width setting algorithm converges on
bounds that are on par with those selected by an optimizer.
![Page 47: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/47.jpg)
Sotomayor - Xu 47
Implementation and experimental validation Single query
![Page 48: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/48.jpg)
Sotomayor - Xu 48
Implementation and experimental validation Multiple queries
![Page 49: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/49.jpg)
Sotomayor - Xu 49
Summary Trade the precision of query results for
lower communication costs. The specification of precision for continuous
queries Adaptive filters
Future work How imprecision propagates through more
complex query plans Develop appropriate optimization
techniques for adapting remote filter predicates in more complex environments
![Page 50: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/50.jpg)
Sotomayor - Xu 50
Conclusion The problem
DSMS must consider the high volume as well as the “burstiness” of data streams
Effectiveness of systems depends on being able to gracefully adapt to environmental conditions (I.e. resource availability)
Two different approaches for adaptivity Minimizing the amount of memory at all
times Controlling the amount of data sent from
multiple data sources
![Page 51: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/51.jpg)
Sotomayor - Xu 51
Conclusion (cont) Chain operator scheduling minimizes the
amount of memory used during execution making the system more adaptable to variation in arrival rates
Adaptive filters reduce the volume of data so that a system can perform efficiently while providing a certain level of precision
Overall, the need for adaptivity in DSMS is necessary due to the unpredictability of data streams
![Page 52: Adaptivity in continuous query systems](https://reader036.fdocuments.us/reader036/viewer/2022062314/56812e58550346895d93ff64/html5/thumbnails/52.jpg)
Sotomayor - Xu 52
References J. M. Hellerstein et al. Adaptive Query
Processing: Technology in Evolution. IEEE 2000 B. Babcock, S. Babu, M. Datar, R. Motwani, and
J. Widom. Models and Issues in Data Stream Systems. ACM SIGMOD/PODS 2002 Conference.
B. Babcock, S. Babu, M. Datar, R. Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. SIGMOD 2003
Chris Olston, Jing Jiang, Jennifer Widom. Adaptive Filters for Continuous Queries Over Distributed Data Streams. SIGMOD 2003.