Post on 25-Dec-2015
Adaptive Stream Filters for Entity-based Adaptive Stream Filters for Entity-based Queries with Non-value ToleranceQueries with Non-value Tolerance
VLDB 2005VLDB 2005
Reynold Cheng Reynold Cheng (Speaker)(Speaker)
Ben Kao, Ben Kao,
Alan KwanAlan KwanSunil Prabhakar, Sunil Prabhakar,
Yicheng TuYicheng TuThe Hong Kong The Hong Kong
Polytechnic Polytechnic UniversityUniversity
The University of The University of
Hong KongHong KongPurdue UniversityPurdue University
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
22
Data Streams and Data Streams and ApplicationsApplications Data Stream Management Systems Data Stream Management Systems
(DSMS)(DSMS)– Sensor networks, location-based applicationsSensor networks, location-based applications– STREAMSTREAM [ABB03], [ABB03], STEAMSTEAM [HAFME03], [HAFME03],
AURORAAURORA [ACC03], [ACC03], CACQCACQ [MSH02] [MSH02] Stream applicationsStream applications
– Telecom call recordsTelecom call records– Network security [BO03]Network security [BO03]– Habitat monitoring [MPS02]Habitat monitoring [MPS02]– Structural health monitoringStructural health monitoring
ContinuousContinuousQueriesQueries
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
33
DSMS ModelDSMS Model
UserQuery
ProcessingUnit
Central Processor
Continuous Query
Result (Refreshed if needed)
stream
stream
stream
streamNetworkReal-time, Response Time
requirement
Massive, FastLimited
memory, CPU, network
bandwidth
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
44
Trading Accuracy for Query TimelinessTrading Accuracy for Query Timeliness
A user may accept an answer with a A user may accept an answer with a carefully controlled carefully controlled error toleranceerror tolerance – wide-area resource accountingwide-area resource accounting– load-balancing in replicated load-balancing in replicated
serversservers
The system exploits The system exploits error toleranceerror tolerance to reduce communication and to reduce communication and computation costscomputation costs
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
55
Value-based ToleranceValue-based Tolerance
Often assumed in literature [Often assumed in literature [OJW03, OJW03, JCW04JCW04]]
Maximum error is a numerical value Maximum error is a numerical value specified by userspecified by user
MAX Query: MAX Query: Return sensor id with the Return sensor id with the highest temperaturehighest temperature
Guarantee the sensor id returned has Guarantee the sensor id returned has temperature value not lower than temperature value not lower than from from that of the true answerthat of the true answer
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
66
Is Selecting Is Selecting Easy? Easy?
Location-based application: a user inquires Location-based application: a user inquires about his closest neighborabout his closest neighbor– Should the tolerance be 0.1, 1, or 100 meters?Should the tolerance be 0.1, 1, or 100 meters?
Sensor network collects humidity, temperature, Sensor network collects humidity, temperature, UV-index, wind speedUV-index, wind speed– Does user know the range of error for each Does user know the range of error for each
type?type? Multi-dimensionalMulti-dimensional data streams (e.g., location) data streams (e.g., location) Multimedia Multimedia data streams (e.g., CCTV images)data streams (e.g., CCTV images)
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
77
Is Selecting Is Selecting for MAX Query for MAX Query easy?easy?
Suppose a user accepts an object that ranks 2nd or above.
small
If is too small……
large
If is too large……
ideal
The ideal ……
Tolerance wasted
Errorunacceptable
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
88
Rank-based ToleranceRank-based Tolerance
Express error Express error tolerance as a tolerance as a rankrank
Error tolerance = Error tolerance = no. of positions the no. of positions the returned sensor returned sensor could rank below could rank below the highest onethe highest one
More intuitive and More intuitive and easier to specifyeasier to specify
Rank-based tolerance = 1
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
99
Non-Value ToleranceNon-Value Tolerance
Rank-based tolerance is Rank-based tolerance is non-value- tolerancenon-value- tolerance– numerical value numerical value not usednot used
Fraction-based ToleranceFraction-based Tolerance– False Positive False Positive FF++(t): % of returned (t): % of returned
answers that are incorrect at time answers that are incorrect at time tt– False Negative False Negative FF--(t): % of correct (t): % of correct
answers not returned at time answers not returned at time tt– FF++(t) (t) ≤ ≤ ++; ; FF--(t) (t) ≤ ≤ --
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1010
Entity-based QueriesEntity-based Queries
Return sets of object ids, not numerical values [CKP03]Return sets of object ids, not numerical values [CKP03] Rank-based queries: Rank-based queries: order of stream values decides order of stream values decides
the final answerthe final answer– e.g., top-e.g., top-kk query, query, kk-nearest-neighbor query-nearest-neighbor query
Non-rank-based queries: Non-rank-based queries: order of stream values is order of stream values is not importantnot important– e.g., range querye.g., range query
Non-value tolerance Non-value tolerance matchesmatches entity-based queries! entity-based queries!
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1111
N o n -ra n k-b a sed q ue ries R a n k-b a se d qu e ry
V a lue -ba sed to le ran ce
R a n k-b a se d qu e rykN N Q u e ry
R a n k-b ase d to le ra n ce
R a n k-b a se d qu e rykN N Q u e ry
N o n -ra n k -ba se d q u e ryR a ng e Q u e ry
F ra c tio n -ba se d to le ra n ce
N o n -va lu e to le ra n ce
A p p ro x im a te C o n tin u o us Q ue ries
Continuous Query Continuous Query ClassificationClassification
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1212
Adaptive Filter [OJW03]: Adaptive Filter [OJW03]: Initialization PhaseInitialization PhaseAdaptive Filter [OJW03]: Adaptive Filter [OJW03]: Initialization PhaseInitialization Phase
ConstraintAssignment
Unit
Data Stream 1
FilterBounds
User-defined Tolerance
Data Stream 2
Data Stream3
[l3,u3]
[l2,u2]
[l1,u1]
Answer tolerance is met as long as
no update is generated
Answer tolerance is met as long as
no update is generated
Query Processing
Unit
ApproximateAnswer
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1313
Adaptive Filter: Maintenance Adaptive Filter: Maintenance PhasePhaseAdaptive Filter: Maintenance Adaptive Filter: Maintenance PhasePhase
ConstraintAssignment
Unit
New Filter Bound
User-defined Tolerance
Update (v2>u2 or v2 < l2)
Data Stream 1 (v1)
Data Stream 2 (v2)
Data Stream3 (v3)
[l3,u3]
[l2,u2]
[l1,u1]
[l2,u2]
RequestValue v3Tolerance
violated!trigger
Maintenance Phase
Query Processing
Unit
ApproximateAnswer
Corrected ApproximateAnswer
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1414
ContributionsContributions
Apply Apply filter boundsfilter bounds to torank-based / non-rank-based queriesrank-based / non-rank-based queries
subject to subject to rank-based / fraction-based tolerancerank-based / fraction-based tolerance
to reduce to reduce message costsmessage costs
Correctness proofs, cost analysis and Correctness proofs, cost analysis and experimental evaluation of each protocolexperimental evaluation of each protocol
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1515
N o n -ra n k-b a sed q ue ries R a n k-b a se d qu e ry
V a lue -ba sed to le ran ce
R a n k-b a se d qu e rykN N Q u e ry
R a n k-b ase d to le ra n ce
R a n k-b a se d qu e rykN N Q u e ry
N o n -ra n k -ba se d q u e ryR a ng e Q u e ry
F ra c tio n -ba se d to le ra n ce
N o n -va lu e to le ra n ce
A p p ro x im a te C o n tin u o us Q ue ries
Filter Bound ProtocolsFilter Bound Protocols
RTP FT-RP FT-NRPZT-RP ZT-NRP
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1616
Non-Rank-based QueriesNon-Rank-based Queries
S6 S5 S2 S7S4 S8S1S3
Ordered Values
Answer SetExample: 1D Range Query
2 6 11 14 23 25 34 41
Range = [10, 30]
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1717
Fraction-based ToleranceFraction-based Tolerance
S6 S5 S2 S7S4 S8S1S3
Range of Q = [l, u]
Ordered Values
Update Update
False PositiveFalse Negative
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1818
Fraction-based Fraction-based ToleranceTolerance
Answer actually returned
A(t)
E+(t)
True answer at time t
F (t) E (t)
A(t)
|A(t)|-E+(t) E-(t)
F (t) E (t)
A(t) E (t) E (t)
= |A(t)| - E+(t) + E-(t)
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
1919
Initialization PhaseInitialization Phase
– Given Given εε++ and and εε--
1.1. Collect current stream valuesCollect current stream values2.2. For streams satisfying the range queryFor streams satisfying the range query
Calculate no. of streams (Calculate no. of streams (EEmaxmax++) that can be false ) that can be false
positivespositives Assign Assign false +ve filtersfalse +ve filters [- [-∞, + ∞] to ∞, + ∞] to EEmax max streamsstreams Assign [l,u] to remaining onesAssign [l,u] to remaining ones
3.3. For streams failing the range queryFor streams failing the range query Calculate no. of streams (Calculate no. of streams (EEmaxmax
--) that can be false ) that can be false negativesnegatives
Assign Assign false -ve filters false -ve filters [+∞, +∞] to [+∞, +∞] to EEmaxmax- - streamsstreams
Assign [l,u] to remaining onesAssign [l,u] to remaining ones– Tolerance is satisfied if no new updates are receivedTolerance is satisfied if no new updates are received
At any time t without update,F+(t) ≤ +
F-(t) ≤ -
At any time t without update,F+(t) ≤ +
F-(t) ≤ -
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
2020
Maintenance Phase: Good Maintenance Phase: Good UpdateUpdate
S6 S5 S2 S7S4 S8S1S3
Insert SInsert S7 7 into A(into A(ttcc)) FF++
and Fand F-- dropdrop
FF++((ttcc) < F) < F++((tt00) ) ≤ ≤ ++ FF--((ttcc) < F) < F--((tt00) ) ≤ ≤ --
Tolerance is metTolerance is met
time time ttcc time time tt00
Filter [l,u]Range of Q = [l, u]
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
2121
Maintenance Phase: Bad Maintenance Phase: Bad UpdateUpdate
1.1. Remove SRemove Sii from A( from A(ttcc))2.2. F F + + ((ttcc) ) ≤ ≤ + + andand F F - - ((ttcc)) ≤ ≤ -- may not be may not be
truetrue3.3. QualityQuality of answer becomes worse of answer becomes worse4.4. Procedure Procedure FixFix to maintain tolerance to maintain tolerance
S6 S5 S2 S4 S8S1S3
time time ttcctime time tt00Filter [l,u]
Range of Q = [l, u]
S7
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
2222
Fix: Consulting False Positive FilterFix: Consulting False Positive Filter
S6 S5 S2 S7S4 S8 S1S3
Select stream Select stream SS44 A(tA(tcc)) with [- with [-∞, +∞] filter ∞, +∞] filter Request SRequest S44 for its updated value for its updated value If If VV44 [[l, ul, u]]
– install [install [l, ul, u] filter to S] filter to S44
– prove thatprove that F F ++(t(tcc)) ≤ ≤ + + and and F F - - ((ttcc)) ≤ ≤ -- are are satisfiedsatisfied
If If VV4 4 [[ll, , uu]], consult a false –ve filter, consult a false –ve filter Worst case: 5 messages Worst case: 5 messages
Filter [-∞, [-∞, +∞]+∞]
Range of Q = [l, u]
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
2323
Filter Bound Protocols for Filter Bound Protocols for Rank-based QueriesRank-based Queries
k-NN query is a representative of NN, Min, Maxk-NN query is a representative of NN, Min, Max Fraction-based tolerance / k-NN queryFraction-based tolerance / k-NN query
– View a k-NN query as a range query, by using the View a k-NN query as a range query, by using the kth nearest neighbor as the “range”kth nearest neighbor as the “range”
– Adapt fraction-based tolerance/range queryAdapt fraction-based tolerance/range query Rank-based tolerance / k-NN queryRank-based tolerance / k-NN query
– Maintain knowledge about (k+r)Maintain knowledge about (k+r)thth and (k+r+1) and (k+r+1)stst item item– Filter bound is defined by the average of the (k+r)Filter bound is defined by the average of the (k+r)thth
and (k+r+1)and (k+r+1)stst item item
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
2424
ExperimentsExperiments
CompareCompare– No filter is used at allNo filter is used at all– Filter protocols with zero toleranceFilter protocols with zero tolerance– Our tolerance-based protocolsOur tolerance-based protocols
Measure total no. of messages Measure total no. of messages required for executing a continuous required for executing a continuous queryquery
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
2525
Experimental SetupExperimental Setup
Real DataReal Data– 30 days of wide-area traces of TCP 30 days of wide-area traces of TCP
connections based on TCP trace [ITA20]connections based on TCP trace [ITA20] Synthetic DataSynthetic Data
– Generated by CSIM 18Generated by CSIM 18– Data value: Data value: Uniform distributionUniform distribution– Fluctuation of updates: Fluctuation of updates: Normal distributionNormal distribution– Interarrival time of updates: Interarrival time of updates: Exponential Exponential
distributiondistribution
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
2626
Fraction-based Tolerance for Range Fraction-based Tolerance for Range Query with Real DataQuery with Real Data
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
2727
Fraction-based Tolerance for Range Fraction-based Tolerance for Range Query with Synthetic DataQuery with Synthetic Data
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
2828
ConclusionsConclusions
Value-based tolerance can be difficult to Value-based tolerance can be difficult to specify for continuous queries in stream specify for continuous queries in stream systemssystems
Rank-based and fraction-based toleranceRank-based and fraction-based tolerance Applied to rank- queries and non-rank- Applied to rank- queries and non-rank-
queriesqueries Filter bound protocols translate non-value- Filter bound protocols translate non-value-
tolerance to filter boundstolerance to filter bounds Experiments illustrate protocol effectivenessExperiments illustrate protocol effectivenessPlease contact Reynold Cheng (csckcheng@comp.polyu.edu.hk) for detailsPlease contact Reynold Cheng (csckcheng@comp.polyu.edu.hk) for details
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
3030
Issues of Running Out of Issues of Running Out of FiltersFilters If all false positive and false negative If all false positive and false negative
filters run out, the system degrades to filters run out, the system degrades to one in which no tolerance is exploitedone in which no tolerance is exploited
To improve performance, initialization To improve performance, initialization phase may be executed againphase may be executed again
Experiments over long-running queriesExperiments over long-running queries
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
3131
Long-Running QueriesLong-Running Queries
Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance
4242
False +ve / -ve Filters Selection False +ve / -ve Filters Selection HeuristicHeuristic
23K
24K
25K
26K
27K
28K
29K
30K
31K
32K
0 0.1 0.2 0.3 0.4 0.5
Nu
mb
er
of M
essa
ge
s RandomBoundary