Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno Jacobsen 2 2...
-
Upload
jade-houston -
Category
Documents
-
view
213 -
download
0
Transcript of Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno Jacobsen 2 2...
Navneet Kumar Pandey1
Stéphane Weiss1
Roman Vitenberg1
Kaiwen Zhang2
Hans-Arno Jacobsen2
2University of Toronto1University of Oslo
Minimizing the communication cost of aggregation in Publish/Subscribe systems
Aggregation in Pub/Sub systemsMotivation: Stock Market Application
2
Content provider: Stock exchanges
Aggregate subscription:Stock indicators (e.g. MACD)
Content subscriber: Brokers, buyers
Non-aggregate subscription: Stock updates
Motivation: Intelligent Transport System (ITS)
• Information providers: road sensors, crowdsourced mobile apps
• Information seekers: commuters, police, first responders, radio networks etc.
3http://www.wired.com/images_blogs/autopia/2012/08/12A914.jpg
• Aggregate subscriptions
• Count number of cars passing a street light per hour
• Average speed of cars on a road segment per day
• Non-aggregate subscriptions
• Accident reports
• Traffic violation reports
Objective: Aggregation in Pub/Sub
4
• Pub/sub is well known for efficient content filtering and dissemination for distributed event sources and sinks.
• However, pub/sub does not support aggregation, which is required in emerging applications.
• Our primary objective is to retain the traditional pub/sub focus on low communication cost, while adding support for aggregation.
• It is more communication- and computation-efficient than running two separate system for pub/sub and aggregation.
Contributions: aggregation in pub/sub
5
• We introduce and formalize the problem of minimizing communication for aggregation in pub/sub.
• We present a solution which is optimal under complete knowledge of publications and subscriptions by a broker.
• We evaluate the trade-off between comm. and comp. costs for these two solutions.
• By reducing the problem to a minimum-vertex-cover over bipartite graphs, we show that it is solvable in polynomial time.
• We propose an alternative algorithm which is less computationally expensive.
BI
Subscriber
P[val,8]A[val, > ,4]
S[val, > ,3]
Bp
Bq
BSBI
B Broker
Subscription Delivery Tree (SDT)
Background: Advertisement-based pub/sub model
6
Publishers
Our design choice:To maximize communication efficiency, we reuse dissemination flow i.e. SDTs.
Proposal: aggregation in Pub/Sub system
7
Aggregate Subscription: {<conditional predicates>, operator, duration (ω), shift size (δ)}
NWR1
NWR2
subsc
ripti
on
1 2 30 Time (in hours)
Notification window ranges (NWR)Pub1 Pub2 Pub3
ω
δ
Ex: { RoadID = 101, speed > 10, op=‘avg’ , ω = 2 hour, δ = 1 hour}
Challenge: Distribute the computation across the brokers
8
Result load
2
Publication load
3
Pub1
Publication messageResult message
Aggregation Decision
Broker
Pub2
Pub3 Res1
Res2
Pub1
Pub2
Pub3
SDT
Broker
NWR1
NWR2
subsc
ripti
on
1 2 30 Time
Pub1 Pub2 Pub3
NWR1
subsc
ripti
on
Pub1 Pub2 Pub3
1 2 30 Time
NWR2
NWR3
NWR4
Res1
Res2
Res3
Res4
Result load
4
Publication load
3
Local aggregation decision by each broker on an SDT for each NWR: Aggregate or forward incoming publications for that NWR.
Trade-off: multiple factors affect the decision
9
Increasing parameter Favors
Publication matching rate Aggregate
Number of matched NWRs Forward
Overlap among aggregate subscriptions Forward
Ratio between aggregate and regular subscriptions Aggregate
Challenges:• No global knowledge about topology.• SDTs are beyond control of the aggregation scheme.• SDTs get changed dynamically during the execution.
Unique challenges compared to other aggregation systems
10
Aggregation in pub/sub Other aggregation systems
Topology is not known to individual broker nodes
Require global view of the topology
Publication sources and sinks are dynamic
Require a priori knowledge of publication sources
Brokers are loosely coupled Need control layer
SDTs are dynamic and outside of control of aggregation scheme
Demand a static query plan
Publications come at an irregular rate Optimized for continuous data streams
Problem formulation: Minimum-Notification for Aggregation (MNA)
• Objective: • Given
• the set of subscriptions and,• set of incoming messages which includes both publications and previously
aggregated results• minimize the number of notifications
• i.e. publication and aggregation results sent by a broker.
11
: an NWR n : a Publication p
Na
Nb
Nc
P1
P2
P3
P4
Np Pp
: matching of a publication to NWR
NWRa
NWRc
P1 P2P3 P4
NWRb
Optimal solution
Unrealistic assumption: • Brokers have information about,
• all the matching publications• all the NWRs within entire execution
• this information is available a priori.
12
Na
Nb
Nc
P1
P2
P3
P4
Na
Nb
Nc
Idea:• Each broker constructs undirected bipartite graph (matching graph),• And computes the minimum vertex cover.
a minimum vertex cover = {Na, Nb, Nc}
• Computation cost:
Practical solution:Aggregation Decision, optimal with Complete Knowledge (ADOCK)
• Idea: Making decision with partial knowledge. Decisions are made based on current state of publications, NWRs and their interconnectivity.
• Implication: suboptimal decision.
13
Na
Nb
Nc
P1
P2
P3
P4
#Subscriptions 90 180 270 360
Difference betweenOptimal and ADOCK in %
3.53% 0.88% 4.29% 3.27%P1
P2
• Communication cost: Close to the optimal solution in the experiments.
|N|: #NWRs, |P|: #publications, degA(N) : average degree of NWR vertices
Scalability issue: Computation cost grows more than quadratically with the number of NWRs.
P3
P4
14
< 1(2/3)
Practical solution: Weighted Aggregation Decision (WAD)
Na
Nb
Nc
P1
P2
P3
P4
weight
1/3
1/3
1/2
1/2
Forward
Aggregate
Aggregate
P1
P2
Nb
Nc
Low computation cost: O(degA(N)x|N|)
• Reduce the number of vertices used for making a decision• Vertices within only 2 hops from the NWR will affect the decision.• Similar to ADOCK, take a decision per NWR
1. Assign a weight to a publication vertex which is inverse of its degree.2. Compute cumulative weight of an NWR from matching publications.3. Aggregate matching publications if cumulative degree ≥ 1.
Idea:
Steps:
≥ 11
≥ 11
Experimental setup• Implemented in Java over the PADRES framework• Topology: 16 brokers
– Combination of publisher-edge only, subscriber-edge only and mixed brokers
• Real life datasets: • Traffic dataset from the ONE-ITS service1
• Yahoo! Finance Stock dataset• Metrics:
• Communication: Number of messages exchanged• Computation: Total computation overhead
• Existing baseline: per Broker Adaptive Technique (BAT)15
B B B B
B
B
B
B
B
B B
B
B
B B
B
1http://one-its-webapp1.transport.utoronto.ca
Varying number of publications
16
Setting: Stock dataset
Computation costCommunication cost
• Trade-off between WAD and ADOCK over communication and computation cost.• WAD is up-to 73% faster than ADOCK at the expense of up-to 22% increase in
communication cost. • BAT sends more messages than either of the proposed solutions.
Varying number of subscriptions
17
Computation costCommunication cost
• ADOCK’s communication cost is around 12% lower than WAD’s.• However, ADOCK’s computation overhead is more than twice that of WAD.• This is also supported by analytical findings
ADOCK
WAD
Impact of sliding windows
18
• A higher sliding parameter increases the NWR interconnectivity and makes the decision graph big.
• ADOCK is up-to four times slower than WAD, while WAD is sending up-to 27% extra messages.
1 2 3 4 5 60%
50%
100%
150%
200%
250%
300%
350%CPU Overhead (ADOCK-WAD)/WADMessage Overhead (WAD-ADOCK)/ADOCK
Duration (ω) to shift size (δ) ratio
Rat
io
Key lessons from experiments• Our results confirm that interconnectivity is the key reason for the
trade-off between computation and communication cost.
• Trade-off is substantially affected by these factors:• Publication matching rate.• Number of matching NWRs.• Overlap among aggregate subscriptions.• Ratio between aggregate and regular subscriptions.
• Recommendation• ADOCK is preferred, if the system expects moderate amount of
subscriptions with high selectivity. • Otherwise, WAD is recommended.
19
Conclusions
20
• We formalize the MNA problem and reduce it to Minimum Vertex Cover over a bipartite graph.
• We provide two solutions: communication efficient ADOCK and computation efficient WAD.
• We experimentally demonstrate the trade-off between computation and communication cost in these approaches.