Alex King Yeung Cheung and Hans-Arno Jacobsen University of
Toronto June, 24 th 2010 ICDCS 2010 MIDDLEWARE SYSTEMS RESEARCH
GROUP
Slide 2
Problem Publishers can join anywhere in the broker overlay
Closest broker Impact High delivery delay High system utilization
Matching Bandwidth Subscription Storage P P S S S S 2
Slide 3
Motivation High system utilization leads to overloads High
response times Reliability issues Critical for enterprise-grade
publish/subscribe systems GooPS Googles internal publish/subscribe
middleware Supermontage Tibcos pub/sub distribution network for
Nasdaqs quote and order processing system GDSN (Global Data
Synchronization Network) Global pub/sub network to allow retailers
and suppliers to exchange supply chain data 3
Slide 4
Goal Adaptively move publisher to area of matching subscribers
Algorithms should be Dynamic Transparent Scalable Robust S S S S P
P 4
Slide 5
Terminology B1 B2 B3 B4 B5 P P Reference broker
upstreamdownstream Publication flow 5
Slide 6
Publisher Placement Algorithms POP Publisher Optimistic
Placement Fully distributed design Retrieves trace information per
traced publication Uses one metric: number of publication
deliveries downstream GRAPE Greedy Relocation Algorithm for
Publishers of Events Computations are centralized at each
publishers broker, makes implementing and debugging easier
Retrieves trace information per trace session Customize on
minimizing delivery delay, broker load, or a specificed combination
of both Uses two metrics: average delivery delay and total system
message rate Goal: Move publishers to where the subscribers are
based on past publication traffic 6
Slide 7
Choice of Minimizing Delivery Delay or Load S S S S S S S S S S
S S [class,=,`STOCK], [symbol,=,`GOOG], [volume,>,1000000] P P
[class,`STOCK], [symbol,`GOOG], [volume,9900000] [class,=,`STOCK],
[symbol,=,`GOOG], [volume,>,0] 4 msg/s 1 msg/s 100% Load 0% 0%
Delay 100% 7
Slide 8
GRAPEs 3 Phases Phase 1 Discover location of publication
deliveries by tracing live publication messages Retrieve trace and
broker performance information Phase 2 Pinpoint the broker that
minimizes the average delivery delay or system load in a
centralized manner Phase 3 Migrate the publisher to the broker
decided in Phase 2 Transparently with minimal routing table update
and message overhead 8
Slide 9
Phase 1 Illustration 0000000000 Trace session ID Start of bit
vector 1 Total number of deliveries made to local subscribers
Publications received at this broker Number of matching subscribers
B34-M213 B34-M212 5 B34-M215 B34-M212 10 B34-M216 B34-M212 5
B34-M217 B34-M212 1 B34-M220 B34-M212 3 B34-M222 B34-M212 20
B34-M225 B34-M212 1 B34-M226 B34-M212 5 0 5 15 01 50 010101010101
GRAPEs data structure per publisher Message ID Trace session ID
9
Slide 10
Phase 1 Trace Data and Broker Performance Retrieval B1B5 B7 B6
B8 P P S S S S S S 1x 9x 5x S S 1x Reply B8 Reply B8 Reply B7 Reply
B7 Reply B8, B7, B6 Reply B8, B7, B6 Reply B8, B7, B6, B5 Reply B8,
B7, B6, B5 Once G threshold publications are traced, then the trace
session ends 10
Slide 11
Contents of Trace Reply in Phase 1 Broker ID Neighbor ID(s) Bit
vector (for estimating total system message rate) Total number of
local deliveries (for estimating end-to-end delivery delay) Input
queuing delay Average matching delay Output queuing delays to
neighbor(s) and binding(s) Message overhead-wise, GRAPE adds 1
reply message per trace session 11
Slide 12
Phase 2 Broker Selection Simulate placing the publisher at
every downstream broker and estimate the average end-to-end
delivery delay Local delivery counts Processing delay at each
broker queuing and matching delays Publisher ping times to each
broker Simulate placing the publisher at every downstream broker
and estimate the total system message rate Bit vectors 12
Slide 13
Phase 2 Estimating Average End- to-End Delivery Delay B1 B8 B6
B7 P P S S S S S S 9 5 S S 2 1 Input Q: Matching: Output Q (RMI):
Output Q (B5): Input Q: Matching: Output Q (RMI): Output Q (B5):
Output Q (B7): Output Q (B8): Input Q: Matching: Output Q (RMI):
Output Q (B6): Input Q: Matching: Output Q (RMI): Output Q (B6): 30
ms 20 ms 100 ms 50ms 20 ms 5 ms 45 ms 25 ms 40 ms 35 ms 30 ms 10 ms
70 ms 30 ms 35 ms 15 ms 75 ms 35 ms Subscriber at B1:
10+(30+20+100) 1 = 160 ms Subscribers at B6:
10+[(30+20+50)+(20+5+45)] 2 = 350 ms Subscribers at B7: 10+
[(30+20+50)+(20+5+40)+ (30+10+70)] 9 = 2,485 ms Subscribers at B8:
10+[(30+20+50)+(20+5+35)+ (35+15+75)] 5 = 1,435 ms Average
end-to-end delivery delay: (150+340+2475+1425) 17 = 268 ms 10 ms
Ping time: 13
Slide 14
Phase 2 Estimating Total Broker Message Rate B1 B8 B6 B7 P P S
S S S S S 9 5 S S 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1
1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 Bit vector are necessary in
capturing publication deliveries to local subscribers in
content-based pub/sub systems 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1
1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 Message rate through a
broker is calculated by using the OR-bit operator to aggregate the
bit vectors of all downstream brokers 14
Slide 15
Phase 2 Minimizing Delivery Delay with Weight P% 1. Get
publisher-to-broker ping times 2. Calculate the average delivery
delay if the publisher is positioned at each of the downstream
brokers 3. Normalize, sort, and drop candidates with average
delivery delays greater than 100 - P 4. Calculate the total broker
message rate if the publisher is positioned at each of the
remaining candidate brokers 5. Select the candidate that yields the
lowest total system message rate 15
Slide 16
POPs 3 Phases Phase 1 Discover location of publication
deliveries by probabilistically tracing live publication messages
Phase 2 Pinpoint the broker closest to the set of matching
subscribers using trace data from phase 1 in a decentralized
fashion Phase 3 Migrate the publisher to the broker decided in
Phase 2 Transparently with minimal routing table update and message
overhead 16
Slide 17
Phase 1 Publication Tracing B43 B615 B1 B3 B2 B5 B4B7 B6 B8 P P
S S S S S S S S S S S S 2x 4x 3x 1x 9x 5x S S 1x B1 B2 B4 B5 1 6 3
15 B32 B2 2424 B89 B75 B6 B7 B8 159159 Publisher Profile Table
Multiple publication traces are aggregated by : S i = S new + (1 -
) S i-1 Reply 9 Reply 9 Reply 5 Reply 5 Reply 15 Reply 15 Reply 15
Reply 15 17
Slide 18
Phase 2 Broker Selection B43 B615 B1 B3 B2 B5 B4B7 B6 B8 S S S
S S S S S S S S S 2x 4x 3x 1x 9x 5x P P S S 1x B1 B2 B4 B5 1 6 3 15
B32 B2 2424 B89 B75 B6 B7 B8 159159 AdvId: P DestId: null Broker
List: B1, B5, B6 10 B6 18
Slide 19
Experiment Setup Experiments on both PlanetLab and a cluster
testbed PlanetLab: 63 brokers 1 broker per box 20 publishers with
publication rate of 10 40 msg/min 80 subscribers per publisher 1600
subscribers in total P threshold of 50 G threshold of 50 Cluster
testbed: 127 brokers Up to 7 brokers per box 30 publishers with
publication rate of 30 300 msg/min 200 subscribers per publisher
6000 subscribers in total P threshold of 100 G threshold of 100
19
Slide 20
Experiment Setup - Workloads 2 workloads Random Scenario 5 %
are high-rated; sink all traffic from their publisher 25% are
medium-rated; sink ~50% of traffic 70% are low-rated; sink ~10% of
traffic Subscribers are randomly placed on N brokers Enterprise
Scenario 5 % are high-rated; sink all traffic from their publisher
95% are low-rated; sink ~10% of traffic All high-rated subscribers
are clustered onto one broker, and all low-rated subscribers onto
N-1 brokers 20
Slide 21
Average Input Utilization Ratio vs Subscriber Distribution
Graph Load reduction by up to 68% 21
Slide 22
Average Delivery Delay vs Subscriber Distribution Graph
Delivery delay reduction by up to 68% 22
Slide 23
Average Message Overhead Ratio vs Subscriber Distribution Graph
23
Slide 24
Conclusions POP and GRAPE moves publishers to areas of matching
subscribers to Reduce load in the system to increase scalability,
and/or Reduce average delivery delay on publication messages to
improve performance POP is suitable for pub/sub systems that strive
for simplicity, such as GooPS GRAPE is suitable for systems that
strive to minimize in the extremes, such as system load in sensor
networks and delivery delay in SuperMontage, or want the
flexibility to adjust the performance and based on resource usage
24
Slide 25
25
Slide 26
Related Approaches Filter-based Publish/Subscribe: Re-organize
the broker overlay to minimize delivery delay and system load.
R.Baldoni et al. The Computer Journal, 2007. Migliavacca et al.
DEBS 2007. Multicast-based Publish/Subscribe: Assign similar
subscriptions to one or more cluster of servers Suitable for static
workloads May get false-positive publication delivery Architecture
is fundamentally different than filter-based approaches Riabov et
al. ICDCS 2002 and 2003 Voulgaris et al. IPTPS 2006 Baldoni et al.
DEBS 2007 26
Slide 27
Average Broker Message Rate VS Subscriber Distribution Graph
27
Slide 28
Average Output Utilization Ratio VS Subscriber Distribution
Graph 28
Slide 29
Average Delivery Delay VS Subscriber Distribution Graph 29
Slide 30
Average Hop Count VS Subscriber Distribution Graph 30
Slide 31
Average Broker Message Rate VS Subscriber Distribution Graph
31
Slide 32
Average Delivery Delay VS Subscriber Distribution Graph 32
Slide 33
Average Message Overhead Ratio VS Time Graph 33
Slide 34
Message Rate VS Time Graph 34
Slide 35
Average Delivery Delay VS Time Graph 35
Slide 36
Average Hop Count VS Time Graph 36
Slide 37
Broker Selection Time VS Migration Hop Count Graph 37
Slide 38
Broker Selection Time VS Migration Hop Count Graph 38
Slide 39
Publisher Wait Time VS Migration Hop Count Graph 39
Slide 40
Results Summary Under random workload No significant
performance differences between POP and GRAPE Prioritization metric
and weight has almost no impact on GRAPEs performance Increasing
the number of publication samples on POP Increases the response
time Increases the amount of message overhead Increases the average
broker message rate GRAPE reduces the input util ratio by up to
68%, average message rate by 84%, average delivery delay by 68%,
and message overhead relative to POP by 91%. 40
Slide 41
Phase 1 Logging Publication History Each broker records, per
publisher, the publications delivered to local subscribers Each
trace session is identified by the message ID of first publication
of that session The trace session ID is in the header of each
subsequent publication message G threshold publications are traced
per trace session 41
Slide 42
POP - Intro Publisher Optimistic Placement Goal: Move
publishers to the area with highest publication delivery or
concentration of matching subscribers 42
Slide 43
POPs Methodology Overview 3 phase algorithm: Phase 1: Discover
location of publication deliveries by probabilistically tracing
live publication messages Ongoing, efficiently with minimal
network, computational, and storage overhead Phase 2: Pinpoint the
broker closest to the set of matching subscribers using trace data
from phase 1 in a decentralized fashion Phase 3: Migrate the
publisher to the broker decided in phase 2 Transparently with
minimal routing table update and message overhead 43
Slide 44
Phase 1 Aggregated Replies B43 B615 B1 B3 B2 B5 B4B7 B6 B8 P P
S S S S S S S S S S S S 2x 4x 3x 1x 9x 5x S S 1x B1 B2 B4 B5 1 6 3
15 B32 B2 2424 B89 B75 B6 B7 B8 159159 Publisher Profile Table
Multiple publication traces are aggregated by : S i = S new + (1 -
) S i-1 Reply 9 Reply 9 Reply 5 Reply 5 Reply 15 Reply 15 Reply 15
Reply 15 44
Slide 45
Phase 2 Decentralized Broker Selection Algorithm Phase 2 starts
when P threshold publications are traced Goal: Pinpoint the broker
that is closest to highest concentration of matching subscribers
Using trace information from only a subset of brokers The Next Best
Broker condition: The next best neighboring broker is the one whose
number of downstream subscribers is greater than the sum of all
other neighbors' downstream subscribers plus the local broker's
subscribers. 45
Slide 46
Phase 2 Example B43 B615 B1 B3 B2 B5 B4B7 B6 B8 S S S S S S S S
S S S S 2x 4x 3x 1x 9x 5x P P S S 1x B1 B2 B4 B5 1 6 3 15 B32 B2
2424 B89 B75 B6 B7 B8 159159 AdvId: P DestId: null Broker List: B1,
B5, B6 10 B6 46
Slide 47
Phase 3 - Example B1 B3 B2 B5 B4B7 B6 B8 S S S S S S S S S S S
S 2x 4x 3x 1x 9x 5x P P S S 1x (1) Update last hop of P to B6-x
(1)Update last hop of P to B6 (2)Remove all S with B6 as last hop
(1)Update last hop of P to B6 (2)Remove all S with B5 as last hop
(3)Forward (all) matching S to B5 How to tell when all subs are
processed by B6 before P can publish again? DONE 47
Slide 48
Phase 2 Minimizing Load with Weight P% 1. Calculate the total
broker message rate if the publisher is positioned at each of the
downstream brokers 2. Normalize, sort, and drop candidates with
total message rate greater than 100 - P. 3. Get publisher-to-broker
ping times on remaining candidates 4. Calculate the average
delivery delay if the publisher is positioned at each of the
remaining downstream brokers 5. Select the candidate that yields
the lowest average delivery delay. 48