Slingshot: Time-Critical Multicast for Clustered Applications
Mahesh BalakrishnanStefan PleischKen Birman
Cornell University
The Contemporary Datacenter
Building-wide super-clusters: 1000s of commodity blade-servers
Typically used as commercial website back-ends: Amazon, etc.
Software Paradigms: SOA, Eventing, Publish/Subscribe…
… many-to-many communication, Multicast!
Multicast in the Datacenter
IP Multicast available: adding reliability to it is a well-researched technology…
Scalability dimensions Number of receivers Number of senders? Number of groups?
Metrics Throughput Timeliness?
Time-Critical Applications
… dealing in perishable data: stock quotes, location updates
… willing to trade complete reliability for timeliness … requiring tunable reliability/ timeliness/ overhead
tradeoffs
Probabilistic Guarantee of Timeliness? For x% overhead, y% of lost packets are recovered in
time t. Remainder can be optionally recovered in time t’.
Design Space
Reactive vs. Proactive Reactive: Loss Discovery
ACK Sender-Based Sequencing
• If the multicast rate in a group is constant, the inter-multicast time at any sender goes up linearly with the number of senders
Gossip – Scalable Proactive: FEC – Tunable
Slingshot Overview
Receiver-Based FEC:
Senders send initially via unreliable IP Multicast
Phase 1: Receivers repair losses by proactively sending each other FEC repair packets
Phase 2: Remaining losses are recovered from the sender
Each receiver sends an error correction (XOR) packet to c randomly selected receivers with the last r packets it received
Rate-of-fire parameter (r, c): Allows tuning of overhead-timeliness tradeoff
Protocol Details 0
Two Packet Types:
Packet ID (Sender, SeqNo)
ApplicationPayload
XOR ofData Packets
List of Data Packet IDs:(sender1,seqno1), (sender2,seqno2)….
Data Packet :
Repair Packet :
App
licat
ion
MT
U: 1
024
Less
than
Net
wor
k M
TU
Terminology: Data packets are included in repair packet
Protocol Details 1
Data Structures: Data Buffer: received data packets Repair Bin: pointers to last <r data packets
Arrival of Data Packet dp at Receiver: dp is added to the data buffer &dp is added to the repair bin If repair bin size equals r, a repair packet rp
is created from its contents, and the repair bin is cleared
rp is dispatched to c random receivers
Protocol Details 2
Arrival of Repair Packet rp at Receiver: If #(missing included data packets) ==0: rp is discarded1: it is recovered by XORing rp with the
other r-1 data packets>1: rp is stored in a special buffer, in
case future data packet arrivals and recoveries make it usable
Evaluation Setup
64 node rack-style cluster at Cornell Loss rate fixed at 1%: packets dropped at end
buffers All nodes send and receive Inter-node latencies = 50-100 microseconds Group Data Rate: 1000 packets per second Each node multicasts 64 packets per second;
i.e one packet every 64 milliseconds
Slingshot Tunability
For 27% overhead, 93.5% Lost Packets are recovered at an
avg. of 3.5 milliseconds
Example TradeoffPoints betweenOverhead, Timeliness,and Reliability
Overhead and Recovered Packets plotted on left y-axis, Recovery Time on right
Slingshot vs SRM
Slingshot recovers 93% in 10 ms, 97% in 25 ms
Fastest SRM packet Recovery is 2.2 seconds93% in 4.85 seconds, 97% in 5.1 seconds
2-3 Orders of Magnitude faster
Slingshot Scalability: Group Size
Scalability in Group Size
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
10 60 110 160 210 260 310 360
Group Size
Fra
ctio
n o
f L
ost
Pac
kets
Rec
ove
red
Gossip-Style Scalability: Insensitive to scale beyond a certain size
Simulation Results:
Conclusion
Slingshot provides a tunable, probabilistic guarantee of timeliness
Outperforms SRM by 2 orders of magnitude in a 64 node system
Insensitive to number of senders Future Work:
Achieve scalability in other dimensions (number of groups)
Build a time-critical middleware layer that uses Slingshot as a generic primitive