Post on 25-Dec-2015
Datacast: A Scalable and Efficient Reliable Group Data
Delivery Service for Data Centers
Jiaxin Cao, Chuanxiong Guo, Guohan Lu, Yongqiang Xiong, Yixin Zheng, Yongguang Zhang, Yibo Zhu, Chen Chen
University of Science and Technology of ChinaMicrosoft Research Asia
Tsinghua UniversityUniversity of California, Santa Barbara
University of Pennsylvania
Reliable Group Data DeliveryThe problem of RGDD is:
<0,0>
01 02 03 10 11 12 13 20 21 22 23 30 31 32 33
<1,0> <1,1> <1,2> <1,3>
<0,1> <0,2> <0,3>
00
given a data source, Src, and a set of receivers, R1, R2, …, Rn,
how to reliably transmit bulk data from Src to all the receivers?
In a data center network,
DataDataDataDataDataData
Reliable Group Data Delivery•RGDD is important in DCNs:
• Bootstrapping or OS upgrading.
•Distributed file systems, e.g., GFS.
•VM setup.
•And more...
Reliable Group Data DeliveryA good RGDD design should have the following properties:
•Scalable (large group numbers and large group sizes)
•High bandwidth efficiency
Existing solutions to RGDD
Existing solutions can be classified into two categories: • Reliable IP multicast. Not scalable, e.g., ACK implosion.
• End-host based overlays. Low bandwidth efficiency.
None of the existing systems can perfectly achieve RGDD.
New opportunities in DCN
Recently, there are two clear trends in DCN: • Multiple edge-disjoint Steiner trees for RGDD.• Practical packet caching abilities in network devices.
We can cache packet!
<0,0>
00
01 02 03
<0,1>
10
11 12 13
<0,2>
20
21 22 23
<0,3>
30
31 32 33
<1,1> <1,2> <1,3>
<1,0>
00
10 20 30
<1,1>
01
11 21 31
<1,2>
02
12 22 32
<1,3>
03
13 23 33
<0,1> <0,2> <0,3>
The architecture of Datacast
Fabric Manager
Master i
Master jSrc
R1 R2
IMD
Src
R1 R2 R3 R4
RGDD Group i1
RGDD Group i2
RGDD Group in
NetworkTopology
How to calculate multiple Steiner
trees?
How to efficiently transmit data in
each Steiner tree?
Multiple edge-disjoint Steiner trees in DCN•Our multiple Steiner trees algorithm takes three steps:
1. Use specific algorithms to construct spanning trees.
2. Prune the spanning trees.
3. Use Breath First Search(BFS) to repair the trees broken by network failures.
This algorithm is fast (O(k|V|) + O(|E|) + O(k|E|)) and efficient.
Datacast transport protocol• Datacast is built on top of Content Centric Network (CCN):
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
Inst
Data Data
Data Inst
Data Data
InstData
Inst
Data
Data
Data
Datacast transport protocol• Datacast uses a rate based congestion control algorithm at
the source side:
Datacast transport protocol• We built a fluid model for Datacast congestion control algorithm.
• and are the current data positions of the data source and the slowest receiver. • R is the “desired” rate of the slowest receiver.
Datacast transport protocol• Theorem 1 (Cache Usage). Datacast works at the full
rate, i.e., the rate of the slowest receiver, R, if the cache size, C, satisfies
When R is 100Mbps, is 5Mbps, T is 1ms, C only needs to be greater than 125KB.
Datacast transport protocol• Theorem 2 (Duplicate Data Ratio). When Datacast
works at the full rate, the duplicate data ratio of Datacast is
When R is 100Mbps, is 5Mbps, T is 1ms, and MTU is 1.5KB, the duplicate data ratio is 1.19%.
Simulation: multiple Steiner trees algorithm• We tested our algorithm in Fattree(24,3), BCube(8, 3), Torus(16, 3)
under the link failure rates (LFR) of 1%, 3% and 5%.
Running times. Steiner tree numbers.
Simulation: Datacast congestion control•Simulation Setup: • BCube(4, 1) with 1Gbps
links. • The link from <0,0> to
02 is slowed down to 100Mbps. • , T = 1ms•MTU = 1.5KB
<0,0>
00
01 02 03
<0,1>
10
11 12 13
<0,2>
20
21 22 23
<0,3>
30
31 32 33
<1,1> <1,2> <1,3>
<1,0>
00
10 20 30
<1,1>
01
11 21 31
<1,2>
02
12 22 32
<1,3>
03
13 23 33
<0,1> <0,2> <0,3>
Steiner Tree 1. Steiner Tree 2.
Simulation: Datacast congestion control• Based on Theorem 1, Datacast needs 125KB caches to work at full
rate. • Based on Theorem 2, the duplicate data ratios is 1.19%.
Cache Size (KB) Throughput(Mbps) Duplicate Data Ratio (%)
8 91.380 1.15
32 95.076 1.14
128 98.799 1.11
512 98.799 1.10
2048 98.799 1.12
Simulation: Datacast congestion control• Compare with BitTorrent.
Fattree. BCube. Torus.
Experiment: Datacast congestion control• We build Datacast with ServerSwitch platform, and evaluate it in a
BCube(4, 1). • = 5Mbps, T = 1ms, MTU = 8KB.
• Based on Theorem 1, Datacast works at the full rate when C > 256KB. • When C = 64KB, the average throughput is 91.998Mbps. • When C = 256KB, the average throughput is 99.595Mbps.
• Based on Theorem 2, the duplicate data ratio should be 3.48%. • When Datacast works at the full rate, the measured duplicate data ratio is
3.10%.
Experiment: Datacast congestion control• We compare Datacast with BitTorrent. We use both of them to
transmit 4GB data.
Finish time (s) Link stress
Datacast 16.9 1.01
BitTorrent 52 1.39
Related work
•Reliable IP multicast• Pgm congestion control (pgmcc)• Active Reliable Multicast (ARM)
•End-host based overlays• SplitStream• End System Multicast• Cornet
Conclusion
•In this paper, we propose Datacast which
• Calculates multiple edge-disjoint Steiner trees in DCNs
• Uses CCN to turn hard group states to soft packet caching
• Uses a simple rate-based AIMD congestion control algorithm to achieve high efficiency
Datacast is scalable and achieves high bandwidth efficiency
Thank you!