BASIC Regenerating Codes for Distributed Storage System s
description
Transcript of BASIC Regenerating Codes for Distributed Storage System s
BASIC Regenerating Codes for Distributed Storage Systems
Kenneth Shum(Joint work with Minghua Chen, Hanxu Hou and Hui Li)
Window Azure data centers
Aug 2013 kshum 2
Inside a data center
Aug 2013 3kshum
http
://te
chno
blim
p.co
m
Data distribution
• Encode and distribute a data file to n storage nodes. Data File: “INC”
Aug 2013 4kshum
Data collector
• Data collector can retrieve the whole file by downloading from any k storage nodes.
“INC”
Aug 2013 5kshum
Three kinds of disk failures
• Transient error due to noise corruption– repeat the disk access request
• Disk sector error– partial failure– detected and masked by the operating system
• Catastrophic error – total failure due to disk controller for instance– the whole disk is regarded as erased
Aug 2013 6kshum
Frequency of node failuresFigure from “XORing elephants: novel erasure codes for Big Data”by Sathiamoorthy et al.
Number of failed nodes over a single month in a 3000 node production cluster of Facebook.
Aug 2013 7
Outline of this talk
• Repetition scheme• Traditional erasure-correcting codes
– Reed-Solomon codes
• Network-coding-based scheme– BASIC regenerating codes
Aug 2013 8kshum
Distributed storage system
• Encode a data file and distribute it to n disks• (n,k) recovery property
– The data file can be rebuilt from any k disks.• Repair
– If a node fails, we regenerate a new node by connecting and downloading data from any d surviving disks.
– Aim at minimizing the repair bandwidth (Dimakis et al 2007).
• A coding scheme with the above properties is called a regenerating code.
Aug 2013 kshum 9
Repetition scheme
• GFS: Replicate data 3 times• Gmail: Replicate data 21 times
Aug 2013 kshum 10
2x Repetition scheme
Aug 2013 11
A
B
A
A, B
B
Data Collector
Cannot toleratedouble disk failures
1G
1G
1G
1G
Divide the datafile into 2 parts
Repair is easy for repetition-based system
Aug 2013 12
A
B
A
B
New node
A
1G
Repair bandwidth =1G
Reed-Solomon Code
Aug 2013 13
A
B
A+B
A, B
A+2B
Data Collector
Divide the file into 2 parts
It can toleratedouble disk failures
Repair requires essentially decoding the whole file
Aug 2013 14
A
B
A+B
A+2Bkshum
New nodeA1G
1G
Repair bandwidth = 2G
BASIC regeneration code
Aug 2013 15
Divide the datafile into 4 parts
Binary AdditionShiftImplementableConvolutional
0.5G0.5G0.5G0.5G
Utilization of bit-wise shift in storage was proposed byPiret and Krol (1983), andQureshi, Foh and Cai (2012).
Download from nodes 1 and 2
Aug 2013 16
Data Collector
1G
1G0.5G0.5G0.5G0.5G
Download from nodes 1 and 3
Aug 2013 17
Data Collector
1G
1G
0.5G0.5G0.5G0.5G
Download from nodes 1 and 4
Aug 2013 18
Data Collector
1G
1G
0.5G0.5G0.5G0.5G
Download from nodes 2 and 3
Aug 2013 19
Data Collector
1G
1G
0.5G0.5G0.5G0.5G
Download from nodes 2 and 4
Aug 2013 20
Data Collector
1G
1G
0.5G0.5G0.5G0.5G
Download from nodes 3 and 4
Aug 2013 21
Data Collector1G
1G
0.5G0.5G0.5G0.5G
Zigzag decoding
P1 P2’
P1 P2
P1
P2
P1
P2’
Aug 2013 22kshum
à la Gollakata and Katabi (2008)
What to solvefor P1 and P2.
Repair of BASIC regenerating code
New node
XOR
Bitwise shift and XOR
Bitwise shift and XOR
Repair bandwidth=1.5 G
Repair of BASIC regenerating code
Decode the blueand red packets byzigzag decoding
Interference alignment
Comparison of the three examples
Repetition scheme
Reed-Solomon Codes BASIC regenerating codes
Storageefficiency
1/2 1/2 1/2
Reliability Tolerate one disk failure
Tolerate two disk failures
Tolerate two disk failures
Repairbandwidth
1G 2G 1.5 G
Computational complexity
Very small Finite field arithmetic Binary additionand bit-wise shift
Aug 2013 kshum 25
Summary
• We can reduce repair bandwidth by network coding.
• BASIC regenerating codes – A failed storage node can be repaired by simple
bit-wise shift and XOR operations.– Small storage overhead due to shifting.
Aug 2013 kshum 26
References• Piret and Krol, MDS convolution codes, IEEE Trans. of Information Theory,
1983.• Dimakis, Brighten, Wainwright and Ramchandran, Network coding for
distributed storage systems, INFOCOM, 2007.• Gollakata and Katabi, Zigzag decoding: combating hidden terminals in
wireless networks, Proc. in the ACM Sigcomm, 2008.• Qureshi, Foh, and Cai, Optimal solution for the index coding problem
using network coding over GF(2), Proc. IEEE Conf. on Sensor Mesh and Ad Hoc Comm. and Network, 2012.
• Sung and Gong, A zigzag decodable code with MDS property for distributed storage systems, Proc. IEEE Symp. on Information Theory, 2013.
• Hou, Shum, Chen and Li, BASIC regenerating code: binary addition and shift for exact repair, Proc. IEEE Symp. on Information Theory, 2013.
Aug 2013 kshum 27
Two modes of repair
• Exact repair– The content of the new node is exactly the same
as the content of the failed node
• Functional repair– only requires that the (n,k) recovery property is
preserved.
Aug 2013 kshum 28