BASIC Regenerating Codes for Distributed Storage System s

BASIC Regenerating Codes for Distributed Storage Systems

Kenneth Shum(Joint work with Minghua Chen, Hanxu Hou and Hui Li)

Window Azure data centers

Aug 2013 kshum 2

Inside a data center

Aug 2013 3kshum

http

://te

chno

blim

p.co

m

http://technoblimp.com/

http://technoblimp.com/

Data distribution

• Encode and distribute a data file to n storage nodes. Data File: “INC”

Aug 2013 4kshum

Data collector

• Data collector can retrieve the whole file by downloading from any k storage nodes.

“INC”

Aug 2013 5kshum

Three kinds of disk failures

• Transient error due to noise corruption– repeat the disk access request

• Disk sector error– partial failure– detected and masked by the operating system

• Catastrophic error – total failure due to disk controller for instance– the whole disk is regarded as erased

Aug 2013 6kshum

Frequency of node failuresFigure from “XORing elephants: novel erasure codes for Big Data”by Sathiamoorthy et al.

Number of failed nodes over a single month in a 3000 node production cluster of Facebook.

Aug 2013 7

Outline of this talk

• Repetition scheme• Traditional erasure-correcting codes

– Reed-Solomon codes

• Network-coding-based scheme– BASIC regenerating codes

Aug 2013 8kshum

Distributed storage system

• Encode a data file and distribute it to n disks• (n,k) recovery property

– The data file can be rebuilt from any k disks.• Repair

– If a node fails, we regenerate a new node by connecting and downloading data from any d surviving disks.

– Aim at minimizing the repair bandwidth (Dimakis et al 2007).

• A coding scheme with the above properties is called a regenerating code.

Aug 2013 kshum 9

Repetition scheme

• GFS: Replicate data 3 times• Gmail: Replicate data 21 times

Aug 2013 kshum 10

2x Repetition scheme

Aug 2013 11

A

B

A

A, B

B

Data Collector

Cannot toleratedouble disk failures

1G

1G

1G

1G

Divide the datafile into 2 parts

Repair is easy for repetition-based system

Aug 2013 12

A

B

A

B

New node

A

1G

Repair bandwidth =1G

Reed-Solomon Code

Aug 2013 13

A

B

A+B

A, B

A+2B

Data Collector

Divide the file into 2 parts

It can toleratedouble disk failures

Repair requires essentially decoding the whole file

Aug 2013 14

A

B

A+B

A+2Bkshum

New nodeA1G

1G

Repair bandwidth = 2G

BASIC regeneration code

Aug 2013 15

Divide the datafile into 4 parts

Binary AdditionShiftImplementableConvolutional

0.5G0.5G0.5G0.5G

Utilization of bit-wise shift in storage was proposed byPiret and Krol (1983), andQureshi, Foh and Cai (2012).

Download from nodes 1 and 2

Aug 2013 16

Data Collector

1G

1G0.5G0.5G0.5G0.5G


Aug 2013 17

Data Collector

1G

1G

0.5G0.5G0.5G0.5G


Aug 2013 18

Data Collector

1G

1G

0.5G0.5G0.5G0.5G


Aug 2013 19

Data Collector

1G

1G

0.5G0.5G0.5G0.5G


Aug 2013 20

Data Collector

1G

1G

0.5G0.5G0.5G0.5G


Aug 2013 21

Data Collector1G

1G

0.5G0.5G0.5G0.5G

Zigzag decoding

P1 P2’

P1 P2

P1

P2

P1

P2’

Aug 2013 22kshum

à la Gollakata and Katabi (2008)

What to solvefor P1 and P2.

Repair of BASIC regenerating code

New node

XOR

Bitwise shift and XOR

Bitwise shift and XOR

Repair bandwidth=1.5 G

Repair of BASIC regenerating code

Decode the blueand red packets byzigzag decoding

Interference alignment

Comparison of the three examples

Repetition scheme

Reed-Solomon Codes BASIC regenerating codes

Storageefficiency

1/2 1/2 1/2

Reliability Tolerate one disk failure

Tolerate two disk failures

Tolerate two disk failures

Repairbandwidth

1G 2G 1.5 G

Computational complexity

Very small Finite field arithmetic Binary additionand bit-wise shift

Aug 2013 kshum 25

Summary

• We can reduce repair bandwidth by network coding.

• BASIC regenerating codes – A failed storage node can be repaired by simple

bit-wise shift and XOR operations.– Small storage overhead due to shifting.

Aug 2013 kshum 26

References• Piret and Krol, MDS convolution codes, IEEE Trans. of Information Theory,

1983.• Dimakis, Brighten, Wainwright and Ramchandran, Network coding for

distributed storage systems, INFOCOM, 2007.• Gollakata and Katabi, Zigzag decoding: combating hidden terminals in

wireless networks, Proc. in the ACM Sigcomm, 2008.• Qureshi, Foh, and Cai, Optimal solution for the index coding problem

using network coding over GF(2), Proc. IEEE Conf. on Sensor Mesh and Ad Hoc Comm. and Network, 2012.

• Sung and Gong, A zigzag decodable code with MDS property for distributed storage systems, Proc. IEEE Symp. on Information Theory, 2013.

• Hou, Shum, Chen and Li, BASIC regenerating code: binary addition and shift for exact repair, Proc. IEEE Symp. on Information Theory, 2013.

Aug 2013 kshum 27

Two modes of repair

• Exact repair– The content of the new node is exactly the same

as the content of the failed node

• Functional repair– only requires that the (n,k) recovery property is

preserved.

Aug 2013 kshum 28

BASIC Regenerating Codes for Distributed Storage System s

Documents

Transcript of BASIC Regenerating Codes for Distributed Storage System s