Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
1
Transcript of Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong...
![Page 1: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/1.jpg)
Cooperative regenerating codes for distributed storage systems
Kenneth Shum(Joint work with Yuchong Hu)
22nd July 2011
![Page 2: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/2.jpg)
Multiple node failures
• Large-scale storage system– Google data center, example from Kannan’s talk.– 800000 servers, fail rate = 4% per year– Repair in 2 days– Mean number of failed servers in 2 days = 175.
• The lazy-repair policy in TotalRecall– A repair process is triggered only after the number
of failed nodes has reached a certain threshold.
Jul, 2011 2kshum
![Page 3: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/3.jpg)
Jointly repair multiple failures
Jul, 2011
Hu et al. (JSAC, Feb 2010)3
Can we further reduce therepair-bandwidth?
Data exchange
kshum
Storage nodes Newcomers
![Page 4: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/4.jpg)
Distributed storage (erasure coding)
Jul, 2011 4
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
Data Collector
Wu, Dimakis ISIT09
kshum
![Page 5: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/5.jpg)
Naive Repair
Jul, 2011 5
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
4 packets required.
A1
A2
B 1, B 2
A 1+B 1
, 2 A 1
+B 2
kshum
![Page 6: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/6.jpg)
Repair with ``code alignment’’
Jul, 2011 6
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
A1
A2
3 packets required.
B 1+ B 2
A 1+2
A 2+B 1
+ B 2
2 A 1
+ A 2
+B1+
B 2
Solve:P1 = A1+2 A2
P2 = 2 A1+ A2
kshum
![Page 7: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/7.jpg)
Multiple failures, separate repair
Jul, 2011 7
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
8 packets in total4 packets per newcomer
B1
B2
2 packets
2 packets
2 A1+B1
A2+B2
2 packets
2 packets
kshum
![Page 8: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/8.jpg)
Multiple failures, cooperative repair (I)
Jul, 2011 8
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
6 packets in total3 packets per newcomer
A1 , A
2
2A2+B
2A1+B
1
B1,B2
B1
B2
2 A1+B1
A2+B2
kshum
![Page 9: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/9.jpg)
Multiple failures, cooperative repair (II)
Jul, 2011 9
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
6 packets in total3 packets per newcomer
A 1+B 1
A1
A1
A1+B1
A2
2A2 +B
2 A2
2A2+B2
B 2
B22A
1 +B1
2A1+B1
A2+B2
B1
kshum
![Page 10: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/10.jpg)
Outline of the talk
• Is it optimal in terms of repair-bandwidth?• What is the tradeoff between storage and
repair-bandwidth for cooperative repair?• Can we achieve the Pareto-optimal operating
points on the tradeoff curve by linear network coding?– Exact repair– Functional repair
Jul, 2011 10kshum
![Page 11: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/11.jpg)
In2
Information flow graph
Jul, 2011 11
S
In1 Out1
DataCollector
Out2In3 Out3
In4 Out4
In5 Out5
Out6
Out7
1
1
1
In6
In71
1
1
Mid6Mid7
2
2
kshum
![Page 12: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/12.jpg)
Is this regenerating code optimal ?
Jul, 2011 12
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
6 packets in total3 packets per newcomer
A 1+B 1
A1
A1
A1+B1
A2
2A2 +B
2 A2
2A2+B2
B 2
B22A
1 +B1
2A1+B1
A2+B2
A1
kshum
![Page 13: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/13.jpg)
In2
First cut
Jul, 2011 13
B
In1 Out1
DataCollector
Out2In3 Out3
In4 Out4
Out6
Out7
Mid6Mid7
2
2
1
1
1
1
B 4 1
In6
In7
kshum
![Page 14: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/14.jpg)
Second cut
Jul, 2011 14
Out1
DataCollector
Out2Out3
Out4
2 Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
2
2
In1In2
In3
In4
1 1
B 2+1+ 2
kshum
![Page 15: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/15.jpg)
A linear programming problem
• Minimize 21+ 2 (repair bandwidth)
• Subject to4 41
4 2+1 + 2
1 , 2 0
Jul, 2011 15
1 1 2 1
2
1
1
1
At least 3 packetskshum
![Page 16: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/16.jpg)
In2
Non-homogeneous download traffic
Jul, 2011 16
B
In1 Out1
DataCollector
Out2In3 Out3
In4 Out4
Out6
Out7
Mid6Mid7
2
2
a
d
c
b
B a +b +c +d
In6
In7
kshum
![Page 17: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/17.jpg)
Non-homogeneous traffic
Jul, 2011 17
Out1
DataCollector
Out2Out3
Out4
2Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
i
j
In1In2
In3
In4
h
f
e
fg
B 2+f +j
kshum
![Page 18: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/18.jpg)
Non-homogeneous traffic
Jul, 2011 18
Out1
DataCollector
Out2Out3
Out4
2Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
i
j
In1In2
In3
In4
h
f
e
fg
B 2+f +j
B 2+h +i
kshum
![Page 19: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/19.jpg)
Non-homogeneous traffic
Jul, 2011 19
Out1
DataCollector
Out2Out3
Out4
2 Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
i
j
In1In2
In3
In4
h
f
e
fg
B 2+f +j
B 2+h +i
B 2+e +j
kshum
![Page 20: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/20.jpg)
Non-homogeneous traffic
Jul, 2011 20
Out1
DataCollector
Out2Out3
Out4
2 Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
i
j
In1In2
In3
In4
h
f
e
fg
B 2+f +j
B 2+h +i
B 2+e +j
B 2+g +i
kshum
![Page 21: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/21.jpg)
The same LP problem
• Minimize• Subject to
Jul, 2011 21
1
1
At least 3 packetskshum
![Page 22: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/22.jpg)
TRADEOFF BETWEENSTORAGE AND REPAIR-BANDWIDTH
Jul, 2011 22kshum
![Page 23: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/23.jpg)
120 130 140 150 160 170 180100
105
110
115
120
125
130
135
140
Repair bandwidth per failed node
Sto
rage
per
nod
e
Storage vs Repair-bandwidth
Jul, 2011 23
One-by-one repair
Repairing 3 newcomers jointly
File size = 420d = 8k = 4
d
DCk
kshum
(S., ICC 2011, Kermarrec, Le Scouamec and Straub, Netcod 2011.)
![Page 24: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/24.jpg)
Fair comparison?
Jul, 2011 24
One-by-one repair
repair degree = 8
Cooperative repair
Sur
vivi
ng n
odes
Sur
vivi
ng n
odes
Number of connectionsper each newcomer = 8
Number of connectionsper each newcomer = 8+2
kshum
![Page 25: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/25.jpg)
120 130 140 150 160 170 180100
105
110
115
120
125
130
135
140
Repair bandwidth per failed node
Sto
rage
per
nod
e
MBCR and MSCR
Jul, 2011 25
One-by-one repair
Cooperative repair
Minimum bandwidthcooperative repair (MBCR)
Minimum storagecooperative repair (MSCR)
kshum
![Page 26: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/26.jpg)
480 490 500 510 520 530 540 550450
460
470
480
490
500
Repair bandwidth per failed node
Sto
rage
per
nod
e,
How much can we improve?
Jul, 2011 26
One-by-one repair
Repairing 10 newcomers jointly
File size = 2275d = 30k = 5
d
DCk
When d is large,joint repair does not havesignificant advantage overone-by-one repair.
kshum
![Page 27: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/27.jpg)
180 200 220 240 260150
160
170
180
190
200
Repair bandwidth per failed node
Sto
rage
per
nod
e,
How much can we improve?
Jul, 2011 27
One-by-one repair
Repairing 10 newcomers jointly
File size = 616d = 8k = 4
d
DCk
Repair-bandwidth reductionis more prominent when d is not so large.
kshum
![Page 28: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/28.jpg)
AN EXPLICIT CONSTRUCTION FOR MINIMUM-BANDWIDTHCOOPERATIVE REPAIR
Jul, 2011 28kshum
![Page 29: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/29.jpg)
An explicit construction for MBCR
Jul, 2011 kshum 29
• Minimum repair-bandwidth
• Storage per node
• B = 8 information packets
• n = 4 nodes• Each node stores 5
packets.• Repair r = 2 failures
simultaneously• No. of connections
for each DC = k=2• No. of helpers for
each failed node =d=2
(S., Hu, ISIT 2011.) Require d = k, r = n–d
![Page 30: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/30.jpg)
Min-Bandwidth point
5 5.5 6 6.5 7 7.5 8 8.5 9
3.5
4
4.5
5
5.5
6
Repair bandwidth per failed node
Sto
rage
per
nod
e
Jul, 2011 30kshum
One-by-one repair
Repairing 2 new nodes cooperatively
![Page 31: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/31.jpg)
Data Distribution
8 data packets: A, B, C, D, E, F, G, H
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
XOR
5 packets: 4 systematic, 1 parity-check
Jul, 2011 31kshum
![Page 32: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/32.jpg)
Data collection
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
Datacollector
A,B,C,D,E,F,G,H
A, B, C, D
E, F, G, H
Jul, 2011 32kshum
![Page 33: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/33.jpg)
Data collection
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
Datacollector
A B C D E F G H
Triangular, Full-rank
F+GH+A
ABCDEF
A, B, C, F+G
D, E, F, H+A
Jul, 2011 33kshum
![Page 34: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/34.jpg)
Exact Repair
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
BA DC
G HE F
F+GB+C
B+C
F+G
How to repair?
Total repair-bandwidth=10
Jul, 2011 34kshum
![Page 35: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/35.jpg)
Exact Repair
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
C D
G H
D+EE H+A
B+CF+GF
E F
E F
E F
How to repair?
Total repair-bandwidth=10
Jul, 2011 35kshum
![Page 36: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/36.jpg)
Min-Bandwidth point
5 5.5 6 6.5 7 7.5 8 8.5 9
3.5
4
4.5
5
5.5
6
Repair bandwidth per failed node
Sto
rage
per
nod
e
Jul, 2011 36kshum
One-by-one repair
Repairing 2 new nodes cooperatively
![Page 37: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/37.jpg)
AN EXPLICIT CONSTRUCTION FOR MINIMUM-STORAGE COOPERATIVE REPAIR
Jul, 2011 37kshum
![Page 38: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/38.jpg)
An explicit construction for MSCR
Jul, 2011 kshum 38
• Minimum repair-bandwidth
• Storage per node
• B = 6 information packets
• n nodes• Each node stores 2
packets.• Repair r = 2 failures
simultaneously• No. of connections
for each DC = k=3• No. of helpers for
each failed node =d=3
(S. ICC 2011.) Require d = k
![Page 39: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/39.jpg)
1 2 3 4 5 6 71
2
3
4
5
6
7
Repair bandwidth per failed node, d
Sto
rage
per
nod
e,
The min-storage point
Jul, 2011 39
Non-cooperative
k=3,d=3,r =2,B=6
Cooperativestorage cost per node = 2repair bandwidth per node = 4
3
DC3
kshum
![Page 40: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/40.jpg)
Data retrieval
Jul, 2011 40
MDS code with dimension k=3Source data
encodecodeword
codeword
Storage nodes ……
Data collector
decode
=2
kshum
![Page 41: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/41.jpg)
Repair : phase 1
Jul, 2011 41
encodecodeword
codeword
Storage nodes lost
lost
decode decodenewcomers
kshum
Source data
![Page 42: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/42.jpg)
Repair: phase 2
Jul, 2011 42
encodecodeword
codeword
Storage nodes
lost
lost
Re-encode Re-encode
exchange
Repair bandwidth per node= 8/2 = 4
newcomers
kshum
![Page 43: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/43.jpg)
1 2 3 4 5 6 71
2
3
4
5
6
7
Repair bandwidth per failed node, d
Sto
rage
per
nod
e,
The construction is optimal
Jul, 2011 43
Non-cooperative
k=3,d=3,r =2,B=6
Cooperativestorage cost per node = 2repair bandwidth per node = 4
3
DC3
kshum
![Page 44: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/44.jpg)
EXISTENCE OF COOPERATIVE REGENERATING CODES UNDER FUNCTIONAL REPAIR
Jul, 2011 44kshum
![Page 45: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/45.jpg)
Existence of optimal linear regenerating codes in general
• Sustainable storage system– Will it work after arbitrarily many repairs?
• Technical difficulty: The information flow graph is unbounded.
• Can we work over a fixed finite field, for unlimited number of regenerations?– Yes if we can construct an exact regenerating code.– The answer is also “yes” for cooperative functional
repair in general.
Jul, 2011 kshum 45
(S., Hu, Netcod 2011.)
![Page 46: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/46.jpg)
Trellis structure
Jul, 2011 kshum 46
mMessage vector(row vector)
…
…
…
…
Stage 0 Stage 1 Stage 2
mT0
T0 is the “transfer matrix” in stage 0
mT0T1
T1 is the “transfer matrix” in stage 1
T2 is the “transfer matrix” in stage 2
mT0T1T2
![Page 47: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/47.jpg)
Flow in information flow graph
Jul, 2011 kshum 47
S
Out1
Out2
Out3
Out4
In1
In2
Mid1
Mid2
Out1
Out2
5
5
5
5
5
52
2
2
2
1
1
DC
In3
In4
Mid3
Mid4
Out3
Out4
5
5
1
1
2
2
2
2
4
4
4
1
1
3
1
2
5
31
2
2
224
4
0
0
0Out3
Out4
The cut-set bound says that the cut capacity is at least 8.
Can we constructa flow with value 8?
![Page 48: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/48.jpg)
Cross-sectional flow pattern
Jul, 2011 kshum 48
S
Out1
Out2
Out3
Out4
In1
In2
Mid1
Mid2
Out1
Out2
5
5
5
5
52
2
2
2
1
1
DC
In1
In2
Mid1
Mid2
Out1
Out2
5
1
1
2
2
2
2
4
4
4
1
1
3
1
2
5
31
2
2
2
24
4
0
0
0
5
3
0
0
4
4
0
0
4
0
4
0
Out3
Out4
![Page 49: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/49.jpg)
A recursive construction of flow
Jul, 2011 kshum 49
In1
In2
Mid1
Mid2
Out1
Out2
Out3
Out4
Out3
Out4
Stage s Stage s+1
g1
g2
g4
g3
h1
h2
h4
h3
1. Identify a set of cross-section flow pattern, say H.
2. For any cross-section flow pattern (h1, h2, h3, h4) in H stage s+1, we can find a flow in this segment of graph, such that (g1, g2, g3, g4) is also in H.
3. Each pattern corresponds to a submatrix of the transfer matrix.
4. By Schwartz-Zippel lemma, we can find the local encoding vectors so that all such determinants are non-zero, if the finite field is sufficiently large.
![Page 50: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/50.jpg)
Summary• Multiple node failures in medium-scale to
large-scale storage system• Formulation as a linear program• Functional repair: Linear regenerating code
over fixed finite field which matches the cut-set bound on repair-bandwidth exists.
• Exact repair: two families of explicit code constructions– Minimum-bandwidth point: d=k, r = n – d – Minimum-storage point: d=k, r arbitrary
Jul, 2011 50kshum
![Page 51: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.](https://reader030.fdocuments.us/reader030/viewer/2022032521/56649d5d5503460f94a3bde4/html5/thumbnails/51.jpg)
References• Y. Wu and A. G. Dimakis, Reducing repair traffic for erasure coding-based storage
via interference alignment, ISIT, Jul, 2009.
• Y. Hu, Y. Xu, X. Wang, C. Zhan and P. Li, Cooperative recovery of distributed storage systems from multiple losses with network coding, J. Sel. Area Comm., vol. 28, no. 2, pp.268-275, Feb, 2010.
• K. W. Shum, Cooperative Regenerating Codes for Distributed Storage Systems, ICC, Jun, 2011.
• A.-M. Kermarrec and N. Le Scouarnec and G. Straub, Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes, Netcod, Jul, 2011.
• K. W. Shum and Y. Hu, Existence of Minimum-Repair-Bandwidth Cooperative Regenerating Codes, Netcod, Jul, 2011.
• K. W. Shum and Y. Hu, Exact Minimum-Repair-Bandwidth Cooperative Regenerating Codes for Distributed Storage Systems, ISIT, Aug, 2011.
Jul, 2011 kshum 51