A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes:
Algorithms and Performance Evaluation
Yinlong XuUniversity of Science and Technology of China
A joint work with
Liping Xiang, John C.S. Lui and Qian Chang
I would like to thank
John C.S. Lui,Raymond W. Yeung,
Patrick B.C. Lee,Alfred C.L. Ho!
Outline
BackgroundA Hybrid Recovery Approach for Single Disk FailureRow-Diagonal Optimal Recovery (RDOR) for Single Disk
Failure A Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage
Performance EvaluationSummary
3/60
Outline
BackgroundHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single Disk
Failure A Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage
Performance EvaluationSummary
4/60
Remark:
5/60
This work can be applied to two RAID-6 codes, RDP and EVENODD.
This talk takes RDP as an example.
RDP Code
Note: With RDP code, all information data is recoverable when any two disks fail.
In a form of a (p1)×(p+1) matrix, p is a prime number.
The first p1 columns are information columns.
The last two are parity columns (row parity, diagonal parity).
6/60
Missing Diagonal
d0,4= d0,0 d0,1 d0,2 d0,3
d0,5= d0,0 d2,3 d3,2 d1,4
Row parity
Diagonal parity
Outline of our work
Problem: The recovery of single disk failure in RDP coded systems Motivation: RDP code tolerates two disk failures, but the probability
of single disk failure is much higher than double disk failures.
Contributions:Give the lower bound of disk readsPropose a recovery scheme, s.t.
Disk reads matches the lower bound, reduced by 1/4.Balancing disk readsMinimum extra memory usage: (p1)/2 blocksXOR operations: No more than conventional scheme
7/60
A Naive Recovery Scheme for Single Disk Failure of RDP Code –– Case(1)Case 1: Single information disk fails
Row parity disk and other information disks are used for the recovery.
The recovery of Disk 1
d3,1
d2,1
d1,1
d0,1
Disk6Disk
0Disk
1Disk
2Disk
3Disk
4Disk
5
d0,1
d1,1
d2,1
d3,1
8/60
A Naive Recovery Scheme for Single Disk Failure of RDP Code –– Case(2)Case 2: Single parity disk fails
The recovery is equivalent to the parity encoding
The recovery of diagonal parity disk
d3,5
d2,5
d1,5
d0,5
Disk6
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,5
d1,5
d2,5
d3,5
9/60
Features of the Naive Recovery Scheme
Only uses single parity column for single disk failure recovery, however, there are two parity columns in the array.
(p1)2 symbols are read from the disks for the recovery.
10/60
Questions
Whether the disk reads can be reduced for the recovery of single disk failure?
What if two parity disks are used for single disk failure recovery?
11/60
Some Benefits from Reducing Disk Reads
Speeding up the recoveryRelieving system bus loadRelieving disk loadEnhancing user’s service performanceSaving disk energy…
12/60
Outlines
BackgroundA Hybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single disk
Failure A Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage
Performance EvaluationSummary
13/60
Row Parity or Diagonal Parity?
Either row parity or diagonal parity can be used to recover an erasure symbol
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,1
d1,1
d2,1
d3,1
d0,1 can be recovered by row parity
14/60
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,1
d1,1
d2,1
d3,1
d0,1 can also be recovered by diagonal parity
A Hybrid Recovery Approach for SingleDisk Failure
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,1
d1,1
d2,1
d3,1
Overlapping symbols
15/60
Using diagonal parity to recover d0,1;
Using diagonal parity to recover d1,1;
Using row parity to recover d2,1;
Using row parity to recover d3,1.
Notes: There are 4 overlapping symbols which need to be read twice. If the 4 overlapping symbols are per-stored in memory, the number of
disk reads is reduced to 164=12<16.
Consideration of Hybrid Recovery ApproachBy using memory read instead of disk read
The recovery process will be speeded up Note: Memory read is 100 times faster than disk readCommunication load of the storage system will be reduced
During the recovery, the more overlapping symbols, the fewer symbols to be read from disks.
QuestionsWhat is the lower bound of disk reads for single disk
failure recovery?How to design a recovery scheme which matches this
lower bound?
16/60
Outlines
BackgroundA Hybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single
Failure Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage
Performance EvaluationSummary
17/60
Row Parity Sets
Ri = {di,k|0 k p1}-----the i-th row parity set.
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,0 d0,1 d0,2 d0,3 d0,4 d0,5
d1,0 d1,1 d1,2 d1,3 d1,4 d1,5
d2,0 d2,1 d2,2 d2,3 d2,4 d2,5
d3,0 d3,1 d3,2 d3,3 d3,4 d3,5
Because
d0,4=d0,0d0,1d0,2d0,3 ,
so
d0,1=d0,0d0,2d0,3d0,4
18/60
Row
parity
Diagonal
parity
Each symbol in Ri can be recovered by other symbols in Ri.
E.g.
R0={d0,0, d0,1, d0,2, d0,3, d0,4}.
Diagonal Parity Sets
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,0 d0,1 d0,2 d0,3 d0,4 d0,5
d1,0 d1,1 d1,2 d1,3 d1,4 d1,5
d2,0 d2,1 d2,2 d2,3 d2,4 d2,5
d3,0 d3,1 d3,2 d3,3 d3,4 d3,5
Dj= {di,k|(i+k) mod p = j, 0 i p2, 0 k p} is the j-th diagonal parity set.
19/60
Row
parity
Diagonal
parity
d0,1=d1,0d3,3d2,4d1,5
Each symbol in Dj can be recovered by other symbols in Dj.
E.g.
D1={d1,0, d0,1, d3,3, d2,4, d1,5}
Overlapping Symbols
There is just one common (named overlapping) symbol between each pair of Ri and Dj.
R1
D3
e.g. R1∩D3={d1,2}
20/60
Special Cases of Parity Sets
Only belong to their diagonal parity
sets
Only belong to their row parity sets
Disk p can only be recovered by diagonal parity.
This work only consider the recovery of Disk k, with k ≠ p.21/60
Recovery Combination
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,0 d0,1 d0,2 d0,3 d0,4 d0,5
d1,0 d1,1 d1,2 d1,3 d1,4 d1,5
d2,0 d2,1 d2,2 d2,3 d2,4 d2,5
d3,0 d3,1 d3,2 d3,3 d3,4 d3,5
E.g. Using recovery combination (D1, D2, R2, R3) to
recover Disk 1.
A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.
22/60
Recovery Combination
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,0 d0,1 d0,2 d0,3 d0,4 d0,5
d1,0 d1,1 d1,2 d1,3 d1,4 d1,5
d2,0 d2,1 d2,2 d2,3 d2,4 d2,5
d3,0 d3,1 d3,2 d3,3 d3,4 d3,5
E.g. Using recovery combination (D1, D2, R2, R3) to
recover Disk 1.
Using D1 to recover d0,1;
A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.
23/60
Recovery Combination
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,0 d0,1 d0,2 d0,3 d0,4 d0,5
d1,0 d1,1 d1,2 d1,3 d1,4 d1,5
d2,0 d2,1 d2,2 d2,3 d2,4 d2,5
d3,0 d3,1 d3,2 d3,3 d3,4 d3,5
E.g. Using recovery combination (D1, D2, R2, R3) to
recover Disk 1.
Using D1 to recover d0,1;
Using D2 to recover d1,1;
A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.
24/60
Recovery Combination
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,0 d0,1 d0,2 d0,3 d0,4 d0,5
d1,0 d1,1 d1,2 d1,3 d1,4 d1,5
d2,0 d2,1 d2,2 d2,3 d2,4 d2,5
d3,0 d3,1 d3,2 d3,3 d3,4 d3,5
E.g. Using recovery combination (D1, D2, R2, R3) to
recover Disk 1.
Using D1 to recover d0,1;
Using D2 to recover d1,1;
Using R2 to recover d2,1;
A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.
25/60
Recovery Combination
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,0 d0,1 d0,2 d0,3 d0,4 d0,5
d1,0 d1,1 d1,2 d1,3 d1,4 d1,5
d2,0 d2,1 d2,2 d2,3 d2,4 d2,5
d3,0 d3,1 d3,2 d3,3 d3,4 d3,5
E.g. Using recovery combination (D1, D2, R2, R3) to
recover Disk 1.
Using D1 to recover d0,1;
Using D2 to recover d1,1;
Using R2 to recover d2,1;
Using R3 to recover d3,1.
A combination of parity sets (Ri, … , Dj) is corresponding to a recovery scheme.
26/60
Number of Overlapping Symbols
Assumption Disk k is in erasure p1 symbols d0,k, d1,k, … , dp-2,k need to be recovered
Disk5Disk4Disk3Disk2Disk1Disk0
Conclusion
t(p1t) = (t(p1)/2)2
+(p1)2
/4 overlapping symbols
When t=(p1)/2, the number of overlapping symbols is maximized.
A Recovery Scheme
t erasure symbols from diagonal parity sets
The other p1t symbols from row parity sets
27/60
Lower Bound of Disk Reads for SingleFailure RecoveryThe maximum number of overlapping symbols is (p1)2/4.A maximum of (p1)2/4 symbols may be reduced from disk
read for recovery.
Conclusion: The lower bound of disk reads for recovery is (p1)2(p1)2/4 =3(p1)2/4.
28/60
Symbols be read Overlapping symbols
Disk5Disk4Disk3Disk2Disk1Disk0
Read Optimal Recovery Scheme
Any recovery combination consists of (p1)/2 row and (p1)/2 diagonal parity sets is read optimal.
-----Named Row-Diagonal Optimal Recovery (RDOR).
Conclusion: RDOR reduces approximately 25% disk reads compared
with the naive scheme.
29/60
Outlines
BackgroundHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single
Failure A Recovery Scheme with Minimum Disk Reads Balancing Disk Reads Optimizing Memory Usage
Performance EvaluationSummary
30/60
Example: Two Read Optimal Recovery Combinations
(R0, R1, R2, D3, D4, D5) (D0, D1, R2, D3, R4, R5)
Disk reads: 5 4 3 3 4 5 3 4 4 4 4 4 4 3
Unbalanced Balanced
31/60
Problem and Questions
During recovery, the disk with the most read operations may slow down the recovery.
32/60
Questions
To reduce the recovery time, what is a balanced and read-optimal recovery scheme?
It reads the same (or almost the same) number of symbols from different disks.
Average Reads from Each Disk
The minimum number of disk reads for recovery is 3(p1)2/4.To achieve read optimal, (p1)/2 symbols will be read from
Disk p (diagonal parity disk).
Conclusion: Average number of symbols to be read from the other
surviving disks (except for Disk k and Disk p) is [3(p 1)2/4 (p 1)/2] / (p 1)= (3p 5)/4.
Note: A balanced read-optimal recovery should read (p1)/2 symbols from Disk p and (3p 5)/4 symbols from each of other disks
33/60
Example: A Balanced Example
(D0, D1, R2, D3, R4, R5)
4 4 4 4 4 4 3
Balanced
34/60
E.g. p=7
Total: 3(p 1)2
/4 =27
Disk 7: (p 1)/2=3
Each of other disks:
(27 3)/6=4
Recovery Sequence
Define a recovery sequence x0, x1, ... , xp2, xp1 corresponds to a recovery combination, wherexi=0 means that di,k is recovered from its row parity setxi=1 means that di,k is recovered from its diagonal parity set
E.g. (D0, D1, R2, D3, R4, R5)
1 1 0 1 0 0 0
35/60
Additional symbol
Condition of Read Optimal and Balanced Recovery SequenceRecovery sequence {xi}0≤i≤p1 is read optimal and balanced if
and only if Read optimal
x0+x1+…+xp2+xp1=(p1)/2 (1) Symbols in missing diagonal and added row are recovered by row
parity. xp1 k=xp1=0 (2)
(3p 5)/4 symbols to be read from Disk j (0≤j≤p1, j≠k)
36/60
(3)
Read Optimal and Balanced Recovery –– An Example (D0, D1, R2, D3, R4, R5) is a read optimal and balanced recovery
combination for p=7 and k=0.
Corresponding recovery sequence x0x1...x5x6=1101000 satisfies: x0+x1+…+x5+x6=(p1)/2=3 (1) xp1 k=xp1=0 (2)
37/60
Condition of Read Optimal and Balanced Recovery Sequence (Cont.)When xi=0 or x<i+jk>p=1, di,j in Disk j is used for recovery. When di,j is used for recovery, xi(1x<i+jk>p)=0.The number of symbols that need to be read in Disk j is
38/60
Number of symbols not
read from Disk j
x2=0, d2,3 is read
x0=1, d4,3 is used
Disk 3
Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:
4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)
Disk 3
39/60
d4,3
Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:
4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)
Disk 3
E.g. x0=1, d4,3 is used;
40/60
d4,3
Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:
4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)
Disk 3
E.g. x0=1, d4,3 is used;
x1=1, d5,3 is used;
41/60
d5,3
Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:
4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)
Disk 3
E.g. x0=1, d4,3 is used;
x1=1, d5,3 is used;
x2=0, d2,3 is used;
42/60
d2,3
Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:
4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)
Disk 3
E.g. x0=1, d4,3 is used;
x1=1, d5,3 is used;
x2=0, d2,3 is used;
x3=1, d0,3 is used;
43/60
d0,3
Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:
4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)
Disk 3
E.g. x0=1, d4,3 is used;
x1=1, d5,3 is used;
x2=0, d2,3 is used;
x3=1, d0,3 is used;
x4=0, d4,3 is used;
44/60
d4,3
Read Optimal and Balanced Recovery –– An Example (Cont.)Recovery sequence x0x1...x5x6=1101000 also satisfies:
4 symbols to be read from Disk j (0≤j≤6, j≠0) (3)
Disk 3
E.g. x0=1, d4,3 is read;
x1=1, d5,3 is read;
x2=0, d2,3 is read;
x3=1, d0,3 is read;
x4=0, d4,3 is read;
x5=0, d5,3 is read;
45/60
d5,3
Not be read
Recovery Set
Given a recovery sequence {xi}0≤i≤p1, define A={ i | xi=1, 0≤i≤p1} as the recovery set.
x0x1...x5x6=1101000
A={0,1,3}
46/60
Recovery Set
As if and only if i∈ A and <i+t>p∈A, xix<i+t>p= 1. So
Balanced Recovery Set
A corresponds to a balanced sequence, if and only if
For any t (1≤ t≤ p1), t has a multiplicity of (p3)/4 in the multi-set MA={a1a2| a1, a2 A, a1≠a2∈ }
47/60
The Existence of Read Optimal and Balanced Recovery Set
By using the concept of (partial) difference-set, we have the following conclusion.Given a prime number p, and the nonzero squares set
D={i2|1≤i≤(p−1)/2} in Fp is a difference-set.There is g∈Fp, s.t. A=D+g corresponds to a read-optimal
and balanced recovery sequence {xi}0≤i≤p−1 for the recovery of Disk k (k≠p).
48/60
Reviewing on Read Balance Problem
49/60
Find out the average number of disk reads on each disk .
Define recovery sequence and recovery set to describe recovery
scheme.
Find out the constraint conditions that a recovery set is read optimal and
balanced.
Using the concept of (partial) difference set to solve these constraint
conditions.
The read optimal and balanced recovery scheme corresponds to the solved
recovery set.
Outlines
BackgroundHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single
Failure Recovery Scheme with Minimum Disk Reads Balancing of Disk Reads Optimizing Memory Usage
Performance EvaluationSummary
50/60
Extra Memory Usage ProblemThe number of overlapping symbols should be stored in
memory is at most (p1)2/4. The more overlapping symbols, the more extra memory usage.
Question How to minimize the extra memory usage while read-optimal and
balanced?
51/60
Main Idea of Optimizing Extra Memory Usage
Using D1 to recover d0,1; Pre-store d3,3; Pre-store d2,4;
Using D2 to recover d1,1; Pre-store d2,0; Pre-store d3,4;
Using R2 to recover d2,1; Read d2,0, d2,4 from memory;
Using R3 to recover d3,1. Read d2,0, d2,4 from memory;
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,1
d1,1
d2,1
d3,1
52/60
E.g. Using (D1, D2, R2, R3) to recover Disk 1.
Need four extra memory units
Main Idea of Optimizing Extra Memory UsageMain Idea
Store the XOR-sum of overlapping symbols instead of the original symbols to optimize extra memory usage.
Disk0
Disk1
Disk2
Disk3
Disk4
Disk5
d0,1
d1,1
d2,1
d3,1
53/60
Two extra memory units M[2], M[3] are reserved for the recovery of
d2,1, d3,1. M[2]=0, M[3]=0;
M[2]=d2,4, M[3]= d3,3;
M[2]=d2,0d2,4, M[3]=d3,3 d3,4;
M[2]
M[3]
Only need two extra memory units
Read Optimal and Balanced Recovery Scheme with Minimum Memory UsageUsing the read optimal and balanced recovery combination.Recovery process is executed in a “row-parity-first” manner.
Firstly, recover all symbols that use row parity sets. Then, using diagonal parity sets to recover the other symbols.
(p1)/2 memory units are reserved to recover (p1)/2 symbols which use diagonal parity sets for recovery.
54/60
Outlines
BackgroundHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single Failure
Recovery Scheme with Minimum Disk Reads Balancing of Disk Reads Optimizing Memory Usage
Performance EvaluationSummary
55/60
Methodology
Experiment Settings Off-line recovery mode DiskSim simulation Disk array size p+1=8, 14, and 20 Strip size from 16KB to 64KB
Metrics Total recovery time Individual disk access time
56/60
Experimental Results –– Recovery Time
The total recovery time of RDOR is less than the naive scheme as strip size varies from 16KB to 64KB.
Moreover, with a strip size less than 32KB, the recovery time of RDOR is reduced by approximately 20% compared with the naive scheme.
57/60
Experimental Results –– Disk Access Time
The average disk access time of RDOR is reduced 15.16% to 22.60% when p=7 and strip size varies from 16KB to 64KB.
In on-line scenarios, each disk will be more available to serve user’s requests.
58/60
Outlines
BackgroundMotivationHybrid Recovery Approach for Single FailureRow-Diagonal Optimal Recovery (RDOR) for Single Failure
Recovery Scheme with Minimum Disk Reads Balancing of Disk Reads Optimizing Memory Usage
Performance EvaluationSummary
59/60
Summary
The proposed single recovery scheme RDOR issues Lower bound of disk reads for recovery
When k≠p, the number of symbols should be read from disk is reduced by 1/4 compared with the conventional strategy.
Balancing disk readsThe number of read operations from each disk are the same (or almost the
same). Minimum memory usage
At any time, the maximum number of overlapping symbols or their computations stored in memory is (p1)/2.
XOR operationsNo more than conventional scheme
60/60
Future works
Design efficient recovery algorithms for other codes.
Construct codes against multiple failures but more efficient for single failure recovery.
61/60
Thank you!
Top Related