WorkOut: I/O Workload Outsourcing for Boosting RAID ...WorkOut: I/O Workload Outsourcing for...

Post on 16-Apr-2020

1 views 0 download

Transcript of WorkOut: I/O Workload Outsourcing for Boosting RAID ...WorkOut: I/O Workload Outsourcing for...

WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction PerformanceReconstruction Performance

Suzhen Wu1, Hong Jiang2, Dan Feng1, Lei Tian12, Bo Mao1

1Huazhong University of Science & Technology2University of Nebraska-LincolnUniversity of Nebraska Lincoln

Outline

BackgroundMotivationMotivationWorkOutPerformance EvaluationsConclusion

HUST & UNL 2

RAID Reconstruction

R th d t t t f il d di kRecovers the data content on a failed diskTwo metrics

Reconstruction timeUser response timeUser response time

CategoriesOff li st tiOff-line reconstructionOn-line reconstruction (commonly deployed)

HUST & UNL 3

Challengesg

Higher error rates than expectedg pComplete disk failures [Schroeder07, Pinheiro07, Jiang08]g ]Latent sector errors [Bairavasundaram07]

Correlation in drive failuresCorrelation in drive failurese.g. after one disk fails, another disk failure will likely occur soonwill likely occur soon.

RAID reconstruction might become the i l l tcommon case in large-scale systems.

Increasing number of drives

HUST & UNL 4

Reconstruction and Its Performance Impact70 times

3 times

HUST & UNL 5

I/O Intensity Impact on Reconstruction21 times

~4 times

Both the reconstruction time and user response time increase with IOPS.

HUST & UNL 6

p

Intuitive Idea

ObservationPerforming the rebuild IOs and user IOs simultaneously leads to disk bandwidth ycontention and frequent long seeks to and from the multiple separate data areas.

Our intuitive ideaOur intuitive ideaTo redirect the amount of user IOs that are issued to the degraded RAID setissued to the degraded RAID set.But, What to redirect? & Where to redirect to?

HUST & UNL 7

What To Redirect

Access localitycc ca tyExisting studies on workload analysis revealed that strong spatial and temporal locality exists that strong spatial and temporal locality exists even underneath the storage cache.

Answer to “what to redirect?”P l d tPopular read requestsAll write requests

8HUST & UNL

Where To Redirect To

Availability of spare or free space in data centers

A spare pool including a number of disksp p gFree space on other RAID sets

Answer to “Where to redirect to?”Answer to Where to redirect to?Spare or free space

C iComparisonExisting approaches: in the context of a single RAID setOur approach: in the context of data centers

HUST & UNL 9

with multiple RAID sets

Main Idea of WorkOut

Workload Outsourcing (Workout)W r a ut urc ng (W r ut)Temporarily redirect all write requests and popular read requests originally targeted at the popular read requests originally targeted at the degraded RAID set to a surrogate RAID set, to significantly improve on-line reconstruction g y pperformance.

GoalGoalApproaches reconstruction-time performance of the off-line reconstruction without of the off line reconstruction without affecting user-response-time performance at the same time.

HUST & UNL 10

m m .

WorkOut Architecture

Administrator

Popular DataIdentifier

AdministratorInterfaceSurrogate

Space ManagerIdentifierRequest

Redirector

Space ManagerReclaimer

Faile

dD

isk

Dis

k

Dis

k

Dis

k

Dis

k

Dis

k

Spar

e D

isk

HUST & UNL 11

Data Structure

D T bl l t bl th t th D_Table: a log table that manages the redirected data

D Fl 1 W it d t f th li ti D_Flag=1: Write data from the user application D_Flag=0: Popular read data from D-RAID to S-RAID

R LRU: n LRU st l list th t id ntifi s th R_LRU: an LRU-style list that identifies the most recent reads

HUST & UNL 12

Algorithm During Reconstructiong g

WorkflowWorkflowFor each write, it will be redirected to its previous location or a new location on the previous location or a new location on the surrogate RAID set according to whether it is an overwrite or notan overwrite or not.For each read, Check the D_Table:

Whether it hits D Table or not?Whether it hits D_Table or not?If a hit, full hit or partial hit?If a miss, whether it hits R_LRU?

HUST & UNL 13

Algorithm During Reclaimg g

The redirected write data should be The redirected write data should be reclaimed back to the newly recovered RAID set after the reconstruction process set after the reconstruction process completes.All b h k d i D T blAll requests must be checked in D_Table:

Each write request is served by the recovered RAID set and the corresponding log in D_Table should be deleted if it exists.Read requests can be also handled well, but it is complicated to explain in a short time. More d l b f d

HUST & UNL 14

details can be found in our paper.

Design Choicesg

Optional De ice

psurrogate RAID set

Device Overhead Performance Reliability Maintainability

A dedicated A dedicated surrogate RAID1 set

medium medium high simple

A dedicated surrogate RAID5 set

high high high simpleD5 s t

A live surrogate RAID5 t

low low medium-high complicatedRAID5 set

HUST & UNL 15

Data Consistency

Data ProtectionIn order to avoid data loss caused by a disk failure in the surrogate RAID set, all gredirected write data in the surrogate RAID set should be protected by a redundancy scheme, such as RAID1 or RAID5.

“Metadata” ProtectionThe content of D_Table should be stored in a NVRAM during the entire period when NVRAM during the entire period when WorkOut is activated, to prevent data loss in the event of a power supply failure

HUST & UNL 16

the event of a power supply failure.

Performance Evaluation

Prototype implementationA built-in module in MDIncorporated into PR & PRO

Experimental setupIntel Xeon 3.0GHz processor, 1GB DDR memory, 15 S t SATA di k (10GB) Li 2 6 11Seagate SATA disks (10GB), Linux 2.6.11

MethodologyO l lOpen-loop: trace replay

Trace: Financial1, Financial2, Websearch2Tool: RAIDmeterTool: RAIDmeter

Closed-loop: TPC-C-like benchmark

HUST & UNL 17

Experimental ResultsTrace Reconstruction Time (second)

Off-line PR WorkOut+PR Speedup PRO WorkOut+PRO Speedup

Fin1

136.4

1121.75 203.13 5.52 1109.62 188.26 5.89

Fin2 745.19 453.32 1.64 705.79 431.24 1.64

Web 9935.6 7623.22 1.30 9888.27 7851.36 1.26

Trace Average User Response Time during Reconstruction (millisecond)g p g ( )Normal Degraded PR WorkOut+PR Speedup PRO WorkOut+PRO Speedup

Fin1 7.92 9.52 12.71 4.43 2.87 9.83 4.58 2.15

Fin2 8.13 13.36 25.8 9.69 2.66 22.97 10.19 2.25

Web 18.46 26.95 38.57 28.35 1.36 35.58 29.12 1.22

Degraded RAID set: RAID5, 8 disks, 64KB stripe unit sizeSurrogate RAID set: RAID5, 4 disks, 64KB stripe unit sizeMinimum reconstruction bandwidth: 1MB/s

HUST & UNL 18

Minimum reconstruction bandwidth: 1MB/s

Percentage of Redirected Requestsg q

84%

Minimum reconstruction bandwidth of 1MB/s

HUST & UNL 19

Sensitivity Study (1)

ms)

on T

ime

(s)

se T

ime

(m

cons

truc

tio

ge R

espo

ns

Rec

Ave

rag

D ff b d d h

(a) (b)

Different minimum reconstruction bandwidth: 1MB/s, 10MB/s, 100MB/s

HUST & UNL 20

Sensitivity Study (2)

800900

) 4045

ms) PR

500600700800

on T

ime

(s)

25303540

nse

Tim

e (m PRO

WorkOut

200300400500

econ

stru

ctio

PRPRO

10152025

age

Res

pon

0100R

e

5 8 11

PROWorkOut

05

Ave

ra

5 8 11

D ff b f d k (5 8 11)

(a) (b)

Different number of disks (5, 8, 11)

HUST & UNL 21

Sensitivity Study (3)

40

n T

ime

(s)

25303540

PRWorkOut

onst

ruct

ion

10152025

Rec

o

05

RAID10 RAID6

(a) (b)

Different RAID level: RAID10 (4 disks), RAID6 (8 disks)

HUST & UNL 22

Different Surrogate Setg

4045

Dedicated RAID1

303540 Dedicated RAID5

Live RAID5PRThe same reconstruction time for the

152025three different surrogate sets

05

10

Dedicated RAID1: 2 disks

0Fin1 Fin2 Web

Dedicated RAID1: 2 disksDedicated RAID5: 4 disksLive RAID5: 4 disks (Replaying the Fin1 workload on it)

HUST & UNL 23

TPC-C-like Benchmark

15%

8000

10000

12000

tion

Rat

e

6000

8000

d T

rans

act

0

2000

4000

Nor

mal

ized

0N

(a) Transaction rate (b) Reconstruction time

Minimum reconstruction bandwidth of 1MB/s

HUST & UNL 24

Extendibility—Re-synchronizationy y(s

)

ms)

atio

n T

ime

nse

Tim

e (m

ynch

roni

za

age

Res

pon

Re-

sy

Ave

ra

( ) (b)

Re-synchronization: RAID5, 8 disks, 64KB stripe unit size

(a) (b)

Surrogate RAID set: RAID5, 4 disks, 64KB stripe unit sizeMinimum Re-synchronization bandwidth: 1MB/s

HUST & UNL 25

Conclusion

WorkOut outsources a significant amount of I/O t f th d d d user I/O requests away from the degraded

RAID set to a surrogate RAID set, thus i i RAID t ti fimproving RAID reconstruction performance;Insights and guidance for storage system designers and administrators by exploiting three design options;WorkOut can improve the performance of other background support RAID tasks such as g ppre-synchronization.

HUST & UNL 26

HUST & UNL 27