Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

32
Distributed RAID Architectures Distributed RAID Architectures for Cluster I/O Computing for Cluster I/O Computing Kai Hwang Kai Hwang Internet and Cluster Computing Lab. University of Southern California 1

Transcript of Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Page 1: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Distributed RAID Architectures Distributed RAID Architectures for Cluster I/O Computingfor Cluster I/O Computing

Kai Hwang Kai Hwang

Internet and Cluster Computing Lab.

University of Southern California

1

Page 2: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Presentation Outline :

n Scalable Cluster I/O

n The RAID-x Architecture

n Cooperative disk drivers

n Benchmark Experiments

n Security and Fault Tolerance

n Conclusions

Page 3: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 3

Scalable clusters providing SSI services are gradually replacing the SMP, cc-NUMA, and MPP in Servers,

Web Sites, and Database Centers

Page 4: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 4

g Size Scalability (physical & application)g Enhanced Availability (failure management)

g Single System Image (Middleware,OS extensions)

g Fast Communication (networks & protocols)

g Load Balancing (CPU, Net, Memory, Disk)

g Security and Encryption (clusters of clusters)

g Distributed Environment (User friendly)g Manageability (Jobs and resources )

g Programmability (simple API required)

g Applicability (cluster- and grid-awareness)

Issues in Cluster Design

Page 5: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 5

g Sixteen Pentinum PCs are housed in two 9-ft computer racks.

g All PCs run with the RedHatLinux v. 6.0 (Kernel v. 2.2.5)

g All nodes are connected by a 100 Mbps Fast Ethernet

g The cluster is ported with DQS, LSF, MPI, PVM, TreadMarks, Elias, and NAS benchmarks, etc.

g Scalable to a future system with 100’s of future PC nodes inter-connected by Gigabit networks

The USC Trojans Cluster ProjectInternet and Cluster Computing Lab. EEB Rm.104

http://andy.usc.edu/trojan/

Page 6: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 6

Trojans Linux Clusterwith Middleware for Security

and Checkpoint Recovery

PentiumPC

PentiumPC

Pentium PC

Gigabit Network Interconnect

Security and Checkpointing Middleware

Single-System Image and Availability Infrastructure

Programming Environments(Java, EDI, HTML, XML)

Web WindowsUser Interface

Other Subsystems(Database, OLTP, etc.)

Linux Linux Linux

Page 7: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

An I/O-centric cluster architecture

EntryPartition

Fast Ethernet

Internet/IntranetClient

DatabasePartition

ServicePartition

Service Flow Data Flow 4

Page 8: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 8

Distributed RAID Embedded in Clusters or Storage-Area Networks:

g I/O Bottleneck in Scalable Cluster Computing

n The gap between CPU/Memory and disk-IO widens

as the µµP doubles in speed every year

n Cluster applications are often I/O-bound

g Disks connected to hosts are often subject to failure by

hosts themselves. Distributed RAID has much higher

availability by fault isolation, rollback recovery, and

automatic file migration.

Page 9: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Distributed RAID with a single I/O space embedded in a cluster

Workstations or PCs

Cluster Network (SAN or LAN)

ds-RAID

Page 10: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Research Projects on Parallel and Distributed RAID

SystemAttributes

USC TrojansRAID-x

PrincetonTickerTAIP

DigitalPetal

Berkeley TertiaryDisk

HPAutoRAID

RAIDArchitectureenvironment

Orthogonal stripingand mirroring in aLinux cluster

RAID-5 withmultiplecontrollers

ChainedDeclusteringin Unix cluster

RAID-5 built witha PC cluster

Hierarchicalwith RAID-1and RAID-5

EnablingMechanismfor SIOS

Cooperative devicedrivers in Linuxkernel

Single RAIDserverimplementation

Petal devicedrivers atuser level

xFS storageservers at filelevel

Disk arraywithin singlecontroller

DataConsistencyChecking

Locks at devicedriver level

Sequencingof userrequests

Lamport’sPaxosalgorithm

Modified DASHprotocol in the xFSfile system

Use mark toupdate theparity disk

Reliabilityand FaultTolerance

Orthogonal stripingand mirroring

Parity checksin RAID-5

ChainedDeclustering

SCSI disks withparity in RAID-5

Mirroring andparity checks

Page 11: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Four RAID architectures using different mirroring and parity checking schemes

B8

M6

M7

M8

B6 B7

Disk 1 Disk 2 Disk 3 Disk 4

B0

B4

B8

P4

B12

B16

B1

B5

P3

B9

B13

B17

B2

P2

B6

B10

B14

P6

P1

B3

B7

B11

P5

B15

Disk 1 Disk 2 Disk 3 Disk 4

B0

B2

B4

B6

B8

B10

B1

B3

B5

B7

B9

B11

M0

M2

M4

M6

M8

M10

M1

M3

M5

M7

M9

M11

Disk 0 Disk 1 Disk 2 Disk 3

B0

B4

B1

B5

B2

M3

M4

M5

B3

M9

M10

M11

B9 B10 B11

M0

M1

M2

Disk 0 Disk 1 Disk 2 Disk 3

B0

M3

B4

M77

B1

M0

B5

M4

B2

M1

B6

M5

B3

M2

B7

M6

B8

M11

B9

M8

B10

M9

B11

M10

Data blocks

Mirrored blocks

(a) Striped mirroring in RAID-10 (b) Parity checking in RAID-5

(c) Orthogonal striping and mirroring(OSM) in the RAID-x

(d) Skewed striping in a chaineddeclustering RAID 6

Page 12: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Theoretical Peak Performance of Four RAID Architectures

7

PerformanceIndicators RAID-10 RAID-5

ChainedDeclustering RAID-x

Read n B n B n B n BLarge Write n B (n-1) B n B n BMax. I/O

Bandwidth Small Write n B nB / 2 n B n BLarge Read mR / n mR / n mR / n mR / nSmall Read R R R RLarge Write 2 mW / n mW / (n-1) 2 mW / n mW / n +

mW / n(n-1)

ParallelRead orParallel

Write TimeSmall Write 2W R+W 2W W

Max. Fault Coveragen/2 diskfailures

Single diskfailure

n/2 diskfailures

Single diskfailure

Page 13: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Distributed RAID-x architectureCluster Network

P/M

CDD

P/M

CDD

P/M

CDD

Node 0 Node 1 Node 3

D0

P/M

CDD

Node 2

D1 D2 D3B0B12B24M25M26M27

B1B13B25M14M15M24

B2B14B26M3M12M13

B3B15B27M0M1M2

D4 D5 D6 D7B4B16B28M29M30M31

B5B17B29M18M19M28

B6B18B30M7M16

M17

B7B19B31M4M5M6

D8 D9 D10 D11B8B20B32M33M34M35

B9B21B33M22M23M32

B10B22B34M11M20M21

B11B23B35

M8M9M10 8

Page 14: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 14

Single I/O space in a Distributed RAID enabled by CDDs at Linux kernel level

Cluster node

Cluster node

Cluster node

Interconnection Network

(b) A global virtual disk with a SIOS formed by cooperative disks

CDD CDD CDDCluster node

Cluster node

Central NFS Server

Interconnection Network

(a) Separate disks driven by independent disk drivers (IDDs)

IDDIDD

IDD

Page 15: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 15(b) Using CDDs to achieve a SIOS in a serverless cluster.

User Level

Kernel Level

UserLevel

KernelLevel

Client side Server side

CDD CDD

User Application

NFS Serveris bypassed

(a) Parallel disk I/O using the NFS in a server/client cluster.

User Level

Kernel Level

UserLevel

KernelLevel

Client side Server side

NFS Client

NFS ServerUser

Application

Traditional Device Driver

Remote disk access using central NFS server versus using cooperative disk drivers in the RAID-x cluster

13

54

6

1

3

456

2

2

Page 16: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 16

Architectural design of Architectural design of cooperative device driverscooperative device drivers

Node 2Node 1

CDD CDD

Virtual disksPhysical disks

Communications through the network

(a) Device masquerading

Cooperative Disk Driver (CDD)

StorageManager

CDD Client Module

Data Consistency Module

Communications through the network

(b) CDD architecture

Page 17: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 17

Cluster node 1

Application

/sios

dir1 dir2

file1 file2 file3

Cluster node 2

Application

/sios

dir1 dir2

file1 file2 file3

Cluster node 3

Application

/sios

dir1 dir2

file1 file2 file3

Cluster node 4

Application

/sios

dir1 dir2

file1 file2 file3

CDD CDD CDD CDD

Cluster Network

Maintaining consistency of the global directory /sios by all CDDs in the distributed RAID-x

Page 18: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Elapsed Time in Executing the Andrew Benchmark on the Linux Cluster at USC

0

5

10

15

20

25

30

35

1 4 8 12 16

Number of Clients

Elap

sed

Tim

e (s

ec)

CompileRead FileScan DirCopy FilesMake Dir

0

1

2

3

4

5

6

7

8

1 4 8 12 16

Number of Clients

Ela

psed

Tim

e (s

ec)

NFS results RAID-x results

Page 19: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Parallel Write Performance of four RAID Architectures against Traffic Rate

13Parallel writes (20MB per client)

0

2

4

6

8

10

12

14

16

18

1 4 8 12 16Number of Clients

Ag

gre

gat

e B

and

wid

th (

MB

/s)

RAID-x Chained DeclusteringRAID-10RAID-5NFS

Page 20: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Parallel Write Performance of four RAID architectures vs. Disk Array Size

13Parallel write

0

2

4

6

8

10

12

14

16

18

2 4 8 12 16Disk Numbers

Ag

gre

gat

e B

and

wid

th (

MB

/s)

RAID-xChained DeclusteringRAID-10RAID-5

Page 21: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 21

Achievable I/O Bandwidth and Improvement Factor on Trojans Cluster

NFS RAID-xI/OOperations 1 Client 16 Clients Improve 1 Client 16 Clients ImproveLarge Read 2.58 MB/s 2.3 MB/s 0.89 2.59 MB/s 15.63 MB/s 6.03

Large Write 2.11 MB/s 2.77 MB/s 1.31 2.92 MB/s 15.29 MB/s 5.24

Small Write 2.47 MB/s 2.81 MB/s 1.34 2.35 MB/s 15.1 MB/s 6.43Chained Declustering RAID-10Operations

1 Client 16 Clients Improve 1 Client 16 Clients Improve

Large Read 2.46 MB/s 15.8 MB/s 6.42 2.37 MB/s 10.76 MB/s 4.54

Large Write 2.62 MB/s 12.63 MB/s 4.82 2.31 MB/s 9.96 MB/s 4.31

Small Write 2.31 MB/s 12.54 MB/s 5.43 2.27 MB/s 9.98 MB/s 4.39

Page 22: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Effects of Stripe Unit Size on I/O Bandwidth of RAID Architectures

Large write (320MB for 16 clients) 14

0

2

4

6

8

10

12

14

16

18

20

16 32 64 128Stripe Unit Size (KB)

Ag

gre

gat

e B

and

wid

th (

MB

/s)

RAID-xChained DeclusteringRAID-10RAID-5

Page 23: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Bonnie Benchmark Results on Trojans ClusterBonnie Benchmark Results on Trojans Cluster

File rewrite

0

0.5

1

1.5

2

2.5

3

3.5

2 4 8 12 16Number of Disks

Ou

tpu

t R

ate

(MB

/s)

RAID-xchained declusteringRAID-10RAID-5

Page 24: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Securing Networks, Intranets,Clusters, or Grid Resources

with intrusion control and automatic recovery from malicious attacks

Highly secured Intranet with intrusion detection and

response, automatic recovery from

malicious attacks, and fault-tolerance

with distributed storage for reliable I/O

Gateway firewall

to screentraffic flow

between networks

Cluster with no security protection

Incr

easin

g R

elia

bil

ity

SMP cluster Intranet Grid

Increasing scalability

No dataprotection

Faulttolerance

Page 25: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 25

Distributed Micro-Firewalls

Source: Murali and Hwang, USC

Page 26: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Distributed Checkpointing on Distributed Checkpointing on The RAIDThe RAID--x in Trojans Clusterx in Trojans Cluster

Process 0

Time

Process 2

Process 4

Process 1

Process 3

Process 5

Process 6

Process 7

Process 8

Process 9

Process10

Process11

Stripe0

Stripe1

Stripe2

C S

C: Checkpointing overhead

S: Synchronization overhead 18

Page 27: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 27

Security Component Technologies

F Firewalls and Cryptography

F Cluster Middleware for Security

F Anti-virus and Immune Systems

F Intrusion Detection and Response

F Distributed Software RAIDs

F Security & Assurance Policies

Page 28: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Distributed Intrusion Detection and Responses

Security Threats Effectiveness in using Micro-Firewalls

Insider attacks Protect hosts against attack from insiders

Denial-of-Service attacks

Protect against denial-of-service attacks from any source

Trojan Program Protect hosts from trapdoors by any source

IP Address Spoofing

Can be reconfigured to prevent IP spoofing at the client host level

Probes and Scans

Use with IDS to block the probes and scans close to their sources

Unauthorized External access

Can prevent unauthorized access to the external networks at the source

Attacks on Intranet Infra-structure

Resist both internal and external attacks and provide fine-grained access control

Page 29: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 29

Checkpointing overhead on distributed RAIDs

0

0.5

1

1.5

2

2.5

3

3.5

1.21 2.21 3.21 4.21 5.21 6.21 7.21 8.11

checkpoint file size (MB)

chec

kpo

int

ove

rhea

d (

sec) NFS

Vaidya

Striped

Page 30: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 30

Advantages and Shortcomings of Distributed Checkpointing

Checkpointing Scheme Advantages Shortcomings Suitable applications

Simultaneous writing toa central storage(The NFS scheme)

Simple,no inconsistent state

Has network and I/Ocontentions, NFS is singlepoint of failure

Small size of checkpoint,small number of nodes,low I/O operation

Staggered writing to acentral storage(Vaidya scheme)

Eliminate the network and I/Ocontention

Network bandwidth iswasted, NFS is a singlepoint of failure

Small size ofcheckpointers, smallnumber of nodes,low I/O operations

Striped staggeringcheckpointing on anydistributed RAID(Our scheme)

Eliminate network and I/Ocontentions, low checkpointoverhead, fully utilize networkbandwidth, tolerate multiplefailures among stripe groups

Can not tolerate morenode failures withineach stripe group

Large size ofcheckpointers, largenumber of nodes,low communication,I/O intensiveapplications

Page 31: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 31

F Distributed storage-area networks demands hardware or software support of a single I/O space not only in clusters but also in pervasive information grids.

F Hierarchical checkpointing with striping and staggered mirroring for building fault-tolerant clusters to provide continuous network services

F Hacker-proof clusters are in great demand for securingE-business, distributed computing, and metacomputinggrid applications.

F Exploring new applications in multiserver consolidation, collaborative design, and pervasive network services.

Conclusions :

Page 32: Distributed RAID Architectures for Cluster I/O Computing Kai Hwang

Call for Participation

IEEE Third International

Conference on Cluster Computing

CLUSTER 2001Sutton Place Hotel, Newport Beach, California

October 8 - 11, 2001