The HP AutoRAID Hierarchical Storage System

31
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories

description

The HP AutoRAID Hierarchical Storage System. John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories. File System Review. File System Review. UNIX File System (1974) provides an addressable structure to store and retrieve files from disk - PowerPoint PPT Presentation

Transcript of The HP AutoRAID Hierarchical Storage System

Page 1: The HP AutoRAID Hierarchical Storage System

The HP AutoRAID Hierarchical Storage System

John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan

Hewlett-Packard Laboratories

Page 2: The HP AutoRAID Hierarchical Storage System

File System Review

Page 3: The HP AutoRAID Hierarchical Storage System

File System Review UNIX File System (1974)

provides an addressable structure to store and retrieve files from disk simple & elegant but slow (2% bandwidth)

Page 4: The HP AutoRAID Hierarchical Storage System

File System Review UNIX File System (1974)

provides an addressable structure to store and retrieve files from disk simple & elegant but slow (2% bandwidth)

Berkeley Fast File System (1984) modified the block size to allow bandwidth to reach up to 47% created cylinder groups that spread metadata to reduce seek times considered hardware specifics during file system parameterization

Page 5: The HP AutoRAID Hierarchical Storage System

File System Review UNIX File System (1974)

provides an addressable structure to store and retrieve files from disk simple & elegant but slow (2% bandwidth)

Berkeley Fast File System (1984) modified the block size to allow bandwidth to reach up to 47% created cylinder groups that spread metadata to reduce seek times considered hardware specifics during file system parameterization

Sprite Log-structured File System (1991) relies on increasingly large file caches to handle most reads buffers multiplemultiple writes before going to disk buffer then gets copied entirely to disk in a singlesingle write introduced the concept of extents (large continuous set of free

blocks) requires cleaning(garbage collection) & requires restructuring of

active/non-active data improved crash recovery with roll forward capability !!!!

Page 6: The HP AutoRAID Hierarchical Storage System

What about Hardware Failure??

Redundancy

FS X

FFS X

LFS X

Page 7: The HP AutoRAID Hierarchical Storage System

What about Hardware Failure??

Redundancy

FS X

FFS X

LFS X

RAID

Page 8: The HP AutoRAID Hierarchical Storage System

RAID Redundant array of independent disks (early 80s)

early days of mainframes Redundant array of inexpensive disks (1988 Patterson, et al)

for smaller computer pc ( became widely popular) introduced the concept of partial redundancy

Virtualization Array of Disks are viewed as a Single Virtual Disk Requires Array Controller, SCSI connector, hardware and software

support

Controls Array of Disks

Page 9: The HP AutoRAID Hierarchical Storage System

the many Levels of RAID Patterson introduced five levels No Standards Exist Companies are free to invent their own versions

Page 10: The HP AutoRAID Hierarchical Storage System

raid0

STRIPINGSTRIPING Pros

Good performance on large requests 100% storage capacity

Cons Not fault tolerant Not considered raid by many

enthusiasts because nothing is redundant

Page 11: The HP AutoRAID Hierarchical Storage System

raid1

MIRRORINGMIRRORING Pros

Good performance And its fault tolerant

Cons 50% storage capacity Gets expensive to scale

Page 12: The HP AutoRAID Hierarchical Storage System

Parity Parity is calculated using XOR Controller takes a bit from each disk

if the total is even parity = 0 If the total is odd parity =1

Same protection as mirroring without all the overhead Increased capacity to 80% (1-1/n where n=disks) Easy to restore bits to a single failed drive

For missing data, what bit makes parity correct?

Page 13: The HP AutoRAID Hierarchical Storage System

raid3Combine Striping and Redundancy

Pros increased storage capacity (1 - 1/N)% high throughput for large files provides partial redundancy using parity

Cons parity is at the bit level Poor performance for small I/O no parallel reads or writes possible because parity is on a single disk

Page 14: The HP AutoRAID Hierarchical Storage System

raid5Spread Parity Across All Disks

Pros block level striping allows hot swappable disk replacement on failure small requests could be performed in parallel

Cons small writes require reading old data, writing new data, reading corresponding

old parity value, and writing new parity value(small-write problem) if workload contains too many small writes performance suffers dramatically

Page 15: The HP AutoRAID Hierarchical Storage System

All these levels, how do I choose the right one? No Level fits for all occasions Raid1 fastfast but doesn’t scale well

50% storage capacity

Raid5 scales but can’t handle multiple small writes

Page 16: The HP AutoRAID Hierarchical Storage System

All these levels, how do I choose the right one? No Level fits for all occasions Raid1 fastfast but doesn’t scale well

50% storage capacity

Raid5 scales but can’t handle multiple small writes

How Can we combine the best of both Levels?

Page 17: The HP AutoRAID Hierarchical Storage System

All these levels, how do I choose the right one? No Level fits for all occasions Raid1 fastfast but doesn’t scale well

50% storage capacity

Raid5 scales but can’t handle multiple small writes How Can we combine the best of both Levels?

Use raid1 for Active data and raid5 for Inactive data

Page 18: The HP AutoRAID Hierarchical Storage System

All these levels, how do I choose the right one? No Level fits for all occasions Raid1 fastfast but doesn’t scale well

50% storage capacity

Raid5 scales but can’t handle multiple small writes How Can we combine the best of both Levels?

Use Raid1 for Active Data and Raid 5 for Inactive data Create a mapping that allows migration between the two

Page 19: The HP AutoRAID Hierarchical Storage System

All these levels, how do I choose the right one? No Level fits for all occasions Raid1 fastfast but doesn’t scale well

50% storage capacity

Raid5 scales but can’t handle multiple small writes How Can we combine the best of both Levels?

Use Raid1 for Active Data and Raid 5 for Inactive data Create a mapping that allows migration between the two Assign a hierarchical preference to each level

Page 20: The HP AutoRAID Hierarchical Storage System

All these levels, how do I choose the right one?

No Level fits for all occasions Raid1 fastfast but doesn’t scale well

50% storage capacity

Raid5 scales but can’t handle multiple small writes How Can we combine the best of both Levels?

Use Raid1 for Active Data and Raid 5 for Inactive data Create mapping that allows migration between the two Assign a hierarchical preference Provide a way to migrate data between the two hierarchies

Page 21: The HP AutoRAID Hierarchical Storage System

Who Manages the Migration?

Not the system administrator Error prone Can not adapt fast enough to changing environment

Not the file system Good idea but not a portable solution

Could use an array controller if it were smart enough It would have to

identify active and inactive data migrate active data to mirrored storage and inactive data to raid5

storage provide a virtual disk to the existing file system be easy to configure

Page 22: The HP AutoRAID Hierarchical Storage System

HP AutoRAID Super intelligent array controller

Uses Embedded software to manage hierarchy Presents virtual logical units to file system The file system is unaware of

storage hierarchy active/inactive grouping data migration

Have to provide a mapping to go from virtual to physical addresses!

Page 23: The HP AutoRAID Hierarchical Storage System

Data Layout – placing the data on the disk PEX physical extent

1MB of disk space allocation These are the columns of data

PEG physical extent group group of at least three PEX’s on different disks Spread across disks to balance data

PEG States Can be assigned to the mirrored storage class Can be assigned to the raid5 storage class Can be unassigned

Segment – 128KB contiguous space Included in a stripe or mirrored pair

RB Relocation Block - 64KB LUN logical unit Host-visible virtual disk STRIPE row of parity & data segments in raid5

peg

Page 24: The HP AutoRAID Hierarchical Storage System

LUN ptrs to PEGS PEG tables

list of RB’s list of PEX’s

PEX tables 1 per disk

RB1

RB2

RB3

RB4

RB5

RB6

RB7

RBn

LUN/VirtualDeviceTables

PEG TABLES

PEGn

PEG2

PEG1 RB4

RB5

RB6

RB7

PEX1

PEX2

PEX3

Disk 1 Disk 2 Disk3

Pex1 segment table Pex2 segment table Pex3 segment table

OS

File

Sys

tem

Mapping Structure

Page 25: The HP AutoRAID Hierarchical Storage System

HP AutoRAID What can it do? Initially array starts out empty Data is added to mirrored storage until it is full Some mirrored storage is immediately reallocated to raid5 storage

Just re-map PEX’s in mirrored PEG’s to RAID5 PEG’s As workload changes

Newly active data are promoted to mirrored storage Data that are less active are demoted to raid5 storage All of this is done in the background - no performance interference

Hot-pluggable disks allow for failed component to be removed while system is running

Disks can be added to the array at any time up to maximum of 12 Controller fail-over support Active hot spare to reduce the risk of having two drive failures Raid5 uses Log-Structured writes for added performance

Page 26: The HP AutoRAID Hierarchical Storage System

Added redundancy Have the ability to add disks to the array on the fly We pushed control disk control from the File System to some fancy

hardware with embedded software As far as the file system is concerned we have solved all the problems,

right? Well, not really! RAID5 uses log-structured writes, what about the garbage collection?

HP AutoRAID is very Slick!

Page 27: The HP AutoRAID Hierarchical Storage System

Added redundancy Have the ability to add disks to the array on the fly We pushed control disk control from the File System to some fancy

hardware with embedded software As far as the file system is concerned we have solved all the problems,

right? Well, not really! RAID5 uses log-structured writes, what about the garbage collection? Same as layout balancing, garbage collection is done in the background

This is done by identifying periods of idleness Cleaning requires filling the Holes left when data are promoted to the

mirrored storage class

HP AutoRAID is very Slick!

Page 28: The HP AutoRAID Hierarchical Storage System

Compaction, cleaning, hole plugging

RAID5 PEG Hole-Plugging Garbage collection If it is nearly full

RB’s from almost empty PEG’s copied to fill holes Minimizes data movement

If it is almost empty Those RB’s are used to fill holes in the nearly full ones

If it is almost empty and no others holes are ready to be plugged valid RB’s are written to the end of the log Complete PEG is reclaimed as a unit

Page 29: The HP AutoRAID Hierarchical Storage System

Performance OLTP macrobenchmark results

Raid redundancy

HPAutoRaid redundancy JBOD–LVM NO redundancy

Striping though, so geared for speed Results are as expected

Transaction rate relative to number of disks Working set to large for 5 drives Write set doesn’t fit entirely in

mirrored storage Thrashing causes poor performance

Page 30: The HP AutoRAID Hierarchical Storage System

Summary HP AutoRAID works well to provide performance and

redundancy Extremely easy to setup and use Works in a variety of real life environments Provides outstanding general purpose storage

Page 31: The HP AutoRAID Hierarchical Storage System

References Wilkes, John. et al “The HP autoraid hierarchical storage system” Hewlett-Packard

Laboratories Patterson, David A “A case for redundant arrays of inexpensive disks (RAID)”

Department of Electrical Engineering UC Berkeley Henson, Val “A Brief History of Unix File Systems”

http://www.lugod.org/presentations/filesystems.pdf Rosenblum, Mendel. et al “The design and implementation of a log-structured file

system” Department of Electrical Engineering UC Berkeley McKusick, Marshal K. et al “A fast file system for UNIX*” Department of

Electrical Engineering UC Berkeley Raid graphics from http://www.prepressure.com/techno/raid.htm Parity graphics from http://www.commodore.ca/windows/raid5/raid5.htm#Parity Tanenbaum, Andrew S “Modern Operating Systems 2nd Edition” Prentice-Hall of

India