A Systematic Approach to Designing a Storage System · A Systematic Approach to Designing a Storage...

19
1 A Systematic Approach to Designing a Storage System Zhiqi Tao, Andreas Dilger, Eric Barton, Bryon Neitzel Nov 16 2012

Transcript of A Systematic Approach to Designing a Storage System · A Systematic Approach to Designing a Storage...

1

A Systematic Approach to Designing a Storage System

Zhiqi Tao, Andreas Dilger, Eric Barton, Bryon Neitzel Nov 16 2012

2 High  Performance  Data  Division  

Outline

•  Introduction •  A Systematic Approach to Designing a Storage System •  Case Study •  Conclusion

3 High  Performance  Data  Division  

Introduction

•  A good storage system is a well-balanced system. o  Each individual component is suited for its purpose

o  All the components fit together to achieve optimal performance.

o  A typical storage system comprises a variety of components:

NAS

Servers HBA/ NIC

Storage Controller NIC

clients clients

clients Clients

Disks and Enclosures

SAN Cluster Network

4 High  Performance  Data  Division  

Introduction

•  Years of experience have given many storage architects a collection of practical rules and guidelines about how to put together a storage system. o  Do not mix different manufactures’ hard disks in one RAID

group.

•  Rules such as these are based on individual experience.

•  They may not be generally applicable and may even be outdated due to more recent advances in storage technology.

5 High  Performance  Data  Division  

Introduction •  This paper describes a methodical approach to architecting a large

scale, high performance storage system.

•  A typical design process starts with a requirements analysis, a top-down process that creates a complete view of the system

o  “Soft” requirements, such as identifying the users of the system and the type of input expected from the clients

o  “Hard” requirements, which are typically related to power, cost, rack space, and so forth.

•  Once the system boundaries are understood, the performance requirements can be determined at the layer or component level. The system can then be built up one component at a time.

6 High  Performance  Data  Division  

A Systematic Approach to Designing a Storage System

•  The storage architect often needs to address two questions:

1.  What are the requirements for capacity, performance, and fault- tolerance relative to the workload presented by clients

2.  How can these requirements be satisfied with minimal costs?

•  How well and how quickly storage architects can find answers to these questions often relies primarily on their level of experience.

•  We propose a systematic methodology to facilitate the design process

7 High  Performance  Data  Division  

Pipeline

•  The components of a storage system are aligned like a pipeline. For each IO operation, data “flows” through the pipeline. Obviously, the faster and more reliable the flow, the better pipeline is.

NAS Servers

HBA/ NIC

Storage Controller NIC

clients clients clients Clients

Storage Layers in the Pipeline

Disks and Enclosures

SAN Cluster Network

8 High  Performance  Data  Division  

Pipeline

•  The narrowest point in the pipeline determines the throughput of the pipeline. Thus, improving the performance of one or two components in the storage system pipeline will not improve overall performance if other components are “restricting” the flow of data

•  As the storage system becomes more complex, each additional layer adds overhead that can cause performance degradation. To achieve optimal performance, it is best to minimize such degradations.

9 High  Performance  Data  Division  

Using an Iterative Design Process

Design  to  meet  requirements  

Test  that  design  meets  requirements  

Modify  design  as  needed  

Add  next  component  in  pipeline  to  components  designed  so  far  

10 High  Performance  Data  Division  

Case Study

A hypothetical storage system

•  A single namespace

•  10 PB usable space

•  100 GB/s aggregate bandwidth

•  No single point of failure

11 High  Performance  Data  Division  

A Simple Generic Lustre System

10GbE

12 High  Performance  Data  Division  

Disks and Disk Enclosures

•  The first step is to determine a disk configuration that meets the requirement for capacity (10 PB usable space) and performance (100 GB/s aggregate bandwidth).

•  Capacity vs. Bandwidth vs. IOPS

13 High  Performance  Data  Division  

Backend Storage

•  Lustre can work with any block device but Lustre has no knowledge or control on backend storage.

– Use RAID to protect disk features – RAID 10 on MDT

– RAID 6 on OST –  Lustre does 1MB IO. A popular configuration is to configure10 Disks in RAID 6 (8

data + 2 parity) and set the segment size to 128KB.

14 High  Performance  Data  Division  

Backend Storage

•  Disk capacity is measured in decimal GB (109 bytes) while file system capacity is measured in binary GiB (230 bytes)

–  For example, 3TB hard disk = 2.793TB usable

•  RAID, LVM, Partitioning and file system would all steal some percentages of the capacity.

•  Metadata is typically 1-2 percent of file system capacity but it would be better to over budget the Metadata Target

–  2K per inode in Lustre 2.x

15 High  Performance  Data  Division  

Backend Storage

•  “Hidden” caches must be protected – Lustre has built-in mechanism to protect caches visible to Lustre.

– Cache Mirroring and Battery backed cache on storage controllers

– Turn off HBA cache

Sub-Controller Module A

Sub-Controller Module B

Disk Enclosure

Disk Enclosure

Cache Mirroring

Storage Controller

16 High  Performance  Data  Division  

Degradations in the pipeline

•  “Hidden” caches must be protected – Lustre has built-in mechanism to protect caches visible to Lustre.

– Cache Mirroring and Battery backed cache on storage controllers

– Turn off HBA cache

NAS Servers

HBA/ NIC

Storage Controlle

r NIC

clients clients clients Clients

Storage Layers in the Pipeline

Disks and Enclosures

SAN Cluster Network

7.2 GB/S | 4 GB/s | 4 GB/s | 3.6 GB/s | 3.6 GB/s |

17 High  Performance  Data  Division  

Network Consideration

•  An un-optimized network architecture can potentially be a limiting factor.

36-port IB switch

36-port IB switch

24 ports 24 ports

12 ports

36-port IB switch

36-port IB switch

18 ports 18 ports

18 ports

2:1 oversubscribed InfiniBand Fabric

Non-oversubscribed InfiniBand Fabric

18 High  Performance  Data  Division  

Conclusion •  The process of architecting a storage system is not

straightforward, as many different aspects must be considered. o  Design the backend disk storage first and then gradually

moving up the storage “pipeline” to the client. o  Iteratively review the design and incrementally add more

limiting factors for consideration.

Thank You

19