A Systematic Approach to Designing a Storage System · A Systematic Approach to Designing a Storage...
Embed Size (px)
Transcript of A Systematic Approach to Designing a Storage System · A Systematic Approach to Designing a Storage...

1
A Systematic Approach to Designing a Storage System
Zhiqi Tao, Andreas Dilger, Eric Barton, Bryon Neitzel Nov 16 2012

2 High Performance Data Division
Outline
• Introduction • A Systematic Approach to Designing a Storage System • Case Study • Conclusion

3 High Performance Data Division
Introduction
• A good storage system is a well-balanced system. o Each individual component is suited for its purpose
o All the components fit together to achieve optimal performance.
o A typical storage system comprises a variety of components:
NAS
Servers HBA/ NIC
Storage Controller NIC
clients clients
clients Clients
Disks and Enclosures
SAN Cluster Network

4 High Performance Data Division
Introduction
• Years of experience have given many storage architects a collection of practical rules and guidelines about how to put together a storage system. o Do not mix different manufactures’ hard disks in one RAID
group.
• Rules such as these are based on individual experience.
• They may not be generally applicable and may even be outdated due to more recent advances in storage technology.

5 High Performance Data Division
Introduction • This paper describes a methodical approach to architecting a large
scale, high performance storage system.
• A typical design process starts with a requirements analysis, a top-down process that creates a complete view of the system
o “Soft” requirements, such as identifying the users of the system and the type of input expected from the clients
o “Hard” requirements, which are typically related to power, cost, rack space, and so forth.
• Once the system boundaries are understood, the performance requirements can be determined at the layer or component level. The system can then be built up one component at a time.

6 High Performance Data Division
A Systematic Approach to Designing a Storage System
• The storage architect often needs to address two questions:
1. What are the requirements for capacity, performance, and fault- tolerance relative to the workload presented by clients
2. How can these requirements be satisfied with minimal costs?
• How well and how quickly storage architects can find answers to these questions often relies primarily on their level of experience.
• We propose a systematic methodology to facilitate the design process

7 High Performance Data Division
Pipeline
• The components of a storage system are aligned like a pipeline. For each IO operation, data “flows” through the pipeline. Obviously, the faster and more reliable the flow, the better pipeline is.
NAS Servers
HBA/ NIC
Storage Controller NIC
clients clients clients Clients
Storage Layers in the Pipeline
Disks and Enclosures
SAN Cluster Network

8 High Performance Data Division
Pipeline
• The narrowest point in the pipeline determines the throughput of the pipeline. Thus, improving the performance of one or two components in the storage system pipeline will not improve overall performance if other components are “restricting” the flow of data
• As the storage system becomes more complex, each additional layer adds overhead that can cause performance degradation. To achieve optimal performance, it is best to minimize such degradations.

9 High Performance Data Division
Using an Iterative Design Process
Design to meet requirements
Test that design meets requirements
Modify design as needed
Add next component in pipeline to components designed so far

10 High Performance Data Division
Case Study
A hypothetical storage system
• A single namespace
• 10 PB usable space
• 100 GB/s aggregate bandwidth
• No single point of failure

11 High Performance Data Division
A Simple Generic Lustre System
10GbE

12 High Performance Data Division
Disks and Disk Enclosures
• The first step is to determine a disk configuration that meets the requirement for capacity (10 PB usable space) and performance (100 GB/s aggregate bandwidth).
• Capacity vs. Bandwidth vs. IOPS

13 High Performance Data Division
Backend Storage
• Lustre can work with any block device but Lustre has no knowledge or control on backend storage.
– Use RAID to protect disk features – RAID 10 on MDT
– RAID 6 on OST – Lustre does 1MB IO. A popular configuration is to configure10 Disks in RAID 6 (8
data + 2 parity) and set the segment size to 128KB.

14 High Performance Data Division
Backend Storage
• Disk capacity is measured in decimal GB (109 bytes) while file system capacity is measured in binary GiB (230 bytes)
– For example, 3TB hard disk = 2.793TB usable
• RAID, LVM, Partitioning and file system would all steal some percentages of the capacity.
• Metadata is typically 1-2 percent of file system capacity but it would be better to over budget the Metadata Target
– 2K per inode in Lustre 2.x

15 High Performance Data Division
Backend Storage
• “Hidden” caches must be protected – Lustre has built-in mechanism to protect caches visible to Lustre.
– Cache Mirroring and Battery backed cache on storage controllers
– Turn off HBA cache
Sub-Controller Module A
Sub-Controller Module B
Disk Enclosure
Disk Enclosure
Cache Mirroring
Storage Controller

16 High Performance Data Division
Degradations in the pipeline
• “Hidden” caches must be protected – Lustre has built-in mechanism to protect caches visible to Lustre.
– Cache Mirroring and Battery backed cache on storage controllers
– Turn off HBA cache
NAS Servers
HBA/ NIC
Storage Controlle
r NIC
clients clients clients Clients
Storage Layers in the Pipeline
Disks and Enclosures
SAN Cluster Network
7.2 GB/S | 4 GB/s | 4 GB/s | 3.6 GB/s | 3.6 GB/s |

17 High Performance Data Division
Network Consideration
• An un-optimized network architecture can potentially be a limiting factor.
36-port IB switch
36-port IB switch
24 ports 24 ports
12 ports
36-port IB switch
36-port IB switch
18 ports 18 ports
18 ports
2:1 oversubscribed InfiniBand Fabric
Non-oversubscribed InfiniBand Fabric

18 High Performance Data Division
Conclusion • The process of architecting a storage system is not
straightforward, as many different aspects must be considered. o Design the backend disk storage first and then gradually
moving up the storage “pipeline” to the client. o Iteratively review the design and incrementally add more
limiting factors for consideration.

Thank You
19