Microservices for the Enterprise: Designing, Developing, and Deploying
Best Practices for designing, deploying, and administering ...€¦ · Best Practices for...
Transcript of Best Practices for designing, deploying, and administering ...€¦ · Best Practices for...
EMC Proven Professional Knowledge Sharing 2010
Best Practices for designing, deploying, and administering SAN using EMC CLARiiON Storage Systems Anuj Sharma
Anuj Sharma [email protected]
2010 EMC Proven Professional Knowledge Sharing 2
TABLE OF CONTENTS S.NO NAME OF THE TOPIC PAGE
NO. I. Abstract 4-5
II. Executive Summary 6-9 III. Introduction 10 IV. Essentials 11-13 A. Designing SAN 1. SAN Topology Considerations 14-16 2. SAN Topologies 16-22 3. Information Gathering 22-26 4. Choosing a Switch Type 26-27 5. Sample SAN Fabric 27
B. Implementation Phase 28
1. Switch and Zoning Best Practices 28-30 2. IP SAN Best Practices 30-32 3. RAID Group Best Practices 32-34 4. HBA Tuning 34-38 5. Hot Sparing Best Practices 38-39 6. Optimizing Cache 39 7. Vault Drive Best Practices 40 8. Virtual Provisioning Best Practices 40-43 9. Drive Spin Down Technology 43-44 10. Aligning File System 44-45 C. Post Implementation Phase 45 1 Health Checkup 45-48 2. Performance Monitoring 49-52
2010 EMC Proven Professional Knowledge Sharing 3
LIST OF FIGURES
S.NO Figure Page 1. Digital Universe 6 2. SAN Features 7 3. Deployment Phases 10 4. Single Switch Topology 17 5. Full Mesh Topology 18 6. Partial Full Mesh Topology 19 7. Core Edge Fabric 21 8. SAN FABRIC 27 9. Execution Throttle Change Snapshot 36 10. Extop screenshot 37 11. Extop screenshot 37 12. Changing Queue Depth 39 13. Thin Provisioning 41 14. Creating Storage Pools 41 15. Disk Drive Spin Down 43 16. Disk Crossing 44 17. DAE Checkup 46 18. LCC Checkup 46 19. Disk Module Checkup 47 20. DAE Checkup 47 21. SPE Checkup 48 22. SPE Checkup 48. 23. Navisphere Analyzer 49 24. Navisphere Analyzer 50
1. Single Switch Subjective Rating 17
2. Full Mesh Switch Subjective Rating 18
3. Partial Mesh Switch Subjective Rating 20
4. Core Edge Subjective Rating 21
5. IOPS Requirements 23
6. Switch Case Scenario 24
7. 64 Switch Case Scenario 25
8. Increased ISLs between core and edge switches 25
9. HBA Tuning Parameters 35
10. Hot Spare Provisioning Example 40
11. Cache Recommendations 40
12 Vault Drive Recommendations 40
13. Data Center Environment Requirements 48.
LIST OF TABLES
2010 EMC Proven Professional Knowledge Sharing 4
If your Windows or UNIX networks are expanding to keep pace with a growing business, you
need more than simple, server-based information storage solutions. You need an enterprise-
capable, fault-tolerant, high-availability solution and EMC® CLARiiON® products are the
answer. CLARiiON storage systems employ the industry’s most extensive set of data integrity
and data availability features, such as dual-active storage processors, mirrored write caching,
data verification, and fault-recovery algorithms and support complex cluster configurations,
including Oracle and Microsoft software. CLARiiON data storage solutions have long been
recognized as the most robust and innovative in the industry. Features of CLARiiON systems
include:
• Flash drives
• UltraFlex™ technology
• Fibre Channel/iSCSI connectivity
• Virtual provisioning
• Tiered storage
• Virtualization-aware management
• Virtual LUN technology
• Drive Spin-Down Technology
The performance and flexibility of CLARiiON storage systems has to be backed by superior
availability. This is where CLARiiON really shines. Its design has no single point of failure, all
the way down to the fans and power cords.
This paper includes the practices that I follow and recommend to realize the benefit of
CLARiiON’s features and optimally utilize its resources to maximize performance; from initial
solution design, to implementation, to SAN administration.
When thinking of designing a SAN, there are many important things that you need to consider
before you jump right into it. For starters, you need to know how the components fit together
in order to choose a SAN design that will work for you. Like most storage managers, you'll
want to design your SAN to fit today's storage needs as well as meet tomorrow's increased
storage capacity requirements. Aside from being scalable, the ideal SAN should also be
designed for resiliency and high availability with the least amount of latency. This article will
touch the following topics:
Initial SAN Solution Designing
I. ABSTRACT
2010 EMC Proven Professional Knowledge Sharing 5
SAN topologies to be considered for different SAN deployments Zoning Best Practices
Practices that should be followed while implementing the SAN Using thin provisioning the best way it should be How Thin provisioning brings utilization and capacity benefits VMware ESX Server using EMC CLARiiON storage systems. …and many more
This article will benefit anyone who implements, manages, or administers SAN using EMC
CLARiiON storage systems.
2010 EMC Proven Professional Knowledge Sharing 6
II. Executive Summary
As an enterprise grows and evolves, the data of the organization grows exponentially along
with data storage requirements. Meeting legal and organizational compliance requirements
has become more difficult as data retention periods have been extended. To avoid future
trouble, organizations have started following compliance more seriously.
According to an IDC survey, by 2011, the digital universe will be 10 times its size in 2006.
The diversity of the digital universe can be seen in the variability of file sizes, from six
gigabyte movies on DVD to 128-bit signals from RFID tags. Because of the growth of VoIP,
sensors, and RFID, the number of electronic information “containers” — files, images,
packets, tag contents — is growing 50% faster than the number of gigabytes. The information
created in 2011 will be contained in more than 20 quadrillion — 20 million billion — of such
containers, a tremendous management challenge for both businesses and consumers.
Meanwhile, media, entertainment, and communications industries will account for 10 times
their share of the digital universe in 2011 as their portion of worldwide gross economic output.
The picture related to the source and governance of digital information remains intact:
approximately 70% of the digital universe is created by individuals, but enterprises are
responsible for the security, privacy, reliability, and compliance of 85%. So the requirements
of storage media is increasing two fold, day by day.
Figure 1 Digital Universe
2010 EMC Proven Professional Knowledge Sharing 7
DAS and NAS implementations allow companies to store and access data effectively, but
often inefficiently. This leads to the isolation of storage to the specific devices, making it
difficult to manage and share. Storage area networks (SANs) have the advantage of
centralization, resulting in improved efficiencies. A SAN is a dedicated storage network that
solves many of the complex business data storage needs. Fiber Channel Switches enable
increased connectivity and performance allowing for interconnected SANs and ultimately,
enterprise-level data accessibility of SAN applications and accessibility.
As SANs continue to grow, many factors need to be considered to help scale and manage
them. A SAN should be designed with present and future needs in mind. A SAN should be
designed keeping in mind the Datacenter Manager needs.
For example:
• 24X7 Data Availability
• Flexible Architecture
• Resilient and Robust Architecture
• Cost Effectiveness
• Hassle-Free Information Management
• Scalable Infrastructure
• Optimally catering to the bandwidth requirements of different applications
Figure 2 SAN FEATURES
2010 EMC Proven Professional Knowledge Sharing 8
The storage system is the most important part in a SAN. The speed and efficiency with which
storage arrays can respond to I/O requests from the servers is critical for minimizing
transaction response times.
In addition to speed and efficiency, the storage system should meet the following critical requirements:
• Availability – Ensure that data is accessible at all times when needed. Loss of access to
data can have significant financial impact on businesses.
• Security – Prevent unauthorized access to data. Mechanisms to allow servers to access
only their allocated resources on storage arrays.
• Capacity – Ability to add storage capacity “on-demand”, without interruption to the
business. If a database runs out of physical storage space, it comes to a halt, thus impacting
the business.
• Scalability – The storage solution should be able to grow with the business. As the
business grows, more servers are deployed and new applications/databases developed.
• Performance – Service all the I/O requests at high speed. With the centralized model,
several servers connect to one storage array. The intelligence of the array, the processors,
and architecture should enable optimal performance.
• Data Integrity – Throughout the I/O chain, checks have to be in place to ensure that data is
not corrupted along the way. The storage system has to “guarantee” that the data that was
sent to it was indeed the data that was written to disk and is available for retrieval when
requested.
• Manageability – The operations and activities required to meet all of these requirements
should be performed seamlessly and with minimal disruption to business activity.
Also, a 2006 IDC study found that power and cooling costs are escalating rapidly as newer,
denser servers and storage come online. Customers building new data centers are planning
for “Green IT”, a hot topic in IT circles. Today’s storage systems should have the intelligence
to use power wisely. EMC CLARiiON addresses this concern with the new Drive Spin Down
technology. While CLARiiON – as an Intelligent Storage System – meets all the above critical
requirements, it may not always meet all the requirements of an administrator unless some of
the practices are followed at the pre-implementation and implementation phases. There are
2010 EMC Proven Professional Knowledge Sharing 9
practices that we should follow while implementing a SAN that maximize resource utilization
and optimize storage system performance.
2010 EMC Proven Professional Knowledge Sharing 10
III. INTRODUCTION
This paper focuses on large IP-SAN or FC-SAN deployments within a data center, and
provides best practices and design considerations when designing a reliable and efficient
SAN using EMC CLARiiON storage systems.
This paper comprises the following 3 sections:
Figure 3 Phases
Designing SAN Implementing SAN Administering SAN
Each section comprises best practices that I feel should be followed at different stages of
SAN deployment to optimally utilize the available resources. Having top-of-the-line SAN
equipment does not guarantee optimal performance. We need to focus on certain parameters
while designing, implementing, and administering a SAN to get optimal performance out of
the available resources.
Apart from best practices, this paper will also focus on the features that enable EMC
CLARiiON storage systems to stand tall among the competitors.
2010 EMC Proven Professional Knowledge Sharing 11
IV. ESSENTIALS
A byte-wide field in the three byte Fibre Channel address that uniquely identifies a switch in a
fabric. The three fields in a FCID are domain, area, and port. A distinct Domain ID is
requested from the principal switch. The principal switch allocates one Domain ID to each
switch in the fabric. A user may be able to set a Preferred ID which can be requested of the
Principal switch, or set an Insistent Domain ID. If two switches insist on the same DID one or
both switches will segment from the fabric.
Director Switches
An enterprise-class Fibre Channel switch, such as the Connectrix® ED-140M, MDS 9509, or
ED-48000B. Directors deliver high availability, failure ride-through, and repair under power to
insure maximum uptime for business-critical applications. Major assemblies, such as power
supplies, fan modules, switch controller cards, switching elements, and port modules, are all
hot-swappable. The term director may also refer to a board-level module in the Symmetrix®
that provides the interface between host channels (through an associated adapter module in
the Symmetrix) and Symmetrix disk devices. Interswitch link (ISL) a physical E_Port
connection between any two switches in a Fibre Channel fabric. An ISL forms a hop in a
fabric.
HBA
A bus card in a host system that allows the host system to connect to the storage system.
Typically, the HBA communicates with the host over a PCI or PCI Express bus and has a
single Fibre Channel link to the fabric. The HBA contains an embedded microprocessor with
on-board firmware, one or more ASICs, and a Small Form Factor Pluggable module (SFP) to
connect to the Fibre Channel link.
Fabric
One or more switching devices that interconnect Fibre Channel N_Ports, and route Fibre
Channel frames based on destination IDs in the frame headers. A fabric provides discovery,
path provisioning, and state change management services for a Fibre Channel environment.
In computer storage, a logical unit number (LUN) is simply the number assigned to a logical
unit. A logical unit is a SCSI protocol entity, the only one which may be addressed by the
actual input/output (I/O) operations. Each SCSI target provides one or more logical units, and
does not perform I/O as itself, but only on behalf of a specific logical unit.
Domain ID
LUN
2010 EMC Proven Professional Knowledge Sharing 12
NAS
Network-attached storage (NAS) is file-level computer data storage connected to a computer
network providing data access to heterogeneous network clients. A NAS unit is essentially a
self-contained computer connected to a network, with the sole purpose of supplying file-
based data storage services to other devices on the network. The operating system and other
software on the NAS unit provide the functionality of data storage, file systems, and access to
files, as well as the management of these functionalities.
Over Subscription
The ratio of bandwidth required to bandwidth available. The switch is oversubscribed when all
ports, associated pair-wise, in any random fashion, cannot sustain full duplex at full line-rate.
Port Fencing
Port fencing is a policy-based feature that allows you to protect your SAN from repeated
operational or security problems experienced by switch ports. Port fencing allows you to set
threshold limits on the number of specific port events permitted during a given time period. If
the port generates more events during the specified time period, the Connectrix Manager
(Port fencing feature) blocks the port, disabling transmit and allows you to receive traffic until
you have time to investigate, solve the problem, and manually unblock the port.
Principal Switch
Principal switch in a multiswitch fabric; the switch that allocates domain IDs to itself and to all
other switches in the fabric. There is always one principal switch in a fabric. If a switch is not
connected to any other switches, it acts as its own principal switch.
SAN
A storage area network (SAN) is an architecture to attach remote computer storage devices
(such as disk arrays, tape libraries, and optical jukeboxes) to servers in such a way that the
devices appear locally attached to the operating system. Although SAN cost and complexity
is dropping, they are still uncommon outside larger enterprises.
VSAN
An allocation of switch ports that can span multiple physical switches, forming a virtual fabric.
A single physical switch can sometimes host more than one VSAN.
World Wide Node Name
2010 EMC Proven Professional Knowledge Sharing 13
A unique identifier, even on global networks. The WWN is a 64-bit number
XX:XX:XX:XX:XX:XX:XX:XX). The WWN contains an OUI which uniquely determines the
equipment manufacturer. OUIs are administered by the Institute of Electronic and Electrical
Engineers (IEEE). The Fibre Channel environment uses two types of WWNs; a World Wide
Node Name (WWNN) and a World Wide Port Name (WWPN). Typically, the WWPN is used
for zoning (path provisioning function).
Zone
An information object implemented by the distributed Nameserver(dNS) of a Fibre Channel
switch. A zone contains a set of members which are permitted to discover and communicate
with one another. The members can be identified by a WWPN or port ID. EMC recommends
the use of WWPNs in zone management. Zoning allows an administrator to group several
devices by function or by location. All devices connected to a connectivity product, such as a
Connectrix switch, may be configured into one or more zones.
Zone Set
An information object implemented by the distributed Nameserver(dNS) of a Fibre Channel
switch. A Zone Set contains a set of Zones. A Zone Set is activated against a fabric, and only
one Zone Set can be active in a fabric.
2010 EMC Proven Professional Knowledge Sharing 14
A. Designing SAN
1. SAN Topology Considerations
The adoption of SANs is driven by a variety of objectives. Some examples are:
The need for more efficient use of enterprise storage arrays
Decreasing size of backup/restore windows
Increasing size of data set to be backed up
The need for improved high availability and disaster tolerance solutions
The need to enhance storage resource management
SAN design can appear to be a challenging task, due to the large number of variables
involved in picking an appropriate design strategy. Designing a fabric involves many variables
that require consideration. With each variable consideration comes a separate design
decision that must be made. Each design decision will help you create a fabric design that is
appropriate for your business information model. The following parameters will help in
choosing the right SAN topology according to the requirements.
a. Accessibility Accessibility refers to the ability of your hosts to access the storage that is required to service
their applications. Accessibility can be measured by your ability to physically connect and
communicate with the individual storage arrays, as well as your ability to provide enough
bandwidth resources to meet your full-access performance requirements. A storage array that
is physically accessible, but cannot be accessed within accepted performance limits because
of oversaturated paths to the device, may be just as useless as an array that cannot be
reached physically.
An example of a statistical bandwidth network is the telephone system. The telephone system
is not constructed with enough bandwidth resources to allow every subscriber to
communicate simultaneously. You may have heard "all lines are currently busy; please try
your call again later." This message indicates that the number of subscribers has saturated
the bandwidth currently available, so no new connections are possible until resources are
freed.
Similar issues can arise in the design and implementation of a fabric. You should also
consider the internal design of the switching devices used in your fabric when considering
2010 EMC Proven Professional Knowledge Sharing 15
accessibility. While switches may be designed for high levels of connectivity and allow many
physical attachments, their internal designs may cause internal bandwidth congestion.
b. Availability Availability is a measurement of the amount of time that your data can be accessed,
compared to the amount of time the data is not accessible because of issues in the
environment. Lack of availability might be a result of failures in the environment that cause a
total loss of paths to the device, or it might be an event that caused so much bandwidth
congestion that the access performance renders the device virtually unavailable. Availability
is impacted not only by your choice of components used to build the fabric, but also by your
ability to build redundancy into the environment.
Another concept that adds to the availability of an environment is sparing, which is the
process of dedicating resources to remain unused until they are needed to take the place of a
failed resource. The following must be considered in your redundancy and sparing plan:
◆ How much bandwidth do I need to preserve after a single event occurs?
◆ What other applications might be affected when the original storage resources move to a new
path or down to a single path?
◆ Do I need to plan for scenarios that include successive failures?
◆ Do I want redundancy built into my connectivity components (as seen with director-class
switching devices)?
◆ Do I want to build site redundancy and copy data to another site using CLARiiON
MirrorView™?
◆ Do I want to build redundancy at the host level with a load-balancing and paths failover
application (like PowerPath®)?
◆ How do I rank my business applications so that I can identify lower priority tasks, so these
resources can be used as spares during a failure event? An example of this would be if task one
had failed due to all of its fiber links being damaged and fiber links from task two were used to
bring up the resources associated with task one. When the resources were back online, both
tasks would be working at 50 percent efficiency.
2010 EMC Proven Professional Knowledge Sharing 16
c. Resource Consolidation Resource consolidation includes the concepts of both physical and logical consolidation.
Physical consolidation involves the physical movement of resources to a centralized location.
Now that these resources are located together, you may be able to more efficiently use
facility resources, such as HVAC (heating, ventilation, and air conditioning), power protection,
personnel, and physical security. The trade-off that comes with physical consolidation is the
loss of resilience against a site failure. Flexibility is a measure of how rapidly you are able to
deploy, shift, and redeploy new storage and host assets in a dynamic fashion without
interrupting your currently running environment. An example of flexibility is the ability to
simply connect new storage into the fabric and then zone it to any host in the fabric.
d. Security Security refers to the ability to protect your operations from external and internal malicious
intrusions, as well as the ability to protect accidental or unintentional data access by
unauthorized parties. Security can range from restriction of physical access to the servers,
storage, and switches by placing them in a locked room, to logical security associated with
zoning, volume accessing/ masking.
e. Supportability Supportability is the measure of how easy it is to effectively identify and troubleshoot issues,
as well as to identify and implement a viable repair solution in the environment. The ability to
troubleshoot may be enhanced through good fabric designs, purposeful placement of servers
and storage on the fabric, and a switch's ability to identify and report issues on the switch
itself or in the fabric. Fabric topologies can be designed so that data traffic patterns are
deterministic, traffic bandwidth requirements can be easily associated with individual
components, and placement policies can be documented so that troublesome components
can be identified quickly.
2. SAN Topologies
This section describes the SAN topologies that can be considered for designing a SAN
according to the importance of parameters explained above. This will help us logically decide
which topology best suits and meets our requirements.
2.a Simple Fibre Channel SAN topologies A simple Fibre Channel SAN consists of less than four directors and switches connected by
ISLs and which has no more than two hops. A single switch fabric is the simplest of the
2010 EMC Proven Professional Knowledge Sharing 17
simple Fibre Channel SAN topologies and consists of only a single switch. A simple Fibre
Channel SAN consists of less than four directors or Fibre Channel Switches.
Figure 4. Single Switch Topology
Most ---------------------------------------------------------------------- Least
Attribute 5 4 3 2 1
Accessibility
Availability
Consolidation
Flexibility
Scalability
Security
Supportability
Table 1: Single Switch Subjective Rating
The following best practices are specific for two switch fabrics.
ISL subscription best practice — While planning the SAN, keep track of how many
host and storage pairs utilize the ISLs between domains. As a general best practice,
if two switches are connected by ISLs, ensure that there is a minimum of two ISLs
between them and that there are no more than six initiator and target pairs per ISL.
For example, if 14 initiators access a total of 14 targets between two domains, a total
of three ISLs would be necessary. This best practice should not be applied blindly
when setting up a configuration. Consider the applications that will use the ISLs.
2.b Complex Fibre Channel SAN topologies
Storage
Management Station
Server
Fibre Switch
2010 EMC Proven Professional Knowledge Sharing 18
2.b.a Full-Mesh fabric
A full-mesh fabric is any collection of Fibre Channel switches in which each switch
is connected to every other switch in the fabric by one or more ISLs. For best host
and storage accessibility, it is recommended that a full-mesh fabric contain no
more than four switches. A mesh may contain departmental switches, directors,
or both, depending on your connectivity needs. When designing and
implementing a full-mesh fabric, it is recommended that you lay out the storage
and servers in a single-tier logical topology design and plan your ISL
requirements based on the assumption that 50% of the traffic on any one switch
will remain local and the other 50% will originate from the remaining remote
switches.
Figure 5. Full-Mesh Topology
Benefits
Full-mesh configurations give you, at most, one-hop access from any server to any storage
device on the fabric. This means that when you are adding or migrating storage or server
attachments, you have the greatest possibility of placing the server attachment and matching
storage attachments anywhere in the fabric and achieving the same response time. Meshes
also ensure that you always have multiple local and remote paths to the data even after fabric
events have occurred.
Limitations
Scaling a full-mesh solution becomes complicated and costly when increasing the number of
switches and required ISLs to guarantee traffic performance.
Most ------------------------------------------------------------------ Least
Attribute 5 4 3 2 1
Accessibility
2010 EMC Proven Professional Knowledge Sharing 19
Availability
Consolidation
Flexibility
Scalability
Security
Supportability
Table 2: Full-Mesh Switch Subjective Rating
2.b.b Partial Mesh fabric
Figure 6. Partial-Mesh Topology
A partial-mesh fabric is different from a full mesh in that each switch does not have to be
connected to all other switches. However, to be considered a partial mesh, the fabric must be
a configuration where splitting it results in each new sub-fabric being a full mesh. For best
fabric response times, both the managed switch (where zoning is activated) and the principal
switch should be at the logical center of the fabric.
Benefits
Partial-mesh designs offer extensive access to both local switch storage and single-hop
storage. A partial mesh also extends accessibility and provides many unique paths to the
storage. Increasing accessibility while maintaining the same level of robustness is a design
goal for every topology. Partial meshes also offer a simple progression into a core/edge
design. If you look at the center of the partial mesh as the core, you can create the new
infrastructure by simply removing some of the ISLs at the outer edges of the fabric.
Limitations
2010 EMC Proven Professional Knowledge Sharing 20
Increasing the fabric size always increases the dependencies within the fabric. This does not
cause a problem, but it does increase the complexity of troubleshooting and the impact on
unrelated processes during a fabric event.
Most ------------------------------------------------------------------ Least
Attribute 5 4 3 2 1
Accessibility
Availability
Consolidation
Flexibility
Scalability
Security
Supportability
Table 3: Partial-Mesh Switch Subjective Rating
2.b Core Edge Fibre Channel SAN topologies
1. Within the two-tier design, servers connect to the edge switches, and storage devices
connect to one or more core switches. This allows the core switch to provide storage services
to one or more edge switches, thus servicing more servers in the fabric. The interswitch links
(ISLs) will have to be designed so that the overall fabric maintains both the fan-out ratio of
servers to storage and the overall end-to-end oversubscription ratio.
2. Three-tier: Edge-core-edge design A three-tier design may be ideal in environments where future network growth will result in
the number of storage devices exceeding the number of ports available at the core switch.
This type of topology still uses a set of edge switches for server connectivity, but adds
another set of edge switches for storage devices. Both sets of edge switches connect to a
core switch via ISLs.
2010 EMC Proven Professional Knowledge Sharing 21
Figure 7. Core Edge Fabric
Benefits The compound core/edge model maintains a robust, highly efficient traffic model while
reducing the required ISLs, thus increasing the available ports for both storage and host
attachments. It also offers a simple method for the expansion of two or more simple
core/edge fabrics into a single environment. You can easily create a compound core topology
by connecting the core switches from simple core/edge fabrics into a full mesh. The
compound core/edge topology creates a robust back-end fabric that can extend the
opportunities for sharing of both backup and storage resources.
Limitations
Core/edge design models produce a physically larger, tiered fabric which could result in
slightly longer fabric management propagation times over smaller, more compact designs.
Neither compound nor complex core/edge fabrics provide for single-hop access to all storage.
Most ---------------------------------------------------------------------- Least
Attribute 5 4 3 2 1
Accessibility
Availability
Consolidation
Flexibility
Scalability
2010 EMC Proven Professional Knowledge Sharing 22
Security
Supportability
Table 4: Core Edge Subjective Rating
Best Practices for Core Edge Topology
Lay out the host and storage connectivity such that if a switch fails, not all of a
particular hosts’ storage becomes inaccessible.
The use of two separate management networks is more common with balanced
fabrics, but it can still be employed when only one fabric is used.
ISL subscription best practice — While planning the SAN, keep track of the number of
host and storage pairs that would be utilizing the ISLs between domains. As a
general best practice, if two switches are connected by ISLs, ensure that there is a
minimum of two ISLs between them, and that there are no more than six initiator and
target pairs per ISL. For example, if 14 initiators access a total of 14 targets between
two domains, a total of three ISLs are necessary.
3. Information Gathering
Designing a SAN begins with gathering the information about the infrastructure which will
help in choosing the right topology. Information that should be captured includes:
a) Details of the Servers
Server details along with the operating system and storage requirements should be gathered.
Server Name Operating System Storage Requirements
anujsanwin2008 Windows 2008 200 GB
anujaix61 AIX 6.1 100 GB
b) Applications running on Servers
Application details and LUN requirements should be captured.
Application LUN Size Number of LUNS
Oracle 9i 1TB 2
IBM DB2 2TB 4
c) Desired IOPS and Read Write Access Distributions
2010 EMC Proven Professional Knowledge Sharing 23
Application Bandwidth Utilization
Read/Write Max
Typical Access Typical I/O Size
OLTP, email, UFS ecommerce, CIFS
Light 80% read 20%wrire
Random 8KB
OLTP(raw) Light 80% read 20%wrire
Random 2KB to 4KB
Decision support, Seismic , Imaging
Medium to heavy 90% read10% write
Sequential 16KB to 128K
Video Server Heavy 98% read 2% Write
Sequential >64KB
SAN applications: LANFree backup, snapshots , thirdparty copy.
Medium to heavy Variable Sequential >64 KB
Table 5: IOPS requirements
d) Future Requirements
To get an idea about the future requirements of the customers, ask questions, such as:
• Number of servers likely to be commissioned in near future
• Storage growth trends
Gathering the above information will help you design the SAN for present and future needs.
Once you have an idea of what a SAN will be used for and the physical location of each piece
of equipment settled, you need to consider how many host and storage ports will be deployed
initially, and how the environment is expected to grow, before you can decide on the right
topology. These are just guidelines. Your actual implementation will vary as more or fewer
ISLs are needed between switches. Scalability information for full mesh with two and four
ISLs and full mesh core with edge switches has been included below to help you find the right
topology.
In Table 6, combinations that result in negative numbers or result in greater than 2048
available ports are grayed out and should not be considered for use.
2010 EMC Proven Professional Knowledge Sharing 24
Table 6
In table 6:
Number of switches The number of switches in the fabric.
Number of ISLs between switches indicates the number of ISLs that will connect
every switch to every other switch. Since this is a full mesh, all switches will connect
to each other.
Number of ports on switch (i.e. 16, 24, 32, etc.) Indicates the number of ports on each
switch in the fabric. It assumes all switches have the same port count.
#total The total raw port count. As of publication, the raw port count cannot exceed
2048 in any single fabric. This number can be determined by summing the port count
on each switch.
# avail The number of ports available for Nx_Ports to attach to.
% avail The amount of ports that were not consumed by E_Ports expressed as a
percentage of the total ports. Generally, you should not use a topology that has 50%
or less of the ports available.
2010 EMC Proven Professional Knowledge Sharing 25
Let’s take a case of using 64 port core switches.
Table 7
The core ports and the appropriate value under Number of ports on edge switch need to be
added to determine the total port count. When this is done, a few configurations fall outside of
the support envelope of 2048 Nx_Ports. If the number of ISLs is increased between cores
and edge switches, then some of the fabrics fall back into the supportable range. Table 8
shows the increase of ISLs between cores and edge switches.
Table 8: Number of ISLs increased between cores and edge switches
2010 EMC Proven Professional Knowledge Sharing 26
Note: Remember, if you are considering deploying a mirrored fabric (and you should,
because this is a best practice), the number of ports needed on each fabric will be roughly
half of the total ports needed.
4. Choosing a Switch Type
This section provides considerations for choosing a vendor and selecting a model.
Choosing a vendor:
If an environment has standardized on a switch vendor such as Brocade or Cisco, you
should use a switch from their product line. Although improvements to test coverage of
interop environments have been made, interop fabrics remain the least tested configurations
as switch vendors spend much more time verifying interop with their own products than
investing time in testing interop with another switch vendors’ products.
The subject of interoperability is raised because even if the fabrics are not connected when
installed, there is a chance that connecting them will be desired at some point in the future.
Training is an equally important reason for using the same vendor. A user who has
standardized on a particular vendor is less likely to need training on the product. Typically,
their expectation of the product’s performance is more realistic and any infrastructure
challenges (power, monitoring) have already been dealt with.
Sometimes it is not possible to keep the same vendor as the decision has already been
made to migrate to another vendor.
If a particular vendor has not been standardized, then determine which features will work
best for you.
Selecting a model: Once a vendor has been chosen, it is time to select a model. There are
many different aspects to consider, but this section is only in regard to port count. Switches
provide between 8 and 64 ports of connectivity. Directors provide between 8 and 528 ports of
connectivity. Keep in mind when ordering a director that they all have minimum shipping
configurations. For example, assuming 4 GB/s FC will be used:
For Cisco: 9513, 9509, and 9506, there is a minimum of one blade per chassis (16 ports).
For Brocade Silkworm 48000, there is a minimum of two blades per chassis (64 ports).
For Brocade M Series Intrepid 6140, there is a minimum of two Universal Port Modules
(UPMs) (8 ports).
Factor in these minimums when considering which switch and how many of each to
purchase.
2010 EMC Proven Professional Knowledge Sharing 27
5. Sample SAN Fabric
Figure 8
The deployment shown in Figure 8 allows scaling to nearly 1500 devices in a single fabric.
The actual production environment has approximately 190 storage ports and roughly 1050
host ports. The environment required a minimum of 12:1 oversubscription within the network,
which required each host edge switch to have a 36-Gbps port channel, using nine physical
links. Storage ports will not grow quite as rapidly and the core switch has room to grow to add
more host edge switches. With data centers continually growing, SAN administrators must
design networks that meet their current needs and can scale for demanding growth.
Administrators deploying large SAN fabrics can use the design parameters and best practices
discussed in this paper to design optimized and scalable SANs.
B. IMPLEMENTATION PHASE
2010 EMC Proven Professional Knowledge Sharing 28
1. Switch and Zoning Best Practices
Connect the host and storage ports in such a way as to prevent a single point of failure
from affecting redundant paths. For example, if you have a dual-attached host and each HBA
accesses its storage through a different storage port, do not place both storage ports for the
same server on the same line card or ASIC.
Use two power sources for host and storage layout.
To reduce the possibility of congestion, and maximize ease of management, connect hosts
and storage port pairs to the same switch where possible.
Use a port fencing policy.
Use the latest supported firmware version and ensure that the same version of firmware is
used throughout the fabric. In homogeneous switch vendor environments, all switch firmware
versions inside each fabric should be equivalent, except during the firmware upgrade
process.
Periodically (or following any changes) back up switch configurations.
Use persistent Domain IDs.
A zoneset can be managed and activated from any switch in the fabric, but it is
recommended that it be managed from a single entry switch within a fabric to avoid
complications with multiple users accessing different switches in a fabric to make concurrent
zone changes.
While it is possible to see and share tapes and disks over the same HBA in a Fibre
Channel fabric, it is not best practice to do so. The reason for this is simple. Tape devices
tend to send out a lot of SCSI rest commands on rewind and this can wreak havoc on disk
data streams. Also, tape traffic, since it is usually one long continuous data stream, will try to
hog the bandwidth of the link. If you are trying to do backups while production is running, it
will affect performance.
A better method is to zone tape ports to dedicated HBAs used for tape backup.
The system administrators should coordinate zoning configuration activity to avoid running
into a situation where two administrators are making changes simultaneously.
To avoid lengthy outages due to errors in Connectrix B SAN configurations, it is
recommended to backup the existing configuration before making any changes.
To avoid the high risk involved in adding a new unauthorized switch to a Connectrix B
fabric, it is advisable to limit the creation of switch-to-switch ports. This can be done by
locking the already connected switch-to-switch ports in the SAN using the portCfgEport
command. Such locking down of E_Ports is persistent across reboots. A portCfgEport <port
2010 EMC Proven Professional Knowledge Sharing 29
number>,0 <disable> must be run on ports that are not connected to other switches in the
fabric to block them from forming ISLs between switches.
The administrator configuring a Connectrix B SAN must be aware that the frame-level
trunking for Connectrix B switches requires all ports in a given ISL trunk reside within an
ASIC group on each end of the link.
On 2 Gb/s switches, port groups are built on contiguous 4-port groups, called quads. For
example, on a Connectrix DS-8B2, there are two quads: ports 0-3 and ports 4-7.
IVR NAT port login (PLOGI) requests received from hosts are delayed for a few seconds to
perform the rewrite on the FC ID address. If the host's PLOGI timeout value is set to a value
less than five seconds, it may result in the PLOGI being unnecessarily aborted and the host
being unable to access the target. EMC recommends that you configure the host bus adapter
for a timeout of at least ten seconds (most HBAs default to a value of 10 or 20 seconds). On
4 Gb/s switches like the Connectrix DS-4100B, trunking port groups are built on contiguous 8-
port groups called octets. In this product, there are four octets: ports 0-7, 8-15, 16-23, and 24-
31. The administrator must use the ports within a group specified above to form an ISL trunk.
It is also possible to configure multiple trunks
If using FC ID or Domain port zone member types, it is recommended that the Domain ID
of each switch in the fabric be locked.
When a new switch is installed in a fabric, it is recommended not to have a configured
zoning database or an Active zoneset. Run the reset zoning command to clear zone.
Host and storage layout
The best practice placed hosts on edge switches and high-use storage ports on core
switches. This was recommended because high-use storage ports are sometimes accessed
by many different hosts on different parts of the fabric. If this is the case in your environment,
this configuration would still be the best option. However, if you have high-use storage ports
that are only accessed by a couple of hosts and it is possible to locate them all on the same
switch, this is the preferred configuration instead of forcing the use of ISLs. Fibre Channel
SAN Topologies resource should be reserved for providing connectivity between ports that
are unable to be placed on the same switch.
With this in mind, the following information provides helpful general guidelines:
Whenever practical, locate HBAs and the storage ports they will access on the same
switch. If it is not practical to do this, minimize the number of ISLs the host and storage need
to traverse.
Some of the switch class products being produced today only contain a single ASIC. If this
is the case, then the positioning of the host and storage ports is strictly a matter of personal
2010 EMC Proven Professional Knowledge Sharing 30
preference. However, if the switch being used contains multiple ASICs, try to connect host
and storage pairs to the same ASIC. This prevents using the shared internal data transfer bus
and reduces switch latency. In addition to performance concerns, consider fault tolerance as
well. For example, if a host has two HBAs, each one accessing its own storage port, do not
attach both HBAs, both storage ports, or all of the HBA and storage ports to the same ASIC.
When working with hosts that have more than one connection to more than one storage
port, always connect the HBAs and, if possible, the storage ports that it accesses to different
FC switches. If a completely separate fabric is available, connect each HBA and storage port
pair to different fabrics. For homogeneous Brocade M Series fabrics: • If Enterprise Fabric
mode is available, enable it. If Enterprise Fabric mode is not available, enable:
Fabric Binding
Switch Binding
Port Binding
For heterogeneous fabrics containing Brocade M Series switches, enable Switch
Binding and Port Binding.
Security
It is important to secure your fabric. General security best practices for an FC SAN include:
Implement some form of zoning
Change default password
Disable unused or infrequently used management interfaces
Use SSL or SSH if available
Limit physical access to FC switches
2. IP SAN Best Practices
Jumbo frames
When supported by the network, we recommend using jumbo frames to increase bandwidth.
Jumbo frames can contain more iSCSI commands and a larger iSCSI payload then normal
frames without fragmenting (or less fragmenting depending on the payload size). If using
jumbo frames, all switches and routers in the paths to the storage system must support and
be capable of handling and configured for jumbo frames.
The following general recommendations apply to iSCSI usage:
2010 EMC Proven Professional Knowledge Sharing 31
iSCSI is not recommended with applications having the highest bandwidth requirements,
including high performance remote replication.
When possible, use a dedicated LAN for storage traffic, or segregate storage traffic to its
own virtual LAN (VLAN).
Use the most recent version of the iSCSI initiator supported by EMC, and the latest version
NIC driver for the host.
Configure iSCSI 1 Gb/s (GigE) and 10 Gb/s (10 GigE) ports to Ethernet full duplex on all
network devices in the initiator-to-target path.
Use CAT6 cabling on the initiator-to-target path whenever possible to ensure consistent
behavior at GigE and 10 GigE Ethernet speeds.
Use jumbo frames and TCP flow control for long distance transfers or with networks
containing low-powered servers.
Use a ratio of 1:1 SP iSCSI ports to NICs on GigE SANs for workloads with high read
bandwidths. 10 GigE SANs can use higher ratios of iSCSI ports to NICs.
Ensure the Ethernet connection to the host is equal to or exceeds the bandwidth rating of
the host NIC.
Ensure the Ethernet connection to the CLARiiON is equal to or exceeds the bandwidth of
the CLARiiON’s iSCSI FlexPort. It is recommended to use a dedicated storage network for iSCSI traffic. If you do not use a
dedicated storage network, iSCSI traffic should be either separated onto a separate physical
LAN, separate LAN segments, or a virtual LAN (VLAN). With VLANs, you can create multiple
virtual LANs, as opposed to multiple physical LANs in your Ethernet infrastructure. This
allows more than one network to share the same physical network while maintaining a logical
separation of the information. FLARE release 29.0 and later support VLAN tagging (IEEE
802.1q) on 1 Gb/s and 10 Gb/s iSCSI interfaces. Ethernet switch-based VLANs are
supported by all FLARE revisions. VLAN tagging with the compatible network switch support
isolates iSCSI traffic from general LAN traffic; this improves SAN performance by reducing
the scope of the broadcast domains.
Network latency
Both bandwidth and throughput rates are subject to network conditions and latency. It is
common for network contention, routing inefficiency, and errors in VLAN configuration to
adversely affect iSCSI performance. It is important to profile and constantly monitor the
network carrying iSCSI traffic to ensure the best iSCSI connectivity and SAN performance. In
general, simple network topologies offer the best performance. Latency can contribute
substantially to iSCSI system performance. As the distance from the host to the CLARiiON
increases, a latency of about 1 millisecond per 200 kilometers (125 miles) is introduced. This
2010 EMC Proven Professional Knowledge Sharing 32
latency has a noticeable effect on WANs supporting sequential I/O workloads. For example, a
40 MB/s 64 KB single stream would average 25 MB/s over a 200 km distance. EMC
recommends increasing the number of streams to maintain the highest bandwidth with these
long-distance, sequential I/O workloads.
3. RAID Group Best Practices
RAID performance characteristics
Ddifferent RAID levels have different performance and availability depending on the type of
RAID and the number of drives in the RAID group. Certain RAID types and RAID group sizes
are more suitable for particular workloads than others, so choosing the appropriate RAID
implementation is a crucial task in the implementation phase.
When to use RAID 0
We do not recommend using RAID 0 for data with any business value.
RAID 0 groups can be used for non-critical data needing high speed (particularly write
speed) and low cost capacity in situations where the time to rebuild will not affect business
processes. Information on RAID 0 groups should be already backed up or replicated in
protected storage. RAID 0 offers no level of redundancy. Proactive hot sparing is not enabled
for RAID 0 groups. A single drive failure in a RAID 0 group will result in complete data loss of
the group. An unrecoverable media failure can result in a partial data loss. A possible use of
RAID 0 groups is scratch drives or temporary storage.
When to use RAID 1
We do not recommend using RAID 1. RAID 1 groups are not expandable. Use RAID 1/0
(1+1) groups as an alternative for single mirrored RAID groups.
When to use RAID 3
For workloads characterized by large block sequential reads, RAID 3 delivers several MB/s of
higher bandwidth than the alternatives. RAID 3 delivers the highest read bandwidth under the
following conditions:
Drives create the bottleneck, such as when there are a small number of drives for each
back-end loop.
Sequential streams are larger than 2 MB.
The file system is not fragmented or is using raw storage.
The block size is 64 KB or greater.
2010 EMC Proven Professional Knowledge Sharing 33
RAID 3 can be used effectively in backup-to-disk applications. In this case, configure RAID
groups as either (4+1) or (8+1). Do not use more than five backup streams per LUN.
In general, RAID 5 usage is recommended over RAID 3. RAID 3 should only be used for
highly sequential I/O workloads, because RAID 3 can bottleneck at the parity drive on random
writes. Also, when more than one RAID 3 group is actively running sequential reads on a
back-end bus, the bus can rapidly become the bottleneck and performance is no different
from RAID 5.
When to use RAID 5
RAID 5 is favored for messaging, data mining, medium-performance media serving, and
RDBMS implementations in which the DBA is effectively using read-ahead and write-behind.
If the host OS and HBA are capable of greater than 64 KB transfers, RAID 5 is a compelling
choice. These following applications are ideal for RAID 5:
Random workloads with modest IOPS-per-gigabyte requirements
High performance random I/O where writes are less than 30 percent of the workload
A DSS database in which access is sequential (performing statistical analysis on sales
records)
Any RDBMS tablespace where record size is larger than 64 KB and access is random
(personnel records with binary content, such as photographs)
RDBMS log activity
Messaging applications
Video/media
When to use RAID 6
RAID 6 offers increased protection against media failures and simultaneous double drive
failures in a parity RAID group. It has similar performance to RAID 5, but requires additional
storage for the additional parity calculated. This additional storage is equivalent to adding a
drive that is not available for data storage to the RAID group.
RAID 6 can be used as an alternative to RAID 5 when the need for increased reliability
outweighs the overhead of the additional parity drive.
RAID 6 groups can be four to 16 drives. A small group is up to six drives (4+2). A medium
group is up to 12 drives (10+2), with large groups being the remainder. Small groups stream
well. However, small random writes destage slowly and can adversely affect the efficiency of
the system write cache. Medium-sized groups perform well for both sequential and random
workloads. The optimal RAID 6 groups are 10 drive (8+2) and 12 drive (10+2) sized.
When to use RAID 1/0
2010 EMC Proven Professional Knowledge Sharing 34
RAID 1/0 provides the best performance on workloads with small, random, write-intensive
I/O. A write-intensive workload’s operations consist of greater than 30 percent random writes.
Some examples of random, small I/O workloads are:
High transaction rate OLTP
Large messaging installations
Real-time data/brokerage records
RDBMS data tables containing small records, such as frequently updated account
balances. RAID 1/0 also offers performance advantages during certain degraded modes,
including when write cache is disabled or when a drive has failed in a RAID group. RAID 1/0
level groups of (3+3) and (4+4) have a good balance of capacity and performance.
4. HBA Tuning
Latest drivers for the HBA should be installed and firmware should be updated.
Transaction-based and throughput-based processing are types of workload. The workload is
the total amount of work that is performed at the storage server, and is measured through the
following formula:
Workload = [transactions (number of host IOPS)] * [throughput (amount of data sent in one I/O)]
Since a storage server can sustain a given maximum workload, the above formula shows that
when the number of host transactions increases, the throughput decreases. Conversely, if the
host is sending large volumes of data with each I/O, the number of transactions decreases.
A workload characterized by a high number of transactions (IOPS) is called a
transaction-based workload.
A workload characterized by large I/Os is called throughput-based workload.
These two workload types are conflicting in nature, and consequently require different
configuration settings across all parts of the storage solution.
2010 EMC Proven Professional Knowledge Sharing 35
Table 9: HBA Tuning parameters.
In Microsoft Windows environments, the following three HBA parameters affect HBA
performance: Execution Throttle, Frame Size, and Fibre Channel Data Rate. Of these, the
Frame Size and Fibre Channel Data Rate default settings are pre-set to 2112 bytes (2048
bytes + headers) and auto-negotiate to provide the best possible performance in any
environment. Therefore, Execution Throttle is the only HBA parameter that you can tune to
improve HBA performance in a Windows environment.
Tuning Execution Throttle
In a SAN configuration with three or more servers accessing the same storage array, QLogic
recommends changing the default Execution Throttle value for each HBA. By default, all
QLogic EMC 4Gb FC HBAs have their Execution Throttle value set to maximum; if you
decide to change this from its default value, use the guidelines below to derive a new value.
To calculate the new execution throttle, first determine if all servers carry the same I/O load. If
all servers carry the same I/O load, calculate the value by dividing 250 by the number of
servers in the SAN. Set each HBA in the SAN to the calculated value. For example, in a four-
server configuration, divide 250 by 4 to arrive at 62.5. The Execution Throttle value for each
HBA is 62. Assign the value of 62 to all HBAs.
If some of the servers carry heavier I/O loads, first calculate the Execution Throttle value by
dividing 250 by the number of servers, and then adjust the values so that servers with higher
I/O loads have higher Execution Throttle values and servers with lower I/O loads have lower
Execution Throttle values. For example, in a four-server configuration, you can assign the
value of 72 to the HBAs in the server with the highest I/O load, the value of 52 to the HBAs in
the second server, and the value of 62 to the HBAs in the remaining two servers.
2010 EMC Proven Professional Knowledge Sharing 36
How to Change the Execution Throttle
The Execution Throttle value for each port of a HBA can be easily changed with the QLogic
SANsurfer FC HBA Manager application (or the SANsurfer command line interface (CLI)) on
Windows Environments (see the figure below).
Figure 9
Tuning HBA Queue Depth for your ESX Environment Use the VMware ESX tool esxtop to view the current HBA queue utilization while I/O is active
on the HBA ports. Navigate your way through esxtop to find disk statistics. A man esxtop
command issued from the ESX console provides detailed information on its usage.
Figure 10 shows the output of the storage statistics section of esxtop while I/O is active.
LQLEN shows the current HBA queue depth set in the QLogic HBA driver. A value of LOAD >
1 indicates that the host application is placing more data in the HBA queue than its current
size can handle. A system with this issue can benefit from an increase in the HBA queue
depth.
2010 EMC Proven Professional Knowledge Sharing 37
Figure 10: extop screenshot
Figure 11 shows the effect of increasing the HBA queue depth. This result of esxtop has been
captured after increasing the HBA queue depth from its default value of 32 to 64. Note that
the LOAD is < 1 and there is a significant increase in the READS/s operations, which means
that performance has increased.
Figure 11: extop screenshot
How to Change the HBA Queue Depth
To change the queue depth of a QLogic HBA in VMware ESX, follow these steps:
• Log on to the VMware ESX Console as root.
• Create a copy of /etc/vmware/esx.conf so you have a backup copy.
• Edit the file /etc/vmware/esx.conf in your favorite editor.
2010 EMC Proven Professional Knowledge Sharing 38
• Locate the following entry
/vmkmodule[0002]/module = "qla2300_707.o"
/vmkmodule[0002]/options = ""
• Modify the entry as shown, where xx is the queue depth value:
/vmkmodule[0002]/module = "qla2300_707.o"
/vmkmodule[0002]/options = "ql2xmaxqdepth=xx"
Figure 12: Changing Queue Depth
• Save the file.
• Reboot the VMware ESX Server.
5. EMC CLARiiON Hot Sparing Best Practices
The following summarizes hot spare best practices:
The storage system should have at least one hot spare of every speed, maximum
needed capacity, and hard drive type.
Position hot spares on the same buses containing the drives they may be required to
replace.
Maintain a minimum 1:30 ratio (round to one per two DAEs) of hot spares to data hard
drives.
2010 EMC Proven Professional Knowledge Sharing 39
EFD storage devices can only be hot spares for, and be hot spared by, other EFD
devices.
DAE Backend bus
Vault Drives Data Drives Hot Spares Total DAE drives
DAE 0 0 5X FC 10X FC 0 15 DAE 1 1 0 14X SATA 1X SATA 15 DAE 2 0 0 15X FC 0 15 DAE 3 0 0 10X FC 1X FC 11 Total FC data
drives 49
Total FC hot spares
1
Total SATA data drives
14
Total SATA hot spares
1
Total Drives 65
Table 10: Hot spare provisioning example
6. Optimizing EMC CLARiiON Cache
Generally, for storage systems with 2 GB or less of available cache memory, use about 20
percent of the memory for read cache and the remainder for write cache. For larger capacity
cache configurations, use as much memory for write cache as possible while reserving about
1 GB for read cache. Specific recommendations are as follows:
CX4120 CX4240 CX4480 CX4960 WRITE CACHE (MB) 498 1011 3600 9760 READ CAHE (MB) 100 250 898 1000 TOTAL CACHE (MB) 598 1261 4498 10760
Table 11: Cache Recommendations
7. EMC CLARiiON Vault Drive Best Practices
It is recommended that no LUNS be bound to the vault drives with high IOPS requirement, as
it can lead to performance degradation. The IOPS supported by the vault drives are given
below in the table and can be used to decide upon the LUN Provisioning on the vault drives.
VAULT HARD DRIVE TYPE MAX IOPS MAX BANDWIDTH (MB/s) FC 100 10SAS 100 10 SATA 50 5 EFD 1500 69
2010 EMC Proven Professional Knowledge Sharing 40
Table 12: Vault Drive Performance Parameters
It is not recommended that the vault drives (0.0.0 to 0.0.4) be left unbound. If drives are
unbound, they are not being regularly verified by flare. This means there would be no early
warning of drive faults which could cause booting problems for the SP. Therefore, if no user
data needs to be bound on the first five drives, then a small (e.g. 1 GB) test LUN should be
bound across any unbound vault drives. This LUN should be named as a verification LUN
and should not be placed in a storage group.
8. EMC CLARiiON Virtual Provisioning Best Practices
Virtual Provisioning provides for thin provisioning of LUNs. Thin LUNs present more storage
to an application than is physically available. The presentation of storage not physically
available avoids over-provisioning the storage system and under-utilizing its capacity. When a
thin LUN eventually requires additional physical storage, capacity is non-disruptively and
automatically added from a storage pool. In addition, the storage pool’s capacity can be non-
disruptively and incrementally added to with no effect on the pool’s thin LUNs.
Figure 13: Thin Provisioning
Recommendations for creating pools are as follows:
We recommend Fibre Channel hard drives for thin storage pools due to their overall
higher performance and availability.
Create pools using storage devices that are the same type, speed, and size. In
particular, keep Fibre Channel and SATA hard drives in separate pools.
Usually, it is better to use the RAID 5 level for pools. It provides the highest user data
capacity per number of pool storage devices.
Use RAID 6 if the pool is composed of SATA drives and will eventually exceed a total
of 80 drives. Pools made up of large capacity (>500 GB) drives should use RAID 6.
Initially provision the pool with the largest number of hard drives as is practical. For
RAID 5 pools, the initial drive allocation should be at least five drives and a quantity
2010 EMC Proven Professional Knowledge Sharing 41
evenly divisible by five. RAID 6 pool initial allocations should be evenly divisible by
eight.
• If you specify 15 drives for a RAID 5 pool, Virtual Provisioning creates three 5-
drive (4+1) RAID groups. This is optimal provisioning.
In a thin LUN pool, the subscribed capacity is the amount of capacity that has been
assigned to LUNs. When designing your system, make sure that the expected
subscribed capacity does not exceed the capacity that is provided by the maximum
number of drives allowed in a storage system’s pool.
Ffigure 14 shows the options to be selected to create a storage pool from Navisphere
Manager.
Figure 14: Creating Storage Pools
Expanding storage pools
For best performance expand storage pools infrequently, maintain the original character of
the pool’s storage devices, and make the largest practical expansions.
Following are recommendations for expanding pools:
Adjust the % Full Threshold parameter (default is 70%) to the pool size and the rate
applications are consuming capacity. A pool with only a few small capacity drives will
quickly consume its available capacity. For this type of pool you should have lower
2010 EMC Proven Professional Knowledge Sharing 42
alerting thresholds. For larger pools slowly consuming capacity you should use higher
thresholds. For example, for the largest pools, a good initial % Full Threshold
parameter value is 85%.
Expand the pool using the same type and same speed hard drives used in the original
pool.
Expand the pool in large increments. For RAID level 5 pools, use increments of drives
evenly divisible by five, not less than five. RAID 6 pools should be expanded using
eight-drive evenly divisible increments.
Creating thin LUNs
The largest capacity thin LUN that can be created is 14 TB.
The number of thin LUNs created on the storage system subtracts from the storage
system’s total LUN hosting budget.
Avoid trespassing thin LUNs. Changing thin LUN SP ownership may adversely affect
performance. After a thin LUN trespass, a thin LUN’s private information remains
under control of the original owning SP. This will cause the trespassed LUN’s I/Os to
continue to be handled by the original owning SP. This results in both SPs being used
in handling the I/Os. Involving both SPs in an I/O increases the time used to complete
an I/O.
When planning to use a thin LUN in a bandwidth-intensive workload, the required
storage for the thin LUN should be pre-allocated. This pre-allocation results in
sequential addressing within the pool’s thin LUN ensuring high bandwidth
performance. Pre-allocation can be performed in several ways including migrating
from a traditional LUN; performing a full format of the file system, performing a file
write from within the host file system; or creating a single Oracle table from within the
host application. In addition, only one concurrent pre-allocation per storage pool
should be performed at any one time. More than one thin LUN per pool being
concurrently pre-allocated can reduce overall SP performance.
There is a fixed capacity overhead associated with each thin LUN created in the pool.
Take into account the number of LUNs anticipated to be created, particularly with
small allocated capacity pools.
A thin LUN is composed of metadata and user data, both of which come from the
storage pool. A thin LUN’s metadata is a capacity overhead that subtracts from the
pool’s user data capacity. Any size thin LUN will consume about 3 GB of pool
capacity: slightly more than 1 GB of capacity for metadata, an initial 1 GB of pool
capacity for user data. An additional 1 GB of pool capacity is prefetched before the
2010 EMC Proven Professional Knowledge Sharing 43
first GB is consumed in anticipation of more usage. This totals about 3 GB. This
allocation of metadata remains about the same from the smallest to the largest (>2 TB
host-dependent) LUNs. Additional metadata is allocated from the first 1 GB of user
data as the LUN’s user capacity increases.
To estimate the capacity consumed, follow this rule of thumb:
Consumed capacity = (User Consumed Capacity * 0.02) + 3GB.
Plan ahead for metadata capacity usage when provisioning the pool. With small capacity
pools, the percentage of capacity used by metadata may be large. Create pools with enough
initial capacity to account for metadata usage and any initial user data for the planned
number of LUNs.
9. EMC CLARiiON Drive Spin Down Technology
Disk-drive Spin Down conserves power by spinning down drives in a RAID group when the
RAID group is not accessed for 30 minutes, and allowing the drives to enter an idle state. In
the idle state, the drives do not rotate and thus use less power. (A RAID group that is idle for
30 minutes or longer uses 60 percent less electricity.) Figure 15 shows how to enable spin
drive technology from Navisphere Manager when creating a RAID Group.
Figure 15: Drive Spin Down
When an I/O request is made to a LUN whose drives are in spin down (idle) mode, the drives
must spin up before the I/O request can be executed. A RAID group can be on idle state for
any length of time. The storage system periodically verifies that idle RAID groups are ready
2010 EMC Proven Professional Knowledge Sharing 44
for full-powered operation. RAID groups failing the verification are rebuilt. Spin Down can be
configured at either the storage system or the individual RAID group level. We recommend
the storage system level. Storage-system level Spin Down will automatically put unbound
drives and hot spares into idle.
Spin Down is recommended for storage systems that support development, test, and training
because these hosts tend to be idle at night. Spin Down is also recommended for storage
systems that back up hosts. A host application will see an increased response time for the
first I/O request to a LUN with RAID group(s) in standby. It takes less then two minutes for the
drives to spin up. The storage system administrator must consider this and the ability of the
application to wait when deciding to enable the disk-drive Spin Down feature in a RAID group.
10. Aligning File System
File System Fragmentation
Fragmented File Systems decreases the opportunity for sequential I/O, which reduces overall
throughput. Therefore, the file systems should be defragmented at a fixed interval of time
(maybe a month) using host utilities. Note: if the file system is NTFS, the file system cannot
be formatted at anything but default extent size.
File System Alignment affects performance in two ways:
• Misalignment causes disk crossings , i.e. an I/O broken across two disk drives.
• Misalignment makes it hard to stripe-align large uncached writes.
Figure 16: Disk Crossing
Figure 16 depicts a single 64 KB I/O split across two disk drives causing the write
operation to access two disk drives for a write operation to be completed. Similarly, for a
read operation, two disk drives need to be accessed for completing the write operation.
In an aligned system, the 64 KB write would have been serviced by a single disk drive.
2010 EMC Proven Professional Knowledge Sharing 45
It is recommended that operating system disk utility should be used to adjust partitions.
For Oracle and OLTP applications, the volume manager stripe element should be set to
the CLARiiON Stripe Size, typically 128 KB or 512 KB.
Linux file-system alignment procedure
The following procedure using fdisk may be used to create a single aligned partition on a
second Linux file sda or sdc file system LUN utilizing all of the LUN’s available capacity. In
this example, this partition will be:
/dev/nativedevicename.
The procedure is:
fdisk /dev/nativedevicename # sda and sdc
n # New partition
p # Primary
1 # Partition 1
<Enter> # 1st cylinder=1
<Enter> # Default for last cylinder
x # Expert mode
b # Starting block
1 # Partition 1
128 # Stripe element = 128
w # Write
C. Post Implementation Phase
Following the implementation phase, we need to monitor the SAN continuously to rectify any
problem before it becomes crucial. The status lights on the CLARiiON Storage System can
be used to monitor the CLARiiON array and check the health of CLARiiON system.
2010 EMC Proven Professional Knowledge Sharing 46
1. Health Checkup
DAE Checkup
Figure 17: DAE Checkup
LCC Checkup
Figure 18: LCC Checkup
Disk Module Checkup
2010 EMC Proven Professional Knowledge Sharing 47
Figure 19: Disk Module Checkup
Power Supply Checkup
Figure 20: DAE Checkup
SPE Checkup
Figure 21: SPE Checkup
2010 EMC Proven Professional Knowledge Sharing 48
Figure 22: SPE Checkup
Monitor the following parameters on a daily basis inside the data center so that they are kept
well within the operating limits of EMC CLARiiON.
Table 13: Data center Environment Requirements
2. Performance Monitoring
We can use Navisphere Analyzer to monitor CLARiiON performance. We need to enable the
logging on the respective Storage Processors of the CLARiiON by right clicking on the SP
and then choosing the Enable the Statistical Logging tab.
It will take some time for nar files to be made so that the actual data can be collected about
the performance over a period of period of time. Ideally, nar files should be retrieved after
enabling the logging for 2-3 days to get the actual information about the performance of the
2010 EMC Proven Professional Knowledge Sharing 49
CLARiiON. Then we need to retrieve the nar files as shown in Figure 23 by going to Tools
Analyzer Archive Retrieve.
Figure 23: Navisphere Analyzer
Open the nar file for analysis by going in the Tools Analyzer Open option and selecting the
recently retrieved nar file. Now we can see the different parameters related to each SP for
individual LUNs as well as the cumulative report for all the LUNS.
Parameters that can be monitored include:
Utilization
Queue Length
Total Throughput (IO/s)
Read Bandwidth
Read Size
Read Throughput
Response Time (ms)
Write Size
Write Throughput
2010 EMC Proven Professional Knowledge Sharing 50
Figure 24 shows the options that can be chosen and the respective graph so that we can see
whether or not the CLARiiON is optimally utilized.
Figure 24: Navisphere Analyzer
We can see that the maximum IOPS being serviced by SPA is 6267 IOPS. We can zoom the
graph to also see the LUN IOPS and similarly can choose the different parameters from the
left-hand window so see their respective values. This will help us determine the actual status
of our EMC CLARiiON.
TROUBLESHOOTING
Troubleshooting a SAN is complex, but you can save yourself a lot of work if you do two
things. First, verify that you have a SAN issue and not a generic storage issue. Second, begin
the troubleshooting process at the center of the SAN so that you can quickly locate the
general area of the problem.
When you're troubleshooting a SAN, you'll find that most problems aren't actually related to
the SAN. Suppose that you're suddenly unable to read data from the SCSI disk on your
standalone PC. Several things could be causing the problem. The hard disk might have gone
out. Maybe you've got a bad cable or a bad disk controller. Maybe the data on the drive has
2010 EMC Proven Professional Knowledge Sharing 51
been accidentally erased, or the partition has been deleted or corrupted. Just because you
can't access your data does not mean that a hardware failure has occurred.
Let's look at this same situation in the context of a SAN. A SAN is basically a way of linking a
server to a logical device on a disk array or some other storage mechanism. The SAN works
by allowing the server to communicate with the storage device using SCSI commands.
Suppose the server is suddenly unable to read data off the SAN. You may have a SAN
problem, but the problem might be not to the SAN but to the data itself. It could be that
connectivity is functional between the server and the storage unit, but that the data has been
erased, corrupted, or disassociated with the server. In that case, you'd troubleshoot the
problem the same way you would if the storage mechanism were directly attached to your
server.
But what if the SAN were the problem? Your best strategy is to start the troubleshooting
process in the center of the SAN and work out toward the edges.
Step 1: Start troubleshooting at the fabric level. The reason for this is that the switches are
located in the center of your SAN and should have connectivity to both the server and to the
storage device.
Verify that the switch can communicate with the server and the storage device. If you can
verify communications, you can rule out the fiber as being at fault. While examining the fiber,
you should look for things like unstable links, missing devices, incorrect zoning
configurations, and incorrect switch configurations.
Step 2: Use diagnostic software to test switch connection. This will verify whether the storage
device is connected to the switch. If not, you know the problem has to do with the storage
device. It may be a physical connection issue between the switch and the storage device, or it
could be that the storage software configuration is incorrect.
If the switch can communicate with the storage device, but the server can't, then you know
that the problem lies somewhere between the switch and the server. This is why you start
troubleshooting at the center of the SAN. A few simple tests and you eliminate half of the
SAN as a possible cause of the problem (either the server side or the storage side of the
network).
Step 3: If the problem lies with server and switch, check out these possible causes. If you do
determine that the problem is between the server and the switch, you've got your work cut out
for you.
2010 EMC Proven Professional Knowledge Sharing 52
Possible causes of the problem are a bad host bus adapter or a missing or incorrectly
configured driver. The problem may also be related to the way that your server is configured
to access the virtual storage device. You can start by using your hardware manufacturer's
diagnostic utility. You can also run a protocol analyzer to verify that the network interface card
(NIC) is functional and that the driver is working. If the NIC appears to be functional, then the
problem almost has to be configuration related.
Many times it’s found that we encounter the host agent not reachable error in the Navisphere
Manager and host is displayed as unmanaged. For troubleshooting, we can check the
following:
Ensure that the Agent IP address listed in Navisphere is routable from the CLARiiON
array. To test this, try to ping the host from the array. If the ping responds, the Agent
service is not running. Restart the Agent service on the host if it is installed on the
host.
Verify that port 6389 is open if there exists a Hardware firewall and, in case of
Software Firewall such as Windows Firewall, add port 6389 as an exception.
Restart Management service if the host IP has been changed.
The host NIC settings should be set on auto negotiate.
2010 EMC Proven Professional Knowledge Sharing 53
References
www.google.com
www.brocade.com
www.cisco.com
www.emc.com
www.emulex.com
www.storagewiki.com