Enterprise Storage Architecture – Planning

40
© 2013 SAP AG Enterprise Storage Architecture – Planning Technical Infrastructure for HANA Raiffeisenring 44 D-68789 St. Leon-Rot SAP Active Global Support July 2013 Version 2.2

description

HANA TDI - Enterprise Storage

Transcript of Enterprise Storage Architecture – Planning

Page 1: Enterprise Storage Architecture – Planning

© 2013 SAP AG

Enterprise Storage Architecture –Planning

Technical Infrastructure for HANA

Raiffeisenring 44D-68789 St. Leon-Rot

SAP Active Global SupportJuly 2013

Version 2.2

Page 2: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 2/40

Table of Contents1 Introduction 3

1.1 SAP’s vantage point towards an Enterprise Storage Architecture 31.2 Main issues tackled by SAP services for Storage Infrastructures 41.3 Goals of this Document 4

2 Shared Storage Infrastructure 52.1 Why not use shared Storage Systems for SAP HANA? 8

3 Storage Systems 93.1 Frontend Adapter 93.2 Processor Complex 103.3 Disk Adapter 113.4 Physical Disks 113.5 Redundant Array of Independent Disks (RAID) 123.6 How Storage Systems provide storage for Host Systems 13

4 Interconnection of Storage- and Host Systems 144.1 Fiber Channel Protocol 144.2 Storage Network 15

5 Managing Storage on Host Systems 165.1 Logical Volume Manager 175.2 File System 195.3 Cluster File System 205.4 Database System 215.5 Storage management on Host Systems is crucial for optimal I/O Performance 22

6 The optimal configuration of the I/O Stack 236.1 Threefold Striping 236.2 Managing the Data Growth 246.3 Assessment of the suggested I/O Stack Configuration 25

7 Monitoring the I/O Stack 267.1 Configuration Analysis 267.2 Performance Analysis 277.3 Summary – Monitoring the I/O Stack 29

8 Implementing SAP HANA on shared Enterprise Storage 308.1 HANA Persistence Layer 308.2 HANA I/O Pattern 318.3 HANA Storage Connector API 338.4 HANA on an optimal configured I/O Stack 34

9 References 39

Page 3: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 3/40

1 Introduction

Storage – precisely the storage system – is the common denominator for all SAP systems with theirrequirement for disk space to store their databases and files.

In the technical infrastructure, storage systems must be considered as dedicated computers with special-purpose disk storage management functions, specifically designed to consolidate the storage demands ofmultiple hosts. Storage is presented to the host systems in the form of Logical Units (called LUNs) or files,whereas the physical layers remain completely hidden. The functions of storage systems include:

Reliability and Availability: All components of the storage systems are redundant, so that in case ofa malfunction, the remaining component can take over the work. The data on the physical disks isprotected by redundancy (RAID - Redundant Array of Independent Disks). This means that thestorage systems always keeps redundant data to reconstruct the data in case of a failed disk.

Scalability: In case of increasing storage demand, the capacity of the storage systems can easily beexpanded without impact on operation.

Serviceability: Specifically designed tools are provided to ease storage management functions, suchas capacity expansion or monitoring system components.

Performance: The storage system’s capability to stripe data across multiple physical disks —together with memory-cached I/O, and special-purpose CPUs for disk-to-cache (disk adapter) as wellas cache-to-host interfaces (host adapter) – offer business-critical high-I/O throughput.

Storage systems provide “fast copy functions” available for data copies within the same (local) andbetween (remote) storage systems, enabling the implementation of solutions for data highavailability, system (landscape) cloning and backup/restore.

1.1 SAP’s vantage point towards an Enterprise Storage Architecture

Basically all storage systems with the above described properties are suitable for SAP systems, regardless ofthe deployed disk type (Hard Disk Drives (HDDs) with rapidly rotating discs (platters) coated with magneticmaterial, Solid State Disks (SSDs) without any moving mechanical components, using flash memory to storedata), transmission technology (Fiber Channel, Serial Attached SCSI, SATA), RAID protection (level 5, 6,10), the interconnection architecture of disk adapters, cache and host adapters or the protocols used toattach storage to hosts. Due to the selected technologies all storage systems will definitely have limitationswith regard to their I/O performance capabilities, but depending on a storage sizing, considering capacity andperformance requirements, the storage vendors will provide suitable storage systems.

Due to the reasonable - since cost efficient - consolidation of databases on common resources, it is essentialto balance all data across all storage system components to achieve optimal performance. No matterhow many storage systems are used, the best performance can only be achieved if all available storagesystem components are evenly used.

Page 4: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 4/40

1.2 Main issues tackled by SAP services for Storage Infrastructures

Storage infrastructure does not fulfill I/O performance requirements – The provided storagefulfills capacity requirements in terms of space needed to store all data, but the storage systemcomponents such as physical disks, disk adapters, cache or storage system adapters to the networkare not sufficient to fulfill I/O performance requirements.

Key I/O performance indicators (KPIs) are throughput (measured in I/O operations per second(IOPS)), bandwidth (measured in transferred MB per second (MB/s)) or latency (the time thatelapses until the successful completion of an I/O operation, measured in milliseconds (ms)).

Insufficient I/O performance for some SAP systems – Although the provided storage systemsfulfill the performance requirements of the storage sizing, some SAP systems suffer from insufficientI/O performance.I/O performance deterioration after data growth – Customers start SAP operation with an initiallyoptimal storage system configuration, but with growth of data the I/O performance deteriorates.Configuration and performance analysis of the complete I/O stack needed – In case businessusers of SAP systems suspect the storage infrastructure to be the culprit for not satisfying overallperformance, be prepared to conduct an analysis of the entire I/O stack – on database system,operating system and storage system level – consisting of configuration and performance analyses.What is an optimal configuration of the I/O Stack? – Customers are planning to redeploy theirstorage systems and are looking for suggestions how to optimally configure the I/O stack.

1.3 Goals of this Document

With this document we would like to establish a common understanding with regard to the building blocks andphysical components of an Enterprise Storage Architecture. If necessary, we will introduce a “normalized”naming of Enterprise Storage components to ease the comparison of certain vendor solutions.

It is not our intention to explain in detail the ever evolving technologies and hardware components that aredeployed in state of the art storage infrastructures. For those who are interested in a “deeper dive”, weprovide references to enable them to enter into these broad themes.

Since in SAP Business Suite Systems most of the I/O operations are performed by the database server, wewill focus on this type of server.

The employment with the above described main issues leads to topics that need to be carefully consideredduring planning of the Enterprise Storage Architecture:

Determine design guidelines for the I/O stack that are optimal regarding I/O performance andmanagement of system growth – these guidelines are provided in chapter “6 The optimal configuration ofthe I/O Stack”.

Establish a monitoring infrastructure for the key components of the storage systems as well as for theentire I/O stack – refer to chapter “7 Monitoring the I/O Stack”.

Feedback is very welcome, please send it to [email protected] .

Page 5: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 5/40

2 Shared Storage Infrastructure

To illustrate the advantages for a common use of storage resources by systems of the SAP Business Suite,see the following graphic.

One approach for providing an IT infrastructure for the SAP Business Suite could be one host system foreach Business Suite system. Each Host System is equipped with sufficient CPU power, Main Memory andStorage (SSD and HDD) recources to fulfill the needs of the hosted Business Suite system.

With regard to CPU power and Main Memory resources it is undoubtedly the right decision to run thedatabase services for each productive Business Suite system on its own Host system to get bestperformance. The Application Servers as well as the non-productive Database Servers may share CPUpower and Main Memory resources of Host Systems – this consolidation will be provided by ServerVitualization Technologies.

Key property of Virtualization Technologies is among others the efficient use of the acquired physicalresources, which will be achieved by provisioning of virtual – rather than actual – resources. Why not applyingthe same concept for storage – hiding all physical layers and taking advantage of storage technologies tosolve typical challenges such as managing the system growth or performance bottlenecks?

Page 6: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 6/40

Take advantage of a shared Storage Infrastructure

Managing the system growth might be a challenge in an IT infrastructure where each Host System has itsown storage. Consider that a Business Suite system on one Host may be out of storage space, while systemson other hosts do not need all provided space. In a non shared storage environment, access to storagerecources on other hosts can be enabled through a network file system, but since this will most likely increaseI/O latency, additional storage will generally be provided on the host of the requesting Business Suite system.The latter leads to an inefficient utilization of purchased resources.

Whenever additional storage resources will be provided, this provision must not interrupt the availability of theBusiness Suite system. Since simply adding physical disks may lead to an accumulation of new data on thenew disks, the management of system growth must consider a balanced distribution of data across allavailable disks to avoid hot-spots.

Performance bottlenecks on storage level will occur whenever single components such as SSDs or HDDsare overstrained. As described above, this may be caused by an unbalanced distribution of data or byinsuficient resources. The latter is the case when, for example, customers decide to purchase a few high-capacity disks instead of more smaller disks to fulfill storage needs.

Hiding the physical storage layers from the application will simplify the storage configuration. However,finding bottlenecks will get more complicated. In dependence on where – database or storage level – aperformance bottleneck was found, questions must be answered such as: “On which physical disk is a certaindatabase object stored?” or “Which database objects are stored on overstrained physical disks?”. This showsthat the analysis of I/O bottlenecks not only requires performance-monitors for storage components, but alsomethods that determine how database objects are configured on the I/O stack.

Service Level Agreements such as constant availability and non-disruptive backup are typical for an ITinfrastructure that is used for SAP Business suite systems. Availability is the main feature of storagesystems and will be accomplished by redundancy: all components of a storage system are redundant, so thatin case of a malfunction, the remaining component can take over the work. The data on the physical disks forexample is protected by redundancy (RAID - Redundant Array of Independent Disks). Storage systemsprovide “fast copy functions” available for data copies within the same (local) and between (remote) storagesystems, enabling the implementation of solutions for data high availability, disaster protection andbackup/restore.

Bottom line

For all systems that share the same storage resources, Storage Systems provide features that help tomanage the system growth as well as I/O performance. Moreover, they provide properties to enableimplementing solutions for data high availability, disaster protection and backup/restore. We thereforerecommend shared Storage Systems for all SAP Business Suite Systems.

Page 7: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 7/40

Without a doubt, the above illustration shows a very simplified storage system, but it depicts the basic ideasof shared storage systems. For all SAP Business Suite systems storage is provided here in the form ofLogical Units (called LUNs). The LUNs are provided from Storage Pools and each LUN is evenly distributedacross all physical disks belonging to a Storage Pool. The figure above shows one Storage Pool containingHDDs and SSDs, representing a newer feature of Storage Systems – the “Storage Tiering”.

Each tier of storage consists of physical disks with the same performance characteristics. For StorageTiering at least two types of physical disks with different performance characteristics are necessary, such asSSD and HDD. Storage Tiering provides an automatic relocation of most frequently used data on physicaldisks with highest performance capabilities.

In the following chapter we will discuss in some more detail the architecture of a Storage System and itsbasic components. Essentially Storage Systems consist of Physical Disks that are connected via DiskAdapters and a Backplane with the Controllers. The Controllers provide among others, functions forredundant (RAID) data storage and therefore they are often called RAID-Controllers. In some publicationseven the entire Storage System is called RAID-Controller, but in this document we will use the term StorageSystem. All Storage Systems have in common that all Read and Write I/O operations pass through theCache (a main memory layer). Access to Storage System LUNs will be provided via Network Interfaces – theFrontend Adapters.

Page 8: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 8/40

2.1 Why not use shared Storage Systems for SAP HANA?Many SAP customers are already taking advantage of the features provided by shared storage systems. Theacquired storage resources are efficiently used by their business systems, so that each system gets therequired capacity and I/O performance. The storage resources are centrally managed and monitored, and ifcapacity or performance needs to be increased, additional resources will be added without interruptingsystem availability. To ensure business continuity, IT solutions have been implemented for high availabilityand disaster recovery that very often are based on specific capabilities of shared storage systems.

With the introduction of SAP HANA “complete IT landscapes” – consisting of CPU, memory, networkcomponents and storage – have been delivered to the customers. These Appliances have been configuredto precisely meet the customer's requirements and are equipped with components that perfectly fit together.The good thing about appliances is that the vendors for hardware and software use components, that arecoordinated and certified, while the customer can use the appliance easily – just “plug and play”.

From the perspective of data center integration, the “Appliance Solution” causes additional efforts.Established and reliable solutions for high availability and disaster recovery cannot be used if they are basedon specific capabilities of shared storage systems. The procurement of additional storage for the appliancemay be quite different than for shared storage systems and systems that don’t run on the Appliance have nobenefit. So a number of customers with a state-of-the art shared storage infrastructure prefer integrating SAPHANA into this existing infrastructure.

As of May 2013, SAP opened HANA for a tailored data center integration (see references for furtherinformation). SAP provides a procedure that allows the customer and his storage provider to check whetherhis current shared storage infrastructure is suitable for SAP HANA. If all I/O requirements are met, then theadvantages of a shared storage infrastructure can be used for SAP HANA.

Page 9: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 9/40

3 Storage Systems

Goal of this chapter is to describe a storage system architecture containing all components that need to beconsidered when determining design guidelines for the I/O stack and establishing a monitoring infrastructure.

3.1 Frontend Adapter

Frontend Adapters are devices that act as interfaces between the storage system and host systems,respectively a network between the hosts and storage system in case hosts are not directly attached to thestorage system. The corresponding devices on Host System level are the Host Bus Adapters (HBAs).

In accordance with the Open Systems Interconnection (OSI) model, both the Frontend Adapter and Host BusAdapter belong to the lowest, the physical layer. Their function is to convert the electrical signals of thestorage system (or host system) into serial signals that are passed through the network between storage andhost system.

The Frontend Adapter connects to the storage system’s inter component backplane. A Frontend Adapter canhave more than one port and for the management of I/O operations it consists of CPUs and memory. Intechnical specifications Frontend Adapters are characterized by their maximal bandwith which is specified inGiga Bits per second (Gb/s). Storage system monitors provide information about a currently achievedbandwidth to assess if a Frontend Adapter is overstrained – some storage system manufacturers additionallyprovide for this assessment the utilization of the Frontend Adapter CPUs.

There are several manufacturers of Frontend Adapters and the corresponding devices (HBAs) on the hostsystems, and when planning the storage infrastructure it is important to consider that these must be matched.

Page 10: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 10/40

Luckily, storage vendors take this task by testing in their labs the interoperability of all kinds of combinationsbetween these devices.

3.2 Processor Complex

The Processor Complex is the main component of an Enterprise Storage System and due to availabilityrequirements, this complex consists of redundant components. To explain this redundancy, we selected astorage system model, that is configured like a server cluster. The clusters consist of dedicated CPUs andmain memory (cache), and they are interconnected using high-speed data transmission technology, which ischaracterized by a low latency and high bandwidth.

Both clusters serve I/O requests and should one cluster not be available, host systems get access to theirdata using the remaining cluster.

All functions of a storage system are provided by the Micro-Code (so to speak, the operating system) that isoperated on each cluster system. These functions include for example: fast copy functions for data copieswithin the same (local) and between (remote) storage systems, algorithms for storing data with redundancy(RAID) and across many physical disks, provision of Logical Units (LUNs) and LUN-masking (meaningvisibility and accessibility of individual LUNs to specific hosts) or Cache algorithms to detect for examplesequential access patterns and start pre-fetching data ahead of I/O requests.

Cache is crucial for optimal I/O performance. Consider that ALL I/O operations pass through the cache!

READ I/O operations will be processed in the folowing basic steps: (1) The cache is queried to determine ifthe requested data is in cache. (2) If the data is not in cache, it will be retrieved from the physical disks andtransferred to the Cache (STAGED). (3) Data is sent from cache to the requesting application buffer.

WRITE I/O operations will be processed in the folowing basic steps: (1) Data is stored in the Cache. (2) Thestorage system signals the Host that the I/O has completed. (3) RAID algorithms will be processed and thedata will be transferred (asynchronously) from Cache to physical Disks (DESTAGED).

The latter means that WRITE I/O operations requested by a host have the biggest advantage of the cache,since the host can continue operation as soon as the data arrived in the cache and has been confirmed bythe storage system as successfully processed. Since data are transferred asynchronously from cache tophysical disk, the storage system provides features (cache is battery-buffered), so that no data will be losteven if the power fails.

Since WRITE and READ operations share the cache, storage systems allow that only a part of the cachemust be filled with modified data that were not yet written to disk – “ Write Pending Limit ”. Whenever this limitis reached, the storage system must switch to “deferred write” mode to destage the data – and this will causea considerable degradation of write performance. Storage system monitors provide information about thisperformance crucial cache property. To avoid deferred write situations for WRITE intensive database objects,such as DB system LOGs, it is recommended to distribute these objects across many physical disks.

Page 11: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 11/40

3.3 Disk Adapter

Disk Adapters provide the paths from the Processor Complex to the physical disks. For redundancy eachphysical disk is reachable via two different Disk Adapters. Data will be transferred on a network between DiskAdapter and physical disks, mainly using the Fiber Channel (FC) protocol – other common protocols areSATA (Serial Advanced Technology Attachment) or SAS (Serial Attached SCSI (=Small Computer SystemInterface)). The network topology is mainly a switched fabric or an arbitrated loop.

In an Arbitrated Loop topology data flows through all disks before arriving at either end of the Disk Adapter.The full loop is required to participate in data transfer, and loop stability can be affected by physical diskfailures.

In a switched Fabric topology the switches enable a direct data flow between Disk Adapter and physicaldisks (direct point-to-point links), therefore a failure on one physical disk will not impair the data transfer to theremaining disks.

Like Frontend Adapters the Disk Adapters are characterized by their maximal bandwith which is specified inGiga Bits per second (Gb/s). Storage system monitors provide information about a currently achievedbandwidth to assess if a Disk Adapter is overstrained.

3.4 Physical Disks

Physical Disks – as depicted in the storage system model above – are Hard Disk Drives (HDDs) with a stackof rapidly rotating discs (platters) coated with magnetic material.

Since manufactures agreed on certain size factors for physical disks (5,25; 3,5 or 2,5 inches), mainly thedevelopment of the Areal Density will contribute to higher capacity of physical disks. As the “outer” tracks ona platter can hold more data than the “inner” tracks, the algorithms used to store data, first fill the outer tracksto optimize the sustained Transfer Rate (measured in MB/s) – high sustained Transfer Rates will be beneficialfor sequential I/O patterns.

The Access Time (measured in milliseconds (ms)) of random I/O operations depends on (1) Latency, thetime it takes the disk to rotate until the data appear beneath the READ/WRITE head, (2) Seek Time, the timeit takes the READ/WRITE head to be positioned on the right track and (3) Transfer Time, the time needed totransfer data between the physical disk and the cache.

Between the platters are the disk-arms and at their top is aREAD/WRITE head, that can be positioned on the Tracksof a platter. The READ/WRITE heads have always thesame distance from the spindle, and all tracks beneath thecurrent READ/WRITE head position are called Cylinder.

The capacity (measured in Giga Bytes (GB)) of a physicaldisk depends on (1) the number of platters, (2) the numberof tracks per platter (Track Density) and (3) the number ofBits of data that fit on a given surface area on a platter(Areal Density).

Page 12: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 12/40

Currently used physical disks have a rotational speed of 15000 rpm (rotations per minute). Assuming that thedata will in average appear beneath the READ/WRITE head after a half rotation, the Latency will be 2 ms (=1 minute / rotational speed / 2 = 60000 ms/ 15000 rpm / 2). The technical specifications of a certain physicaldisk provide the Seek Time, and currently installed disks have a Seek Time of 3 ms. The Transfer Time istypically on the order of a fraction of a millisecond. Assuming a Transfer Time of zero, a single physical diskcan serve about 170 I/O per second (1 s / (Latency + Seek Time) = 1000 ms / 2 ms/IO / 3 ms/IO).

Although manufacturers attempt to improve the throughput (IOPS), sustained Transfer Rate (MB/s) andAccess Time for a physical disk by adding cache to a disk, this is not sufficient to meet the needs that arerequested from SAP Business Suite systems. Even if the entire physical disk consists of flash memory –without any moving mechanical components – such as Solid State Disks (SSDs), single disks will most likelynot meet the I/O performance needs.

3.5 Redundant Array of Independent Disks (RAID)

The introduction of the Redundant Array of Independent Disks (RAID) concept helped to overcome theperformance limitations of single physical disks.

RAID 0 (Striping) is the concept for increasing the I/O performance. A certain number of physical disks willbe grouped to one disk array, and physical storage is provided as LUNs (Logical Units), which are evenlydistributed across the disk array. Data which are transmitted to a LUN with one I/O operation, are broken intoas many pieces of equal size as there are disks within a group, and these pieces in turn are evenlydistributed across all physical disks. Although the I/O performance is increased by a factor equal to thenumber of disks in a group, this improvement must be paid dearly: In case one disk of a stripe-set fails, alldata will be lost!

RAID 10 (Mirroring and Striping) combines data protection and performance improvement. First, severalmirror pairs of two physical disks are created and then data will be striped across these pairs. If not an entiremirror pair fails, up to half of all disks of a disk array can fail until all data will be lost. Read I/O operations on aLUN provided from a RAID 10 disk array will be served from all disks and Write I/O operations only on half ofall disks. Due to the mirroring the usable storage capacity is only half of the procured capacity – 50%capacity is used for data protection.

RAID 5 (Parity Protection and Striping) is a compromise between costly data protection and performanceimprovement. Only the storage capacity of one physical disk is used for redundant information – meaning thedisk array consists of n+1 physical disks. All data transferred to a LUN of a RAID 5 disk array are broken inton pieces of equal size and a “Parityblock” (the redundant information needed to rebuild data) is created.Finally, all data pieces and the Parityblock are evenly distributed across all physical disks. In case onephysical disks fails, all data can be rebuild using the information stored on the remaining n physical disks. In aRAID 5 protected disk array consisting of n Data + 1 Parity disks, ((1 – n / (n+1) ) * 100) is the percentage ofcapacity used for protection – for ex. in a RAID 5 (7+1) configured disk array 12,5% capacity is used for dataprotection.

Page 13: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 13/40

The above described RAID levels are the most common. However, there are many other levels that are notcovered in this document. Levels that are not described include: RAID 3, 4, 6 (other types of parity protection)or nested (hybrid) types, such as RAID 51.

Basically all RAID levels that provide data protection are suitable SAP Business Suite systems. Theperformance capabilities of a RAID level depend on a manufacturer specific implementation.

3.6 How Storage Systems provide storage for Host Systems

The LUNs, which are provided from RAID configured disk arrays, can be directly supplied to Host Systemsusing block-based transmission protocols such as Fibre Channel (FC), SCSI, iSCSI, ATA over Ethernet(AoE), or HyperSCSI (bypasses the internet protocol suite (TCP/IP) and works directly over Ethernet). AStorage Infrastructure that provides storage as devices (LUNs), using a block-based protocol is calledStorage Area Network (SAN). Since mainly the FC protocol is used in these infrastructures, Fiber Channelhas become a synonym for SAN.

Alternatively, storage can be provided to Host Systems using file-based transmission protocols such as NFS(Network File System, popular on UNIX systems) or SMB/CIFS (Server Message Block/Common Internet FileSystem, used with MS Windows systems). A Storage Infrastructure that provides storage as files, using a file-based protocol is called Network Attached Storage (NAS).

The conceptual difference between NAS and SAN is that NAS appears to the client operating system (OS) asa file server, whereas a device (LUN) available through a SAN still appears to the client OS as a device. Thestorage system devices are managed either on storage system level (in case of NAS) or on operating systemlevel (in case of SAN) by utilities for Logical Volume Management (LVM) and File System (FS) Management.We will explain these levels of the I/O stack in chapter: Managing Storage on Host Systems.

Page 14: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 14/40

4 Interconnection of Storage- and Host Systems

As mentioned in the introduction, we focus in this document on host systems that provide database servicesand that are interconnected with storage systems using a dedicated network. Unlike networks with equivalentsystems: server network (client to server, server to server), in a storage network not equivalent host andshared storage systems communicate with each other, and each host considers storage as his privateproperty. Both network types may use the similar hardware such as copper or optical cable for datatransmission, a similar topology and similar network components, but they differ in the used datatransmission protocols.

4.1 Fiber Channel Protocol

To standardize the communication in networks, the International Organization for Standardization (ISO)provided the Open Systems Interconnection (OSI) model. It is a way of subdividing a communication systeminto smaller parts called layers. Similar communication functions are grouped into logical layers. A layerprovides services to its upper layer while it receives services from the layer below.

Since mainly the Fiber Channel Protocol (FCP) is used in storage networks, we will briefly introduce the OSImodel and relate the Fiber Channel (FC) layers to the OSI layers.

The OSI Physical Layer (FC-0) defines electrical and physical specifications for devices (FrontendAdapter, Switches, Host Bus Adapter). In particular, it defines the relationship between a device and atransmission medium, such as a copper or optical cable. This relationship includes the layout of pins,voltages, cable specifications, and more. On a Fiber Channel data are transmitted serial – when writing thisdocument, a speed (bandwidth) of 4 or 8 Gb/s were standard.

The OSI Data Link Layer (FC-1) provides functions and procedures to transfer data between networkentities. This layer also detects and corrects errors that might occur in the Physical Layer. These functionsare enabled by signal encoding: Signals that are transmitted serial, never reach in identical intervals,therefore the receiver must synchronize regularly with the transmitter. For this synchronisatzion, the datastream clock can be used. In all serial transmission techniques the data stream clock can be derived due to aspecial encoding, e.g. each Byte (8 Bit) will be translated in a 10-Bit-character (8b/10b encoding). Thisexplains that a 8 Gb/s FC Adapter never reaches a bandwidth of 1 Giga Bytes per second (GB/s), but atmaximum 20% less. Manufacturers work on a encoding with less overhead.

The OSI Network Layer (FC-2) provides functions and procedures for transferring variable length datasequences. Large sequences are divided by the transmitter into several frames. A sequence is provided tothe next higher layer, after all frames have arrived. If a frame gets lost, todays FC networks transmit again theentire sequence. Since a transmitter may only transmit as many frames as the receiver can process (“percredit method”), it is important to coordinate the credits between HBA and Frontend Adapter (End-to-Endcredits) or between Switch and HBA/Frontend Adapter (buffer-to-buffer credits).

The OSI Transport Layer (FC-3) provides transparent transfer of data between users and additionalservices to the upper layers, such as multipathing, striping, mirroring, compression and encryption. TodaysFC products do not provide these additional services. Currently software beyond the Fiber Channel Protocolstack and switches provide for example multipathing features.

Page 15: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 15/40

The upper OSI layers (FC-4) (the Upper Layer Protocols) define which protocol is used to exchanged databetween devices. A Fiber Channel network (layers FC-0 up to FC-3) can be used for multiple protocols, suchas SCSI, Internet Protocol (IP) or FICON. The protocol for SCSI (=Small Computer System Interface) on aFiber Channel network is called Fiber Channel Protocol (FCP).

4.2 Storage Network

The interconnection of Storage- and Host Systems will be provided by a storage network consisting of afabric of interface devices (Frontend Adapter, Host Bus Adapter) and Switches, that couple together networksegments via optical cables. The data is transmitted according to the specifications of the selected protocol.

For each database host system access paths between a HBA port and a Frontend Adapter port will bedefined – this is called “Zoning”. In case multiple paths are defined to access storage, storage systemmanufacturers offer a “Multipathing Software”, providing access to storage should one path fail as well asI/O workload balancing. The Multipating Software (driver) has to be installed on the host operating system.

Each host considers the access paths and underlying hardware as its private property. This means that eachhost has point-to-point connections from its HBA ports to Frontend Adapter ports, and – as depicted above –in case a switch or Frontend Adapter port has a malfunction, access will be provided through an alternatepath, but in case Host Systems share Frontend Adapter ports they may impair each other. This means thatthe Storage Network for SAP Business Suite systems with highest I/O performance requirements should beconfigured in a way that these systems do not share Frontend Adapter ports.

Page 16: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 16/40

5 Managing Storage on Host Systems

In the following we assume that storage is provided to the Operating System of a Database Host as devices(LUNs) using a block-based protocol – that is storage from a Storage Area Network (SAN). On OperatingSystem level we will use for the LUNs provided by the SAN the more general term Host Devices.

In the previous chapter we learned that the Host Bus Adapter can buffer data (“frames”, see OSI NetworkLayer), and that the number of buffered frames must be coordinated with the capabilities of the networkdevice connected with the HBA (buffer-to-buffer credits). Manufacturers of Host Bus Adapters provide thesoftware for managing I/O operations on HBA level – the HBA Driver.

There is as well a “driver” for the management of I/O operations on Host Devices, and the “Host DeviceDriver” manages for each Host Device a buffer that can queue I/O operations requested by a application.Since many Host Devices share a HBA, their queue length must be coordinated with buffer capabilities ofthe HBA.

Both, the I/O buffers on HBA and Host Device level are crucial for I/O performance and from databaseapplication point of view, the more of each buffers are available for I/O operations on a database file, thebetter will be the I/O performance.

Distributing the I/O requests for a Host Device across multiple HBAs will provide more HBA buffers – this willbe achieved by “I/O Multipathing”. The I/O Multipathing Driver provides access to SAN storage should onepath fail as well as I/O workload balancing across all available paths.

The number of Host Device queues available for I/O operations on DB files can be increased by distributingthese objects across multiple Host Devices, and this will be achieved by the Logical Volume Manager.

Page 17: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 17/40

5.1 Logical Volume Manager

Like the Storage System, the Logical Volume Manager (LVM) provides methods for increasing I/Operformance and data protection. Since Storage System RAID configurations perfectly combine dataprotection and performance improvement, this is very often considered as sufficient and the LVM capabilitieswill be left aside.

Yes, the storage system redundancy algorithms (RAID) are undoubtedly sufficient for data protection, butonly the additional striping of data across Host Devices will enable the utilization of multiple Host Devicequeues. To achieve this, Host Devices will be combined in a Volume Group (VG) and Logical Volumes(LVs) will be created that are distributed across the Host Devices belonging to the VG.

When creating Logical Volumes in a Volume Group, storage will be allocated on the Host Devices in“contiguous chunks” (often called “partitions” or “extents”), having a size of 1 MB and up to several GBs. Thissize is defined per Volume Group, meaning that the “chunks” of all Logical Volumes in a VG have the samesize.

Only if these chunks are striped allocated (next chunk on next Host Device), data will be balanced across allHost Devices of a Volume Group and consequently I/O requests can take advantage of multiple Host Devicequeues. If all storage of a Volume Group is exhausted and more is needed due to database growth, theVolume Group must be extended by multiple Host Devices to keep a balanced distribution of LogicalVolumes and a utilization of multiple Host Device queues. Ideally, the Volume Group should always beextended with the same number of Host Devices, for example as many as initially used. We suggest to createstriped allocated Logical Volumes for database files containing data and indexes.

Another way to allocate the chunks of storage is to use first the entire storage of one Host Device, beforeallocating storage on the next Host Device of a VG. This concatenated allocation may lead to anaccumulation of most frequently used data on a single Host Device and the I/O performance will be limited tothe performance of one Host Device. The concatenated allocated Logical Volumes are inappropriate for DBobjects of SAP Business Suite systems.

For some database objects, such as the performance crucial Redo LOGs, the granularity of chunks allocated

on Host Devices (> 1MB) will not be sufficient, since the Redo LOG Write I/O blocks are often smaller than

the allocated chunks and therefore the I/O performance might be limited to the performance (queue) of oneHost Device. Luckily the LVM provides for LVs as well a “striping of blocks”, each block having a size of 4KB and up to several MBs, helping to overcome this performance limitation.

We suggest providing a separate Volume Group for the creation of Logical Volumes that will be used forRedo LOG files. Each Logical Volume is again striped allocated. Additionally all Host Devices in the VolumeGroup build a stripe set (the number of Host Devices is often called “stripe width”) and the blocks of thestripe set – each block has the same “stripe size” – are balanced across all Host Devices of the VG.

These kind of “block level striped” Logical Volumes use the queues of all <n> Host Devices belonging to theVolume Group, since I/O operations requested by the database will be “chopped” into <n> same sizedblocks, and these in turn are processed in parallel – WRITE as well as READ I/O operations. The parallel

Page 18: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 18/40

processing improves the I/O performance significantly. For best Redo LOG I/O performance it is essential todistribute these objects across many Host Devices.

The Logical Volume Manager and Storage Systems have one more concept in common, the data mirroring,enabling the implementation of solutions for data high availability.

The basic concept of mirroring is simple – keep the data in different locations to eliminate the possibility ofdata loss upon disk block failure situation. From the high-availability point of view, each mirrored data shouldbe located on separate physical disks, and separate I/O adapters should be used. Ideally, the physical disksshould be located in separate storage systems to protect against data loss in a power failure situation.

As mentioned in chapter Storage Systems, these have functions that allow copies within and betweenstorage systems – data will be copied from one storage system LUN to another. Data mirroring on LogicalVolume Manager level – also known as “Host Level Mirroring” – will be achieved by copying data from oneLogical Volume to another and both LVs can be physically stored on separate storage systems.

Both data mirroring concepts, on Storage System and Logical Volume Manager level, are equally suitable fora data high availability solution, merely mirroring on Storage System level is totally transparent to theOperating System and does not require any resources of the database host.

The next level above the Logical Volume Manger is the file system. Before discussing more details about filesystems, it should be mentioned that for database applications file systems are not needed to achieveoptimal I/O performance.

Basically the Logical Volumes can be provided to the database system – so called raw devices – and datablocks will be directly transferred between database buffers and the Logical Volumes (Direct I/O). To achievebest performance, database algorithms try to keep data in the buffers (main memory) as long as possible andperform I/O operations asynchronously whenever possible (Asynchronous I/O). Since database systemscoordinate the concurrent data access of multiple processes (by database locking), there is no need forlocking I/O operations (Concurrent I/O).

File systems are preferred to raw devices, because they provide advantages such as a secure administration,but no matter which file system type will be implemented, it should provide the key performance capabilitiesof raw devices – that is direct I/O, asynchronous I/O and concurrent I/O.

Page 19: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 19/40

5.2 File System

Today’s File Systems must meet different challenges – they must be fast, robust, ideally infinite scalable andthey must provide capabilities supporting data center operations, such as seamless file system growth andbackup/restore. Note: the term file system is often equated with the software that manages file systems.

Data (database objects) are stored in files and for fast access the file system management organizes thefiles in a tree-structure.

The elements of the tree are called I-nodes. The nodeat the top is called root and the nodes at the bottomare called leaves.

Only the leaves contain data, whereas the uppernodes contain organizational information – pointers tonodes or leaves.

The pointers are located in contiguously allocatedchunks of space – the extents. The extents that buildthe leaves are of variable size (a multiple of the filesystem block size). The ability to allocate extents

reduces the administrative burden for growing file systems and improves performance.

The file system management keeps the tree balanced (B-Tree), so that all paths from the root node to theleaves have the same length and are as short as possible. The latter provides the fast access to that part of afile containing the data requested by the application (here the database system).

Changes on DB files due to insert, update and delete operations may lead to a re-balancing of a B-tree andthe file system management may need to temporarily lock I-nodes. This I-Node Locking can have a severeimpact on the I/O performance and to avoid this, file system option “Concurrent I/O” should be activated,whenever available.

Ideally, change operations on DB files should not block any database process. To achieve this, file systemoption “Asynchronous I/O” should be activated, whenever available. With this option I/O operations run inthe background and do not block database processes. This improves performance, since I/O operations anddatabase processing can run simultaneously.

Since file systems manage files of any application (not only databases), they also cache data changes inmain memory to improve I/O performance. To prevent loosing cached data in the event of a system crash, filesystems must provide measures that guarantee data consistency. This robustness will be achieved by alogging of Write I/O operations and these logs are kept in file system journals.

Although file system caching generally improves the I/O performance, this is unsuitable for databaseapplications. The database system buffers all data in main memory therefore additional file system cachingwill waste memory, while double copying (storage to file system cache and file system cache to DB buffer)will waste CPU time. For file systems containing data, indexes and Redo LOG data, option “Direct I/O” wesuggest activating whenever available.

Page 20: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 20/40

File systems will be created with a certain size and due to database growth it may be necessary to increasethe size of the file system. Assuming that the Logical Volume on which the file system is “mounted”(respectively the Volume Group) has spare storage resources, the size of a file system can be increasedonline – without any impact on operation. A decrease of the size is for some file system types not designedand administrators must implement their own procedures, executing the steps: backup of files, creation of anew smaller file system and restore of saved files.

Backup and Restore are the daily business of each computing center. The backup of file systems may be achallenge if business application users cannot afford any downtime, since the backup requires a consistentstate of the file systems. Luckily file systems – more precisely, file system management in cooperation withLogical Volume Management – offer the Snapshot capability, providing a fast and consistent point in timecopy.

To obtain a Snapshot, the file system offers a command that freezes all I/O operations, causes a copy of allpointers (I-nodes) providing access to the current data and finally thaws I/O operations. Actually no data willbe copied – this makes the process very fast – and after I/O operations are thawed, all data changes will bewritten to new locations while the “old” data (point in time copy) are kept. The point in time copy is consistentand can be copied to a backup media or used to create a system “clone”.

5.3 Cluster File System

So far we discussed file systems as part of the Operating System kernel, supporting the management ofapplication data, which are stored in files and organized in file directories – so to say single OS FileSystems.

The today’s ever increasing needs for performance and high availability, especially requested for databaseapplications, will be fulfilled by the power of multiple host systems, sharing a common data base. To keep thecommon database (files) consistent, a kind of coordinator is needed, “wrapped” around the single OS filesystems and synchronizing their activities. This functionality is provided by a Cluster File System.

Cluster File Systems rely on an inter-hostsystem network, which is characterized bylow-latency and high bandwidth. BasicallyCluster File Systems consist of threecomponents: (1) the Cluster-Frameworkdescribes which Host Systems belong to acluster and take over functions of the ClusterFile System, (2) the Locking Mechanismensures that no two Host Systems cansimultaneously make changes on the same

database file and (3) the Fencing ensures that a malfunction on one Host System does not destroy any data.

The currently available Cluster File Systems can be distinguished by the physical location of the commondatabase. The most common Cluster File System solutions use a shared storage infrastructure and

Page 21: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 21/40

storage is provided using a block-based protocol (SAN). In other solutions, the storage is provided from thelocal physical disks of the Host Systems that build the cluster.

Another way to allow multiple host systems to share a common database is to interpose a dedicatedcoordinator providing to the host systems files, which are used by the database system. This File Server andthe Database Host Systems are interconnected via the Server Network and the file system is installed only onthe File Server – the Network File System (NFS).

Basically the File Server can be regarded asa single OS File System:

Storage may be provided from a storagenetwork using a block-based protocol (SAN)and is managed by the File Server’s LogicalVolume Manager.

The File System manages all files andprovides these to the database Hostsystems.

The File Server must as well provide aLocking Mechanism ensuring that no two Host Systems can simultaneously make changes on the same file.

As discussed in chapter “Storage Systems”, the functionality of the File server can be part of the storageinfrastructure, which is then called Network Attached Storage (NAS).

5.4 Database System

At top of the I/O stack are the database objects: tables, indexes and Redo LOGs. For storing of tables andindexes, Database Systems provide a concept – the table spaces – that combine these objects. The tablespaces in turn consist of “chunks” of space (called “extent”, “segment”, “superblock”) that are stored in files.Usually table spaces consist of many Database files, and – as discussed in the previous chapter – the DBfiles are distributed across multiple file directories (these directories are as well called file systems).

If multiple DB files are available for a table space, Database Systems balance the extents across the DBfiles (next extent in next DB file) – this is called “extent based striping”. Ideally, each table space consists ofas many same sized DB files as file systems exist, and each DB file is stored in a different file system. Due tothis provision of DB files and the extent based striping, all tables and indexes will be evenly distributed acrossall file systems. For growing table spaces “full stripes” of same sized DB files, one file in each file systems,should be provided to keep the balanced data distribution.

Page 22: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 22/40

5.5 Storage management on Host Systems is crucial for optimal I/O Performance

At the beginning of this chapter, we have emphasized that it is crucial for I/O performance, to use as manyI/O buffers of Host Devices and Host Bus Adapter as possible. We discussed which components of the I/Ostack (Multipath Driver, Logical Volume Management and Database System) are helping to achieve on HostSystems this goal.

Only the consideration of storage management capabilities on both, Host System and Storage System level,will help us to determine design guidelines for the I/O stack that are optimal regarding I/O performance andmanagement of system growth.

Page 23: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 23/40

6 The optimal configuration of the I/O Stack

From now on, we assume that the reader is familiar with the building blocks and physical components of anEnterprise Storage Architecture, which was described in the previous chapters. We will now present designguidelines for the entire I/O stack, which we consider to be optimal, since the used data distributiontechniques will assure a uniform utilization of all available physical resources.

6.1 Threefold Striping

In the following, we assume that storage is provided from a Storage Area Network – LUNs using a block-based protocol. We will describe the I/O stack configuration from bottom to top.

First level of Striping

On Storage System level, the physical disks are organized in groups of disks. The LUNs that are providedby the disk groups are RAID configured, using a RAID level that offers both, data protection and datadistribution. The LUNs are evenly distributed (striped) across the physical disks of a RAID-group and in casethe Storage System has the feature of combining RAID-groups in a storage “Pool”, then the provided LUNsare additionally striped across multiple RAID-Groups.

Page 24: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 24/40

For production SAP Business Suite Systems access to a LUN is provided via multiple paths (at least two),and the “zoning” between Frontend Adapter ports and Host Bus Adapter ports deploys multiple differentadapters on storage and host system level. On the Host System a “Multipathing Driver” is installed,providing access to storage should one path fail as well as I/O workload balancing.

Second level of Striping

On Host System level, the Host Devices (LUNs) are managed by the Logical Volume Manager and they areorganized in Volume Groups. The Logical Volumes consist of contiguous chunks of data, that are stripedallocated (next chunk on next Host Device) on all Host Devices belonging to the Volume Group. The LogicalVolumes that are used for the DB system Redo LOG file systems, are additionally block-level striped. Whenchoosing the chunk- and block size, it should be considered that smaller sizes lead to a better distribution.

The File Systems are “mounted” on Logical Volumes – there is always a 1:1 relationship between a FileSystem and a Logical Volume – and the number of File Systems is arbitrary, BUT should NOT be increaseddue to increasing storage demands. The file systems are implemented with the following mount options:direct I/O, asynchronous I/O and concurrent I/O, to achieve the performance capabilities of raw devices.

In the above model, each file system is built on a separate volume group, and when the volume group isincreased by further Host Devices, then the additional capacity is clearly associated to one Logical Volume,respectively file system. The use of multiple Volume Groups may be necessary if the Logical VolumeManager specifies limits for a Volume Group, such as size of a Logical Volume, total number of manageable“chunks” or the number of Host Devices. If no Volume Group limits exist and administrators prefer to provideall Logical Volumes from one Volume Group, this is as well suitable, BUT at least the Redo LOG file systemsare stored in their own Volume Group(s).

Third level of Striping

To enable on Database System level an extent based striping, <n> SAPDATA file systems are created.The DB table spaces are created with a uniform extent size and storage is provided in a number of <n>same sized DB files, each stored in a separate file system.

6.2 Managing the Data Growth

Two tasks must be managed in case the storage demands grow: provision of more physical storage andextension of table spaces on the already allocated storage. Both tasks should be easy to manage without anyimpact to operation – especially neither performance decrease nor downtime.

Extend physical storage

Each Volume Group is extended by the same number of Host Devices, as originally used – a full stripe of<m> same sized LUNs. The new Host Devices need not necessarily to have the same size as the initiallyused Host Devices. To keep the balanced utilization of storage system components, the new LUNs must bebalanced across the RAID groups – respectively storage pools.

If not automatically done by the Logical Volume Manager, the Logical Volume must be increased by theprovided capacity. Finally the file system, that is mounted on the Logical Volume can be increased.

Page 25: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 25/40

Extend Table Spaces

Add to each file system one new file – a full stripe of <n> same sized files. If a very slow growing table spaceshould not be extended by <n> files, then it is better to resize all existing files uniformly, instead of adding justone new file.

6.3 Assessment of the suggested I/O Stack Configuration

Due to the suggested threefold striping, all performance crucial components of the I/O stack: Host Devicequeues, Host Bus Adapter buffers, multiple I/O paths; Storage System: Frontend Adapter, Cache andProcessor Complex, Disk Adapter and physical Disks are evenly utilized.

Data growth can be managed without impact on operation and – as long as the guidelines are followed – thebalanced utilization of all performance crucial components will be kept.

Page 26: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 26/40

7 Monitoring the I/O Stack

Think of a situation where the business users of SAP systems are not satisfied with the overall performance,and since the transactions or business processes that they run are I/O intensive (processing a huge amountof Read and Write I/O operations) the storage infrastructure is suspected to be the culprit.

After reading the previous chapters, it should be clear that potential I/O bottlenecks are not necessarilycaused by the components of the storage infrastructure, but can be caused on all levels of the I/O stack.

Since the storage infrastructure is used by many systems, it should be noted that the systems may interferewith each other. In case the storage infrastructure executes data replications between storage systems, dueto a implemented High Availability solution, the I/O performance of a system may even become impaired bysystems that operate on a different storage system (see chapter Storage Network).

This makes clear that for monitoring the IO stack it is important to get a complete picture of how a system ismapped to the components of the IO stack, and what other systems use these components also. Thedetailed configuration analyses will allow to pinpoint the part of the IO stack, causing performancebottlenecks and to detect these, detailed performance analyses are needed.

The monitors and tools providing raw data for the I/O stack analyses, depend on the used DB System,Operating System and Storage System. Since any combination of these systems can be deployed and eachsystem level has many monitors, we will not describe specific monitors or tools, but rather the informationthey must supply to enable an end-to-end I/O performance analysis. Usually are the levels of the I/O stackadministrated by different experts and special authorizations are needed for the monitors and tools, thereforeit is essential for the analysis of the entire I/O stack, that all experts cooperate.

7.1 Configuration Analysis

To get a complete picture how the SAP system is configured on the DB host and Storage system, theconfiguration analysis comprises the mapping of application system's Database objects (Tables, Indexes,LOGs) to the different parts of the I/O stack, such as DB files, File Systems, Logical Volumes, VolumeGroups, Host Devices and Host Adapters and finally to the Storage System components like FrontendAdapters, Cache, Disk Adapters, RAID groups and physical disks.

This analysis will show how far the current configuration differs from the design guidelines.

Database System

The DB System maintains system tables that contain information describing how tables and indexes aredistributed across the DB files that belong to a table space. This information show for example if a databaseextent based striping is achieved or if most frequently used tables and indexes are accumulated in few DBfiles.

Operating System

Detailed information about the Host System hardware is needed to assess if the available computing power isable to fulfill the requirements of the business application. These information comprise the equipment with

Page 27: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 27/40

CPUs, Memory, Interface cards to the server network and Host Bus Adapter interfaces to the SAN. For allthese components the current parameter settings are as well needed.

For File Systems detailed settings such as mount options and block sizes are needed and the data gatheredfrom the Logical Volume Manager must especially show how Logical Volumes are build on Host Devices, theI/O block sizes and the queue length defined for Host Devices.

Finally, information are required from the Multipath Driver to enable the analysis of I/O paths, such asselected I/O distribution policy, HBA buffer length, used HBA ports and used storage system frontend ports.

Storage System

Only the storage administration knows all Host Systems that share the storage infrastructure, how manystorage systems are used and how the storage systems are related to each other.

For each storage system that may impair the performance of the selected Host System, inventory informationis needed, such as Frontend Adapters, Cache (main memory size, thresholds), Disk Adapters, physical disks(interface, size, RPM) and RAID configuration (which physical disks belong to a group and which RAID levelis implemented). For each LUN, information are needed about the size and assignment to RAID group, DiskAdapters and Frontend Adapters and in case LUNs are provided from storage pools that are combined ofRAID groups, these information are needed as well.

For all Host Systems that share Storage Systems with the selected Host System, at least their names andtheir zoning to Frontend Adapter ports is needed.

7.2 Performance Analysis

As mentioned in the introduction of this chapter, since mostly business users complain about not satisfyingI/O performance, the performance analyses should be conducted during periods with peak businessworkload, such as “Month End Closing” or “load of data into Business Warehouse cubes” – any artificialworkload that is generated outside business periods will not be helpful.

The data needed for the analyses must be collected in parallel on all levels of the I/O stack – DB System,Operating System and Storage System. For this there are monitors available that provide comprehensiveperformance metrics. Depending on the level of the I/O stack, the metrics will be collected in differentintervals, since i.e. small intervals on storage system level can effect the performance of all systems thatshare the storage systems. The different interval length must be taken into account when interpreting thedata. We recommend using an interval length of 5 minutes on DB and storage system, and 10 seconds onthe operating system.

Goal of the analysis is to pinpoint which part of the IO stack causes performance bottlenecks. Sinceperformance optimization on each level of the I/O stack needs special expertise, it is crucial to identify thelevel that is causing bottlenecks as fast as possible to involve the right experts.

A suitable performance indicator is the I/O latency (the time that elapses until the successful completion ofan I/O operation, measured in milliseconds (ms)) that can be derived on all levels of the I/O stack for most ofthe components. Latency on the different I/O stack levels should be compared in periods with significant I/O

Page 28: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 28/40

workload, which can be derived on all levels as throughput (measured in I/O operations per second (IOPS))and bandwidth (measured in transferred MB per second (MB/s)).

If the latency of comparable periods is on the current level significantly higher than on the next level below,then the bottleneck may be caused on components between the current level and the level below. Thereforeit is reasonable to start the performance analysis on operating system level, to pinpoint faster the level thatcauses the bottleneck.

Operating System

If the latency for Read or Write I/O operation measured on DB System level is significantly higher than thelatency measured on Operating System (Host Device) level, the bottleneck is very likely on componentsabove the Host Devices or even on DB System level.

The memory and CPU utilization must be analysed to verify if the Database Host is Memory or CPU bound. Ifthis is not the case, the distribution of the I/O workload across Host Devices may not be optimal and few HostDevices must serve most of the I/O requests, which is indicated by a high utilization of Host Devices. Thelatter will lead to unpredictable latency – just acceptable and shortly after poor I/O service times. Anotherreason for high I/O latency on DB System level might be an improper block size on file system level.

If the I/O latency on both DB System and on Operating System level is not satisfying, then the bottleneck isvery likely on Host Device level or on components below, that is the SAN level. Host Devices might be highlyutilized, have an unsuitable queue size or the used Host Bus Adapters are overloaded. The HBA overloadmay be caused by too many Host Devices that use the same HBA, HBA buffers that are too small or amultipathing driver that does not properly balance the I/O workload.

Storage System

A high utilization of the used Frontend Adapters will lead to high I/O latency. If the Frontend Adapters areshared with other Host Systems, their I/O workload may be the reason for the high utilization andconsequently for the not satisfying I/O latency.

A high latency of Write I/O operations may be caused by cache shortages, meaning the cache is filled withmodified data that were not yet written to disk (Write Pending Limit reached). This is very likely caused by abottleneck in the storage backend and the I/O performance of the LUNs must be analyzed. If the Write I/Olatency is high, but Write Pending Limit is not reached, a synchronous replication of data to the cache of aremote storage system may be the reason – either network bottlenecks or on the remote storage system theWrite Pending Limit reached.

LUNs with not satisfying I/O latency may be accumulated on the same RAID groups (physical disks), storedon too few RAID groups or the LUNs of other Host Systems with high I/O workload use the same RAIDgroups.

Page 29: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 29/40

7.3 Summary – Monitoring the I/O Stack

There are many monitors and tools for monitoring the I/O stack, but few that allow an end-to-end analysisstarting at the application down through the components of the database-, operating- and storage system.Therefore it is important to select for the implemented infrastructure exactly those monitors and tools thatenable this end-to-end analysis.

The monitor and tool selection is made by the administrators who are responsible for the different levels ofI/O stack. On database level the monitors can be queries on statistical tables and on operating system level,commands that collect metrics from the Kernel. Storage system manufacturers provide administrators oftencomplete tool suites for configuration and monitoring of the systems. Whatever monitors and tools arechoosen, the administrators must prove that they can perform the above outlined end-to-end configurationand performance analysis.

Page 30: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 30/40

8 Implementing SAP HANA on shared Enterprise Storage

In the previous chapters the building blocks and physical components of an Enterprise Storage Architecturewere described and we outlined design guidelines for an optimal configuration of database objects on theentire I/O stack. In this chapter we will provide guidelines for the implementation of SAP HANA on storagethat is provided from a Storage Area Network – LUNs using a block-based protocol.

To develop these guidelines, we first need to know which objects SAP HANA saves on physical disk storageand for this we give an overview of the SAP HANA persistence layer. The components on the I/O stackprovide measures to optimize sequential- or random I/O operations with small or large I/O blocks, thereforewe discuss SAP HANA I/O patterns and the processes that generate these. Finally we introduce the StorageConnector API, the SAP HANA built-in solution for file access sharing and fencing of storage, that is need forHANA scale-out solutions.

8.1 HANA Persistence Layer

The SAP HANA database services: Index Server, Name Server, Statistics Server and XS Server savedata on the physical storage level. HANA distinguishes for each database service two types of data that mustbe stored – transactional Redo LOG information and all other DATA.

The XS server is a lightweight application server that is integrated into SAP HANA. The Statistics Serverevaluates information about status, performance and resource consumption from all components belongingto the system. It stores the monitoring and alert information in its own database tables. From there theinformation can be accessed by administrative tools such as the SAP HANA studio.

With regard to availability and consistency of business data, only the Name and Index Server are relevant.The Name Server knows the topology – which tables, table replicas, or partitions of tables are located onwhich index server, either in a single-host or in distributed multi-host (scale-out) HANA environment. TheIndex Server is managing the business data – the database tables. Multiple Index Servers can be active,and each maintains its own subset of tables – “shared nothing”.

HANA DATAThe database tables can be stored row by row: Row Store or column by column: Column Store. The RowStore works directly with pages (blocks of data), while the Column Store use an additional abstraction layer,the containers.

There are different types of containers. Virtual files provide a file-like interface. Virtual files are used by thecolumn store to write and read the data (main storage) and the delta logs (delta storage) of columnar tables.Other types of containers such as fixed size entry container, var size entry container and B* trees areused internally by the persistence layer. Data of all different types of containers will finally be passed aspages to the lower layers.

Each HANA database service (Index-, Name-, Statistics- and XS Server) stores the pages – whether fromRow Store or Column Store – in its own Data Volume. From the perspective of the Linux operating system,the Data Volume is a physical file. The physical files are managed by the HANA “Page I/O” module, thatpartitions the file into so called “Superblocks”, each having a size of 64 MB.

Page 31: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 31/40

The HANA “Page I/O” module ensures that each superblock contains only pages of the same size. Pagesthat are used for Row Store tables always have a size of 16 KB, while pages of the Column Store areeither: 4 KB, 16 KB, 64 KB, 256 KB, 1 MB, 4 MB or 16 MB. The pages of the Row Store do not sharesuperblocks with 16 KB pages that come from the Column Store. When the HANA “Page I/O” module has towrite data, then it will try to make the pages as large as possible.

Apart from the business data other information is stored on the Data Volumes, such as Undo LOG, RestartRecord, List of open Transactions and other technical data.

HANA LOGEach HANA database service (Index-, Name-, Statistics- and XS Server) has its own Log Volumes that areused to record changes. As in other transactional DB systems, HANA will use this information after a systemcrash to redo all completed transactions.

At least 2 Log Volumes – called Log Segments – will initially be created, so that logging can continue while afull log segment is archived. Additional Log Segments can be created if needed. The Log Segments arerepresented on Linux operating system level as physical files, and the HANA Logger writes pages into thesefiles that have a size of 4 KB, 16 KB, 64 KB, 256 KB or 1 MB.

8.2 HANA I/O Pattern

In this section we will arrange the processes that generate I/O operations by their I/O frequency: firstprocesses that continuously generate I/O operations, and then processes that periodically or rarely generateI/O operations.

Redo LOG writingThe information that is needed to redo a committed transaction must be successfully written to the physicalstorage level, before processing can continue – the continuous Redo LOG write I/O operations aresynchronous. The LOG write I/O pattern are sequential and depending on commit frequency and fillingstate of the Log Buffer, Redo LOG I/O blocks are between 4 KB and 1 MB. The major part of theperformance crucial LOG I/O operations processes 4 KB blocks, therefore measures should be taken thatoptimize sequential writes of 4 KB I/O blocks with regard to I/O latency.

Savepoint writingThe persistence layer periodically performs savepoints. During the savepoint operation modified data pagesin the page cache are written to the physical storage level. Buffered redo log entries are flushed to thephysical storage level as well. The purpose of performing savepoints is to speed up the restart, since the redolog need not be processed from the beginning but only from the last savepoint position. Therefore periodicsavepoints are essential to be prepared for a fast restart. By default savepoints will be processed every 5minutes – the savepoint periods are adaptable.

The periodic Savepoint write I/O operations are asynchronous. Savepoints generate some load on LogVolumes but the main load is on Data Volumes. The Savepoint I/O pattern are sequential and I/O blocksbetween 4 KB and 16 MB are processed. Due to HANA’s practice to make the pages as large as possible,rather large I/O blocks can be expected. Therefore measures should be taken that optimize I/O bandwidth(the processed MB/s) rather than throughput (IOPS).

Page 32: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 32/40

Delta Merge writingHANA keeps columnar organized data in the main storage, a memory area containing data that iscompressed and organized for highest read performance. Modified data will be stored in a memory arearelated to main storage – the delta storage – so that the performance properties of the main storage areretained.

The purpose of the delta merge operation is to move modified data that is collected in the delta store into theread optimized main storage. During the merge operation, the complete main storage is rewritten to disk. Thedelta merge is performed asynchronously to transactions that made changes. Certain column store eventstrigger a delta merge, such as: (1) the number of lines in delta storage exceeds the specified limit or (2) thememory consumption of the delta storage exceeds the specified limit.

The regular Delta Merge write I/O operations are asynchronous. Delta Merges generate load on DataVolumes. The Delta Merge I/O pattern are sequential and I/O blocks between 4 KB and 16 MB areprocessed. Due to HANA’s practice to make the pages as large as possible, rather large I/O blocks can beexpected. Therefore measures should be taken that optimize I/O bandwidth (the processed MB/s) ratherthan throughput (IOPS).

Backup processingTwo kinds of Backup are distinguished – the Log Backup and the Data Backup.

Log Backups are automated local processes that are triggered and executed by each HANA server (Index-,Name-, Statistics- and XS Server) autonomously. The Log Backup is performed (1) when a log segment isfull, (2) when a configured time limit is exceeded or (3) after startup of a server.

Data Backups are coordinated for all HANA servers by the “Backup Manager”. The backup manager firsttells all HANA servers to perform a global savepoint. The global savepoint is required to get a system wide-consistent snapshot of the database. When the global savepoint is complete, a database-internal snapshot iscreated based on this savepoint. This means that the pages belonging to this savepoint will not beoverwritten by subsequent savepoints. Now the data volumes on the persistence layer contain a snapshotwith the frozen consistent state of the database.

In the follwing phase of the backup procedure, the backup manager tells the backup executors of the serversto write the content of the previously created snapshot to the backup files. Each server reads the pages of thesnapshot from the persistence layer and writes them to the backup files.

The regular Backup I/O operations are asynchronous. The first phase of Data Backups (Savepoints)generate write I/O load, some load on Log Volumes but the main load on Data Volumes. The I/O pattern aresequential and I/O blocks between 4 KB and 16 MB are processed.

During the Log Backup or the second phase of the Data Backup, large I/O blocks are sequentially read andthen written to the backup files. Since usual database operations (Savepoint writing, Delta Merges) continueduring the second phase of the Data Backup, backup Read I/O operations may compete with Write I/Ooperations.

Since Backups sequentially process large I/O blocks, measures should be taken that optimize I/Obandwidth (the processed MB/s).

Page 33: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 33/40

Read I/O operationsRead I/O operations rare events as they are mainly performed during backups. Of course, data will be readfrom the persistence layer during system startup and during usual database processing, read I/O operationswill be performed to load rarely used tables.

In a multi-host HANA environment a high read I/O load occurs during “failover”, when a HANA host takesover the work. The time needed to failover to the standby host depends on the size of the Row-Store, sinceHANA can start processing earliest, after the Row-Store is completely loaded. The latter is only relevant if theHANA server node that is hosting the Row Store is affected by the server failure.

Application transactions respectively system users must wait until these rare Read I/O operations arecompleted. The I/O pattern are sequential and I/O blocks between 4 KB and 16 MB are processed, thereforemeasures should be taken that optimize I/O bandwidth (the processed MB/s).

Bottom lineFrom the perspective of the I/O layer SAP HANA performs mainly sequential write I/O operations. On theData Volumes mainly large I/O blocks are processed, while on Log Volumes mainly 4 KB I/O blocks areprocessed.

To provide best I/O performance for HANA, I/O operations on the Data Volumes should be optimized forhigh bandwidth (MB/s) and due to the synchronous Log write I/O operations on Log Volumes, these shouldbe optimized for low latency (ms).

8.3 HANA Storage Connector API

In HANA scale-out solutions – one HANA database deploys a cluster of hosts and one standby host is readyto take over in case an active host fails – a layer is required that coordinates the access to Data and LogVolumes. As discussed in chapter 5.3, this layer can be a Cluster File System that provides the followingcomponents:

The Cluster-Framework that describes which Host Systems belong to a cluster and take over functionsof the Cluster File System.The Locking Mechanism that ensures that no two Host Systems can simultaneously make changes onthe same file.The Fencing that ensures that a malfunction on one Host System does not destroy any data.

HANA Storage Connector API provides exactly this functionality. The Storage Connector manages on eachHANA host 2 LUNs. One LUN is used for Data Volumes and the other for the Log Volumes of the HANAservers (Index-, Name-, Statistics- and XS Server) running on the host. SAP offers a ready to useimplementation of this Storage Connector API for all storage subsystems attached via Fiber Channel usingnative Linux multipathing and supporting the SCSI-3 protocol – especially the Persistent Reservation (PR)feature of the SCSI-3 protocol is used for fencing.

In case an active host of a HANA cluster fails, HANA calls the appropriate Storage Connector API method, to(1) allow the storage device driver to re-mount the required Data and Log LUNs to the standby host and (2)fence off these LUNs from the failed host.

Page 34: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 34/40

If the Storage Connector does not suffice, for example, because of a shared storage approach or the lack ofSCSI-3 persistent reservations, a custom Storage Connector can be implemented.

8.4 HANA on an optimal configured I/O Stack

For single-host SAP HANA systems the configuration guidelines described in Chapter 6 (The optimalconfigured I/O Stack), are basically applicable.

HANA requires for its servers (Index-, Name-, Statistics- and XS Server) 2 separate storage devices – one forthe Data Volumes and a second for the Log Volumes. These 2 storage devices can be Logical Volumes(LV) that are managed by the Linux Logical Volume Manager (LVM), and each Logical Volume should beprovided from a separate Volume Group (VG) consisting of LUNs that are from the Storage Area Network.

The Logical Volumes consist of contiguous chunks of data, that should be striped allocated (next chunk onnext Host Device = LUN) on all Host Devices belonging to the Volume Group. Since HANA partitions theData Volumes into Superblocks of 64 MB, the LV chunks should have the same size. Due to this measurethe performance capabilities of “many” LUNs can be utilized for I/O operations on the Data Volumes.

The Logical Volume used for HANA Log Volumes should be prepared for parallel I/O operations on all LUNsthat belong to the Volume Group to increase bandwidth and to minimize I/O latency of Log writes. This will beachieved by “block level striped” Logical Volumes. With this implementation the LUNs of the Volume Groupbuild a stripe set (the number of LUNs is the “stripe width”) and the blocks of the stripe set (each block hasthe same “stripe size”) are balanced across all LUNs of the VG.

The parameters for the optimization of the I/O performance on Log Volumes are stripe width and stripesize, and these should be adapted to the expected peak write load. The following graphic gives an idea howthe I/O bandwidth (MB/s) on a Log Volume depends on the LV stripe width and stripe size, assuming that thewrite I/O latency on the LUNs is 2 ms (consider that larger I/O blocks may have a higher latency).

In this example the stripewidth varies between 4LUNs and 10 LUNs, andthe stripe size between 4KB and 64 KB.The assumed averagewrite I/O latency isrealistic, and in ouranalyses of storage infra-structures, even a betterwrite I/O latency wasmeasured.If the expected peakwrite load of all LogVolumes that are storedon the LOG LogicalVolumes is 250 MB/s,

Page 35: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 35/40

then stripe width = 8 LUNs and stripe size = 64 KB will be sufficient, and in case the average write I/O latencyis 1 ms, a bandwidth of 500 MB/s will be reached. Bandwidth [MB/s] = (Stripe Size [KB] / Latency [ms] *1000) /

1024 * Stripe Width

In a multi-host SAP HANA scale-out solution, where the Storage Connector API is used, the LinuxLogical Volume Manager can not be used – this is the current state when writing this document (SAP HANASP6, July 2013).This means that the Logical Volume Manager based optimizations cannot be applied. Thus the 2 storagedevices that are required for each HANA host, are LUNs from the Storage Area Network, and all HANA I/Operformance demands must be made available from this level.

The graphic shows the HANA I/O stack on one host of a multi-host scale-out solution (the ith host). Two LUNs(Host Devices) are provided from the Storage Area Network – on the Data Host Device (blue) the file system:/hana/data/<sid>/mnt<i> (<sid> = SAP system Identifier; <i> host number) and on the Log Host Device (red)the file system: /hana/log/<sid>>/mnt<i> is mounted. The Linux ext3 block-based file system is used.

For every HANA service there is on both file systems a subdirectory – in the example above, for thenameserver subdirectory /hdb<j> and for the indexserver subdirectory /hdb<j+1>. Every HANA servicestores in the DATA file system its DATAvolume<m> and in the LOG file system its Log Volumes:LOGsegment<m>_<n>.

In each SAP HANA system there is one master nameserver that owns the topology and distribution data.This data is replicated to all other nameservers, called slave nameservers. The slave name servers write thereplicated data to a cache in shared memory from where the indexservers of the same instance can read it.The master name server has its own persistence where it stores name server data (topology, distributiondata). The slave name servers have no persistence as they are only holding replicated data.

In a HANA scale-out solution it is best practice to run only one indexserver on each host per SID.

Page 36: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 36/40

How do we get the I/O performance required by HANA services?The following considerations are valid for both, single-host and multi-host (scale-out) HANA solutions. Asdiscussed in chapter 5 (Managing Storage on Host Systems), it is crucial for I/O performance, to use as manyI/O buffers of Host Devices and Host Bus Adapters (HBAs) as possible, and to provide for the LUNs“sufficient” I/O paths – these are the connections from HBAs via Switches to Frontend Adapter Ports ofthe storage system.

PathsFor data availability at least 2 paths should be defined for each LUN to provide access to SAN storage shouldone path fail as well as I/O workload balancing across all available paths. On Linux operating system level,the Device-Mapper-Multipath (dm-multipath) module will be used for this purpose. For the distribution of I/Orequests across the available paths, dm-multipath offers 3 path_selector algorithms:

round-robin loops through every path in the path group, sending the same amount of I/O load to each.queue-length sends the next bunch of I/O down the path with the least amount of outstanding I/O.service-time chooses the path for the next bunch of I/O based on the amount of outstanding I/O to thepath and its relative throughput (I/O operations per second).

Up to Linux kernel 2.6.31 the default number of I/O to route to a path before switching to the next in the samepath group was 1000 – specified by parameter rr_min_io. Starting with Linux kernel 2.6.32 the newparameter rr_min_io_rq was introduced, that specifies as well the number of I/O to route to a path beforeswitching to the next in the same path group, using request-based dm-multipath – default is 1.

Assessment and suggestionsAlthough the “round-robin” path_selector algorithm is suggested as default, notice that this algorithm does notconsider the impact of I/O load on components (HBAs or Frontend Adapter Ports) that are as well used byother paths. The other two path_selector algorithms will consider the utilization of the components that areused for the path, therefore we suggest to use for the Data LUNs the path_selector algorithm “queue-length”and for the Log LUNs “service-time”, that takes into account the I/O service time.

Apparently 1000 I/O operations before switching to the next path is not optimal for applications that processsequentially “large” I/O blocks (such as HANA), since this may lead to a high utilization of the HBA andFrontend Adapter Port that is used for the path, while components used on other paths are idle. Obviouslyswitching every next I/O to another path will not lead to a considerable overhead (storage vendorbenchmarks confirm this) and the new parameter rr_min_io_rq with default = 1 was introduced.

We suggest to check which setting the storage vendors propose for their storage systems. If there are noproposals, then start for the HANA LUNs with rr_min_io_rq = 10 (respectively rr_min_io = 10) and considerincreasing this value in case the Bandwidth (MB/s) is not sufficient.

How many paths are actually sufficient depends on the performance of the components (4Gbit, 8Gbit, ...,HBAs and Frontend Adapter Ports) and their availability. It is best practice to use no more than 4 paths for aLUN, and optimally the paths do neither share HBAs nor Frontend Adapter Ports of the storage system. Sincethe I/O pattern on HANA Data and Log LUNs are the same, they can share the paths. In case the continuousLog write I/O operations on the Log LUNs are impaired by the periodic writes on Data LUNs, then useseparate paths for Log and DATA LUNs.

Page 37: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 37/40

I/O QueuesThe queue length is the number of parallel executed I/O operations and for Host Bus Adapters themaximum is 2048. The sum of lengths of all Host Devices queues that share a Host Bus Adapter must notbe greater than the HBA queue length.

Notice that HBAs of different host systems may share Frontend Adapter Ports of the storage system. If this isthe case, the I/O load on the HBA with a maximum queue length may impair the performance of I/Ooperations that are generated on another host system. The information about the exact number of parallel I/Ooperations that can be processed on a certain type of Frontend Adapter, can only be provided by the storagevendors.

ExampleThe publications of benchmarks for Frontend Adapter cards from different vendors show that 8 Gbitcards can process at maximum about 3200 I/O operations in parallel (queue length). In this examplewe will assume that for our HANA database 4 storage system Frontend Adapter cards are used, eachwith a maximum of 3200 parallel I/O operations.

We further assume that we have a HANA scale-out solution with 2-active plus 1-standby host, and oneach host 4 Host Bus Adapter cards are used. Each of the two LUNs on the hosts make use of all 4HBAs – each LUN has 4 paths.

How should the queue length for the 4 LUNs and 8 HBAs be set assuming that HANAgenerates on all LUNs the same I/O load?

According to ourassumption, themaximum parallel I/Ooperations on storagelevel is 4 * 3200 =12800. We furtherassumed that the I/Oload is equivalent onboth active hosts,therefore each hostcan process 12800 / 2= 6400 and each of

the 4 HBAs 6400 / 4 = 1600 parallel I/O operations. Thus the queue length for all HBAs will be 1600and for each LUN = 800.

Due to the continuous load on the LOG LUNs, we might consider to define for these a greater queuelength, to favor Redo Log write I/O operations during peak load periods.

Page 38: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 38/40

Optimizations on Storage System levelSAP HANA performs mainly sequential write I/O operations and apart from Backups, Read I/O operations arerare events. If data is actually read, then this happens during the initial system start or when switching to thestandby host.As discussed in chapter 3 (Storage Systems) ALL I/O operations pass through the cache of the storagesystem, but the data that HANA wants to read are very likely not in the cache. Therefore all measures thatprovide as much “write cache” as possible will be beneficial for HANA write I/O operations, but mostenterprise storage systems do not distinguish between read and write cache - all I/O operations share thecache. Storage systems allow that only a part of the cache may be filled with modified data that were not yetwritten to disk (destaged) – “Write Pending Limit”. Whenever this limit is reached, the storage system mustswitch to “deferred write” mode to destage the data – and this will cause a considerable degradation of writeperformance.Consequently Cache optimization means to avoid “deferred writes” and to write modified data as fast aspossible to physical disks.

Optimize Storage System Backend I/O performanceBasically this will be achieved by LUNs that are distributed across many physical disks and a balancedutilization of all Disk Adapters. For this storage systems distribute (stripe) LUNs evenly across the physicaldisks of a RAID-group and in case the Storage System has the feature of combining RAID-groups in astorage “Pool”, then the provided LUNs are additionally striped across multiple RAID-Groups.

If the storage system offers the “Storage Tiering” feature, we suggest to use this. The storage tiers consist ofphysical disks with the same performance characteristics (highest performance is provided in the tierequipped with SSDs (Solid State Disks), then tiers with Fiber Channel (FC) disks, fast SAS (Serial AttachedSCSI) disks and/or SATA disks), and Storage Tiering provides an automatic relocation of most frequentlyused data on physical disks with highest performance capabilities.

Notice that the HANA database is very likely not the only application that uses the storage pool of the storagesystem, therefore the pool must be equipped with as many physical disks as needed to meet the I/Orequirements of all applications.

Impact on I/O performance by synchronous data replication between storage systemsThe synchronous replication of data changes from the primary to a secondary storage system will increasethe Write I/O latency by about 1 ms per 300 km cable length (speed of light). In addition “deferred write”occurences on the secondary storage system, caused by insufficient backend I/O performance, will as wellincrease the latency of synchronous Write I/O operations – HANA Log writes – on the primary storagesystem. The measures used to optimize the backend I/O performance of the primary storage system must aswell be applied on the secondary storage system.

Page 39: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 39/40

9 References

Introduction to Storage Area Networks and System Networking, IBM-Redbook SG24-5470-04, JonTate, Pall Beck, Hector Hugo Ibarra, Shanmuganathan Kumaravel, Libor Miklas, November 2012

iX kompakt Storage – von SSD bis Cloud, Susanne Nolte et al., Februar 2011

Overview – SAP HANA tailored data center integration link to www.saphana.com

SAP HANA Fiber Channel Storage Connector Admin Guide, Thomas Weichert, April 2013

Page 40: Enterprise Storage Architecture – Planning

Enterprise Storage Architecture – Planning

© 2013 SAP AG page 40/40

© 2013 SAP AG. All rights reserved.SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, and other SAP products andservices mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and othercountries.

Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, andother Business Objects products and services mentioned herein as well as their respective logos are trademarks or registeredtrademarks of Business Objects Software Ltd. Business Objects is an SAP company.

Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and services mentioned herein aswell as their respective logos are trademarks or registered trademarks of Sybase, Inc. Sybase is an SAP company.

All other product and service names mentioned are the trademarks of their respective companies. Data contained in this documentserves informational purposes only. National product specifications may vary.

These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies (“SAPGroup”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors oromissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in theexpress warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting anadditional warranty.