WellPoint Fourth Quarter 2008 Earnings Conference Call Transcript
© 2009 IBM Corporation AIX Rightsizing Clea Zolotow Senior Technical Staff Member, IBM Corporation...
-
Upload
pauline-benson -
Category
Documents
-
view
215 -
download
0
Transcript of © 2009 IBM Corporation AIX Rightsizing Clea Zolotow Senior Technical Staff Member, IBM Corporation...
© 2009 IBM Corporation
AIX Rightsizing
Clea ZolotowSenior Technical Staff Member, IBM Corporation
Nicholas Lydakis, Manager, Capacity Planning, WellPoint Corporation
June 3, 2011
© 2009 IBM Corporation2
AIX Rightsizing
ABSTRACT
There are many ways to reduce cost in a datacenter. One of the easiest ways to decrease costs is to decrease the number of servers on the floor. Now, along with physical consolidation, we can logically simplify the datacenter by utilizing virtualization.
Some technical barriers to virtualization are Performance concerns – workloads competing for resources; Growth concerns – workloads cannot reserve space for growth; and Architectural constraints – servers run out of IO or memory before they run out of CPU.
This presentation provides an mass analysis methodology to address performance and growth concerns, and architectural constraints as well as methodologies that can be used to coadunate LPARs to achieve higher utilization rates at the hardware level.
This methodology has been quite successful at IBM. Our biggest cost savings was a run rate of $2.4 million yearly in hardware and a $2 million software savings due to decreased engine utilization.
© 2009 IBM Corporation3
AIX Rightsizing
Virtualization = Infrastructure Simplification
Efficient Virtualization provides the best ROI and minimize the RISK
Logical SimplificationMultiple virtual servers (OS’s)
per physical serverSignificant savings – fewer
servers, higher utilization Rapid “provisioning”Automatic workload mgmtPreserve logical “server to
application” relations
Virtualization
Virtual Servers,Storage, Networks
StorageServersNetworking
Physical Consolidation
LinuxServer
Networking
Fewer sitesUse of larger servers / SAN’sMostly environmental savingsDisparate management toolsLabor intense provisioningWorkload mgnt and isolation issues
SAN
Windows Server
Unix Server
Linux Servers
Unix Servers
1 workload per serverManual provisioningNo sharingVertical silo’sDisparate mgmt toolsMultiple sites
ManagementServers
Complex
Networking
Storage
Windows Servers
© 2009 IBM Corporation4
AIX Rightsizing
Virtualization’s popularity today is based on its ability to optimize ITVirtualization’s popularity today is based on its ability to optimize IT
Virtualization has been around for decades
And it is here to stay
Large and small organizations alike are rapidly adopting the technology
Virtualization motivators
Reduce costs 57%
Simplify IT infrastructure and administration 48%
Increase server utilization 48%
Increase scalability of infrastructure 29%
Enhance resiliency and reliability 25%
Improve application performance 15%
Manage a heterogeneous server environment 9%
Source: IBM Systems and Technology Group (1Q06)
Why do organizations adopt virtualization?
For reasons that range from reduced IT costs to simplified IT environments, streamlined management and increased IT flexibility
© 2009 IBM Corporation5
AIX Rightsizing
Each Workload is Evaluated for Suitability Based on Technical Attributes
Priority Workloads for Consolidation:
WebSphere® applications
Domino® Applications
Selected tools: Tivoli®, WebSphere® and internally developed
WebSphere MQ
DB2® Universal Database™
© 2009 IBM Corporation6
AIX Rightsizing
Current Mid-Range Server Location by State – Physical Consolidation Opportunities still exist!
Unix and Intel by Location
California Colorado Connecticut Georgia Illinois
Indiana Kentucky Maine Massachusetts Michigan
Missouri Nevada New Hampshire New York North Carolina
Ohio Texas Virginia West Virgina Wisconsin
© 2009 IBM Corporation7
AIX Rightsizing
Analysis methodology to address performance and growth concerns: Rightsize individual LPARs (CPU and Memory)
Know your current hardware utilization rates and derive potential cost savings to get customer/app owner buy-in.
Rightsize individual LPARs– Initial pass is “perfect world”– Second pass is initial meeting with app
owners.– Third and subsequent passes take into
account most-loved and business critical applications.
Roll out resizing in waves.– Capacity planning has to measure pre- and
post-wave to ensure that there is headroom for processing.
– Find potential resource problems before the app owner
Actual hardware savings is usually 50% or less than perfect world analysis.
Physical Box Busy
8.43
13.66
0 2 4 6 8 10 12 14 16
Non-Production
Production
© 2009 IBM Corporation8
AIX Rightsizing
UNIX Virtualized vs. Non-Virtualized Utilization Large Company – Recent Data
0
10
20
30
40
50
60
70
80
90
0
200
400
600
800
1000
1200
Average CPU Busy 11.88 9.43 12.94 10.85 11.28
Average CPU Max 84.36 71.26 81.81 67.25 76.17
Number of Virtual Machines/LPARs 499 54 406 153 1112
Non-Production LPAR
Non-Production Server
Production LPARProduction
ServerTotal/Averages
© 2009 IBM Corporation9
AIX Rightsizing
Capped and Uncapped Mode
In the configuration of Micro-Partitioning, two types are available, capped and uncapped. The difference is in defining the ability of a partition to use extra capacity available in the system. If a processor donates unused cycles back to the shared pool, or if the system has idle capacity (because there is not enough workload running), the extra cycles may be used by other partitions, depending on their type and configuration.
Capped mode The processing capacity never exceeds the assigned processing capacity.
Uncapped mode The processing capacity may be exceeded when the shared processing pool has available resources.
© 2009 IBM Corporation10
AIX Rightsizing
Capped Mode
A capped partition is defined with a hard maximum limit of processing capacity. That means that it cannot go over its defined maximum capacity in any situation, unless you change the configuration for that partition (either by modifying the partition profile or by executing a dynamic LPAR operation). Even if the system is otherwise idle, the capped partition cannot exceed its entitled capacity.
© 2009 IBM Corporation11
AIX Rightsizing
Uncapped Mode With an uncapped partition, you must specify the uncapped weight of that partition. If multiple uncapped
logical partitions require idle processing units, the managed system distributes idle processing units to the logical partitions in proportion to each logical partition's uncapped weight. The higher the uncapped weight of a logical partition, the more processing units the logical partition gets.
© 2009 IBM Corporation12
AIX Rightsizing
Min, Max and Desired
When assigning processor values you must specify minimum, desired, and maximum values for both processing units and virtual processors.
If any of the three types of resources cannot satisfy the specified minimum and required values, the activation of a partition fails. If the available resources satisfy all the minimum and required values but do not satisfy the desired values, the activated partition will get as many of the resources that are available.
MinProcessing Unit
.1
DesiredProcessing Unit
.5
MaxProcessing Unit
1
MinVirtual CPU
1
DesiredVirtual CPU
1
MaxVirtual CPU
2
The maximum value is used to limit the maximum processor resources when dynamic logical partitioning operations are performed on the partition.
This is the Cap
© 2009 IBM Corporation13
AIX Rightsizing
Physical
Virtual
Engine Type
Minimum Entitlement Maximum
Half of the Physical Entitlement
The average CPU consumed by the LPAR,
or 10% of the Virtual entitlement, whichever is higher. The total of this
number cannot exceed the activated CPUs on the
frame.
Twice the Physical Entitlement
Half of the Virtual Entitlement
Twice the Virtual Entitlement
The maximum of the CPU consumed
by the LPAR * 1.30%.
Rightsizing Methodology: AIX CPU Sizing Parameters (Uncapped)Minimum=the lowest configuration available without rebooting
Physical Entitlement=the starting configuration of the LPAR
Physical Entitlement=the starting configuration of the LPAR
Maximum=the highest configuration available without rebooting
Maximum=the highest configuration available without rebooting
Virtual Entitlement=the maximum the LPAR can receive
Virtual Entitlement=the maximum the LPAR can receive
© 2009 IBM Corporation14
AIX Rightsizing
Rightsizing Methodology: AIX CPU Sizing Parameters (Capped)
Minimum=the lowest configuration available without rebooting
Maximum=the highest configuration available without rebooting
Maximum=the highest configuration available without rebooting
Physical Entitlement=the capacity of the LPAR can receive
Physical Entitlement=the capacity of the LPAR can receive
Physical
Engine Type
Minimum Entitlement Maximum
Half of the Physical Entitlement Twice the Physical
Entitlement
The maximum of the CPU consumed
by the LPAR * 30%. The total of
this number cannot exceed the
activated CPUs on the frame.
© 2009 IBM Corporation15
AIX Rightsizing
Advanced Power Virtualization
AIX 5LV5.2Linux
Hypervisor
Dynamically resizable
2 CPUs
4CPUs
6 CPUs
Lin
ux
Lin
ux
AIX
5L
V5
.3
Virtual I/O paths
AIX
5L
V 5
.3
AIX
5L
V5
.3
AIX
5L
V5
.3
AIX
5L
V5
.3
Micro-Partitioning
ManagerServer
LPAR 2AIX 5L V5.3
LPAR 1AIX 5L V5.2
LPAR 3Linux
PLM partitions Unmanaged partitions
Hypervisor
PLM agent PLM agent
AIX 5LV5.3
6CPUs
Ethernetsharing
Virtual I/O server
partition
Storagesharing
1 CPU
i5/OSV5R3**
1CPU
IVM
Virtual I/O Server– Shared Ethernet – Shared SCSI and
Fibre Channel-attached disk subsystems
– Supports AIX 5L V5.3 and Linux partitions
Micro-Partitioning– Share processors across
multiple partitions– Minimum partition 1/10th
processor
Partition Load Manager– Balances processor and
memory request
Managed via HMC or IVM
© 2009 IBM Corporation16
AIX Rightsizing
Tooling and Data Retrieval: SRM
To the right is the SRM methodology and data streams. This works like many other performance and capacity systems.
Minutely agents are deployed (1) and sent to an interim holding spot (2) where the the data gets processed and crunched to 15 minute intervals or hourly intervals (3) where it’s stored in DB2 (4) and presented on the SRM website(4).
© 2009 IBM Corporation17
AIX Rightsizing
Tooling and Data Retrieval: Brio (ODBC)
After the data is loaded to the SRM data warehouse, it is extracted to the PC utilizing Microsoft’s Open Data Base Connectivity (ODBC).
There, the architectural and utilization information is merged together to produce three reports utilized for rightsizing and server consolidation studies.
Utilization Information
CustomCategorization
Architectural Information
Brio
SRM Data Warehouse
Rightsizing Reporting
Architectural Reporting
Utilization Reporting
© 2009 IBM Corporation18
AIX Rightsizing
Rightsizing Methodology: AIX CPU Sizing Parameters
Part One: Pull the data:
Part Two: Analyze it
Use this later, start with the forest, not the trees.
Use this later, start with the forest, not the trees.=ROUNDUP(IF(
A3="Capped",(G3*I3/100)*1.3,J3),0)
=ROUNDUP(IF(A3="Capped",(G3*I3/100)*1.3,J3),0)
=IF(K3/10>M3,K3/10,M3)
=IF(K3/10>M3,K3/10,M3) =ROUNDUP(IF(
A3="Capped",G3*I3/100,J3),1)
=ROUNDUP(IF(A3="Capped",G3*I3/100,J3),1)
© 2009 IBM Corporation19
AIX Rightsizing
The Big Picture
In the previous example, I chose only the 34 32-way boxes at this corporation (1088 CPUs).
385 physical CPUs on capped LPARs are currently allocated to the workload.
After rightsizing, in a perfect world, we uncapped all the LPARs and could run them on 261 virtual CPUs and 174.8 physical CPUs, or 5.5 32-way boxes, a savings of 25 physical frames after accounting for headroom (2 CPUs per frame) and 4 engines per frame dedicated to VOIS.
Your mileage will vary.
© 2009 IBM Corporation20
AIX Rightsizing
Technical Barriers to Virtualization: Workloads Competing for Resources
Monitoring workloads is essential.
Silo-ed corporations seem to believe that in shared-host systems, someone else is stealing their CPU.
The next chart shows how physical utilization can be calculated at the frame level.
Uncapped LPAR utilization is calculated by utilizing the number of CPUs dispatched to service the workload and therefore includes any LPAR overhead of frame overhead (PURR value, physical processors consumed).
Capped LPAR utilization can be calculated in two ways:– Simple count of engines as they are no longer in the shared pool (i.e., the number of
physical CPUs).– CPU Utilization * the number of physical CPUs assigned.
To prove to management that the boxes are underutilized and run a cost savings project, I usually use CPU Utilization (as seen on the next page).
To prove to application owners that the CPUs isn’t being “stolen” I use the “simple count of engines” for the capped environment and the CPU dispatched for the uncapped.
© 2009 IBM Corporation21
AIX Rightsizing
0
5
10
15
20
25
30
35
IBM
,01025C24A
IBM
,01021062B
IBM
,0102B143F
IBM
,010288F3F
IBM
,0102C586C
IBM
,0102105DB
IBM
,010247D1D
IBM
,01021DD
DB
IBM
,01021BF5B
IBM
,0102B13D
F
IBM
,01023F70B
IBM
,01020EC
9D
IBM
,01024DA
1A
IBM
,01020ED
2D
IBM
,010222C5F
IBM
,010225A9A
IBM
,0102398AB
IBM
,0102270FB
IBM
,01021DD
9B
IBM
,010288F8F
IBM
,010288FDF
IBM
,0102CF0D
F
IBM
,0102BC
17C
IBM
,011095030
IBM
,011059FBD
IBM
,0110BA
DFC
IBM
,011023BA
F
IBM
,0110BA
EA
C
IBM
,0110BA
DC
C
IBM
,0110BA
DA
C
IBM
,01103F92F
IBM
,0110401DF
IBM
,01022E17D
IBM
,011049B0F
Phy
sica
l CP
Us
0
5
10
15
20
25
30
CP
U U
tiliz
atio
n
CPUs on Frame Max HW Used Avg HW Used 90thPCtile HW Used
The top (yellow bar) is the number of physical CPUs, here 32.
The red square is the 90th percentile of the CPU utilization of the frame utilizing hourly data.
The red square is the 90th percentile of the CPU utilization of the frame utilizing hourly data.
The top of the blue line is the maximum CPU utilization of the frame.
The top of the blue line is the maximum CPU utilization of the frame.
The bottom of the blue line is the average utilization of the frame.
The bottom of the blue line is the average utilization of the frame.
Technical Barriers to Virtualization: Workloads Competing for Resources
© 2009 IBM Corporation22
AIX Rightsizing
Growth Concerns – Workloads Cannot Reserve Space for Growth;
In an uncapped environment, workloads can reserve space for growth by utilizing the amount of virtualized CPUs available to the workload.
This was used to “sell” the benefits of uncapped LPARs to the application owners.
In the previous example, a 30% uplift was built into the calculation for the virtual CPUs:– =ROUNDUP(IF(A3="Capped",(G3*I3/100)*1.3,J3),0). – As you work with your individual environment, you can customize that uplift.– Note that uplift not only covers growth, but intra-hour peaks (as I utilized hourly average
data).
© 2009 IBM Corporation23
AIX Rightsizing
Architectural Constraints – Servers Run out of IO or Memory Before They run out of CPU;
These machines require 1,393,664 MB of memory to run their workload. (Memory optimization will have to wait for another day.)
Spread over 7 machines, each machines (evenly) would require 199,095 MB of memory, or 200,704 (4096) or 204,800 (8192).
Unfortunately, these machines came with 131,072.
Further, there are 7 Oracle databases in which the application owner will not let the LPAR run on shared VOIS, adding to the number of frames and the number of engines.
© 2009 IBM Corporation24
AIX Rightsizing
Methodologies to Coadunate LPARs
coadunationthe state or condition of being united by growth.
— coadunate, adj.
© 2009 IBM Corporation25
AIX Rightsizing
Coadunation Example
Mixing workload shares headroom but you pay in response time at low utilization....workload management shifts peaks based on business priorities to use
"white space" but response time of lower priority work is traded off...
© 2009 IBM Corporation26
AIX Rightsizing
Data Preparation
Data is readily available from the SRM database at srmweb.raleigh.ibm.com.
Data is extracted and normalized to the receiving machine using the Ideas International database.
The CSV file is briefly edited then run into SPOT.
This extraction and load process takes about 20 minutes (depending on the response time of the SRM database).
The SPOT tool takes about 10 minutes to run each datacenter (Southbury and Boulder).
Total study time is 60 minutes. Easy!
© 2009 IBM Corporation30
AIX Rightsizing
Results of Co-adunation Study, Boulder
Boulder has 24 physical frames holding 93 LPARs, averaging 3.875 LPARs per frame. Based on CPU utilization, the LPARs could all be deployed to 5 x445s, which would then run an average of 47.4% busy, a savings of 19 physical frames. 2 LPARs would be migrated to stand-alone. (This is an average 18.2 LPARs per frame.)
Current host utilization for Boulder for March, 2007 was 7.33% busy.
© 2009 IBM Corporation31
AIX Rightsizing
Results of Co-adunation Study, Southbury
Southbury has 17 physical frames holding 59 LPARs, averaging 3.47 LPARs per frame. Based on CPU utilization, the LPARs could all be deployed to 4 x445s, which would then run an average of 45.6% busy, a savings of 13 physical frames. 2 LPARs would be migrated to stand-alone. (This is an average of 14.25 LPARs per frame.)
Current utilization for Southbury was 7.62% busy.
© 2009 IBM Corporation32
AIX Rightsizing
Conclusion
There are many ways to reduce cost in a datacenter.
Decrease the number of servers on the floor using physical or virtual consolidation.
Address Concerns:– Performance concerns – workloads competing for resources; – Growth concerns – workloads cannot reserve space for growth; and – Architectural constraints – servers run out of IO or memory before they run out of CPU.
Utilize a statistical or bin-packing mass analysis methodology to coadunate LPARs to achieve higher utilization rates at the hardware level.
Get those cost savings!