Esx Presentation

download Esx Presentation

of 64

Transcript of Esx Presentation

  • 8/6/2019 Esx Presentation

    1/64

    Performance and CapacityPerformance and CapacityConsiderations withConsiderations with

    VMware ESX 3.5VMware ESX 3.5-- DRS and HADRS and HA

    Ellen FriedmanEllen Friedman

    Philadelphia CMGPhiladelphia CMG

    May 8, 2009May 8, 2009

  • 8/6/2019 Esx Presentation

    2/64

    AgendaAgenda

    Virtualization Candidate ConsiderationVirtualization Candidate Consideration DRS and HA OverviewDRS and HA Overview Metrics: Performance/CapacityMetrics: Performance/Capacity Capacity Planning and PerformanceCapacity Planning and Performance

    ConsiderationsConsiderations

  • 8/6/2019 Esx Presentation

    3/64

    Server ConsolidationServer Consolidation

    A solution for Server SprawlA solution for Server Sprawl Consolidation can lead to savings/efficiency andConsolidation can lead to savings/efficiency and

    reduced TCO:reduced TCO:

    Hardware (not typically softwareHardware (not typically software-- depends ondepends onlicensing model)licensing model)

    IT resources and systems management costsIT resources and systems management costs

    Power and Cooling and Floor SpacePower and Cooling and Floor Space

    Add flexibility for future growthAdd flexibility for future growth

    Can facilitate onCan facilitate on--demand resource provisioning dependingdemand resource provisioning dependingupon the solutionupon the solution

    Create an opportunity to provide HA and D/R atCreate an opportunity to provide HA and D/R atreduced costreduced cost

    Create a single point of controlCreate a single point of control

  • 8/6/2019 Esx Presentation

    4/64

    Server ConsolidationServer Consolidation

    ConsiderationsConsiderations What Consolidation Ratio are you trying toWhat Consolidation Ratio are you trying to

    achieve? E.g., 20:1, 40:1?achieve? E.g., 20:1, 40:1?

    Need to mitigate risks:Need to mitigate risks: Time to do backTime to do back--upsups StartStart--up issues and possible outagesup issues and possible outages

    SAN is now the major point of contention and possible singleSAN is now the major point of contention and possible singlepoint of failure depending upon implementationpoint of failure depending upon implementation

    Still not optimal even with ESX 3.5 for decision support orStill not optimal even with ESX 3.5 for decision support or

    large database systemslarge database systems

  • 8/6/2019 Esx Presentation

    5/64

    Server Consolidation ConsiderationsServer Consolidation Considerations

    (contd)(contd) Business and Application ConsiderationsBusiness and Application Considerations

    Candidates for Consolidation and VirtualizationCandidates for Consolidation and Virtualization

    Similar Availability and SLA requirementsSimilar Availability and SLA requirements Solutions will be based on theseSolutions will be based on these

    Understand the political landscapeUnderstand the political landscape Business/application should be able to work in a sharedBusiness/application should be able to work in a shared

    services environmentservices environment

    Minimal I/O resource requirements:Minimal I/O resource requirements: For virtualized environmentsFor virtualized environments-- there is still some penalty for I/Othere is still some penalty for I/O

    and MPIO may not be achievable depending upon the solutionand MPIO may not be achievable depending upon the solution

  • 8/6/2019 Esx Presentation

    6/64

    Capacity Planning andCapacity Planning and

    BalancingBalancing What ROI does management expect?What ROI does management expect?

    What is the expected consolidation ratio?What is the expected consolidation ratio?

    MonitorMonitor as you roll outas you roll out Monitor resource usage for the HOSTSMonitor resource usage for the HOSTS

    and the VMsand the VMs Compare Actual vs Projected resource usage.Compare Actual vs Projected resource usage.

    Projected Resource Usage will come from Analysis of List ofProjected Resource Usage will come from Analysis of List ofCandidate ServersCandidate Servers

    Projected CPU, Validate Memory, Validate I/O and NetworkProjected CPU, Validate Memory, Validate I/O and Network Develop Realistic Timeframes to provision additional hardware includingDevelop Realistic Timeframes to provision additional hardware including

    cabling/network requirements.cabling/network requirements. Incorporate Tolerances to Accommodate for DelaysIncorporate Tolerances to Accommodate for Delays

    Ensure proper planning for Server (CPU, Memory, I/O, Network, Storage)Ensure proper planning for Server (CPU, Memory, I/O, Network, Storage)

  • 8/6/2019 Esx Presentation

    7/64

    Capacity Planning andCapacity Planning and

    Balancing (contd)Balancing (contd) Planning for the SANPlanning for the SAN

    Understand the I/O performance capabilities Native vs. VirtualizedUnderstand the I/O performance capabilities Native vs. Virtualized Utilize IOMETER to benchmark I/O performance by planning for your profiles e.g,Utilize IOMETER to benchmark I/O performance by planning for your profiles e.g,

    amount and size of Sequential Reads/Writes vs Random Reads/Writesamount and size of Sequential Reads/Writes vs Random Reads/Writes IOMETER will produce I/O loading and response time reports for various types/sizes ofIOMETER will produce I/O loading and response time reports for various types/sizes of

    I/O.I/O. Need to plan for peak IOPs as well as storage Cache (kbytes/second)Need to plan for peak IOPs as well as storage Cache (kbytes/second)

    Ability to Scale and Load Balancing for better performanceAbility to Scale and Load Balancing for better performance

    Ability to ReAbility to Re--size LUNssize LUNs E.g., 22 GB VM storage and LUN size of 400 GB (find the sweetE.g., 22 GB VM storage and LUN size of 400 GB (find the sweet

    spot here)spot here)

    Number of VMs per LUN (12Number of VMs per LUN (12--15max)15max) Accommodate for growth in VM storage requirementsAccommodate for growth in VM storage requirements Accommodate for Snapshots, Swap Space (function of memory size)Accommodate for Snapshots, Swap Space (function of memory size) Minimize impact of Test on ProductionMinimize impact of Test on Production

  • 8/6/2019 Esx Presentation

    8/64

    Virtualization Candidate ConsiderationsVirtualization Candidate Considerations

    and Creating Profilesand Creating Profiles Create Templates for VM builds and profiles of the work.Create Templates for VM builds and profiles of the work. Develop resource profiles for candidate applications in terms of CPU,Develop resource profiles for candidate applications in terms of CPU,

    I/O, Memory and NetworkI/O, Memory and Network Collect the data for at least one peak week, but of course more isCollect the data for at least one peak week, but of course more is

    better! Sampling rate should be between 1better! Sampling rate should be between 1--5 minutes (e.g., month5 minutes (e.g., monthend).end). Summarize the data to obtain peak estimates per server (e.g.,Summarize the data to obtain peak estimates per server (e.g.,

    9595thth percentile for CPU, I/O, NIC, memory)percentile for CPU, I/O, NIC, memory) Do the candidates have similar resource profiles as those ofDo the candidates have similar resource profiles as those of

    the systems you are considering?the systems you are considering? Monitor resource usage over time to see if profiles change.Monitor resource usage over time to see if profiles change. High I/O requirements may be problematic for virtualizationHigh I/O requirements may be problematic for virtualization

    Summarize the data and rank the candidatesSummarize the data and rank the candidatesperhaps limiting servers with high I/O demand andperhaps limiting servers with high I/O demand andminimizing the number of servers with large memoryminimizing the number of servers with large memoryfootprints.footprints.

  • 8/6/2019 Esx Presentation

    9/64

    Virtualization Candidate ConsiderationsVirtualization Candidate Considerations

    and Creating Profiles (contd)and Creating Profiles (contd) Availability and Service requirementsAvailability and Service requirements

    Will Dictate BackWill Dictate Back--up requirements and Disk/SAN/NASup requirements and Disk/SAN/NAS

    Solutions to meet SLASolutions to meet SLAVMs with different SLAs and AvailabilityVMs with different SLAs and Availabilityrequirements should not be placed in the samerequirements should not be placed in the sameresource pool but perhaps not the same cluster?resource pool but perhaps not the same cluster?

    Do all the Virtual Candidates fit the same modelDo all the Virtual Candidates fit the same modelin terms of Storage requirements? Does thein terms of Storage requirements? Does thetemplate work?template work?

    What are the differences in I/O Profiles?What are the differences in I/O Profiles?

  • 8/6/2019 Esx Presentation

    10/64

    What data to Measure in WindowsWhat data to Measure in Windows--

    CPUCPU CPU UsageCPU Usage

    Total CPU Usage: Percent busy and Normalized usage in SPECTotal CPU Usage: Percent busy and Normalized usage in SPEC--INTs not MHz (remember not all MHz are created equal)INTs not MHz (remember not all MHz are created equal)

    How busy? Recommendation from VMware and to limitHow busy? Recommendation from VMware and to limitoverhead/scheduling is 1 Virtual CPU per VMoverhead/scheduling is 1 Virtual CPU per VM This means you want to eliminate candidates which are projected toThis means you want to eliminate candidates which are projected to

    consume more than 1 Socket/Core on the consolidated platform.consume more than 1 Socket/Core on the consolidated platform.

    How many VMs to deploy requiring multiple VCPUs?How many VMs to deploy requiring multiple VCPUs?

    How busy?How busy?

    Depends upon your ROI and how many servers you need toDepends upon your ROI and how many servers you need to

    consolidate and if CPU becomes a rateconsolidate and if CPU becomes a rate--limiting factorlimiting factor Plan based on what can fit in your initial environment and sizePlan based on what can fit in your initial environment and size

    accordinglyaccordingly

  • 8/6/2019 Esx Presentation

    11/64

    What data to Measure in WindowsWhat data to Measure in Windows

    MemoryMemory Measuring Memory ConsumptionMeasuring Memory Consumption

    Committed BytesCommitted Bytes Total Working Set SizeTotal Working Set Size Available MemoryAvailable Memory Page file utilizationPage file utilization

    Projected Memory Requirement: 1 GB, 2 GB, 3GB???Projected Memory Requirement: 1 GB, 2 GB, 3GB??? Based on Committed bytes and Total Working SetBased on Committed bytes and Total Working Set Data must be collected at more granular level since memory is an instantaneous counterData must be collected at more granular level since memory is an instantaneous counter Need to account for memory requirements during normal processing and during backNeed to account for memory requirements during normal processing and during back--upsups Allocate and round up based on Templates that you will createAllocate and round up based on Templates that you will create

    How Big can you go? How much can fit?How Big can you go? How much can fit? Depends upon ROI e.g., consolidation ratio and how much memory you haveDepends upon ROI e.g., consolidation ratio and how much memory you have You need to allocate what the system/server requires or else it will page or swapYou need to allocate what the system/server requires or else it will page or swap Therefore typically < 4 GBTherefore typically < 4 GB

    WW

  • 8/6/2019 Esx Presentation

    12/64

    What data to Measure in WindowsWhat data to Measure in Windows

    and from SAN for I/Oand from SAN for I/OTwo Sides of the CoinTwo Sides of the Coin Side 1Side 1-- SANSAN

    What is the SAN Capacity how many IOPs can you achieve in total e.g.,What is the SAN Capacity how many IOPs can you achieve in total e.g.,total throughputtotal throughput

    Total Throughput specifically for VMwareTotal Throughput specifically for VMware Side 2Side 2-- Windows requirementsWindows requirements

    Measure the IOPs and kbytes/second during peak timeframes.Measure the IOPs and kbytes/second during peak timeframes. For the candidate list, can the aggregate I/O throughput be met?For the candidate list, can the aggregate I/O throughput be met? Do you need to eliminate high I/O loadsDo you need to eliminate high I/O loads Example: If Capacity is 200 MB per second but you have a totalExample: If Capacity is 200 MB per second but you have a total

    requirement for 300 MB per secondrequirement for 300 MB per second then you would eliminate the highest I/O candidates which caused you tothen you would eliminate the highest I/O candidates which caused you toexceed your constraint for this phase.exceed your constraint for this phase.

  • 8/6/2019 Esx Presentation

    13/64

    I/O Pathing in ESX 3.0I/O Pathing in ESX 3.0

    MultiMulti--pathing Featurespathing Features Two modes: Fixed/Preferred and MRUTwo modes: Fixed/Preferred and MRU

    MRUMRU--Most recently used: Active/Passive devicesMost recently used: Active/Passive devices Devices that maintain a single active path to the disk andDevices that maintain a single active path to the disk and

    failfail--over to the alternate path in the event of a componentover to the alternate path in the event of a componentfailurefailure

    Fixed/PreferredFixed/Preferred-- Active/Active devicesActive/Active devices

    Allows for manual balancing of LUNs between HBAs.Allows for manual balancing of LUNs between HBAs.

    NOTE: I/O Pathing is supposedly improved in ESX 3.5

  • 8/6/2019 Esx Presentation

    14/64

    MRU DesignMRU Design

    Note: No I/O balancing

    Between HBAs-Single Path I/O

  • 8/6/2019 Esx Presentation

    15/64

    Preferred PathPreferred Path

    LUN0

    LUN2

    LUN1

    LUN3

    Preferred Path

    Failover Path

    Poor Mans I/O Load Balancing

    Manually Load Balance toImprove performance

  • 8/6/2019 Esx Presentation

    16/64

    The VMKernelThe VMKernel

    VMkernelVMkernel

    A highA high--performanceperformanceoperating system thatoperating system that

    occupies theoccupies thevirtualization layer andvirtualization layer andmanages most of themanages most of thephysical resources onphysical resources onthe hardware, includingthe hardware, includingmemory, physicalmemory, physicalprocessors, storage, andprocessors, storage, andnetworking controllers.networking controllers.

    Network and I/O goes through

    I/O queues within the virtualization layerSupport for 10GigE cards is now availablewith ESX Server 3.5.

  • 8/6/2019 Esx Presentation

    17/64

    VMware and the Virtual ViewVMware and the Virtual View

    Managing ResourcesManaging Resources Data centersData centers

    Virtual Center can view all of the resources and manage theVirtual Center can view all of the resources and manage theresources of data centers.resources of data centers.

    ClusterCluster A group of hosts: In order to use HA and DRS feature, the hostsA group of hosts: In order to use HA and DRS feature, the hosts

    must be defined as part of a cluster to provide for loadmust be defined as part of a cluster to provide for loadbalancing and failbalancing and fail--over between hosts.over between hosts.

    HostHost A container for virtual machinesA container for virtual machines

    Resource PoolResource Pool A collection of virtual machines on a host or within a cluster thatA collection of virtual machines on a host or within a cluster that

    possess the ability to have processor and memory resourcespossess the ability to have processor and memory resourcescontrolled at an aggregated level.controlled at an aggregated level.

  • 8/6/2019 Esx Presentation

    18/64

    What/Why Are ResourceWhat/Why Are Resource

    Pools?Pools? Resource pools can be used to hierarchicallyResource pools can be used to hierarchically

    partition available CPU and memory resources.partition available CPU and memory resources. For each resource pool, you can specify reservation,For each resource pool, you can specify reservation,

    limit, shares, and whether the reservation should belimit, shares, and whether the reservation should beexpandable.expandable. The resource pool resources are then available to childThe resource pool resources are then available to child

    resource pools and virtual machines.resource pools and virtual machines.

    Why?Why? Provides a container view: Isolate and ProtectProvides a container view: Isolate and Protectdifferent workloads in a box. (Critical vs nondifferent workloads in a box. (Critical vs non--Critical)Critical)

    VMware DRS helps you balance resources acrossVMware DRS helps you balance resources across

    virtual machines.virtual machines.

  • 8/6/2019 Esx Presentation

    19/64

    Resource Allocation per VMResource Allocation per VM

    CPU and Memory (not I/O)CPU and Memory (not I/O) Can guarantee resources withCan guarantee resources with ReservationsReservations Can CAP resource usage withCan CAP resource usage with LimitsLimits PrioritizationPrioritization-- Fair Share SchedulingFair Share Scheduling

    Specifying Shares: Apportionment RatioSpecifying Shares: Apportionment Ratio HighHigh-- 44-- MediumMedium--2:2: LowLow--11

    Shares for CPUShares for CPU High: 2000, Normal: 1000, Low: 500High: 2000, Normal: 1000, Low: 500

    Shares for MemoryShares for Memory High: 20 Shares per MB of VMHigh: 20 Shares per MB of VM Normal: 10 Shares per MB of VM memoryNormal: 10 Shares per MB of VM memory Low: 5 Shares per MB of memoryLow: 5 Shares per MB of memory

  • 8/6/2019 Esx Presentation

    20/64

    VMware ESX 3.5 FeaturesVMware ESX 3.5 Features Storage VMotionStorage VMotion: simplifies array: simplifies array

    migration and upgrade tasks andmigration and upgrade tasks andreduces I/O bottlenecks by movingreduces I/O bottlenecks by movingvirtual machines to the best availablevirtual machines to the best availablestorage resource in your environment.storage resource in your environment.

    Provisioning across datacentersProvisioning across datacentersVirtualCenter 2.5 allows you toVirtualCenter 2.5 allows you toprovision virtual machines acrossprovision virtual machines acrossdatacenters.datacenters. Administrators can now clone a virtualAdministrators can now clone a virtual

    machine on one datacenter to anothermachine on one datacenter to anotherdatacenter. Templates can now bedatacenter. Templates can now becloned between datacenters. You cancloned between datacenters. You canalso perform a cold migration of aalso perform a cold migration of avirtual machine across datacenters.virtual machine across datacenters.

    SRM Site Recovery ManagerSRM Site Recovery Manager

    Support for 10GigE cards is nowavailable

    http://pubs.vmware.com/vi3/wwhelp/wwhimpl/js/html/wwhelp.htm

  • 8/6/2019 Esx Presentation

    21/64

    VMware ESX 3.X FeaturesVMware ESX 3.X Features

    DRSDRS Allocates and balances computing capacity dynamically acrossAllocates and balances computing capacity dynamically acrosscollections of hardware resources for virtual machinescollections of hardware resources for virtual machines Policy can be created to determine migration thresholdsPolicy can be created to determine migration thresholds Automation level (Manual, Partially Automated, Fully Automated)Automation level (Manual, Partially Automated, Fully Automated)

    Manual/Partially Automated will display recommendations of candidates to moveManual/Partially Automated will display recommendations of candidates to move Use Manual model to get startedUse Manual model to get started Set custom levels (exclude specific VMs, set affinities)Set custom levels (exclude specific VMs, set affinities)

    VMware HAVMware HA Feature that provides easyFeature that provides easy--toto--use, costuse, cost--effective higheffective highavailability for applications running in virtual machines. In the event ofavailability for applications running in virtual machines. In the event ofserver failure, affected virtual machines are automatically restarted on otherserver failure, affected virtual machines are automatically restarted on otherproduction servers that have spare capacity.production servers that have spare capacity.

    VMware VMotionVMware VMotion Feature that enables the live migration of runningFeature that enables the live migration of runningvirtual machines from one physical server to another with zero down time,virtual machines from one physical server to another with zero down time,continuous service availability, and complete transaction integritycontinuous service availability, and complete transaction integrity

  • 8/6/2019 Esx Presentation

    22/64

    New With ESX 3.5New With ESX 3.5

    Storage VMotionStorage VMotion: simplifies array migration and upgrade tasks and reduces I/O bottlenecks: simplifies array migration and upgrade tasks and reduces I/O bottlenecksby moving virtual machines to the best available storage resource in your environment.by moving virtual machines to the best available storage resource in your environment. you can move a Powered On VMware Guest virtual machine, from one ESX host AND fromyou can move a Powered On VMware Guest virtual machine, from one ESX host AND from

    one ESX datastore (say one SAN to another or from one local host to a SAN) with noone ESX datastore (say one SAN to another or from one local host to a SAN) with nodowntimedowntime

    Provisioning across datacentersProvisioning across datacentersVirtualCenterVirtualCenter 2.5 allows you to provision virtual2.5 allows you to provision virtualmachines across datacenters.machines across datacenters.

    As a result, VMware Infrastructure administrators can now clone a virtualAs a result, VMware Infrastructure administrators can now clone a virtualmachine on one datacenter to another datacenter. You can also clone a virtualmachine on one datacenter to another datacenter. You can also clone a virtualmachine on one datacenter to a template on another datacenter. Templates canmachine on one datacenter to a template on another datacenter. Templates cannow be cloned between datacenters. You can also perform a cold migration of anow be cloned between datacenters. You can also perform a cold migration of avirtual machine across datacenters.virtual machine across datacenters.

    VMotion with local swap ESXVMotion with local swap ESX 3.5 and VC 2.5 now provides for swap files to be3.5 and VC 2.5 now provides for swap files to bestored on local storage and still provides VMotion.stored on local storage and still provides VMotion.

    Users can configure a swap datastore policy at the host or cluster level, althoughUsers can configure a swap datastore policy at the host or cluster level, althoughthe policy can be overwritten by the virtual machine configuration.the policy can be overwritten by the virtual machine configuration.

    During a VMotion migration or a failoverDuring a VMotion migration or a failover for virtual machines with swap files on local storage, if localfor virtual machines with swap files on local storage, if localstorage on destination is selected, the virtual machine swap file is recreated.storage on destination is selected, the virtual machine swap file is recreated.

    The creation time for the virtual machine swap file depends on local disk I/O (or if too many concurrent virtual machines areThe creation time for the virtual machine swap file depends on local disk I/O (or if too many concurrent virtual machines arestarting due to an ESX Server host failover with VMware HA).starting due to an ESX Server host failover with VMware HA).

  • 8/6/2019 Esx Presentation

    23/64

    VMware ESX 3.5 Features (contd)VMware ESX 3.5 Features (contd)

    VMware Site Recovery ManagerVMware Site Recovery Manager To Automate D/R recovery processTo Automate D/R recovery process

    SRM guides users through the process of creating, automating, andSRM guides users through the process of creating, automating, and

    testing disaster recovery plans for their virtual infrastructure.testing disaster recovery plans for their virtual infrastructure. SRM will require typically 30% more storage for Snapshots for testingSRM will require typically 30% more storage for Snapshots for testingyour recovery plans and maybe more.your recovery plans and maybe more.

    It works in conjunction with your SAN mirroring technology e.g.,It works in conjunction with your SAN mirroring technology e.g.,Clariion Mirror View.Clariion Mirror View.

    VMware Update ManagerVMware Update Manager

    With Update Manager, you can automatically patch ESX hostsWith Update Manager, you can automatically patch ESX hostsAND (get this) virtual guest machines!AND (get this) virtual guest machines! It will also ensure that you dont boot too many machinesIt will also ensure that you dont boot too many machines

    simultaneously which was something you previously needed tosimultaneously which was something you previously needed tomanually schedulemanually schedule

  • 8/6/2019 Esx Presentation

    24/64

  • 8/6/2019 Esx Presentation

    25/64

    DRS Migration levelsDRS Migration levels

    Level Stars

    1=Most Conservative 5 or more Stars

    2=Moderately Conservative 4 or more Stars

    3=Midpoint 3 or more Stars

    4=Moderately Aggressive 2 or more Stars

    5=Most Aggressive 1 or more Stars

    Examples of Migration Recommendations when in Manual or Partially Automatic Mode:Balance Average CPU LoadsBalance Average Memory LoadsSatisfy Affinity Rule

    Satisfy Anti-Affinity Rule

  • 8/6/2019 Esx Presentation

    26/64

    VMware Availability and HAVMware Availability and HA Detects server failuresDetects server failures

    automatically, using aautomatically, using aheartbeat on servers.heartbeat on servers.

    Monitors capacity continuouslyMonitors capacity continuouslyto ensure space is alwaysto ensure space is alwaysavailable to restart virtualavailable to restart virtualmachines in the event ofmachines in the event ofserver failure.server failure.

    Restarts virtual machines on aRestarts virtual machines on adifferent physical server withindifferent physical server withinthe same resource pool.the same resource pool.

    If used in conjunction withIf used in conjunction withVMware DRS it will loadVMware DRS it will loadbalance and choose the hostbalance and choose the hoston which to load the VMs.on which to load the VMs.

  • 8/6/2019 Esx Presentation

    27/64

  • 8/6/2019 Esx Presentation

    28/64

    HA Capacity ConsiderationsHA Capacity Considerations

    MemoryMemory Memory requirements must be satisfied forMemory requirements must be satisfied for

    all reservations + Overhead for each VMall reservations + Overhead for each VM

    to be powered on.to be powered on.

    HA memory requirements would need to beHA memory requirements would need to besatisfied on the host with smallest memorysatisfied on the host with smallest memory

    configuration.configuration.

    Memory overheadMemory overheadfor each VM:for each VM:

  • 8/6/2019 Esx Presentation

    29/64

    DRS and HA Considerations whenDRS and HA Considerations when

    the Cluster is constrainedthe Cluster is constrained ReRe--start Prioritystart Priority

    Indicates relative priority for reIndicates relative priority for re--starting virtual machine in casestarting virtual machine in caseof host failureof host failure

    Highest priority VMs will start firstHighest priority VMs will start first If sufficient resources arent available then lower priority VMs wontIf sufficient resources arent available then lower priority VMs wont

    startstart--upup

    ReRe--start priority is especially important if there isntstart priority is especially important if there isntsufficient capacitysufficient capacity Example: You have turned off admission control which allowsExample: You have turned off admission control which allows

    you to operate when HA rules are violated but you have moreyou to operate when HA rules are violated but you have moreVMs then there is capacity forVMs then there is capacity for

    Example: Failure capacity is set to 1 but there are 2 hosts whichExample: Failure capacity is set to 1 but there are 2 hosts whichhave failed and not sufficient capacity.have failed and not sufficient capacity.

  • 8/6/2019 Esx Presentation

    30/64

    CPU MetricsCPU Metrics

    CPU Usage (%)CPU Usage (%) This is cumulative % used across all CPUs in the host.This is cumulative % used across all CPUs in the host. CPU Usage in MHzCPU Usage in MHz This counter identifies the total CPU usage in MHzThis counter identifies the total CPU usage in MHz

    used onused on This counter is useful for normalizing % CPU for capacity planning when there are hostsThis counter is useful for normalizing % CPU for capacity planning when there are hosts

    with differing CPU speeds or core counts in the same cluster.with differing CPU speeds or core counts in the same cluster. % Ready (in ESXTOP, per VM)% Ready (in ESXTOP, per VM) This identifies the amount of time aThis identifies the amount of time a

    VM is ready to run, but cannot because VMkernel is unable to schedule the VMVM is ready to run, but cannot because VMkernel is unable to schedule the VMprocess on a physical CPU. High % Ready typically means there is CPU contentionprocess on a physical CPU. High % Ready typically means there is CPU contentionon the ESX host.on the ESX host.

    CPU Ready (in Virtual Center per VM)CPU Ready (in Virtual Center per VM) This counter reflects theThis counter reflects thesame information as % Ready in ESXTOP;same information as % Ready in ESXTOP;

    Virtual Center reflects this measurement in milliseconds. The VI Client can view virtualVirtual Center reflects this measurement in milliseconds. The VI Client can view virtualmachines CPU ready time in realmachines CPU ready time in real--time, but historical tracking requires the Virtual Centertime, but historical tracking requires the Virtual Centerlogging level to be set to 3.logging level to be set to 3.

    Calculating % Ready from CPU Ready in Virtual CenterCalculating % Ready from CPU Ready in Virtual Center The CPU Ready value in Virtual Center is displayed in milliseconds and is refreshed everyThe CPU Ready value in Virtual Center is displayed in milliseconds and is refreshed every

    20 seconds.20 seconds.

    CPU Ready time of 280ms over the default refreshCPU Ready time of 280ms over the default refreshinterval (20000ms)interval (20000ms)

    %Ready=(280/20000) * 100 = 1.4%%Ready=(280/20000) * 100 = 1.4%

  • 8/6/2019 Esx Presentation

    31/64

    High % CPU Ready for VMHigh % CPU Ready for VM

    %CPU Ready

    =4.942/20 sec=25%

  • 8/6/2019 Esx Presentation

    32/64

    Memory ManagementMemory Management

    3 mechanisms are utilized for memory management to3 mechanisms are utilized for memory management toexpand/contract amount of memory used by VMsexpand/contract amount of memory used by VMs

    Transparent Page SharingTransparent Page Sharing Redundant virtual machine memory pages are "sharedRedundant virtual machine memory pages are "shared The Vmkernel removes duplicate pages from physical RAMThe Vmkernel removes duplicate pages from physical RAM

    and the page table is adjusted to redirect the virtualand the page table is adjusted to redirect the virtualmachine's virtual page back to the page in RAM.machine's virtual page back to the page in RAM. ThisThiseeliminates redundant pages in physical memoryliminates redundant pages in physical memory

    SwappingSwapping Used to forcibly reclaim memory (ESX decides not Guest)Used to forcibly reclaim memory (ESX decides not Guest)

    BallooningBallooningVMmemctl module loaded into the guest operating system.VMmemctl module loaded into the guest operating system.

    Guest OS determines what to pages/swap outGuest OS determines what to pages/swap out

    Part of VMware tools, guest OS must be configured withPart of VMware tools, guest OS must be configured withsufficient swap space.sufficient swap space.

  • 8/6/2019 Esx Presentation

    33/64

    MemoryMemory

    Memory GrantedMemory Granted this is the amount of memory that the vmkernelthis is the amount of memory that the vmkernelhas allocated to all virtual machines running on the server.has allocated to all virtual machines running on the server. This represents a rough estimate of how much RAM will be needed forThis represents a rough estimate of how much RAM will be needed for

    another host to spin up the VMs.another host to spin up the VMs.

    Memory Usage %Memory Usage % -- This metric identifies the memory used on the hostThis metric identifies the memory used on the host

    as a percentage, based onas a percentage, based on Memory ConsumedMemory Consumed divided by the totaldivided by the totalmemory in the ESX host.memory in the ESX host. Memory ConsumedMemory Consumed How much memory is actually being used by theHow much memory is actually being used by the

    VMs.VMs. This takes into account transparent page sharing, zero pages, vmkernel andThis takes into account transparent page sharing, zero pages, vmkernel and

    service console memory usage, and virtualization overhead.service console memory usage, and virtualization overhead.

    Memory BalloonMemory Balloon This counter should be zero under normalThis counter should be zero under normalcircumstances. Otherwise, it indicates the system is under memorycircumstances. Otherwise, it indicates the system is under memoryconstraints and has begun borrowing memory from virtual machines toconstraints and has begun borrowing memory from virtual machines tomeet demands.meet demands.

    Memory Swap UsedMemory Swap Used This counter should always be zero. MemoryThis counter should always be zero. Memoryswapping is used as a last resort, so if swapping currently exists on theswapping is used as a last resort, so if swapping currently exists on thehost it means memory is severely overcommitted.host it means memory is severely overcommitted.

    Memory ZeroMemory Zero The amount of memory that is not being used by theThe amount of memory that is not being used by theallocationallocation This metric can be helpful in identifying VMs that may have been configured for more RAM thanThis metric can be helpful in identifying VMs that may have been configured for more RAM than

    necessarynecessary

  • 8/6/2019 Esx Presentation

    34/64

    Memory Usage Displayed from VCMemory Usage Displayed from VC

    Shared=1GB

    Granted=24 GB

    Active=3.5 GB

    Ovhd=2 GB

    Swap=0Zero=830 MB

  • 8/6/2019 Esx Presentation

    35/64

    Disk Usage MetricsDisk Usage Metrics

    Disk Usage (KBps)Disk Usage (KBps) This number isThis number iscumulative across all HBAs in the host.cumulative across all HBAs in the host.

    VMFS Volume Free SpaceVMFS Volume Free Space This is monitoredThis is monitoredby querying either an ESX host that has accessby querying either an ESX host that has accessto the VMFS, or by querying the Virtual Centerto the VMFS, or by querying the Virtual Centerdatabase. The former will provide the most updatabase. The former will provide the most upto date information.to date information.

    SAN Performance DataSAN Performance Data Response time per LUN (read time, write time), LUNResponse time per LUN (read time, write time), LUN

    queue length, read vs write activity, IOPs per LUNqueue length, read vs write activity, IOPs per LUN

  • 8/6/2019 Esx Presentation

    36/64

  • 8/6/2019 Esx Presentation

    37/64

    VM CPU UsageVM CPU Usage (max=100%)(max=100%)

    CPU Usage near 100%

  • 8/6/2019 Esx Presentation

    38/64

    CPU Usage on 5CPU Usage on 5--66

    CPU Usage near 100%

  • 8/6/2019 Esx Presentation

    39/64

    %CPU Ready DEV 35 SIT%CPU Ready DEV 35 SITCompute%CPU Ready=CPU Read time (ms)/20,000*100 should be

    12%

  • 8/6/2019 Esx Presentation

    40/64

    Windows Performance CPUWindows Performance CPU

    Queue length: Avg:12 Max=47Queue length: Avg:12 Max=47

  • 8/6/2019 Esx Presentation

    41/64

    Determining Processor ResourceDetermining Processor Resource

    RequirementsRequirements MHz or SPECMHz or SPEC--INTs vs CPU busy:INTs vs CPU busy:

    VMware reports MHz used is a normalized metricVMware reports MHz used is a normalized metric

    and is used to determine capacity requirements.and is used to determine capacity requirements. SPECSPEC--Ints would be a better mechanismInts would be a better mechanism

    E.g. Total capacity is 2.6 GHz*number of CPUsE.g. Total capacity is 2.6 GHz*number of CPUs Each VM is consuming approximately 200 MHzEach VM is consuming approximately 200 MHz How many VMs can you support on 2How many VMs can you support on 2--dual core 2.6 GHzdual core 2.6 GHz

    processors?processors? Need to understand overheadNeed to understand overhead

    As you add VMs, you also add overhead.As you add VMs, you also add overhead.

  • 8/6/2019 Esx Presentation

    42/64

    CPU Usage as VMs are AddedCPU Usage as VMs are Added

    6 VMs were added on 106 VMs were added on 10--1414 8 VMs were added on 108 VMs were added on 10--2121 CPU usage increased from 8% to 20%CPU usage increased from 8% to 20% Minimal Overhead as we increased from 8 to 14 VMs perMinimal Overhead as we increased from 8 to 14 VMs per

    hosthost

    CPU MHz per VM at 6 VMs=143.8 MHzCPU MHz per VM at 6 VMs=143.8 MHz CPU MHz per VM at 14 VMs=153.8 MHzCPU MHz per VM at 14 VMs=153.8 MHz

    What is CPU requirement per VM at 20 VMs or 30 VMs?What is CPU requirement per VM at 20 VMs or 30 VMs? NonNon--linear increase in overhead. Need to benchmark it.linear increase in overhead. Need to benchmark it.

  • 8/6/2019 Esx Presentation

    43/64

  • 8/6/2019 Esx Presentation

    44/64

    Benchmarking and Loading yourBenchmarking and Loading your

    environmentenvironment HOW?HOW?

    Slowly load up one host with a mix of workloadsSlowly load up one host with a mix of workloads--Large, Medium, Small and measure the resourceLarge, Medium, Small and measure the resourceconsumption and overhead as you load.consumption and overhead as you load.

    Look for a measure of user experienceLook for a measure of user experience-- ETE R/TETE R/T SQL: SQL R/T, Web servers: Page Load times, other ETESQL: SQL R/T, Web servers: Page Load times, other ETE

    Measure CPU usage, and overhead as you loadMeasure CPU usage, and overhead as you load

    Measure the memory consumptionMeasure the memory consumption Measure I/O throughput and I/O response time as the loadMeasure I/O throughput and I/O response time as the loadincreases. (Use data from SAN tools)increases. (Use data from SAN tools)

    Remember dont have true MPIORemember dont have true MPIO

  • 8/6/2019 Esx Presentation

    45/64

  • 8/6/2019 Esx Presentation

    46/64

    Processor Resource Usage by VMProcessor Resource Usage by VM

    Past WeekPast Week

  • 8/6/2019 Esx Presentation

    47/64

    Processor Usage MHz vs % BusyProcessor Usage MHz vs % Busy

  • 8/6/2019 Esx Presentation

    48/64

  • 8/6/2019 Esx Presentation

    49/64

    CPU CapacityCPU Capacity

    Monitor Physical CPU usage and %ReadyMonitor Physical CPU usage and %Readyfor all VMsfor all VMs

    If %Ready >5% and starts to approachIf %Ready >5% and starts to approach10% delay the host is saturated. It10% delay the host is saturated. Itdoesnt have to be running at 100%doesnt have to be running at 100%

    physical utilization.physical utilization. Monitor usage of VMkernelMonitor usage of VMkernel Monitor System Services (onlyMonitor System Services (only

    dispatchable on CPU 0)dispatchable on CPU 0)

  • 8/6/2019 Esx Presentation

    50/64

    VMware Distributed ResourceVMware Distributed ResourceScheduler (DRS) and CPUScheduler (DRS) and CPU

    saturationsaturation Verify that all systems in the DRS cluster are carryingVerify that all systems in the DRS cluster are carrying

    load when the server you are interested in is overloaded.load when the server you are interested in is overloaded. Change the aggressiveness of the DRS algorithm.Change the aggressiveness of the DRS algorithm.

    Review VM reservations for CPU and Memory againstReview VM reservations for CPU and Memory againstother hosts in the cluster to ensure that virtual machinesother hosts in the cluster to ensure that virtual machinescan migrate (e.g., sufficient capacity available aftercan migrate (e.g., sufficient capacity available afterreservations are satisfied).reservations are satisfied).

    Increase the number of Hosts (servers) in the DRSIncrease the number of Hosts (servers) in the DRScluster so virtual machines from the server you arecluster so virtual machines from the server you areevaluating can migrate to servers with availableevaluating can migrate to servers with availableresources.resources. Or, reduce the number of VMs in the Cluster and move them toOr, reduce the number of VMs in the Cluster and move them to

    a different Cluster.a different Cluster.

  • 8/6/2019 Esx Presentation

    51/64

    Improving Capacity for DRS ClusterImproving Capacity for DRS Cluster

    (Options)(Options) Increase host capacity (add more physical processorsIncrease host capacity (add more physical processors

    and/or upgrade to have additional cores and overalland/or upgrade to have additional cores and overallcapacity)capacity)

    Tune the VMs (reduce I/O or network requirements,Tune the VMs (reduce I/O or network requirements,change backchange back--up strategy, reduce 3up strategy, reduce 3rdrd party software toolsparty software toolsoverhead)overhead)

    Use latest copy of VMware toolsUse latest copy of VMware tools Reduce the Virtual CPU count for VMs that cannot takeReduce the Virtual CPU count for VMs that cannot take

    full advantage of multiple cores.full advantage of multiple cores. Most efficient to restrict the number of logical processors to aMost efficient to restrict the number of logical processors to a

    VM to 1.VM to 1.

  • 8/6/2019 Esx Presentation

    52/64

    If a VM is CPU or MemoryIf a VM is CPU or Memory

    ConstrainedConstrained Balance resources within the Cluster and moveBalance resources within the Cluster and move

    VMs or use DRS more aggressivelyVMs or use DRS more aggressively Add capacity or add more logical CPUs for theAdd capacity or add more logical CPUs for the

    VMVM Adjust VMs priority by increasing its sharesAdjust VMs priority by increasing its shares

    and increasing its CPU or memoryand increasing its CPU or memoryreservation.reservation.

    Use faster processorsUse faster processors Review CPU usage and CPU constraints fromReview CPU usage and CPU constraints fromwithin the VMs perspective e.g, Windows metricswithin the VMs perspective e.g, Windows metrics CPU utilization, CPU queue lengthCPU utilization, CPU queue length Review % CPU readyReview % CPU ready

  • 8/6/2019 Esx Presentation

    53/64

    Is system memory constrained?Is system memory constrained?

    Use ESXtop or Virtual Center and measure:Use ESXtop or Virtual Center and measure: SwappingSwapping-- should be zeroshould be zero Review Ballooning should be zero and not invoked (only used to recoverReview Ballooning should be zero and not invoked (only used to recover

    memory)memory) Review ovhd for each VM this value will change and is needed for the VMkernelReview ovhd for each VM this value will change and is needed for the VMkernel

    depending upon the number of VMs on the host and the number of aggregatedepending upon the number of VMs on the host and the number of aggregatevirtual CPUsvirtual CPUs Provide memory reservations for critical applications and perhaps for databaseProvide memory reservations for critical applications and perhaps for database

    systems.systems. Reduce the memory footprint for VMs not needing or using the total grantedReduce the memory footprint for VMs not needing or using the total granted

    memory.memory. This will free memory for other VMs.This will free memory for other VMs. Review the Templates for memory assignment and consider deploying additionalReview the Templates for memory assignment and consider deploying additional

    templates.templates. Review VM memory metrics from within the VMs memory statistics e.g.,Review VM memory metrics from within the VMs memory statistics e.g.,

    Windows metrics.Windows metrics. Paging, Available Memory, Committed bytes, Total Working setPaging, Available Memory, Committed bytes, Total Working set

  • 8/6/2019 Esx Presentation

    54/64

  • 8/6/2019 Esx Presentation

    55/64

  • 8/6/2019 Esx Presentation

    56/64

    What Data to Look at if you have aWhat Data to Look at if you have a

    problem ?problem ?Memory and I/OMemory and I/O With Windows GuestsWith Windows Guests-- monitor the number of machinesmonitor the number of machines

    you are able to reyou are able to re--boot simultaneouslyboot simultaneously

    Look to load balance datastores and I/O load across theLook to load balance datastores and I/O load across thehosts within a clusterhosts within a cluster Monitor I/O usage over time by VM and monitor memoryMonitor I/O usage over time by VM and monitor memory

    usageusage Windows applications relying on file cache or SQL serverWindows applications relying on file cache or SQL server

    Cache will perform more I/O if they are memoryCache will perform more I/O if they are memoryconstrained.constrained. Ensure that you have provided sufficient memory from yourEnsure that you have provided sufficient memory from your

    server candidate analysis.server candidate analysis.

  • 8/6/2019 Esx Presentation

    57/64

  • 8/6/2019 Esx Presentation

    58/64

    Best PracticesBest Practices-- PerformancePerformanceCPUCPU

    Account for the CPU overhead and planAccount for the CPU overhead and planaccordingly by adjusting the # of VMs on an ESXaccordingly by adjusting the # of VMs on an ESXhost.host.

    Monitor overhead across the ESX hosts. EnsureMonitor overhead across the ESX hosts. Ensurethat VMs are installed with the correct HALthat VMs are installed with the correct HALversionversion uni vs. smpuni vs. smp

    Consider setting reservations for criticalConsider setting reservations for criticalsystems.systems.

    The guest operating system timer rate can haveThe guest operating system timer rate can havean impact on performance.an impact on performance. Linux guests keep time by counting timer interrupts. TheLinux guests keep time by counting timer interrupts. The

    overhead of delivering so many virtual clock interruptsoverhead of delivering so many virtual clock interruptscan negatively impact guest performance and increasecan negatively impact guest performance and increasehost CPU consumption.host CPU consumption.

    Time S nchronizationTime S nchronization-- Install VMware tools in uest OS.Install VMware tools in uest OS.

  • 8/6/2019 Esx Presentation

    59/64

  • 8/6/2019 Esx Presentation

    60/64

  • 8/6/2019 Esx Presentation

    61/64

    Best Practices SummaryBest Practices Summary Start Small but Think BIG!Start Small but Think BIG!

    Build your core team and train them well!Build your core team and train them well! Create a core team from multiple disciplines and architecture groups toCreate a core team from multiple disciplines and architecture groups to

    establish the design based on availability and service levelestablish the design based on availability and service levelconsiderations.considerations.

    Understand Change, Problem and Incident Management in the newUnderstand Change, Problem and Incident Management in the newvirtualized world.virtualized world.

    Start in testStart in test Develop a pilot project for productionDevelop a pilot project for production Measure the impact to the end user: End user response timeMeasure the impact to the end user: End user response time

    Make sure that you did a good job in identifying Candidates forMake sure that you did a good job in identifying Candidates forvirtualizationvirtualization

    Make sure that you measured peak requirements for the workloadsMake sure that you measured peak requirements for the workloads CPU, I/O load to the SAN and Memory requirements.CPU, I/O load to the SAN and Memory requirements. Measure resource requirements after patching and rebooting VMsMeasure resource requirements after patching and rebooting VMs

  • 8/6/2019 Esx Presentation

    62/64

    Best Practices Summary (contd)Best Practices Summary (contd)

    Determine what Service Level you need to meetDetermine what Service Level you need to meet.. I/O Performance and I/O availabilityI/O Performance and I/O availability

    A SAN is required in order to realize the benefits ofA SAN is required in order to realize the benefits ofmoving a VM to another host.moving a VM to another host.

    Data centers, Hosts, Clusters, Resource Pools and Virtual machinesData centers, Hosts, Clusters, Resource Pools and Virtual machines The size of the clusters that will work together forThe size of the clusters that will work together forvmotionvmotion

    Identify and plan for High Availability (HA) and D/R requirementsIdentify and plan for High Availability (HA) and D/R requirements Identify application/system requirements and any single pointsIdentify application/system requirements and any single points

    of failureof failure

    Account for addl capacity requirements to satisfy HAAccount for addl capacity requirements to satisfy HA DRS is not a substitute for capacity planning.DRS is not a substitute for capacity planning.

    DRS automation level: begin with a conservative settingDRS automation level: begin with a conservative setting

  • 8/6/2019 Esx Presentation

    63/64

    Capacity Planning Summary DRS and HACapacity Planning Summary DRS and HA

    These new paradigms do not replace CapacityThese new paradigms do not replace CapacityPlanning and MeasurementsPlanning and Measurements

    Good capacity planning practices are requiredGood capacity planning practices are requiredand more criticaland more critical Remember: True resource usage is only providedRemember: True resource usage is only provided

    by ESX 3.5by ESX 3.5

    Monitor resource usage over time and performMonitor resource usage over time and performworkload balancing based on businessworkload balancing based on businessinformation and resource usageinformation and resource usage

    Understand overhead as VMs are added toUnderstand overhead as VMs are added tohost and clusterhost and cluster

    Ensure Sufficient capacity for failEnsure Sufficient capacity for fail--over for HAover for HA Manage resources at the cluster level.Manage resources at the cluster level.

  • 8/6/2019 Esx Presentation

    64/64

    References and ToolsReferences and Tools

    www.vmware.com/support/pubs//viwww.vmware.com/support/pubs//vi pubs.htmlpubs.html www.teamquest.comwww.teamquest.com (used by Merrill Lynch)(used by Merrill Lynch) www.perfman.comwww.perfman.com (used by BCBS(used by BCBS--NJ)NJ) www.vkernel.comwww.vkernel.com (Vmware only)(Vmware only) www.cirba.comwww.cirba.com www.uptime.comwww.uptime.com www.sysload.comwww.sysload.com www.bmc.comwww.bmc.com (used at UPS)(used at UPS) www.www.metronmetron.co.uk.co.uk (used by Tokyo)(used by Tokyo)