Leveraging VMware Storage I/O Control in CloudStack

19
Leveraging VMware SIOC in CloudStack

Transcript of Leveraging VMware Storage I/O Control in CloudStack

Page 1: Leveraging VMware Storage I/O Control in CloudStack

Leveraging VMware SIOC in CloudStack

Page 2: Leveraging VMware Storage I/O Control in CloudStack

Mike Tutkowski (on Twitter as @mtutkowski)- Full-time CloudStack software engineer, CloudStack PMC member

- Focused on CloudStack's storage component

NetApp SolidFire (http://www.solidfire.com/)- Based out of Boulder, CO, USA

- Develop a scale-out SAN technology (using industry-standard hardware)

- Built from the ground up to support guaranteed Quality of Service (QoS) on a per-volume (logical unit) basis (min, max, and burst IOPS per volume)

- All-SSD architecture (no spinning disks)

- Leverage compression, de-duplication, and thin provisioning (all inline) on a 4-KBblock boundary across the entire cluster to drive down cost/GB to be on par withtraditional disk-based SANs

- Rest-like API to enable automation of all aspects of the SAN

Page 3: Leveraging VMware Storage I/O Control in CloudStack

CloudStack fromthe top down

Page 4: Leveraging VMware Storage I/O Control in CloudStack

Primary Storage Secondary Storage

Objectives Storage for VM root and data disks Data to be stored for future retrieval

Use Cases • Production Applications• Traditional IT Systems• Database-Driven Apps• Messaging / Collaboration• Dev/Test Systems

• VM Templates• ISO Images• Backups of Volumes

Workloads • High-Change Content• Smaller, Random R/W• Higher / “Bursty” IO

• Typically More Static Content• Larger, Sequential IO (more read

than write)• Lower IOPS

Storage Use Cases & Workloads

Page 5: Leveraging VMware Storage I/O Control in CloudStack

• Managed Primary Storage

● Can exist at the Zone or Cluster Level

● Supports a 1:1 mapping between a virtual disk and a backend volume

● A virtual disk can be assigned QoS that's directly supported by thebackend volume (ex. if the backend volume gets 500 4-KB IOPS,then so does the virtual disk that makes use of that backend volume)

● Allows for fast and space-efficient snapshots that reside on primarystorage

What is Ideal Primary Storage?

Page 6: Leveraging VMware Storage I/O Control in CloudStack

• Some hypervisors have important limitations to take intoconsideration.

● vSphere/ESXi: Only supports 256 – 512 datastores per cluster(depending on version).

● XenServer: Only supports around 200 – 600 storagerepositories per cluster (depending on version).

● KVM: No pertinent limit here. You can use managed storage forall virtual disks.

Why not use Managed Storage all the time?

Page 7: Leveraging VMware Storage I/O Control in CloudStack

• VMware has a relatively new feature in production called VVols.

• VVols is essentially VMware's version of what CloudStack calls“managed storage.”

• In VVols, each virtual disk can be backed by its own volume on a storagesystem.

• CloudStack does not yet support VVols.

• In the meanwhile, we can enhance CloudStack's QoS capabilities whenusing vSphere by leveraging VMware Storage I/O Control.

Focusing on vSphere

Page 8: Leveraging VMware Storage I/O Control in CloudStack

• Hypervisor-based QoS: VMware's technique for rate-limiting virtual diskIO and/or balancing the available IO of a datastore across the virtualdisks in use within it.

• Two of the primary SIOC control knobs:● Limit IOPS● Disk Resource Shares

• Limit IOPS = Simply rate limiting. This allows you to limit the amount ofIO that a virtual disk can perform per second.

• Disk Resource Shares = When the datastore is not able to draw asufficient amount of performance from the backend volume that supportsit, SIOC uses this variable to determine how much attention to give onevirtual disk relative to the others.

What is VMware Storage I/O Control (SIOC)?

Page 9: Leveraging VMware Storage I/O Control in CloudStack

• SolidFire volume with 3,000 32-KB IOPS• VMware SIOC-enabled datastore using SolidFire volume

● Virtual Disk 1: Limit IOPS = 1,000 (32-KB IOPS)● Virtual Disk 2: Limit IOPS = 2,000 (32-KB IOPS)

• 1,000 + 2,000 = 3,000 (no need to refer to disk resource shares because thedatastore always has enough performance for both virtual disks simultaneously)

• SolidFire volume with 3,000 32-KB IOPS• VMware SIOC-enabled datastore using SolidFire volume

● Virtual Disk 1: Limit IOPS = 1,000 (32-KB IOPS)● Virtual Disk 2: Limit IOPS = 2,500 (32-KB IOPS)

• 1,000 + 2,500 = 3,500 (in cases where the datastore does not have enoughperformance to support the IO needs of both virtual disks simultaneously, thedisk resource shares of the virtual disks are consulted)

Simple Examples of SIOC in Action

Page 10: Leveraging VMware Storage I/O Control in CloudStack

• mClock, by default, makes use of a 32-KB boundary for IO.

● Examples of calculating max throughput: Virtual disk with 1,000 Limit IOPS

● 8 KB IO = 1,000 / Ceil(8 / 32) = 1,000 IOPS● 1,000 IOPS * 8 KB = 8,000 KB

● 16 KB IO = 1,000 / Ceil(16 / 32) = 1,000 IOPS● 1,000 IOPS * 16 KB = 16,000 KB

● 32 KB IO = 1,000 / Ceil(32 / 32) = 1,000 IOPS● 1,000 IOPS * 32 KB = 32,000 KB

● 64 KB IO = 1,000 / Ceil(64 / 32) = 500 IOPS● 500 IOPS * 64 KB = 32,000 KB

mClock Disk Scheduler (ESXi 5.5 - Present)

Page 11: Leveraging VMware Storage I/O Control in CloudStack

● SIOC is relevant for active virtual disks only as inactive virtual disks donot require performance resources.

● Each VMDK file attached to a VM (whether the VM is running or not) hasthe Limit IOPS and Disk Resource Shares fields.

● CloudStack does not set those fields. As such, those fields have defaultvalues: Limit IOPS = Unlimited; Disk Resource Shares = 1,000.

● In the current implementation, the CloudStack API Plug-in for SIOC canupdate these two fields for any virtual disk that belongs to a VM thatCloudStack has in its database.

SIOC Notes

Page 12: Leveraging VMware Storage I/O Control in CloudStack

● CloudStack is not aware of all vSphere VMs.

● Temporary “worker” VMs are used for the following background tasks:

● Copying a template from secondary storage to primary storage

● Copying a VM snapshot from primary storage to secondary storage

● Those worker activities require datastore performance and there is noway to limit how many of these VMs are running concurrently (not withthe simple introduction of a new API plug-in).

Current Issue with SIOC in CloudStack

Page 13: Leveraging VMware Storage I/O Control in CloudStack

Let's say we have a goal to provide each virtual disk that's on a givendatastore with 10 4-KB IOPS per GB.

● Create a SolidFire volume with 15,000 4-KB IOPS

● Determine size of volume: 15,000 IOPS / (10 IOPS per GB) = 1,500 GB

● 75% of datastore for foreground disks: 1,125 GB and 11,250 4-KB IOPS

● This leaves the following amount of performance for backend disks: 15,000IOPS – 11,250 IOPS = 3,750 IOPS

● CloudStack can notify you when a primary storage (in this case, adatastore) reaches a certain percentage full.

● The SolidFire API enables you to query for volume stats such as actualIOPS and average IOPS size.

Creating a Backend Volume for an SIOC Datastore

Page 14: Leveraging VMware Storage I/O Control in CloudStack

On the chance that there is a sufficient amount of IO to the volume thatSIOC detects latency above a (configurable) threshold, the disk resourceshares of the virtual disks are utilized.

● Background virtual disks have this value set to 1,000.

● Foreground virtual disks can have this value set anywhere from 2,000– 4,000 (based on their size).

● Since foreground virtual disks always have their disk resource sharesset at least twice as high as that of background virtual disks, they getat least twice as many IOs during this contention state.

SIOC: Falling back on Disk Resource Shares

Page 15: Leveraging VMware Storage I/O Control in CloudStack

● CloudStack does not support Datastore Clusters.

● If you'd like to create a datastore with more IOPS than is possible withone backend volume, you can create a datastore with multiple extents(each extent is a backend volume).

Creating Large Datastores

Page 16: Leveraging VMware Storage I/O Control in CloudStack

cloudmonkey updateSiocInfo zoneid=1 storagetag=SIOC-10sharespergb=10 limitiopspergb=10

For all VMFS-based datastores with the storage tag SIOC-10 For each volume in this storage pool (datastore) If the volume is attached to a VM Store this VM name in a list

For each VM name in the list For each of its virtual disks If the virtual disk is on the datastore (storage pool) we are looking at Update the Limit IOPS and Disk Resource Shares of the virtual disk

Note: A virtual disk must be SCSI based to change Limit IOPS or DiskResource Shares if the VM the virtual disk is attached to is running.Global Setting: vmware.root.disk.controller = SCSI

Invoking CloudStack API Plug-in for SIOC

Page 17: Leveraging VMware Storage I/O Control in CloudStack

cloudmonkey updateSiocInfo zoneid=1 storageid=12 sharespergb=10limitiopspergb=10 iopsnotifythreshold=15000

// Use similar logic to previous slide, but also keep track of the sum of the// Limit IOPS of each virtual disk.// Then, go on to update SIOC values and count Limit IOPS for applicable VMs whose// name doesn't start with “VM-” (these are worker VMs).

for each VM in zoneid if VM's name doesn't start with “VM-” for each virtual disk if virtual disk is on storageid

set limit_IOPS* and disk_resource_shares** add limit_IOPS to total_limit_IOPS

if total_limit_IOPS < iopsnotifythreshold send text in response indicating OKelse send text in response indicating alert state

* (10 IOPS per GB * size of virtual disk)** Min(10 shares per GB * size of virtual disk + 2,000, 4000)

Invoking CloudStack API Plug-in for SIOC

Page 18: Leveraging VMware Storage I/O Control in CloudStack

Limit IOPS is consolidated per virtual machine per datastore.

Examples for a single virtual machine with four virtual disks (no other VMs in this environment):

Example 1: All virtual disks located in one datastore. Each virtual disk has Limit IOPS = 100.

As each disk is limited to 100 IOPS, the total IOPS for the datastore is 400. If disks 1, 2, and 3issue 10 IOPS each, disk 4 could issue 370 IOPS without being restricted.

Example 2: Disks 1 and 2 in datastore A; disks 3 and 4 in datastore B. All Limit IOPS are set to100.

The IOPS are consolidated to 200 for datastore A and 200 for datastore B. If disks 1 and 3 issue10 IOPS each, disks 2 and 4 could issue 190 IOPS each without being restricted.

Example 3: All virtual disks located in one datastore. One disk is set to Limit IOPS = Unlimited;all other disks are set to Limit IOPS = 100.

As one of the disks in the datastore is set to Unlimited, the IOPS for the datastore are alsoUnlimited.

Curious SIOC Details

Page 19: Leveraging VMware Storage I/O Control in CloudStack

1620 Pearl Street,Boulder, Colorado 80302

Phone: 720.523.3278Email: [email protected]

www.solidfire.com