How to Fail at VDI

33
BriForum | © TechTarget Welcome

description

BriForum London 2012

Transcript of How to Fail at VDI

Page 1: How to Fail at VDI

BriForum | © TechTarget

Welcome

Page 2: How to Fail at VDI

BriForum | © TechTarget

Dan Brinkmann @dbrinkmann blog.danbrinkmann.comSolutions Architect, VMware vExpertLewan & Associates (Denver, CO)

How to Fail at VDI

Page 3: How to Fail at VDI

“What business problem are we solving?”

BriForum | © TechTarget

Page 4: How to Fail at VDI

4

Business/Expectation VDI Failures

● No business problem● Desktop virtualization is not server virtualization● Saving money● Project in the hands of the vSphere administrator● No success criteria● Assume you know what users do● The same or better experience remotely as locally

BriForum | © TechTarget

Page 5: How to Fail at VDI

BriForum | © TechTarget 5

Agenda

● Compute● Storage● Guessing

To understand what causes VDI failures

Page 6: How to Fail at VDI

6

How to Fail at VDI

● Test with 5 users● Using vendor provided users/core sizing● Using vendor provided IOPs estimates● Ignore anti-virus● Ignore user profile management● Use existing desktop images for physcial PC’s● Guess

BriForum | © TechTarget

The technology failure points

Page 7: How to Fail at VDI

7

Compute

● Multi-threaded apps● Latency sensitive workloads● Hyperthreading● Latency = Health

BriForum | © TechTarget

It’s magic until it stops working

Page 8: How to Fail at VDI

8

Compute

● CPU scheduler in vSphere is entitlement/consumption based, not priority (unlike Windows)

● There is no priority in the CPU scheduler● Given equal entitlement the more a vm/world consumes

the more likely it is to be prempted by another vm/world● http://www.vmware.com/resources/techresources/10131

BriForum | © TechTarget

CPU scheduler in vSphere

Page 9: How to Fail at VDI

9

Compute with a Physical PC

BriForum | © TechTarget

CPU 1

OS/Apps/Profile

Page 10: How to Fail at VDI

10

Compute with Citrix XenApp

BriForum | © TechTarget

OS/Apps/Profile

OS/Apps/Profile

CPU 1 CPU 2

OS/Apps/Profile

OS/Apps/Profile

OS/Apps/Profile

OS/Apps/Profile

OS/Apps/Profile

OS/Apps/Profile

Page 11: How to Fail at VDI

11

Compute with VDI

BriForum | © TechTarget

CPU 1 CPU 2

Page 12: How to Fail at VDI

12

vSphere Compute

BriForum | © TechTarget

This is poor performance monitoring

Page 13: How to Fail at VDI

13

vSphere Compute

BriForum | © TechTarget

This is better performance monitoring - ESXTOP

Display Metric Threshold Explanation

CPU %RDY 10 Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set.

CPU %CSTP 3Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.

CPU %SYS 20The percentage of time spent by system services on behalf of the world. Most likely caused by high IO VM. Check other metrics and VM for possible root cause

CPU %MLMTD 0The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the “CPU limit” settings. If larger than 0 the world is being throttled due to the limit on CPU.

CPU %SWPWT 5 VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.

Page 14: How to Fail at VDI

14

vSphere Compute

BriForum | © TechTarget

Page 15: How to Fail at VDI

15

vSphere Compute

BriForum | © TechTarget

%CSTP probably driving %RDY values

Page 16: How to Fail at VDI

16

vSphere Compute

BriForum | © TechTarget

Now with fewer vCPU’s

Page 17: How to Fail at VDI

17

Summary on Compute

● Multithreading, vSMP● Not priority based● % Utilization is not the complete picture● Latency = Health● http://kb.vmware.com/selfservice/microsites/search.do?

language=en_US&cmd=displayKC&externalId=1017926

BriForum | © TechTarget

Page 18: How to Fail at VDI

18

Storage

● #1 cause of performance issues in server virtualization● #1 cause of performance issues in desktop virtualization● Latency = Health

­ 20ms - in trouble­ 50ms - your users hate you

BriForum | © TechTarget

The wrath of the math

Page 19: How to Fail at VDI

19

What You Need to Know

● Capacity vs performance● Random vs sequential● Average vs peak● Where it’s coming from● Most are guessing

BriForum | © TechTarget

Page 20: How to Fail at VDI

20

Storage

BriForum | © TechTarget

Spinning disk

Page 21: How to Fail at VDI

21

RAID Penalty

BriForum | © TechTarget

Page 22: How to Fail at VDI

22

The Math – RAID 5 50/50

● 500 users, Windows 7, 20 IOPs avg, 50/50 read/write RAID 5

● 500 * 20 = 10,000 IOPs – 5,000 read, 5,000 write● 5,000 write * 4 = 20,000 + 5,000 read = 25,000 IOPs● 25,000 IOPs on 15K spindles (200 IOPS) = 125 spindles

BriForum | © TechTarget

Some back of the napkin math

Page 23: How to Fail at VDI

23

The Math – RAID 10 50/50

● 500 users, Windows 7, 20 IOPs avg, 50/50 read/write RAID 10

● 500 * 20 = 10,000 IOPs – 5,000 read, 5,000 write● 5,000 write * 2 = 10,000 + 5,000 read = 15,000 IOPs● 15,000 IOPs on 15K spindles (200 IOPS) = 75 spindles

BriForum | © TechTarget

Some back of the napkin math

Page 24: How to Fail at VDI

24

The Math – RAID 10 20/80

● 500 users, Windows 7, 20 IOPs avg, 20/80 read/write RAID 10

● 500 * 20 = 10,000 IOPs – 2,000 read, 8,000 write● 8,000 write * 2 = 16,000 + 2,000 read = 18,000 IOPs● 18,000 IOPs on 15K spindles (200 IOPS) = 90 spindles

BriForum | © TechTarget

Some back of the napkin math

Page 25: How to Fail at VDI

25

vSphere Storage Latency

BriForum | © TechTarget

Guest

VMkernel

Application

Filesystem

I/O Drivers

Virtual SCSI

Filesystem

A

G

D

K

S

R

Device Queue

Application Latency

R = Physical Disk “Disk Secs/Transfer”

G = Guest Latency

K = ESX Kernel

D = Device Latency

Page 26: How to Fail at VDI

26

vSphere Storage

BriForum | © TechTarget

Performance monitoring for storage

Display Metric Threshold Explanation

DISK GAVG 20 Look at “DAVG” and “KAVG” as the sum of both is GAVG.

DISK DAVG 20 Disk latency most likely to be caused by array.

DISK KAVG 2 Disk latency caused by the VMkernel, high KAVG usually means queuing. Check “QUED”.

DISK QUED 1Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value.

DISK ABRTS/s 1 Aborts issued by guest(VM) because storage is not responding. Can be caused when paths failed.

DISK RESETS/s 1 The number of commands reset per second.

DISK CONS/s 20 SCSI Reservation Conflicts per second. Can be caused by too many VMDKs on a datastore.

Page 27: How to Fail at VDI

27

Building for Read IOPs

● Memory - Storage controller cache, PVS● Host/Hypervisor - CBRC, Intellicache● Storage - SSD tiering / flash cache

BriForum | © TechTarget

Fairly easy

Page 28: How to Fail at VDI

28

Building for Write IOPs

● Profiles/Apps● Spinning disk● SSD tiering● Local disk● IO optimization (dedupe, serializing IO)

BriForum | © TechTarget

Much harder…and expensive

Page 29: How to Fail at VDI

29

Storage Summary

● 25,000 IOPs R5 50/50 – 125 spindles● 15,000 IOPs R10 50/50 – 75 spindles● 18,000 IOPs R10 20/80 – 90 spindles● Latency is the key metric● Write IOPs & things that cause it is #1 focus

BriForum | © TechTarget

Page 30: How to Fail at VDI

30

How does this relate to VDI failure?

● Pilot performance is great, then terrible in production● Boot storm vs login storm● Applications in gold image vs streamed● Read/write ratio is important● Anti-virus software● Existing desktop images

BriForum | © TechTarget

Page 31: How to Fail at VDI

31

Guessing

● Initial sizing● Determine peaks and when● Baseline application impact● Monitor application impact over time● Application updates/changes

BriForum | © TechTarget

You need to use tools to do this

Page 32: How to Fail at VDI

32

Project testing

● Unit/system testing● Application testing● Performance/scalability testing● Operational testing● User acceptance testing

BriForum | © TechTarget

Good to know what you are and aren’t doing

Page 33: How to Fail at VDI

33

Summary

● Understand your limited resources (compute/storage)● Don’t guess● 5 users = what kind of testing, what are you really

accomplishing?

BriForum | © TechTarget