Storage Troubleshooting with VC Ops 5

Post on 24-Feb-2016

64 views 0 download

description

Storage Troubleshooting with VC Ops 5. Things we want to know in performance. The storage team have requested for greater visibility Joint troubleshooting, capacity planning, performance monitoring . Is there any storage bottlenect? If yes, where? - PowerPoint PPT Presentation

Transcript of Storage Troubleshooting with VC Ops 5

© 2010 VMware Inc. All rights reserved

Confidential

Storage Troubleshooting with VC Ops 5

2 Confidential

Things we want to know in performance

The storage team have requested for greater visibility

• Joint troubleshooting, capacity planning, performance monitoring.

• Is there any storage bottlenect? If yes, where?

• You need to know both Big Picture and details

Your needs:

• Be able to quickly tell the overall workload

• Be able to quickly tell which VMs are generating the big IOPS.

• Be able to tell the total IOPS generate from all VMs, and see a chart to see if there is a spike.

You want to know the 3 dimensions

• IOPS: Read, Write, Read/Write Ratio, Total IOPS

• Latency: Read, Write, Total

• Throughput

This is a Level 300 material.I’m assuming you’re hands-on on both vSphere 5 and VC Ops 5.

This is based on vSphere 5.0.1 and VC Ops 5.0.1

Please read speaker notes.

3 Confidential

The challenges

Your environment• Production Site

• 500 servers VM, 3000 desktop VM• 2 vCenters, 80 ESXi, 10 clusters, 60 datastores, 6 RDM.• 50 physical servers (mostly UNIX)

• You use VMFS on FC and NFS on 10 GE

• 2 storage arrays: 1 high end, 1 midrange

• DR Site• Let’s not talk about this. The production is complex enough already!

4 Confidential

Storage counters: ESXi hostDatastore Disk

Storage Adapter or Storage Path

5 Confidential

ESXi: Adapter, Device and Path

1 adapter can many Devices (LUN).1 Device is accessed via many paths.

1 path can only access 1 Device.

6 Confidential

ESXi: Disk

7 Confidential

NFS

ESXi: Adapter, Device and Path

Disk

ESXi 5.0

Disk

Datastore

Storage Path

Storage Adapter 1

Storage Path

Disk

Storage Path

Storage Adapter 2

Storage Path Storage Path Storage Path

vmhba2 vmhba3

vmhba3

vmnic

VMFS VMFS

Datastore

RDM

Datastore

8 Confidential

Storage counters: VM

Disk

Virtual Disk (VMDK, RDM)

Datastore

Disk

VM

RDMVMFS NFS

Drive 1 Drive 2 Drive 3

Disk

scsi0:0 scsi0:2

Datastore Datastore

vDisk vDisk vDisk

9 Confidential

VC Ops has 4 groups of Storage metrics for a VM

Which counters do you take? There are so many of them. Say you want Write Latency. Which one do you take: Virtual Disk, Datastore, Disk, or Storage?I’ll try to answer in the next few slides.If you want to know now, the counter with the black arrow is the counters that I think we should use.

? Not sure what this is

IOPS counters

Other counters

Latency counters

Thruput counters

Why only at Disk level?

? Not sure what this is

These don’t exist in vCenter. RDM?

Don’t use

10 Confidential

VM: Storage

11 Confidential

Comparing VC Ops with vCenter

Datastore shows the metric for this VM only, not for every VM in that datastore. Datastore figures will be higher if your VM has snapshot.

Disk = physical LUN backing up the datastore. If there is no extent, then Disk = Datastore.

Where does the Storage counter come from, as there is no Storage in vCenter? vCenter only has Datastore, Disk, Virtual Disk, as shown in this screenshot.If you know, let me know.

12 Confidential

VC Ops has 2 groups of Storage metrics for a Datastore

Not sure the difference between Max Observed and Highest ObservedWhich counters do you take? There are so many of them. Say you want Write Latency. Which one do you take: Virtual Disk, Datastore, Disk, or Storage?I’ll try to answer in the next few slides.

IOPS counters

Other counters

Latency counters

Thruput counters

VMFS datastore NFS datastore

13 Confidential

VC Ops has 4 groups of Storage metrics for a ESXi

Which counters do you take? There are so many of them. Say you want Write Latency. Which one do you take: Virtual Disk, Datastore, Disk, or Storage?I’ll try to answer in the next few slides.

IOPS counters

Other counters

Latency counters

Thruput counters

14 Confidential

VC Ops: Storage metrics from Cluster until World

Notice Storage is not the group, but Disk. I was hoping for Storage as it is more intuitive.For IOPS or Throughput, it is the sum of all components (e.g. all VM in that vCenter)For Latency, I’m not sure if it is an average, or the max. If it is a Max, that would be an awesome Super Metric!IOPS counters

Other counters

Latency counters

Thruput counters

Cluster Datacenter WorldvCenter

15 Confidential

Storage counters at VC level

16 Confidential

Storage counters at World level

17 Confidential

Part 1: IOPS

18 Confidential

19 Confidential

Same data, but on 1 chart

20 Confidential

21 Confidential

vCenter: performance chart

This is the object name. In this case, this is a VM and its name is vCenter5

This one tells us that it is the Datastore group, and it is showing Past day data (last 24 hours)

22 Confidential

Same VM & timeline, but from the Disk counter.

23 Confidential

vCenter Ops might aggregate differently than vCenter

Same info, but this time from vCenter Ops.They are similar, but not identical. Is this because the way VC Ops aggregate?Read peaks at 245 in vCenter vs 217 in VC Ops. Around 13% lower in VC Ops.Write peaks at 137 vs 135. This is close enough.

24 Confidential

IOPS: Snapshot causes real IOPS penalty

This is from the Virtual Disk counters. 173 reads at Virtual Disk translates into 245 reads at Datastore. This is 40% more70 writes at Virtual Disk translates into 137 writes at Datastore. This is almost 200%!So a snapshot can cause much higher IOPS.

25 Confidential

Again, the same gap remain between vCenter and VC Ops.

26 Confidential

27 Confidential

IOPS: Conclusion

Use the Datastore counter for vmdk• The Virtual Disk counter is useful if you are comparing with actual IOPS issued

at Guest OS level. It will be too low if you have snapshot.

• The Storage counter = Virtual Disk

• The Disk counter is useful if you are discussing with the Storage team, who is showing you LUN by LUN metrics. Disk = LUN. • It is not useful if your datastore spans multiple LUNs due to Extent.

• In most cases, Disk = Datastore as you should avoid Extent.

Use the Disk counter for RDM VC Ops counter may differ to vCenter

• If the number looks strange, check with vCenter.

• Sometimes the data in vCenter itself is wrong.

• Check a few VMs, not just 1.

28 Confidential

Part 2: Latency

29 Confidential

VM level: Total Latency

30 Confidential

VM Level: Read Latency

31 Confidential

32 Confidential

33 Confidential

34 Confidential

Avoid the counter “Datastore | Highest Latency”

35 Confidential

36 Confidential

37 Confidential

Data at VC Ops

38 Confidential

Total Latency >< Read Latency + Write Latency

39 Confidential

View at Datastore level

40 Confidential

Latency: Conclusion

Use the Datastore counter for vmdk• The Virtual Disk counter is useful if you are comparing with actual IOPS issued

at Guest OS level. It will be too low if you have snapshot.

• The Storage counter = Virtual Disk

• The Disk counter is useful if you are discussing with the Storage team, who is showing you LUN by LUN metrics. Disk = LUN. • It is not useful if your datastore spans multiple LUNs due to Extent.

• In most cases, Disk = Datastore as you should avoid Extent.

Use the Disk or Virtual Disk counter for RDM VC Ops counter may differ to vCenter

• If the number looks strange, check with vCenter.

• Sometimes the data in vCenter itself is wrong.

• Check a few VMs, not just 1.

41 Confidential

Latency: Conclusion

Do not use the Total Latency• When creating super metric, manually add the Read and the Write.

Use the Datastore counter for vmdk Use the Disk counter for RDM VC Ops counter may differ to vCenter

• If the number looks strange, check with vCenter.

• Sometimes the data in vCenter itself is wrong.

• Check a few VMs, not just 1.

42 Confidential

Part 3: Throughput

43 Confidential

Throughput counters for VM

44 Confidential

Throughput counters for VM

45 Confidential

Same VM, vastly different data

46 Confidential

47 Confidential

Throughput: Conclusion

Use the Datastore counter for vmdk• The Virtual Disk counter is useful if you are comparing with actual IOPS issued

at Guest OS level. It will be too low if you have snapshot.

• The Storage counter = Virtual Disk

• The Disk counter is useful if you are discussing with the Storage team, who is showing you LUN by LUN metrics. Disk = LUN. • It is not useful if your datastore spans multiple LUNs due to Extent.

• In most cases, Disk = Datastore as you should avoid Extent.

Be careful with the Disk counters, as they can report large numbers• vCenter: Disk | Disk Throughput usage

• vC Ops: Disk | IO Usage capacity

VC Ops counter may differ to vCenter• If the number looks strange, check with vCenter.

• Sometimes the data in vCenter itself is wrong.

• Check a few VMs, not just 1.

48 Confidential

Part 4: Other Interesting Metrics

49 Confidential

Built-in Super Metric?

The 3 chart below shows summary at World level• The actual world is on the right. It has 5 vCenters

50 Confidential

Other interesting metrics

51 Confidential

52 Confidential

vCenter “equivalent” dashboard

53 Confidential

Capacity

You have 1000 VMs on 50 datastores.

max([$This:M180/14,$This:M1978/$This:M1977*0.8]) which is translated to: max([This Resource: summary|total_number_vms/14,This

Resource: capacity|used_space/This Resource: capacity|total_capacity*0.8])

this means show me all datastores where either number of attached vm's is more than 14 or space left is less than 20%.

You can imagine how great it can look on a heatmap