Post on 24-Feb-2016
description
© 2010 VMware Inc. All rights reserved
Confidential
Storage Troubleshooting with VC Ops 5
2 Confidential
Things we want to know in performance
The storage team have requested for greater visibility
• Joint troubleshooting, capacity planning, performance monitoring.
• Is there any storage bottlenect? If yes, where?
• You need to know both Big Picture and details
Your needs:
• Be able to quickly tell the overall workload
• Be able to quickly tell which VMs are generating the big IOPS.
• Be able to tell the total IOPS generate from all VMs, and see a chart to see if there is a spike.
You want to know the 3 dimensions
• IOPS: Read, Write, Read/Write Ratio, Total IOPS
• Latency: Read, Write, Total
• Throughput
This is a Level 300 material.I’m assuming you’re hands-on on both vSphere 5 and VC Ops 5.
This is based on vSphere 5.0.1 and VC Ops 5.0.1
Please read speaker notes.
3 Confidential
The challenges
Your environment• Production Site
• 500 servers VM, 3000 desktop VM• 2 vCenters, 80 ESXi, 10 clusters, 60 datastores, 6 RDM.• 50 physical servers (mostly UNIX)
• You use VMFS on FC and NFS on 10 GE
• 2 storage arrays: 1 high end, 1 midrange
• DR Site• Let’s not talk about this. The production is complex enough already!
4 Confidential
Storage counters: ESXi hostDatastore Disk
Storage Adapter or Storage Path
5 Confidential
ESXi: Adapter, Device and Path
1 adapter can many Devices (LUN).1 Device is accessed via many paths.
1 path can only access 1 Device.
6 Confidential
ESXi: Disk
7 Confidential
NFS
ESXi: Adapter, Device and Path
Disk
ESXi 5.0
Disk
Datastore
Storage Path
Storage Adapter 1
Storage Path
Disk
Storage Path
Storage Adapter 2
Storage Path Storage Path Storage Path
vmhba2 vmhba3
vmhba3
vmnic
VMFS VMFS
Datastore
RDM
Datastore
8 Confidential
Storage counters: VM
Disk
Virtual Disk (VMDK, RDM)
Datastore
Disk
VM
RDMVMFS NFS
Drive 1 Drive 2 Drive 3
Disk
scsi0:0 scsi0:2
Datastore Datastore
vDisk vDisk vDisk
9 Confidential
VC Ops has 4 groups of Storage metrics for a VM
Which counters do you take? There are so many of them. Say you want Write Latency. Which one do you take: Virtual Disk, Datastore, Disk, or Storage?I’ll try to answer in the next few slides.If you want to know now, the counter with the black arrow is the counters that I think we should use.
? Not sure what this is
IOPS counters
Other counters
Latency counters
Thruput counters
Why only at Disk level?
? Not sure what this is
These don’t exist in vCenter. RDM?
Don’t use
10 Confidential
VM: Storage
11 Confidential
Comparing VC Ops with vCenter
Datastore shows the metric for this VM only, not for every VM in that datastore. Datastore figures will be higher if your VM has snapshot.
Disk = physical LUN backing up the datastore. If there is no extent, then Disk = Datastore.
Where does the Storage counter come from, as there is no Storage in vCenter? vCenter only has Datastore, Disk, Virtual Disk, as shown in this screenshot.If you know, let me know.
12 Confidential
VC Ops has 2 groups of Storage metrics for a Datastore
Not sure the difference between Max Observed and Highest ObservedWhich counters do you take? There are so many of them. Say you want Write Latency. Which one do you take: Virtual Disk, Datastore, Disk, or Storage?I’ll try to answer in the next few slides.
IOPS counters
Other counters
Latency counters
Thruput counters
VMFS datastore NFS datastore
13 Confidential
VC Ops has 4 groups of Storage metrics for a ESXi
Which counters do you take? There are so many of them. Say you want Write Latency. Which one do you take: Virtual Disk, Datastore, Disk, or Storage?I’ll try to answer in the next few slides.
IOPS counters
Other counters
Latency counters
Thruput counters
14 Confidential
VC Ops: Storage metrics from Cluster until World
Notice Storage is not the group, but Disk. I was hoping for Storage as it is more intuitive.For IOPS or Throughput, it is the sum of all components (e.g. all VM in that vCenter)For Latency, I’m not sure if it is an average, or the max. If it is a Max, that would be an awesome Super Metric!IOPS counters
Other counters
Latency counters
Thruput counters
Cluster Datacenter WorldvCenter
15 Confidential
Storage counters at VC level
16 Confidential
Storage counters at World level
17 Confidential
Part 1: IOPS
18 Confidential
19 Confidential
Same data, but on 1 chart
20 Confidential
21 Confidential
vCenter: performance chart
This is the object name. In this case, this is a VM and its name is vCenter5
This one tells us that it is the Datastore group, and it is showing Past day data (last 24 hours)
22 Confidential
Same VM & timeline, but from the Disk counter.
23 Confidential
vCenter Ops might aggregate differently than vCenter
Same info, but this time from vCenter Ops.They are similar, but not identical. Is this because the way VC Ops aggregate?Read peaks at 245 in vCenter vs 217 in VC Ops. Around 13% lower in VC Ops.Write peaks at 137 vs 135. This is close enough.
24 Confidential
IOPS: Snapshot causes real IOPS penalty
This is from the Virtual Disk counters. 173 reads at Virtual Disk translates into 245 reads at Datastore. This is 40% more70 writes at Virtual Disk translates into 137 writes at Datastore. This is almost 200%!So a snapshot can cause much higher IOPS.
25 Confidential
Again, the same gap remain between vCenter and VC Ops.
26 Confidential
27 Confidential
IOPS: Conclusion
Use the Datastore counter for vmdk• The Virtual Disk counter is useful if you are comparing with actual IOPS issued
at Guest OS level. It will be too low if you have snapshot.
• The Storage counter = Virtual Disk
• The Disk counter is useful if you are discussing with the Storage team, who is showing you LUN by LUN metrics. Disk = LUN. • It is not useful if your datastore spans multiple LUNs due to Extent.
• In most cases, Disk = Datastore as you should avoid Extent.
Use the Disk counter for RDM VC Ops counter may differ to vCenter
• If the number looks strange, check with vCenter.
• Sometimes the data in vCenter itself is wrong.
• Check a few VMs, not just 1.
28 Confidential
Part 2: Latency
29 Confidential
VM level: Total Latency
30 Confidential
VM Level: Read Latency
31 Confidential
32 Confidential
33 Confidential
34 Confidential
Avoid the counter “Datastore | Highest Latency”
35 Confidential
36 Confidential
37 Confidential
Data at VC Ops
38 Confidential
Total Latency >< Read Latency + Write Latency
39 Confidential
View at Datastore level
40 Confidential
Latency: Conclusion
Use the Datastore counter for vmdk• The Virtual Disk counter is useful if you are comparing with actual IOPS issued
at Guest OS level. It will be too low if you have snapshot.
• The Storage counter = Virtual Disk
• The Disk counter is useful if you are discussing with the Storage team, who is showing you LUN by LUN metrics. Disk = LUN. • It is not useful if your datastore spans multiple LUNs due to Extent.
• In most cases, Disk = Datastore as you should avoid Extent.
Use the Disk or Virtual Disk counter for RDM VC Ops counter may differ to vCenter
• If the number looks strange, check with vCenter.
• Sometimes the data in vCenter itself is wrong.
• Check a few VMs, not just 1.
41 Confidential
Latency: Conclusion
Do not use the Total Latency• When creating super metric, manually add the Read and the Write.
Use the Datastore counter for vmdk Use the Disk counter for RDM VC Ops counter may differ to vCenter
• If the number looks strange, check with vCenter.
• Sometimes the data in vCenter itself is wrong.
• Check a few VMs, not just 1.
42 Confidential
Part 3: Throughput
43 Confidential
Throughput counters for VM
44 Confidential
Throughput counters for VM
45 Confidential
Same VM, vastly different data
46 Confidential
47 Confidential
Throughput: Conclusion
Use the Datastore counter for vmdk• The Virtual Disk counter is useful if you are comparing with actual IOPS issued
at Guest OS level. It will be too low if you have snapshot.
• The Storage counter = Virtual Disk
• The Disk counter is useful if you are discussing with the Storage team, who is showing you LUN by LUN metrics. Disk = LUN. • It is not useful if your datastore spans multiple LUNs due to Extent.
• In most cases, Disk = Datastore as you should avoid Extent.
Be careful with the Disk counters, as they can report large numbers• vCenter: Disk | Disk Throughput usage
• vC Ops: Disk | IO Usage capacity
VC Ops counter may differ to vCenter• If the number looks strange, check with vCenter.
• Sometimes the data in vCenter itself is wrong.
• Check a few VMs, not just 1.
48 Confidential
Part 4: Other Interesting Metrics
49 Confidential
Built-in Super Metric?
The 3 chart below shows summary at World level• The actual world is on the right. It has 5 vCenters
50 Confidential
Other interesting metrics
51 Confidential
52 Confidential
vCenter “equivalent” dashboard
53 Confidential
Capacity
You have 1000 VMs on 50 datastores.
max([$This:M180/14,$This:M1978/$This:M1977*0.8]) which is translated to: max([This Resource: summary|total_number_vms/14,This
Resource: capacity|used_space/This Resource: capacity|total_capacity*0.8])
this means show me all datastores where either number of attached vm's is more than 14 or space left is less than 20%.
You can imagine how great it can look on a heatmap