Post on 16-Apr-2017
Conducting a Successful Virtual SAN 6.2Proof of ConceptPaudie ORiordan, VMware, IncCormac Hogan, VMware, Inc
STO7535
#STO7535
CONFIDENTIAL 2
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
CONFIDENTIAL 3
Agenda
1 Introduction to Session
2 Introduction to Virtual SAN
3 Tools to conduct a successful Virtual SAN proof of concept (POC)
4 POC validation scenarios
5 Data Services Considerations
6 Measuring Performance
CONFIDENTIAL 4
CONFIDENTIAL
This session…• Virtual SAN has been available since March 2014, almost 2.5 years
• To date, we have now almost 5,000 VSAN customers.
• VMware recognises that conducting a Virtual SAN proof of concept can be challenging
• Since the launch of Virtual SAN, additional tools for managing, monitoring and troubleshooting Virtual SAN have become available
• In this session, the tools available to vSphere and Virtual SAN administrators will be discussed, and how they can help deliver a Virtual SAN proof of concept
5
CONFIDENTIAL 6
Introduction to VMware Virtual SAN• Storage scale out architecture built into
the hypervisor
• Aggregates locally attached storage from each ESXi host in a cluster
• Dynamic capacity and performance scalability
• Flash optimized storage solution – Fully integrated with vSphere and interoperable:
• vMotion, DRS, HA, VDP, VR …
• VM-centric data operations
• Many new data services
+ + + ++ + +
…
+
DatastoreVirtual SAN
What I Need to Be SuccessfulTools to conduct a successful Virtual SAN POC
CONFIDENTIAL 8
Before YOU BEGIN: Verify Your Components Against HCL• VMware Virtual SAN Hardware
• Server, Controller, SSD, Disk on HCL• Controller Firmware, Driver• Disk Firmware, • Enclosure Firmware
• SAS/SATA SSD Minimum Firmware is Critical– Rule is minimum or higher
• NVMe Firmware – HCL lists absolute version only
CONFIDENTIAL
Success Tool #1 : Health Plugin – Reactive Health Checks• Introduced with Virtual SAN 6.0
• Incorporate in the vSphere Web Client
• Virtual SAN Health Check tool include:– General Health– Proactive tests– Virtual SAN HCL health– Physical disk health
9
• Especially useful when injecting errors into cluster and verifying that they have been remediated
CONFIDENTIAL
Success Tool #1 : Health Plugin – Proactive Health Checks• Proactive tools running
on Virtual SAN cluster and pre-production tests– VM Creation test– Storage Performance– Multicast performance
test
10
CONFIDENTIAL 11
Success Tool #2 : Capacity Views• Dedupe and
Compression Savings
• Group by Object Type– Filesystem overhead– Dedupe overhead– Checksum overhead– Virtual disks– Swap– Home namespace
CONFIDENTIAL 12
Success Tool #3 : Performance Service• Enable it once
• Integrated with vSphere
• Simplified metrics– Backend (VSAN)– Frontend (VM)
• Distributed Architecture– No SPOF
• Historical data
• Status monitored by health checks
CONFIDENTIAL
Success Tool #4 : HCIbench• Hyperconverged Infrastructure benchmark
• Based on Vdbench
• Designed to work on distributed architectures like Virtual SAN
• UI Driven
• Free
• Provides results in both text format, and format that can be viewed in VSAN Observer
• Now available from https://labs.vmware.com/flings
13
CONFIDENTIAL 14
Success Tool #5 : RVC/Virtual SAN Observer• Native tools installed on Linux/Appliance and Windows versions of vCenter Server• Used for Configuration and Status of the Virtual SAN Cluster• For Performance and Activity monitoring on demand
– VM level– Host level– VMDK level– HDD/SSD Level
• Any anomalies will show up with the metric in question shown in red
• Follow the I/O : VM -> VMDK -> Disk Group -> Disk -> Congestion
CONFIDENTIAL
Success Tool #5 : RVC/Virtual SAN Observer (ctd.)
15
vsan.apply_license_to_cluster
vsan.enable_vsan_on_cluster
vsan.disable_vsan_on_cluster
vsan.clear_disks_cache
vsan.cluster_change_autoclaim
vsan.cluster_set_default_policy
vsan.enter_maintenance_mode
vsan.fix_renamed_vms
vsan.object_reconfigure
vsan.host_wipe_vsan_disks
vsan.recover_spbm
vsan.reapply_vsan_vmknic_config
Cluster
vsan.check_limits
vsan.check_state
vsan.cluster_info
vsan.cmmds_find
vsan.whatif_host_failures
vsan.resync_dashboard
Disk
vsan.disk_object_info
vsan.disks_info
vsan.disks_stats
Host
vsan.host_info
vsan.host_consume_disks
Networking
vsan.lldpnetmap
VM
vsan.vm_object_info
vsan.vm_perf_stats
vsan.vmdk_stats
vsan.obj_status_report
vsan.object_info
Troubleshooting
vsan.support_information
vsan.observer
Virtual SAN Operation Virtual SAN Information
Virtual SAN Monitoring
Validation ScenariosExpected outcomes from POC activities
CONFIDENTIAL 17
PoC Validation• What are the most important test validation?
1. Successful VSAN configuration2. Successful VM deployments on VSAN datastore3. VM Availability in the event of failures (host, storage device, network)4. VSAN Serviceability (maintenance of hosts, disk groups, disks)5. VM Performance meets expectations6. VSAN Data Services (Dedupe, Compression, RAID-5/6, Checksum) working
as expected
CONFIDENTIAL
Case #1 – Successfully VSAN Deployment – Checklist• Correct vSphere versions
• Appropriate licenses – especially if PoC is expected to take a long time (> 60 days)
• Correctly Configured Network– VSAN requires multicast, so prep the network team
• Minimum of three servers– Or 2 servers plus a witness appliance if doing Remote Office/Branch Office (ROBO)
18
Remember, the VSAN Health Check will do most of this work for you.
CONFIDENTIAL
Case #1 – Successfully VSAN Deployment – Checklist (ctd.)• Minimum of three servers contributing
storage:• At least one storage controller – you’ve checked
the HCL, and drivers and firmware are valid, right?• At least one flash device (SSD, PCIe) for cache –
check the HCL• At least one magnetic disk (hybrid) or flash device
(all-flash) for capacity – check the HCL
• Or consider VSAN Ready Nodes as an option …
15
Remember, the VSAN Health Check will do most of this work for you.
CONFIDENTIAL 20
Case #1 – Successfully VSAN Deployment – Device Claiming• Devices not visible
– Some RAID controllers won’t present individual disks without RAID configuration– May need RAID-0 configuration set on storage devices via controller
• Devices not being claimed– Some controllers allow devices to be shared; so devices get presented as “non-local”– VSAN will only claim devices that are local
• SSD showing up as HDD– Placing devices in RAID-0 will do this
• All-Flash using wrong devices for cache/capacity– Set VSAN to “Manual mode” when setting up all-flash– Gives control over which devices are used for cache and which devices are use for
capacity
CONFIDENTIAL
Case #1 – Successfully VSAN Deployment – Overall health
21
Run health checks after every test!
Clear Alarms!
Use it to verify a problem that was
previously introduced is now fixed!
Check the Virtual SAN Health Check regularly
CONFIDENTIAL
Case #2 : Successful VM Deployment on VSAN
22
Use the Health Check – Proactive Tests to do initial VM deployment check
Part of the Proactive Tests. This will verify if virtual machines can
be created on VSAN cluster
CONFIDENTIAL
Case #2 : Successful VM Deployment on VSAN
23
Component host location
I created a new VM, but where/how is the VM is stored
CONFIDENTIAL
Case #3 : VM Availability in the Event of Failures• Various failures may be introduced as part of a typical POC
– Host failure– Flash device / Magnetic Disk failure – Cache/Capacity device failures– Network failure
• Objective: ensure that the VM continues to be available in the event of a failure. VM maybe restarted on another node in the cluster.
• vSphere HA is fully integrated with Virtual SAN so that virtual machines on the failed host are restarted on other hosts elsewhere on the cluster
24
CONFIDENTIAL
Case #3.1 : Host Failures• How many hosts do I really need?• A minimum of 3 hosts is needed to support VSAN.
• What about rebuilding after a failure or maintenance mode operations?
• If you want virtual machines to remain highly available on VSAN during these scenarios, consider configuring for additional capacity i.e. minimum 4 nodes.
25
CONFIDENTIAL
Case #3.2 : Storage Failures
• The Virtual SAN 6.0 Proof Of Concept Guide has details on how to inject temporary disk errors for the purpose of testing.– A real disk failure results in immediate rebuild activity initiated by VSAN
26
Eject/Offline/Unplug: AbsentWait 60 minutes before
remediation
Failure: DegradedImmediate remediation
CONFIDENTIAL 27
Case #3.2 : Storage Failures (ctd.)• Additional considerations when dedupe/compression are enabled on VSAN
– Deduplication and compression hash tables/metadata are spread across all disks in a disk group– A single device failure in the disk group will render the whole of the disk group unavailable– All data in disk group will be rebuilt elsewhere in the cluster (if resources allow)
Rebuild Rebuild Rebuild
CONFIDENTIAL
Case #3.3 : Network Failure
28
Part of the Proactive Tests. This will verify if multicast performance
is acceptable can for VSAN cluster
Multicast configuration is the most common issue
Start simple
If you want feature like LACP, don’t implement
initially. Turn off QoS/Flow Control, then build it
afterwards
CONFIDENTIAL
Case #3.4 : Validating Rebuild Activity After Failure• Virtual SAN might need to move data around in the background: change policy, host failure,
long term/permanent component loss, user triggered reconfig, maintenance mode, etc.
• UI Resync Dashboard shows the VMs that are resyncing and remaining bytes to sync
29
Remember! Test one thing at a
time!
CONFIDENTIAL
Case #4 : VSAN Serviceability – Maintenance Mode
30
I want to update one of my ESXi host in a VSAN cluster, what do I do ?
VSAN provides multiple options for maintenance mode
CONFIDENTIAL
Case #4 : VSAN Serviceability – Maintenance Mode
31
Ensure Accessibility Full Data Migration No data MigrationLost of VM compliance Full VM Data compliance No VM availability ensured
Short time maintenance More than one hour of Maintenance
Short time maintenance
Short Storage preparation Long storage preparation No Impact
Limited Free Storage space required
Free Storage space requirements on the other nodes
No Impact
Full migration
unvailable in 3 node
clusters!
CONFIDENTIAL
Case #5 : Management – Disks Serviceability
32
Disk serviceability feature enables identification of to be replaced magnetic disks and flash based
CONFIDENTIAL 33
Case #5 : Management – Disk/Disk Group Evacuation• Allows you to evacuate data from disk groups and individual disks before removing
a disk/disk group from a Virtual SAN host
• Allows Virtual SAN to ensure all workloads stay fully compliant with their policy!– Supported in the UI, ESXCLI and RVC.– Check box in the “Remove disk/disk group” UI screen.
PoC considerations for New Data Services in VSAN 6.2
CONFIDENTIAL
New Data Services in VSAN 6.2• Erasure Coding – RAID-5/RAID-6 Support
• Deduplication / Compression
• Checksum
• IOPS limits / QoS
35
There are performance considerations associated with all of the above.There are also some issues to be aware of!
CONFIDENTIAL 36
Capacity Overhead of the New Data Services• Overheads are all calculated in advance
– Deduplication/Compression maintain hash tables• Approx. 5% overhead
– Checksum Metadata is stored separately from data • Approx. 1.2 % overhead
Many customers are surprised by the amount of overhead when data services are first enabled
CONFIDENTIAL 37
Data Services File System Overheads – Don’t Panic
• Deduplication and Compression File System Overhead is 5% (approx.) of Total Virtual SAN Capacity
• Checksum Overhead is approx. 1.2% of capacity
How to Measure Virtual SAN Performance?
CONFIDENTIAL 39
How to Test Performance…• Distributed architecture => best performance when the pooled compute and storage resources
in the cluster are well utilized.
• This usually means a number of VMs each running the specified workload should be distributed in the cluster and run in a consistent manner to deliver aggregated performance.
• This part of an evaluation can be complex and time-consuming
• Real application workloads are best, but …– synthetic workloads (IOmeter) might be easier to set up– simplistic workloads don’t really reflect what Virtual SAN can do
• Worth a read: Pro Tips For Storage Performance Testing– http://blogs.vmware.com/storage/2015/08/12/tips-storage-performance-testing/
CONFIDENTIAL
Performance Testing Considerations (Primarily for Hybrid)
40
Is the test utilising the distributed storage resources of Virtual SAN? • Multiple VMs across multiple hosts delivers better performance than one VM on one host.
Is the working set fully in cache, utilising flash performance?• Read-cache misses will incur latency. Is the workload cache friendly?• Sustained sequential write workloads fill cache, which must then be destaged. Mixed
R/W workloads with repeat patterns are best.
Is the cache warmed if using VSAN hybrid?• Initial results from starts of tests will not be reflective of overall performance.
Warning : Make sure dedupe scrubber is disabled. Causes performance issue on hybrid *
* KB 2146267
CONFIDENTIAL 41
Performance Test with HCIbench/vdbench• VMs will be distributed equally across all hosts• Select I/O size• Select R/W ratio• Select random/sequential• Select duration of test• Disks can be zeroed with “dd”*• VMs will be removed (optionally) when test
completes• Produces results per VM
– IOPS, Latency, Throughput, etc• Produces results consumable by VSAN
Observer
* Avoid zeroing disks if deduplication enabled – will create hot-spot
CONFIDENTIAL 42
Q & A