vSAN Stretched Cluster Technical Deep Dive
Paudie O’Riordan, VMware
Mansi Shah, VMware
#vmworld #HCI2088BE
HCI2088BE
VMworld 2018 Content: Not for publication or distribution
Disclaimer
2©2018 VMware, Inc.
This presentation may contain product features orfunctionality that are currently under development.
This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.
VMworld 2018 Content: Not for publication or distribution
Agenda
3Confidential │ ©2018 VMware, Inc.
Stretched Clusters
SDDC Use Cases
Day 0-2 Best Practices
Storage Policies & Data Placement
Failure Scenarios
VMworld 2018 Content: Not for publication or distribution
4Confidential │ ©2018 VMware, Inc.
Stretched Clusters
VMworld 2018 Content: Not for publication or distribution
5Confidential │ ©2018 VMware, Inc.
What are vSphere Stretched Clusters?Infrastructure spanning two physical locations
Benefits• Site-level high availability for
business continuity• Disaster and downtime
avoidance• Workload mobility and load-
balancing
Challenges• Complex configuration and
management• Expensive to deploy and
operate• Creates silos of specialized
hardware
Site A Site B
vSphere Stretched
Cluster
Storage A Storage BData Replication
3rd site for witness
VMworld 2018 Content: Not for publication or distribution
6Confidential │ ©2018 VMware, Inc.
What are vSAN Stretched Clusters?Simple active-active data centers
Objectives• Provide same benefits as
traditional stretched, and more• Remove need for expensive,
complex storage arrays
Benefits• Single HCI cluster stretched
across two sites• Does not require any
specialized hardware• Simplified management• VM granular local and remote
protection configured via policies
vSAN Stretched
Cluster
3rd site for vSAN witness
Preferred Site Secondary Site
VMworld 2018 Content: Not for publication or distribution
7Confidential │ ©2018 VMware, Inc.
vSAN Stretched Cluster Architecture
vSphere vSAN
ClusterCluster
5ms RTT, 10GbE
RAID-6
3rd site for witness
RAID-6
RAID-1
Characteristics• Built on vSAN Fault Domains (each site is
a FD)• High bandwidth ISL for synchronous
traffic– Latency requirements: <=5ms RTT
• Witness serves as tie-breaker for split-brain scenarios and stores only witness components
– Latency requirements: 100-200ms RTT
Data Placement• VM with site protection (RAID-1 / PFFT=1)
and local protection (RAID-6 / SFFT=2)– Two copies of a RAID6 tree, one at each site– RAID1 witness component resides at witness
site– RAID6 parity component spread across
hosts at each siteVMworld 2018 Content: Not for publication or distribution
8Confidential │ ©2018 VMware, Inc.
SDDC Use Cases
VMworld 2018 Content: Not for publication or distribution
9Confidential │ ©2018 VMware, Inc.
vSAN for Remote Offices / Branch Offices (ROBO)2 Node vSAN Cluster
Same architecture as a 1+1+W Stretched Cluster
2 Node direct connect removes need for expensive 10gb switches at each site
Minimum of 500ms RTT latency to/from the witness site
Limited to RAID-1 protection of all VMs
Per-VM licensing available
VMs can run on either node (no affinity rules required)
Cluster
vSphere vSAN
vSphere vSAN
vSAN witness traffic
1Gb switch
vSAN traffic
vSAN witness traffic
vSphere vSAN
1Gb switch
vSAN traffic
Data CenterROBO site
ROBO site
VMworld 2018 Content: Not for publication or distribution
10Confidential │ ©2018 VMware, Inc.
Active-Active Data Center with Additional DR ProtectionvSAN Stretched Clusters and Site Recovery Manager
Use vSphere Replication to third site, and enable RPO as low as 5 minutes
Use Site Recovery Manager for disaster recovery orchestration
Stretched across metro distance, replicated across geo
Place the witness at the DR cluster
Stretched Cluster
3rd site for witness
vSphere Replication
Site Recovery Manager
vSphere Rep.
SRM
DR Cluster
Synchronous I/O5ms RTT, 10Gb
Async ReplicationAny distance
vSphere vSAN vSphere vSAN vSphere vSAN
VMworld 2018 Content: Not for publication or distribution
11Confidential │ ©2018 VMware, Inc.
AWS Availability Zone Level Resiliency vSAN Stretched Clusters for VMware Cloud on AWS
Provide increased data resiliency for VMware Cloud on AWS, without the need for an additional DRaaS
Provide customer and management workload resiliency against AZ failure, within a region
Support for RAID-1 across AZs, RAID-5 within AZ
Managed service lifecycle of the vSAN witness
AZ AZ
Region
AZ
Witness
vSAN Stretched Cluster
AWS Global Infrastructure AWS
HA Restart
VMworld 2018 Content: Not for publication or distribution
12Confidential │ ©2018 VMware, Inc.
Day 0-2 Best PracticesClick to edit optional subtitle
VMworld 2018 Content: Not for publication or distribution
13Confidential │ ©2018 VMware, Inc.
Day 0/1: Deployment and Configuration
vSphere vSAN
5ms RTT, 10GbE
3rd site for witness
Preferred Site Secondary Site
Network• Design for ISL redundancy• Always L3 to the witness with
witness traffic separation
HA/DRS• Enforce use and ongoing
adherence to affinity rules and groups
• Set DRS to fully automated
VMworld 2018 Content: Not for publication or distribution
14Confidential │ ©2018 VMware, Inc.
Day 2: vSAN Witness
Use the OVA appliance for an integrated license
Use Witness Traffic Separation to isolate vSAN data traffic and assign different MTU sizes
Ensure no failures and all VMs are compliant before maintenance mode
• Witness down = site failure
Simply redeploy a new witness to reduce time the cluster is degraded
Thin-provision and over-provision witness resources
vSphere vSAN
Preferred Site Secondary Site
Witness Traffic
1500 MTU
vSAN DataTraffic
9000 MTU
VMworld 2018 Content: Not for publication or distribution
15Confidential │ ©2018 VMware, Inc.
Day 2: Lifecycle and Maintenance
vSphere vSAN
5ms RTT, 10GbE
3rd site for witness
Preferred Site Secondary Site
Use VUM rolling upgrades/patches• Start with preferred site, then
secondary, then witness• Use VUM to also lifecycle the witness
(not part of rolling upgrade of vSAN cluster)
• Disable HA admission control beforehand
Site-wide maintenance1. Modify affinity rules2. Migrate all VMs to other side 3. Place hosts into maintenance mode
one at a time, using Ensure Accessibility
In general – only put one host into maintenance mode at one timeVMworld 2018 Content: Not for publication or distribution
16Confidential │ ©2018 VMware, Inc.
Storage Policies & Data Placement
VMworld 2018 Content: Not for publication or distribution
17Confidential │ ©2018 VMware, Inc.
Site Disaster To Tolerate
Failures To Tolerate
Failure Tolerance Method (either Mirroring [default] or Erasure Coding)
IOPS limit for object
Disable object checksum
Force provisioning
Number of disk stripes per object
Flash read cache reservation (%)
Object space reservation (%)
Affinity (when PFTT=0 in stretched clusters)
Interesting Policies for Stretch Cluster
Consideration: Policies have a direct impact on the number of components that are created
when provisioning a VM on vSAN
There are 11 unique policies in vSAN 6.7
VMworld 2018 Content: Not for publication or distribution
Site Disaster To Tolerate
vSphere vSAN
ClusterCluster
5ms RTT, 10GbE
RAID-0
3rd site for witness
RAID-0
RAID-1
• Each site has one full copy of the data
• Witness component is placed on the remote witness site as tie-breaker
• Synchronous replication between sites needs low latency link
• Witness component only holds metadata and can be over higher latency link
Site Failure Protection
Replica 1 Replica 2
Tie-breaker Witness
VMworld 2018 Content: Not for publication or distribution
Local & Remote Protection for Stretched Clusters
vSphere vSAN
ClusterCluster
5ms RTT, 10GbE
• Redundancy locally andacross sites
• With site failure, vSAN maintains availability with local redundancy in surviving site
• No change in stretched cluster configuration steps
• Optimized site locality logic to minimize I/O traffic across sites
RAID-6
3rd site for witness
RAID-6
RAID-1
Site Disaster To Tolerate and (Secondary) Failures To Tolerate with Erasure Coding
VMworld 2018 Content: Not for publication or distribution
Stretched Cluster Local Failure Protection – RAID-1
Preferred Site
VM Witness
Tertiary SiteVMDK
Secondary Site
Data Component
Witness Component
• Each site needs to have 2N+1 votes to tolerate N failures within site
• (Local) Witness components added on each site to satisfy votes.
• Each site can now independently handle N failures
• Additionally can tolerate one site failure.
Failures To Tolerate within Site = 1 (RAID-1 with 2 mirrors)
Hosts needed to tolerate 1 failures = 2N+1 = 3
Site Disaster To Tolerate and (Secondary) Failures To Tolerate with Mirroring
VMworld 2018 Content: Not for publication or distribution
Stretched Cluster Local Failure Protection – RAID-6
Witness
Preferred Site Secondary Site
Tertiary SiteVM
VMDK
Preferred Site Secondary Site
Failures To Tolerate within Site = 2 (RAID-6 can tolerate 2 failures)
Hosts needed to tolerate 2 failures = 2N+1 = 5
Site Disaster To Tolerate and (Secondary) Failures To Tolerate with Erasure Coding
• Each site needs 2N+1 hosts to tolerate N failures within site
• RAID-6 protects against 2 host failures. N = 2
• RAID-6 has 6 stripes on 6 unique hosts.
• Hence 2N+1, i.e 5 hosts always take part in a RAID-6 configuration.
VMworld 2018 Content: Not for publication or distribution
Site Affinity for Stretched Clusters
vSphere vSAN
ClusterCluster
• User can specify single site location of VM’s components if site level protection is unnecessary
• Policy driven using SPBM
• Reduces network and storage requirements
• Ideal for solutions that already use application redundancy (Exchange DAGs, SQL Availability groups, etc.)
3rd site for witness
RAID-0
RAID-6
No site failure Protection
VMworld 2018 Content: Not for publication or distribution
Site Affinity for Stretched Clusters
vSphere vSAN
ClusterCluster
• User can specify single site location of VM’s components if site level protection is unnecessary
• Policy driven using SPBM
• Reduces network and storage requirements
• Ideal for solutions that already use application redundancy (Exchange DAGs, SQL Availability groups, etc.)
3rd site for witness
RAID-0
RAID-6
No site failure Protection
VMworld 2018 Content: Not for publication or distribution
24Confidential │ ©2018 VMware, Inc.
IO Traffic In Stretch Cluster
VMworld 2018 Content: Not for publication or distribution
vSAN Stretched Cluster - Writes
vSphere vSAN
ClusterCluster
5ms RTT, 10GbE
RAID-6
3rd site for witness
RAID-6
RAID-1
Proxy
• Data needs to be sent to multiple replicas on the remote site
• Will consume network bandwidth if not optimized
• vSAN sends copies once to a proxy on the remote site
• Proxy will replicate the copies as many times required locally
• Network bandwidth is conserved
VMworld 2018 Content: Not for publication or distribution
26Confidential │ ©2018 VMware, Inc.
vSAN Stretched Clusters – Reads
• Read Locality enabled by default
• Reads occur only on site VM resides on
• VMs typically do not move between sites– Consider cache rewarm in
vSAN Hybrid architectures
Today
vSphere vSAN
ClusterCluster
5ms RTT, 10GbE
RAID-0
3rd site for witness
RAID-0
RAID-1
VMworld 2018 Content: Not for publication or distribution
27Confidential │ ©2018 VMware, Inc.
Failure Scenarios
VMworld 2018 Content: Not for publication or distribution
28Confidential │ ©2018 VMware, Inc.
Failure Scenario #1Site Isolation
comp comp comp
RAID6
comp comp
Secondary Site
comp comp comp comp
RAID6
R1
compcompcomp
R1
WitnessVM
Preferred Site
XX
• Preferred Site is Down
VMworld 2018 Content: Not for publication or distribution
29Confidential │ ©2018 VMware, Inc.
Failure Scenario #2Multiple Host Failures
comp comp comp
RAID6
comp comp
Secondary Site
comp comp comp comp
RAID6
R1
compcompcomp
R1
WitnessVM
Preferred Site
XXX X
• Preferred Site is Down• One Failure in Secondary
VMworld 2018 Content: Not for publication or distribution
30Confidential │ ©2018 VMware, Inc.
Failure Scenario #3Multiple Site Failures
comp comp comp
RAID6
comp comp
Secondary Site
comp comp comp comp
RAID6
R1
compcompcomp
R1
WitnessVM
Preferred Site
XXX X
X
• Preferred Site is Down• One Failure in Secondary• Witness Site is Down
VMworld 2018 Content: Not for publication or distribution
31Confidential │ ©2018 VMware, Inc.
Failure Scenario #3 – Votes MatterMultiple Site Failures
comp comp comp
RAID6
comp comp
Secondary Site
comp comp comp comp
RAID6
R1
compcompcomp
R1
WitnessVM
Preferred Site
XXX X
X
Dead Votes 11Available Votes 8
2 1 1 1 1 1 1 1 1 1 1 1
6
VMworld 2018 Content: Not for publication or distribution
32Confidential │ ©2018 VMware, Inc.
Failure Scenario #4Inter-site Disconnect
comp comp comp
RAID6
comp comp
Secondary Site
comp comp comp comp
RAID6
R1
compcompcomp
R1
WitnessVM
Preferred Site
X
• Secondary Site is Down
VMworld 2018 Content: Not for publication or distribution
33Confidential │ ©2018 VMware, Inc.
Failure Scenario #4Inter-site Disconnect + Host Failures
comp comp comp
RAID6
comp comp
Secondary Site
comp comp comp comp
RAID6
R1
compcompcomp
R1
WitnessVM
Preferred Site
X
• Secondary Site is Down• Two Host Failures in Primary
XX X
X
VMworld 2018 Content: Not for publication or distribution
34Confidential │ ©2018 VMware, Inc.
ANY failure on a node – disk, disk-group, NIC – leads to component failure.
Site Isolation is equal to entire site going down for replicated VMs.
Failures are hierarchical – Object is inaccessible after two site failures, irrespective of secondary FTT setting.
Multiple failures on a site can equal a site failure
Object Accessibility depends on Availability of Full Copy of Data Availability of Quorum Votes
Summary of Failure Semantics
VMworld 2018 Content: Not for publication or distribution
Confidential │ ©2018 VMware, Inc.
Questions?
VMworld 2018 Content: Not for publication or distribution
DON’T FORGET TO FILL OUT YOUR SURVEY.
#vmworld #HCI2088BE
VMworld 2018 Content: Not for publication or distribution
THANK YOU!
#vmworld #HCI2088BE
VMworld 2018 Content: Not for publication or distribution
Top Related