Download - HCI2088BE vSAN Stretched Cluster for publication Technical ... · Use vSphere Replicationto third site, and enable RPO as low as 5minutes Use Site Recovery Manager for disaster recovery

vSAN Stretched Cluster Technical Deep Dive

Paudie O’Riordan, VMware

Mansi Shah, VMware

#vmworld #HCI2088BE

HCI2088BE

VMworld 2018 Content: Not for publication or distribution

Disclaimer

2©2018 VMware, Inc.

This presentation may contain product features orfunctionality that are currently under development.

This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery.

Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.


Agenda

3Confidential │ ©2018 VMware, Inc.

Stretched Clusters

SDDC Use Cases

Day 0-2 Best Practices

Storage Policies & Data Placement

Failure Scenarios



Stretched Clusters



What are vSphere Stretched Clusters?Infrastructure spanning two physical locations

Benefits• Site-level high availability for

business continuity• Disaster and downtime

avoidance• Workload mobility and load-

balancing

Challenges• Complex configuration and

management• Expensive to deploy and

operate• Creates silos of specialized

hardware

Site A Site B

vSphere Stretched

Cluster

Storage A Storage BData Replication

3rd site for witness



What are vSAN Stretched Clusters?Simple active-active data centers

Objectives• Provide same benefits as

traditional stretched, and more• Remove need for expensive,

complex storage arrays

Benefits• Single HCI cluster stretched

across two sites• Does not require any

specialized hardware• Simplified management• VM granular local and remote

protection configured via policies

vSAN Stretched

Cluster

3rd site for vSAN witness

Preferred Site Secondary Site



vSAN Stretched Cluster Architecture

vSphere vSAN

ClusterCluster

5ms RTT, 10GbE

RAID-6


RAID-6

RAID-1

Characteristics• Built on vSAN Fault Domains (each site is

a FD)• High bandwidth ISL for synchronous

traffic– Latency requirements: <=5ms RTT

• Witness serves as tie-breaker for split-brain scenarios and stores only witness components

– Latency requirements: 100-200ms RTT

Data Placement• VM with site protection (RAID-1 / PFFT=1)

and local protection (RAID-6 / SFFT=2)– Two copies of a RAID6 tree, one at each site– RAID1 witness component resides at witness

site– RAID6 parity component spread across

hosts at each siteVMworld 2018 Content: Not for publication or distribution


SDDC Use Cases



vSAN for Remote Offices / Branch Offices (ROBO)2 Node vSAN Cluster

Same architecture as a 1+1+W Stretched Cluster

2 Node direct connect removes need for expensive 10gb switches at each site

Minimum of 500ms RTT latency to/from the witness site

Limited to RAID-1 protection of all VMs

Per-VM licensing available

VMs can run on either node (no affinity rules required)

Cluster

vSphere vSAN

vSphere vSAN

vSAN witness traffic

1Gb switch

vSAN traffic

vSAN witness traffic

vSphere vSAN

1Gb switch

vSAN traffic

Data CenterROBO site

ROBO site



Active-Active Data Center with Additional DR ProtectionvSAN Stretched Clusters and Site Recovery Manager

Use vSphere Replication to third site, and enable RPO as low as 5 minutes

Use Site Recovery Manager for disaster recovery orchestration

Stretched across metro distance, replicated across geo

Place the witness at the DR cluster

Stretched Cluster


vSphere Replication

Site Recovery Manager

vSphere Rep.

SRM

DR Cluster

Synchronous I/O5ms RTT, 10Gb

Async ReplicationAny distance

vSphere vSAN vSphere vSAN vSphere vSAN



AWS Availability Zone Level Resiliency vSAN Stretched Clusters for VMware Cloud on AWS

Provide increased data resiliency for VMware Cloud on AWS, without the need for an additional DRaaS

Provide customer and management workload resiliency against AZ failure, within a region

Support for RAID-1 across AZs, RAID-5 within AZ

Managed service lifecycle of the vSAN witness

AZ AZ

Region

AZ

Witness

vSAN Stretched Cluster

AWS Global Infrastructure AWS

HA Restart



Day 0-2 Best PracticesClick to edit optional subtitle



Day 0/1: Deployment and Configuration

vSphere vSAN

5ms RTT, 10GbE



Network• Design for ISL redundancy• Always L3 to the witness with

witness traffic separation

HA/DRS• Enforce use and ongoing

adherence to affinity rules and groups

• Set DRS to fully automated



Day 2: vSAN Witness

Use the OVA appliance for an integrated license

Use Witness Traffic Separation to isolate vSAN data traffic and assign different MTU sizes

Ensure no failures and all VMs are compliant before maintenance mode

• Witness down = site failure

Simply redeploy a new witness to reduce time the cluster is degraded

Thin-provision and over-provision witness resources

vSphere vSAN


Witness Traffic

1500 MTU

vSAN DataTraffic

9000 MTU



Day 2: Lifecycle and Maintenance

vSphere vSAN

5ms RTT, 10GbE



Use VUM rolling upgrades/patches• Start with preferred site, then

secondary, then witness• Use VUM to also lifecycle the witness

(not part of rolling upgrade of vSAN cluster)

• Disable HA admission control beforehand

Site-wide maintenance1. Modify affinity rules2. Migrate all VMs to other side 3. Place hosts into maintenance mode

one at a time, using Ensure Accessibility

In general – only put one host into maintenance mode at one timeVMworld 2018 Content: Not for publication or distribution


Storage Policies & Data Placement



Site Disaster To Tolerate

Failures To Tolerate

Failure Tolerance Method (either Mirroring [default] or Erasure Coding)

IOPS limit for object

Disable object checksum

Force provisioning

Number of disk stripes per object

Flash read cache reservation (%)

Object space reservation (%)

Affinity (when PFTT=0 in stretched clusters)

Interesting Policies for Stretch Cluster

Consideration: Policies have a direct impact on the number of components that are created

when provisioning a VM on vSAN

There are 11 unique policies in vSAN 6.7


Site Disaster To Tolerate

vSphere vSAN

ClusterCluster

5ms RTT, 10GbE

RAID-0


RAID-0

RAID-1

• Each site has one full copy of the data

• Witness component is placed on the remote witness site as tie-breaker

• Synchronous replication between sites needs low latency link

• Witness component only holds metadata and can be over higher latency link

Site Failure Protection

Replica 1 Replica 2

Tie-breaker Witness


Local & Remote Protection for Stretched Clusters

vSphere vSAN

ClusterCluster

5ms RTT, 10GbE

• Redundancy locally andacross sites

• With site failure, vSAN maintains availability with local redundancy in surviving site

• No change in stretched cluster configuration steps

• Optimized site locality logic to minimize I/O traffic across sites

RAID-6


RAID-6

RAID-1

Site Disaster To Tolerate and (Secondary) Failures To Tolerate with Erasure Coding


Stretched Cluster Local Failure Protection – RAID-1

Preferred Site

VM Witness

Tertiary SiteVMDK

Secondary Site

Data Component

Witness Component

• Each site needs to have 2N+1 votes to tolerate N failures within site

• (Local) Witness components added on each site to satisfy votes.

• Each site can now independently handle N failures

• Additionally can tolerate one site failure.

Failures To Tolerate within Site = 1 (RAID-1 with 2 mirrors)

Hosts needed to tolerate 1 failures = 2N+1 = 3

Site Disaster To Tolerate and (Secondary) Failures To Tolerate with Mirroring


Stretched Cluster Local Failure Protection – RAID-6

Witness


Tertiary SiteVM

VMDK


Failures To Tolerate within Site = 2 (RAID-6 can tolerate 2 failures)

Hosts needed to tolerate 2 failures = 2N+1 = 5

Site Disaster To Tolerate and (Secondary) Failures To Tolerate with Erasure Coding

• Each site needs 2N+1 hosts to tolerate N failures within site

• RAID-6 protects against 2 host failures. N = 2

• RAID-6 has 6 stripes on 6 unique hosts.

• Hence 2N+1, i.e 5 hosts always take part in a RAID-6 configuration.


Site Affinity for Stretched Clusters

vSphere vSAN

ClusterCluster

• User can specify single site location of VM’s components if site level protection is unnecessary

• Policy driven using SPBM

• Reduces network and storage requirements

• Ideal for solutions that already use application redundancy (Exchange DAGs, SQL Availability groups, etc.)


RAID-0

RAID-6

No site failure Protection



IO Traffic In Stretch Cluster


vSAN Stretched Cluster - Writes

vSphere vSAN

ClusterCluster

5ms RTT, 10GbE

RAID-6


RAID-6

RAID-1

Proxy

• Data needs to be sent to multiple replicas on the remote site

• Will consume network bandwidth if not optimized

• vSAN sends copies once to a proxy on the remote site

• Proxy will replicate the copies as many times required locally

• Network bandwidth is conserved



vSAN Stretched Clusters – Reads

• Read Locality enabled by default

• Reads occur only on site VM resides on

• VMs typically do not move between sites– Consider cache rewarm in

vSAN Hybrid architectures

Today

vSphere vSAN

ClusterCluster

5ms RTT, 10GbE

RAID-0


RAID-0

RAID-1



Failure Scenarios



Failure Scenario #1Site Isolation

comp comp comp

RAID6

comp comp

Secondary Site

comp comp comp comp

RAID6

R1

compcompcomp

R1

WitnessVM

Preferred Site

XX

• Preferred Site is Down



Failure Scenario #2Multiple Host Failures

comp comp comp

RAID6

comp comp

Secondary Site

comp comp comp comp

RAID6

R1

compcompcomp

R1

WitnessVM

Preferred Site

XXX X

• Preferred Site is Down• One Failure in Secondary



Failure Scenario #3Multiple Site Failures

comp comp comp

RAID6

comp comp

Secondary Site

comp comp comp comp

RAID6

R1

compcompcomp

R1

WitnessVM

Preferred Site

XXX X

X

• Preferred Site is Down• One Failure in Secondary• Witness Site is Down



Failure Scenario #3 – Votes MatterMultiple Site Failures

comp comp comp

RAID6

comp comp

Secondary Site

comp comp comp comp

RAID6

R1

compcompcomp

R1

WitnessVM

Preferred Site

XXX X

X

Dead Votes 11Available Votes 8

2 1 1 1 1 1 1 1 1 1 1 1

6



Failure Scenario #4Inter-site Disconnect

comp comp comp

RAID6

comp comp

Secondary Site

comp comp comp comp

RAID6

R1

compcompcomp

R1

WitnessVM

Preferred Site

X

• Secondary Site is Down



Failure Scenario #4Inter-site Disconnect + Host Failures

comp comp comp

RAID6

comp comp

Secondary Site

comp comp comp comp

RAID6

R1

compcompcomp

R1

WitnessVM

Preferred Site

X

• Secondary Site is Down• Two Host Failures in Primary

XX X

X



ANY failure on a node – disk, disk-group, NIC – leads to component failure.

Site Isolation is equal to entire site going down for replicated VMs.

Failures are hierarchical – Object is inaccessible after two site failures, irrespective of secondary FTT setting.

Multiple failures on a site can equal a site failure

Object Accessibility depends on Availability of Full Copy of Data Availability of Quorum Votes

Summary of Failure Semantics


DON’T FORGET TO FILL OUT YOUR SURVEY.

#vmworld #HCI2088BE


THANK YOU!

#vmworld #HCI2088BE