Symon Perriman Program Manager II Clustering & High-Availability Microsoft Corporation SESSION CODE:...

Post on 28-Mar-2015

223 views 2 download

Tags:

Transcript of Symon Perriman Program Manager II Clustering & High-Availability Microsoft Corporation SESSION CODE:...

Disaster Recovery by Stretching Hyper-V Clusters Across SitesSymon PerrimanProgram Manager IIClustering & High-AvailabilityMicrosoft Corporation

SESSION CODE: VIR303

Session Objectives And Takeaways

Session Objective(s): Understanding the need and benefit of multi-site clustersWhat to consider as you plan, design, and deploy your first multi-site cluster

Windows Server Failover Clustering with Hyper-V is a great solution for not only high availability, but also disaster recovery

Multi-Site Clustering

IntroductionNetworkingStorageQuorum

Defining High-Availability

But what if there is a catastrophic event and you lose the entire datacenter?

Site A

High-Availability (HA) allows applications or VMs to maintain service availability by moving them between nodes in a cluster

Defining Disaster Recovery

Disaster Recovery (DR) allows applications or VMs to maintain service availability by moving them to a cluster node in a different physical location

Site B

Node is located at a physically separate site

SAN

Site A Site B

Benefits of a Multi-Site Cluster

Protects against loss of an entire locationPower Outage, Fires, Hurricanes, Floods, Earthquakes, Terrorism

Automates failoverReduced downtimeLower complexity disaster recovery plan

Reduces administrative overheadAutomatically synchronize application and cluster changesEasier to keep consistent than standalone servers

What is the primary reason why DR solutions fail?

Dependence on People

Flexible Hardware

Two simple requirements for supportAll components must be logoed

http://www.microsoft.com/whdc/winlogo/default.mspx

Complete solution must pass the Cluster Validation Testhttp://technet.microsoft.com/en-us/library/cc732035.aspx

Same 2008 hardware will workNo reason to not move to R2!

CSV has same storage requirementsiSCSI, Fibre Channel or Serial-Attached SCSI

Support Policy: KB 943984

Multi-Site Clustering

IntroductionNetworkingStorageQuorum

Stretching the Network

Longer distance traditionally means greater network latencyMissed inter-node health checks can cause false failoverCluster heartbeating is fully configurable

SameSubnetDelay (default = 1 second)Frequency heartbeats are sent

SameSubnetThreshold (default = 5 heartbeats)Missed heartbeats before an interface is considered down

CrossSubnetDelay (default = 1 second)Frequency heartbeats are sent to nodes on dissimilar subnets

CrossSubnetThreshold (default = 5 heartbeats)Missed heartbeats before an interface is considered down to nodes on dissimilar subnets

Command Line: Cluster.exe /propPowerShell (R2): Get-Cluster | fl *

Security over the WAN

Encrypt inter-node communication0 = clear text1 = signed (default)2 = encrypted

Site A

10.10.10.1 20.20.20.1

30.30.30.1 40.40.40.1

Site B

Network Considerations

Network Deployment Options:1. Stretch VLANs across sites2. Cluster nodes can reside in different subnets

Site A

Public Network

10.10.10.1 20.20.20.1

30.30.30.1 40.40.40.1

Redundant Network

Site B

DNS ConsiderationsNodes in dissimilar subnetsVM obtains new IP addressClients need that new IP Address from DNS to reconnect

10.10.10.111 20.20.20.222

DNS Server 1DNS Server 2DNS Replication

Record Created

VM = 10.10.10.111

Record Updated

VM = 20.20.20.222

Site A Site B

Record UpdatedRecord Obtained

Faster Failover for Multi-Subnet Clusters

RegisterAllProvidersIP (default = 0 for FALSE)Determines if all IP Addresses for a Network Name will be registered by DNSTRUE (1): IP Addresses can be online or offline and will still be registeredEnsure application is set to try all IP Addresses, so clients can come online quicker

HostRecordTTL (default = 1200 seconds)Controls time the DNS record lives on client for a cluster network nameShorter TTL: DNS records for clients updated soonerExchange Server 2007 recommends a value of five minutes (300 seconds)

Solution #1: Local Failover FirstConfigure local failover fist for high availability

No change in IP addressesNo DNS replication issuesNo data going over the WAN

Cross-site failover for disaster recovery

10.10.10.111

DNS Server 1

VM = 10.10.10.111

Site A Site B

20.20.20.222

Solution #2: Stretch VLANs

Deploying a VLAN minimizes client reconnection timesIP of the VM never changes

DNS Server 1 DNS Server 2

FS = 10.10.10.111

Site A Site B

10.10.10.111

VLAN

Solution #3: Abstraction in Networking Device

Networking device uses independent 3rd IP Address3rd IP Address is registered in DNS & used by client

10.10.10.111 20.20.20.222

DNS Server 1

DNS Server 2

VM = 30.30.30.30Site A Site B

30.30.30.30

Cluster Shared Volumes Networking Considerations

CSV does not support having nodes in dissimilar subnetsUse VLANs if you want to use CSV with multi-site clusters

Note: CSV and live migration are independent, but complimentary, technologies

Site A Site B

VLANCSV

Network

Updating VMs IP Address on Cross-Subnet Failover

On cross-subnet failover, if guest is…

Best to use DHCP in guest OS for cross-subnet failover

•IP updated automaticallyDHCP•Admin needs to configure new IP•Can be scriptedStatic IP

Live Migrating Across Sites

Live migration moves a running VM between cluster nodesTCP reconnects makes the move unnoticeable to clients

Use VLANs to achieve live migrations between sitesIP client is connected to will not change

Network Bandwidth PlanningLive migration may require significant network bandwidth based on amount of memory allocated to VMLM times will be longer with high latency or low bandwidth WAN connections

Multi-Subnet vs. VLAN RecapMulti-Subnet VLAN

Live Migration (seamless)Quick Migration

Fast failoverCluster Shared Volumes

Static IPs in guestFlexibility

Complexity

Choosing the right networking model for you depends on your business requirements

Multi-Site Clustering

IntroductionNetworkingStorageQuorum

Storage in Multi-Site Clusters

Different than local clusters:Multiple storage arrays – independent per siteNodes commonly access own site storageNo ‘true’ shared disk visible to all nodes

Site B

SAN

Site A Site B

Storage Considerations

Site A

Changes are made on Site A and replicated to Site B

DR requires data replication mechanism between sites

Site B

SAN

Site A Site B

Replica

Replication Partners

Hardware storage-based replicationBlock-level replication

Software host-based replicationFile-level replication

Appliance replicationFile-level replication

Synchronous Replication

Host receives “write complete” response from the storage after the data is successfully written on both storage devices

PrimaryStorage

SecondaryStorage

WriteComplete

Replication

Acknowledgement

WriteRequest

Asynchronous Replication

Host receives “write complete” response from the storage after the data is successfully written to just the primary storage device, then replication

PrimaryStorage

SecondaryStorage

WriteComplete

WriteRequest

Replication

Synchronous versus Asynchronous

Synchronous AsynchronousNo data loss Potential data loss on hard failuresRequires high bandwidth/low latency connection

Enough bandwidth to keep up with data replication

Stretches over shorter distances

Stretches over longer distances

Write latencies impact application performance

No significant impact on application performance

Cluster Validation with Replicated Storage

Multi-Site clusters are not required to pass the Storage tests to be supported

Validation Guide and Policyhttp://go.microsoft.com/fwlink/?LinkID=119949

What about DFS-Replication?

Not supported to use the file server DFS-R feature to replicate VM data on a multi-site Failover Cluster

DFS-R performs replication on file close:Works well for Office documents Not designed for application workloads where the file is held open, like VHDs or databases

Cluster Shared Volume Overview

Cluster Shared Volumes (CSV)Distributed file access solution for Hyper-VEnabling multiple nodes to concurrently access a single ‘truly’ shared volumeProvides VMs complete transparency with respect to which nodes actually own a LUNGuest VMs can be moved without requiring any disk ownership changes

No dismounting and remounting of volumes is required

Disk5

Single Volume

VHD VHD VHD

SAN

Concurrent access to a single file system

Site BSite A

CSV with Replicated Storage

Traditional architectural assumptions do not hold trueTraditional replication solutions assume only 1 array accessed at a timeCSV assumes all nodes can concurrently access a LUN

CSV is supported by many replication vendorsTalk to your storage to understand their support story

VHD

Read/OnlyRead/Write

VM attempts to access replica

Site BSite A

Storage Virtualization Abstraction

Some replication solutions provide complete abstraction in storage arrayServers are unaware of accessible disk locationFully compatible with Cluster Shared Volumes (CSV)

Virtualized storage presents logical LUN

Servers abstracted from storage

Choosing a Multi-Site Storage Model

Traditional Cluster Storage

Cluster Shared Volumes

Live MigrationHardware Replication Consult vendorSoftware ReplicationAppliance Replication Consult vendor

Choosing the right storage model for you depends on your business requirements

EMC for Windows Server Failover ClusteringTxomin BarturenSenior ManagerSymmetrix and VirtualizationEMC Corporation

PARTNER

What’s Storage Got To Do With It?Storage Controllers can be powerful compute and replication resourcesProvide multiple forms of replication styles

Synchronous – Metro configurationsAsynchronous – Continental configurations… and various combinations of those

Arrays/Appliances are able to provide Consistency Technology to replication

Bind database and transaction logs together as an atomic unitRequired for Disaster Recovery scenarios

Single consolidated solution for all environmentsAs opposed to per-application solutionOperational ease and automated operations

Geographical Windows ClusteringLong history of Geographical Windows solutions

Original “GeoSpan” introduced in the 1990sCurrent product is called “Cluster Enabler”

Support for multiple storage replication mechanisms

Symmetrix Remote Data Facility (SRDF)CLARiiON MirrorviewEMC RecoverPoint (Appliance)

Support for multiple replication implementations

Synchronous (SRDF/S, MV/S, RP)Asynchronous (SRDF/A, MV/A, RP)

Select the best replication fit for SLA

Cluster Enabler – Integration with Failover ClusteringCluster Enabler is implemented as a cluster group resource

DLL manages disk state when necessary

Disaster or site move requestsCustom MMC for administration

Provides insight into relationshipsAllows for management of storage resources

Add/remove storage devicesAll cluster functions managed through Failover Cluster Manager

Simplified management

Unique Cluster Configuration SupportConcurrent Replication

Cascaded Replication

Heterogeneous Replication

Challenges of Block Storage ReplicationStorage block level replication is typically uni-directional (per LUN)

Change blocks flow from source site to remotePossible to have different LUNs replicating in different directionsStorage cannot enforce block level collision resolution

Application must determine resolution, or be coordinated

Applications today implement shared nothing modelSurfacing storage as R/W at multiple sites is only useful if application can handle a distributed access deviceFew applications implement the necessary support

Obvious exception is CSV

EMC VPLEX METRO support for Hyper-V and Cluster Shared Volumes

ANNOUNCING

Federated Storage InfrastructureFederated storage

A new HW and SW platform that extends storage beyond the boundaries of the data centerLocated in the SAN to present hosts with federated view of EMC and heterogeneous storage

VPLEX Local and VPLEX Metro configurationsUnique Value

Distributed coherent cache – AccessAnywhere™N+1 scale out clusterData at a Distance Architected for Global Apps

Workload “travels” with application

Sample VPLEX METRO Configuration

CSV - Volume1 - OS VHDsCSV - Volume2 - OS VHDs

CSV - Volume3 - OS VHDs

CSV - Volume4 - OS VHDs

NewYork-01

NewYork-02

NewYork-03

NewYork-04

NewJersey-01NewJersey-02

NewJersey-03

NewJersey-04

VPLEXCluster-1

VPLEXCluster-2

CSV - Volume1 - SQL VHDsCSV - Volume2 - SQL VHDs

CSV - Volume3 - SQL VHDs

CSV - Volume4 - SQL VHDs

EMC VPLEX Metro with Cluster Shared Volumes

DEMO

Multi-Site Clustering

IntroductionNetworkingStorageQuorum

Quorum Overview

Disk only (not recommended)Node and Disk majority

Node majorityNode and File Share majority

VoteVote Vote Vote Vote

Majority is greater than 50%Possible Voters:

Nodes (1 each) + 1 Witness (Disk or File Share)4 Quorum Types

Replicated Disk Witness

A witness is a tie breaker when nodes lose network connectivityThe witness disk must be a single decision maker, or problems can occur

Do not use a Disk Witness in multi-site clusters unless directed by vendor

Replicated Storage

?Vote Vote Vote

Node Majority

Site BSite A

Cross site network connectivity broken!

Can I communicate with majority of the nodes in

the cluster?Yes, then Stay Up

Can I communicate with majority of the nodes in

the cluster?No, drop out of Cluster

Membership

5 Node Cluster: Majority = 3

Majority in Primary Site

Node Majority

Disaster at Site 1

Can I communicate with majority of the nodes in

the cluster?No, drop out of Cluster

Membership

5 Node Cluster: Majority = 3

Need to force quorum manually

Site A

We are down!

Site B

Majority in Primary Site

Forcing Quorum

Forcing quorum is a way to manually override and start a node even if the cluster does not have quorum

Important: understand why quorum was lostCluster starts in a special “forced” stateOnce majority achieved, drops out of “forced” state

Command Line:net start clussvc /fixquorum (or /fq)

PowerShell (R2):Start-ClusterNode –FixQuorum (or –fq)

Multi-Site with File Share Witness

Site A Site B

Site C (branch office)

Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share

WAN

File Share Witness

File Share Witness

Multi-Site with File Share Witness

\\Foo\Share

WAN

Complete resiliency and automatic recovery from the loss of connection between sites

Can I communicate with majority of the nodes in the

cluster?No (lock failed), drop out of

Cluster Membership

Site BSite A

Can I communicate with majority of the nodes (+FSW) in the cluster?

Yes, then Stay Up

Site C (branch office)

File Share Witness (FSW) Considerations

Simple Windows File ServerSingle file server can serve as a witness for multiple clusters

Each cluster requires it’s own shareCan be made highly available on a separate cluster

Recommended to be at 3rd separate site for DR

FSW cannot be on a node in the same clusterFSW should not be in a VM running on the same cluster

Quorum Model Recap

•Even number of nodes•Highest availability solution has FSW in 3rd site

Node and File Share Majority

•Odd number of nodes•More nodes in primary siteNode Majority

•Use as directed by vendorNode and Disk Majority

•Not Recommended•Use as directed by vendor

No Majority: Disk Only

Multi-Site Clustering ContentDesign guide: http://technet.microsoft.com/en-us/library/dd197430.aspxDeployment guide/checklist:

http://technet.microsoft.com/en-us/library/dd197546.aspx

Session Summary

Multi-site Failover Clusters have many benefitsYou can achieve high-availability and disaster recover in a single solution using Windows Server Failover Clustering

Multi-site clusters have additional considerations:Determine network topology across sitesChoose a storage replication solutionPlan quorum model & nodes

Passion for High Availability?

Are You Up For a Challenge?

Become a Cluster MVP!

Contact: ClusMVP@microsoft.com

Related ContentBreakout Sessions

WSV313 | Failover Clustering Deployment SuccessWSV314 | Failover Clustering Pro Troubleshooting with Windows Server 2008 R2VIR303 | Disaster Recovery by Stretching Hyper-V Clusters across Sites ARC308 | High Availability: A Contrarian ViewDAT207 | SQL Server High Availability: Overview, Considerations, and Solution GuidanceDAT303 | Architecting and Using Microsoft SQL Server Availability Technologies in a Virtualized WorldDAT305 | See the Largest Mission Critical Deployment of Microsoft SQL Server around the WorldDAT401 | High Availability and Disaster Recovery: Best Practices for Customer DeploymentsDAT407 | Windows Server 2008 R2 and Microsoft SQL Server 2008: Failover Clustering ImplementationsUNC304 | Microsoft Exchange Server 2010: High Availability Deep DiveUNC305 | Microsoft Exchange Server 2010 High Availability Design Considerations

Interactive SessionsVIR06-INT | Failover Clustering with Hyper-V Unleashed with Windows Server 2008 R2UNC01-INT | Real-World Database Availability Group (DAG) DesignVIR02-INT | Hyper-V Live Migration over Distance: A Multi-Datacenter Approach BOF34-IT | Microsoft Exchange Server High Availability and Disaster Recovery: Are You Prepared?

Hands-on LabsWSV01-HOL | Failover Clustering in Windows Server 2008 R2DAT01-HOL | Create a Two-Node Windows Server 2008 R2 Failover ClusterDAT02-HOL | Create a Windows Server 2008 R2 MSDTC ClusterDAT09-HOL | Installing a Microsoft SQL Server 2008 + SP1 Clustered InstanceDAT12-HOL | Maintaining a Microsoft SQL Server 2008 Failover ClusterUNC02-HOL | Microsoft Exchange Server 2010 High Availability and Storage ScenariosVIR06-HOL | Implementing High Availability and Live Migration with Windows Server 2008 R2 Hyper-V

Visit the Cluster Team in the TLC

Failover Clustering Booth

WSV-7

Failover Clustering ResourcesCluster Team Blog: http://blogs.msdn.com/clustering/

Cluster Resources: http://blogs.msdn.com/clustering/archive/2009/08/21/9878286.aspx

Cluster Information Portal: http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx

Clustering Technical Resources: http://www.microsoft.com/windowsserver2008/en/us/clustering-resources.aspx

Clustering Forum (2008): http://forums.technet.microsoft.com/en-US/winserverClustering/threads/

Clustering Forum (2008 R2): http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2highavailability/threads/

R2 Cluster Features: http://technet.microsoft.com/en-us/library/dd443539.aspx

Multi-Site Clustering Design guide: http://technet.microsoft.com/en-us/library/dd197430.aspx

Multi-Site Clustering Deployment guide/checklist: http://technet.microsoft.com/en-us/library/dd197546.aspx

Hyper-V Business Continuity portal: http://www.microsoft.com/virtualization/en/us/solution-continuity.aspx

Microsoft Cross-Site Disaster Recovery Solutions whitepaperhttp://download.microsoft.com/download/3/6/1/36117F2E-499F-42D7-9ADD-A838E9E0C197/SiteRecoveryWhitepaper_final_120309.pdf

Virtualization Track ResourcesStay tuned into virtualization at TechEd NA 2010 by visiting our event website, Facebook and Twitter pages. Don’t forget to visit the Virtualization TLC area (orange section) to see product demos, speak with experts and sign up for promotional giveawaysMicrosoft.com/Virtualization/Events Facebook.com/Microsoft.VirtualizationTwitter.com/MS_Virt Like this session? Write a blog on 2 key learning's from this session and send it to #TE_VIR and you could win a Lenovo IdeaPad™ S10-3 with Windows 7 Netbook! Review the rules on our event websiteMicrosoft.com/Virtualization/Events

Resources

www.microsoft.com/teched

Sessions On-Demand & Community Microsoft Certification & Training Resources

Resources for IT Professionals Resources for Developers

www.microsoft.com/learning

http://microsoft.com/technet http://microsoft.com/msdn

Learning

Complete an evaluation on CommNet and enter to win!

Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st

http://northamerica.msteched.com/registration

You can also register at the

North America 2011 kiosk located at registrationJoin us in Atlanta next year

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

JUNE 7-10, 2010 | NEW ORLEANS, LA