Jim Teague Program Manager Microsoft Corporation.

71
The 10 Most Important Things About Failover Clustering Jim Teague Program Manager Microsoft Corporation

Transcript of Jim Teague Program Manager Microsoft Corporation.

Page 1: Jim Teague Program Manager Microsoft Corporation.

The 10 Most Important Things About Failover Clustering

Jim TeagueProgram ManagerMicrosoft Corporation

Page 2: Jim Teague Program Manager Microsoft Corporation.

AgendaFailover Clustering in Windows Server Longhorn is different. Here are 10 big changes you need to know about:

1. Cluster Validation

2. Revamped Setup

3. New Cluster Experience

4. Networking Enhancements

5. Getting to Longhorn

6. New Security Model

7. New Quorum Model

8. Geographically Dispersed Cluster Enhancements

9. Shared Storage Topologies

10. Storage Compatibility

Page 3: Jim Teague Program Manager Microsoft Corporation.

Terminology Changes

BetaWolfpack

Windows NT 4.0Microsoft Cluster Service (MSCS)

Windows 2000 Server / Windows Server 2003Server Clustering

Windows codenamed Longhorn ServerFailover Clustering (WSFC)

Page 4: Jim Teague Program Manager Microsoft Corporation.

Where Is Clustering Going…

What’s Clustering in Longhorn all about?

Simplicity, Security, StabilityClusters for people without PhD’s

Easy to create, use, and manage

Enabling the IT Generalist

Reduce Clustering Total Cost of Ownership

Making Clusters a smart business choice for the enterprise

Improvements in Security, Networking, Eventing, and Storage

Page 5: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes

1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 6: Jim Teague Program Manager Microsoft Corporation.

Motivation For Validate

Configuration Issues

Cabling mistakes

SP and Hotfix binaries

Driver mismatches

Inconsistent Settings

Complexity

Best Practices

Supportability Requirements

Hardware Compatibility

If we can eliminate the configuration issues up front, we can ensure a better

cluster experience (installation and operation)

48% of Cluster support calls are due to configuration problems

-Microsoft PSS

80% of failures are due to human error

-Gartner

Page 7: Jim Teague Program Manager Microsoft Corporation.

What Is Cluster Validate?Runs a focused set of tests on a collection of servers that are intended to be a cluster

Catch hardware or configuration problems before the cluster goes in production

Ensures that the solution you are about to deploy is rock solid

Run validate each and every time you install a new cluster

It’s the very first thing you do!

Validate can also be run on configured clusters as a diagnostic tool

Disk Resources need to be in an Offline state to be validated

Page 8: Jim Teague Program Manager Microsoft Corporation.

What Does Validate Inventory?OS BinaryConsistenc

y

Architecture

Configuration

Correct and Same OS version

Same Hotfix and Service Pack level

Same CPU architecture

Consistent Domain membership

Analysis of unsigned driversStorage HBA’s and Networking NIC’s

Devices

Page 9: Jim Teague Program Manager Microsoft Corporation.

What Does Validate Verify?

Infrastructure

Storage

Network

Inter-cluster communicationShared disks accessible from all machines and uniquely identifiable

Shared Storage Persistent Reservation functionality and complianceDisk I/O latencies

Each NIC has different IP addressNetwork I/O latencies

Failover simulationFunctionality

Page 10: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 11: Jim Teague Program Manager Microsoft Corporation.

Create an entire cluster in one step

Setup is streamlined and simplified

Intuitive

All the power of a full cluster test suite in your hands to ensure that the actual cluster you are setting up will provide rock solid stability

Catch configuration issues

Fully scriptable for automated deployments

New Create Cluster API allows fullycustomizable experience

Simple

Validation

Deployable

Easy To Create Clusters

Page 12: Jim Teague Program Manager Microsoft Corporation.

Create The ClusterWindows Server 2003 (today)

Page 13: Jim Teague Program Manager Microsoft Corporation.

Create The Cluster (3, 4)

Page 14: Jim Teague Program Manager Microsoft Corporation.

Create The Cluster (5, 6)

Page 15: Jim Teague Program Manager Microsoft Corporation.

Create the Cluster (7, 8)

Enter Cluster Services Account…

Page 16: Jim Teague Program Manager Microsoft Corporation.

Create The Cluster (9, 10)

Page 17: Jim Teague Program Manager Microsoft Corporation.

Create The Cluster (11, 12)

Page 18: Jim Teague Program Manager Microsoft Corporation.

Add Second Node (13, 14)

Page 19: Jim Teague Program Manager Microsoft Corporation.

Add Second Node (15, 16)

Page 20: Jim Teague Program Manager Microsoft Corporation.

Add Second Node (17, 18)

Page 21: Jim Teague Program Manager Microsoft Corporation.

Add Second Node (19, 20)

Page 22: Jim Teague Program Manager Microsoft Corporation.

Add Second Node (21, 22)

Page 23: Jim Teague Program Manager Microsoft Corporation.

Add Second Node (23)

Finally!

Page 24: Jim Teague Program Manager Microsoft Corporation.

The Validate tool is the 1st menu step

Setup is simplified and Intuitive

Create a 16 node cluster in one step

Simple

Validation

Fast

LH Easy To Create Clusters

Page 25: Jim Teague Program Manager Microsoft Corporation.

Cluster Setup

Page 26: Jim Teague Program Manager Microsoft Corporation.

Cluster Setup

Page 27: Jim Teague Program Manager Microsoft Corporation.

Cluster SetupIt's that easy!

Page 28: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 29: Jim Teague Program Manager Microsoft Corporation.

New User Experience

All New Cluster Management Tool!!Designed to be task-based and easy to useFewer dials-n-knobs to worry about

What’s all this IsAlive/LooksAlive stuff I don’t care about? Just make my cluster work!

Tell us what you want to do and we’ll take care of the rest

I would like to make this File Share Highly Available…

Page 30: Jim Teague Program Manager Microsoft Corporation.

Cluster Administrator Tool Today…

New Cluster MMC Snap-In

Page 31: Jim Teague Program Manager Microsoft Corporation.

Command line (cluster.exe)

Cluster Management

ConsoleFully

Scriptable with WMI

Richer Tool Experience

Exposes Advanced

Options

Task Oriented

Phasing out MSClus

ClusterMOM Management Pack

Manageability

Page 32: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 33: Jim Teague Program Manager Microsoft Corporation.

Networking Enhancements

Integrated with new Longhorn TCP/IP StackFull IPv6 Support

Native IPv6 support for client access, native and tunnels Inter-node communication with IPv6

DHCP Support for IPv4 ResourcesObtain cluster IP address from a DHCP serverRelieves management pain of static IP’s

Page 34: Jim Teague Program Manager Microsoft Corporation.

Networking Enhancements

No more legacy dependencies on NetBIOS

Ready for NetBIOS-less environmentsSimplifying the transport of SMB trafficRemoving WINS and NetBIOS name resolution broadcastsStandardizing name resolution on DNS

Moved from datagram RPC protocols to more secure TCP session oriented protocolsImprovements in IPSec to allow almost instantaneous failover for clients

Page 35: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 36: Jim Teague Program Manager Microsoft Corporation.

Cluster MigrationsCluster Migration Tool

Will assist migration of a cluster configurationfrom one cluster to anotherCopies resources and cluster configurations from one cluster to another

No Mixed Version CompatibilityLH node and Win2003 node can not be in the same cluster at the same time

No rolling upgrades

Page 37: Jim Teague Program Manager Microsoft Corporation.

Migrating To LonghornStep 1

All nodes running Windows Server 2003

Group owned by Node 1 hosting a IP Address, Network Name, Physical Disk, and File Share resources

SAN

Node 1 Node 2

Windows Server 2003

Single Cluster

IPNameDisk

File Share

Page 38: Jim Teague Program Manager Microsoft Corporation.

Migrating To LonghornStep 2

Evict Node 2 from the Windows 2003 cluster

Perform clean install of Longhorn on Node 2

Create an independent single node cluster with a new Cluster Name

SAN

Node 1 Node 2

Longhorn2003

Two separate Clusters

IPNameDisk

File Share

Page 39: Jim Teague Program Manager Microsoft Corporation.

Migrating To LonghornStep 3

Run the Longhorn Migration Wizard on Node 2

Designate Node 2 as target and Node 1 as source

Perform a group by group copy of resources

Groups are created in an Offline state

SAN

Node 1 Node 2

ResourcesOnline

IPNameDisk

File Share

IPNameDisk

File ShareResources

Offline

Page 40: Jim Teague Program Manager Microsoft Corporation.

Migrating To LonghornStep 4

Longhorn Cluster is pre-staged and ready for migration

Bring group Offline on Node 1

Bring group Online on Node 2

SAN

Node 1 Node 2

ResourcesOffline

IPNameDisk

File Share

IPNameDisk

File Share

ResourcesOnline

Page 41: Jim Teague Program Manager Microsoft Corporation.

Migrating To LonghornStep 5

Install Longhorn on Node 1

Join Node 1 to the existing Longhorn cluster with Node 2

Resources can now be failed back and forth

Migration is now complete!

SAN

Node 1 Node 2

IPNameDisk

File ShareResources

Online

Single Cluster

Page 42: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 43: Jim Teague Program Manager Microsoft Corporation.

Service Manageability

Improved Security ModelCluster Service now runs in the context of the LocalSystem built-in account

No more Cluster Service Account (CSA)

No more account password management

No need to pre-stage defined user accounts

More resilient to configuration issuesAddresses supportability issues where privileges are accidentally  stripped by group policies

Increased security

Page 44: Jim Teague Program Manager Microsoft Corporation.

New Security ContextHow does this impact you?

Cluster Service starts with set privileges

Resource Hosting Subsystem launched in the same context with the same privileges

Resource DLL’s and Applications are launched in the same context of RHS with the same set of privileges

No common identity

In short, any custom resource DLL or applications leveraging the Generic Application or Generic Script resource types will have reduced privileges and no remote-ability

You are responsible for handling the credentials your applications require

Test your apps and resources with Windows Server Longhorn!

Page 45: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 46: Jim Teague Program Manager Microsoft Corporation.

New Quorum Model

Majority based cluster membershipWho and what gets a vote is fully configurable

Eliminating failure pointsOriginal design assumed that storage would be always available

New best-of-both-worlds quorum model

Hybrid of legacy Majority Node Set (MNS) logic and Shared Disk Quorum modelThis model will replace both existing models

No single point of failure!Can survive loss of the Quorum disk

Page 47: Jim Teague Program Manager Microsoft Corporation.

Majority Quorum ModelNew majority based quorum model

Majority of Nodes based quorum

Disk is optional witness to have a vote in deciding majority

3 total votes, with 2 needed for majority

So the Cluster can survive the loss of any 1 vote

SAN

Node 1 Node 2

Shared Storage Device gets 1

vote

VoteVote

Vote

Each node counts as 1

vote

Page 48: Jim Teague Program Manager Microsoft Corporation.

Majority Of Nodes

Only Nodes get votes3+ Node votes without Shared Storage voteMajority of votes needed to operate clusterNo shared disk vote

Node 3Node 1 Node 2

Replicated Storage Devices

Vote

VoteVote

Page 49: Jim Teague Program Manager Microsoft Corporation.

Witness DiskOnly Disk gets a vote

Nodes have no votes

Quorum disk is the master

Cluster stays up even if only 1 node can talk to the disk

Achieves same behavior as legacy quorum model

SAN

Node 1 Node 2

Shared Storage Device is master

Vote

Page 50: Jim Teague Program Manager Microsoft Corporation.

File Share WitnessFile Share Witness allows a 2-node cluster with no shared disk

Majority of Nodes + Witness based quorum

Excellent solution for GeoClusters

Witness could reside at a 3rd Site

Single file server could serve as the Witness for multiple clusters

Node 1 Node 2

File Share on an independent

server

VoteVote

Witness

Each node counts as 1

vote

Page 51: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 52: Jim Teague Program Manager Microsoft Corporation.

Stretching Clusters

But businesses are now demanding more!

Stretching Nodes across the river used to be good enough…

Page 53: Jim Teague Program Manager Microsoft Corporation.

Geographically Dispersed Clusters

No More Single-Subnet LimitationAllow cluster nodes to communicate across network routers

No more having to connect nodes with VLANs!

Configurable Heartbeat TimeoutsIncrease to Extend Geographically Dispersed Clusters over greater distances

Decrease to detect failures faster and take recovery actions for quicker failover

Page 54: Jim Teague Program Manager Microsoft Corporation.

IP Address Resource A

IP Address Resource B

Enhanced DependenciesNew Dependency Filter Objects

Network Name resource stays up if either IP Address resource A or B are up

Today both resource A and B have to be online for the Network Name to be available to users

Allows redundant resources and scoping impact to dependent services and applications

OR OR

Network Name Resource

Page 55: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 56: Jim Teague Program Manager Microsoft Corporation.

Shared Storage Topology Requirements

Only storage that supports Persistent Reservations will be supported in Longhorn Failover Clustering

Deprecating parallel-SCSI supportSerial Attached SCSI (SAS) based clusters will replace parallel-SCSI

Fibre Channel iSCSI SAS

Supported Shared Bus Types

Page 57: Jim Teague Program Manager Microsoft Corporation.

Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster

Enhancements9. Shared Storage Topologies 10. Storage Compatibility

Page 58: Jim Teague Program Manager Microsoft Corporation.

Storage Enhancements

Improved disk fencing for shared disksEnhanced mechanism to use Persistent Reservations

New algorithm for managing shared disks

No more device resets with PR’s!No longer uses SCSI Bus Resets which can be disruptive on a SAN

Disks are never left in an unprotected state

Tight integration into core OS disk management

Support for GPT disks

Page 59: Jim Teague Program Manager Microsoft Corporation.

Windows Server Longhorn Will Be A Clean Slate

CompatibilitySome hardware may not be upgradeable

Can not assume solutions that previously worked with clustering will continue to work in Longhorn Clustering

SupportabilityThere will be no grandfathering of support for currently qualified solutions listed on the Windows Server Catalog

Windows Server 2003 clustering solutions will not necessarily work with failover clustering in Longhorn

Work with your vendor to find out

Page 60: Jim Teague Program Manager Microsoft Corporation.

SCSI Command RequirementsStorage must support the following SCSI-3 SPC-3

compliant SCSI Commands:

Unique ID’s

Vital product data (VPD), device identification page (page code 83h) with Identifier Type 2 (EUI-64 based), 3 (NAA), or 8

PERSISTENT RESERVE IN Read Keys (00h)

PERSISTENT RESERVE IN Read Reservation (01h)

PERSISTENT RESERVE OUT Reserve (01h) Scope: LU_SCOPE (0h)

Type: Write Exclusive – Registrants Only (5h)

PERSISTENT RESERVE OUT Release (02h)

PERSISTENT RESERVE OUT Clear (03h)

PERSISTENT RESERVE OUT Preempt (04h)

PERSISTENT RESERVE OUT Register AND Ignore Existing Key (06h)

Page 61: Jim Teague Program Manager Microsoft Corporation.

ClusSvc.exe ClusRes.dllDisk

Resource

RHS.exe

CluAdmin.msc

HBA

Storage enclosure

User

KernelVolume

C:\

Volume

F:\

PartMgr.sys

Disk.sys

ClusDisk.sys Control pathNetFT

Storport

Miniport

Major change is that ClusDisk no longer is in the disk fencing business

MS MPIO Filter

ClusAPI

CPrepSrv

Validate

WMI

New Cluster Architecture

Page 62: Jim Teague Program Manager Microsoft Corporation.

Persistent Reservation Table

Registration Table Reservation Table

Node1_HBA1 Key1 Key1

Node1_HBA2 Key11. Every interface

has an entry in the registration table

2. You must be registered to place a reservation

3. Challenging nodes attempt to register

4. Registrations with unknown keys are periodically scrubbed

Key is known and unique

Anyone who knows the key has access to the disk

Persistent Reservation Table in the external

storage

Node2_HBA1

Node2_HBA2

Key2

Key2

Page 63: Jim Teague Program Manager Microsoft Corporation.

0 1 5432 6 7 111098 12 13 161514

Defender Node

Challenger Node

Registerand ReserveRead

Read and Purge Read

Register andReserve (fails)

Read

PreemptAttempt Fails

Challenge

Successful defense

Timeline in sec’s

Read

Registration Defense ProtocolSuccessful defense

Page 64: Jim Teague Program Manager Microsoft Corporation.

ExistingReserve

Register andReserve (fails)

Preemptand Reserve

Challenger Node

0 1 5432 6 7 111098 12 13 161514

Challenge

Successful Challenge

Timeline in sec’s

Read

Registration Defense Protocol Successful challengeDefender Node

Page 65: Jim Teague Program Manager Microsoft Corporation.

HBA Requirements

All Host Bus Adapters (HBA) must use

a Storport mini-port driverAll multi-path software must be based on MS MPIO

If using a custom DSM it must have a logo

All components in a cluster must have a Designed for Windows logo

Page 66: Jim Teague Program Manager Microsoft Corporation.

Summary/Call To Action

Try out the new cluster experience and send us feedbackTest Storage to ensure compatibility with Persistent ReservationsTest custom resource DLL’s and cluster aware applications for compatibility with new security model

Page 67: Jim Teague Program Manager Microsoft Corporation.

Resources

Feature Overview webcast off all the new Failover Clustering features coming in Windows Server codenamed Longhorn:

http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032271683&Culture=en-US

Page 68: Jim Teague Program Manager Microsoft Corporation.
Page 69: Jim Teague Program Manager Microsoft Corporation.

Appendix

Page 70: Jim Teague Program Manager Microsoft Corporation.

© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 71: Jim Teague Program Manager Microsoft Corporation.