Jim Teague Program Manager Microsoft Corporation.
-
Upload
marylou-stella-poole -
Category
Documents
-
view
223 -
download
1
Transcript of Jim Teague Program Manager Microsoft Corporation.
The 10 Most Important Things About Failover Clustering
Jim TeagueProgram ManagerMicrosoft Corporation
AgendaFailover Clustering in Windows Server Longhorn is different. Here are 10 big changes you need to know about:
1. Cluster Validation
2. Revamped Setup
3. New Cluster Experience
4. Networking Enhancements
5. Getting to Longhorn
6. New Security Model
7. New Quorum Model
8. Geographically Dispersed Cluster Enhancements
9. Shared Storage Topologies
10. Storage Compatibility
Terminology Changes
BetaWolfpack
Windows NT 4.0Microsoft Cluster Service (MSCS)
Windows 2000 Server / Windows Server 2003Server Clustering
Windows codenamed Longhorn ServerFailover Clustering (WSFC)
Where Is Clustering Going…
What’s Clustering in Longhorn all about?
Simplicity, Security, StabilityClusters for people without PhD’s
Easy to create, use, and manage
Enabling the IT Generalist
Reduce Clustering Total Cost of Ownership
Making Clusters a smart business choice for the enterprise
Improvements in Security, Networking, Eventing, and Storage
Key Cluster Changes
1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
Motivation For Validate
Configuration Issues
Cabling mistakes
SP and Hotfix binaries
Driver mismatches
Inconsistent Settings
Complexity
Best Practices
Supportability Requirements
Hardware Compatibility
If we can eliminate the configuration issues up front, we can ensure a better
cluster experience (installation and operation)
48% of Cluster support calls are due to configuration problems
-Microsoft PSS
80% of failures are due to human error
-Gartner
What Is Cluster Validate?Runs a focused set of tests on a collection of servers that are intended to be a cluster
Catch hardware or configuration problems before the cluster goes in production
Ensures that the solution you are about to deploy is rock solid
Run validate each and every time you install a new cluster
It’s the very first thing you do!
Validate can also be run on configured clusters as a diagnostic tool
Disk Resources need to be in an Offline state to be validated
What Does Validate Inventory?OS BinaryConsistenc
y
Architecture
Configuration
Correct and Same OS version
Same Hotfix and Service Pack level
Same CPU architecture
Consistent Domain membership
Analysis of unsigned driversStorage HBA’s and Networking NIC’s
Devices
What Does Validate Verify?
Infrastructure
Storage
Network
Inter-cluster communicationShared disks accessible from all machines and uniquely identifiable
Shared Storage Persistent Reservation functionality and complianceDisk I/O latencies
Each NIC has different IP addressNetwork I/O latencies
Failover simulationFunctionality
Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
Create an entire cluster in one step
Setup is streamlined and simplified
Intuitive
All the power of a full cluster test suite in your hands to ensure that the actual cluster you are setting up will provide rock solid stability
Catch configuration issues
Fully scriptable for automated deployments
New Create Cluster API allows fullycustomizable experience
Simple
Validation
Deployable
Easy To Create Clusters
Create The ClusterWindows Server 2003 (today)
Create The Cluster (3, 4)
Create The Cluster (5, 6)
Create the Cluster (7, 8)
Enter Cluster Services Account…
Create The Cluster (9, 10)
Create The Cluster (11, 12)
Add Second Node (13, 14)
Add Second Node (15, 16)
Add Second Node (17, 18)
Add Second Node (19, 20)
Add Second Node (21, 22)
Add Second Node (23)
Finally!
The Validate tool is the 1st menu step
Setup is simplified and Intuitive
Create a 16 node cluster in one step
Simple
Validation
Fast
LH Easy To Create Clusters
Cluster Setup
Cluster Setup
Cluster SetupIt's that easy!
Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
New User Experience
All New Cluster Management Tool!!Designed to be task-based and easy to useFewer dials-n-knobs to worry about
What’s all this IsAlive/LooksAlive stuff I don’t care about? Just make my cluster work!
Tell us what you want to do and we’ll take care of the rest
I would like to make this File Share Highly Available…
Cluster Administrator Tool Today…
New Cluster MMC Snap-In
Command line (cluster.exe)
Cluster Management
ConsoleFully
Scriptable with WMI
Richer Tool Experience
Exposes Advanced
Options
Task Oriented
Phasing out MSClus
ClusterMOM Management Pack
Manageability
Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
Networking Enhancements
Integrated with new Longhorn TCP/IP StackFull IPv6 Support
Native IPv6 support for client access, native and tunnels Inter-node communication with IPv6
DHCP Support for IPv4 ResourcesObtain cluster IP address from a DHCP serverRelieves management pain of static IP’s
Networking Enhancements
No more legacy dependencies on NetBIOS
Ready for NetBIOS-less environmentsSimplifying the transport of SMB trafficRemoving WINS and NetBIOS name resolution broadcastsStandardizing name resolution on DNS
Moved from datagram RPC protocols to more secure TCP session oriented protocolsImprovements in IPSec to allow almost instantaneous failover for clients
Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
Cluster MigrationsCluster Migration Tool
Will assist migration of a cluster configurationfrom one cluster to anotherCopies resources and cluster configurations from one cluster to another
No Mixed Version CompatibilityLH node and Win2003 node can not be in the same cluster at the same time
No rolling upgrades
Migrating To LonghornStep 1
All nodes running Windows Server 2003
Group owned by Node 1 hosting a IP Address, Network Name, Physical Disk, and File Share resources
SAN
Node 1 Node 2
Windows Server 2003
Single Cluster
IPNameDisk
File Share
Migrating To LonghornStep 2
Evict Node 2 from the Windows 2003 cluster
Perform clean install of Longhorn on Node 2
Create an independent single node cluster with a new Cluster Name
SAN
Node 1 Node 2
Longhorn2003
Two separate Clusters
IPNameDisk
File Share
Migrating To LonghornStep 3
Run the Longhorn Migration Wizard on Node 2
Designate Node 2 as target and Node 1 as source
Perform a group by group copy of resources
Groups are created in an Offline state
SAN
Node 1 Node 2
ResourcesOnline
IPNameDisk
File Share
IPNameDisk
File ShareResources
Offline
Migrating To LonghornStep 4
Longhorn Cluster is pre-staged and ready for migration
Bring group Offline on Node 1
Bring group Online on Node 2
SAN
Node 1 Node 2
ResourcesOffline
IPNameDisk
File Share
IPNameDisk
File Share
ResourcesOnline
Migrating To LonghornStep 5
Install Longhorn on Node 1
Join Node 1 to the existing Longhorn cluster with Node 2
Resources can now be failed back and forth
Migration is now complete!
SAN
Node 1 Node 2
IPNameDisk
File ShareResources
Online
Single Cluster
Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
Service Manageability
Improved Security ModelCluster Service now runs in the context of the LocalSystem built-in account
No more Cluster Service Account (CSA)
No more account password management
No need to pre-stage defined user accounts
More resilient to configuration issuesAddresses supportability issues where privileges are accidentally stripped by group policies
Increased security
New Security ContextHow does this impact you?
Cluster Service starts with set privileges
Resource Hosting Subsystem launched in the same context with the same privileges
Resource DLL’s and Applications are launched in the same context of RHS with the same set of privileges
No common identity
In short, any custom resource DLL or applications leveraging the Generic Application or Generic Script resource types will have reduced privileges and no remote-ability
You are responsible for handling the credentials your applications require
Test your apps and resources with Windows Server Longhorn!
Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
New Quorum Model
Majority based cluster membershipWho and what gets a vote is fully configurable
Eliminating failure pointsOriginal design assumed that storage would be always available
New best-of-both-worlds quorum model
Hybrid of legacy Majority Node Set (MNS) logic and Shared Disk Quorum modelThis model will replace both existing models
No single point of failure!Can survive loss of the Quorum disk
Majority Quorum ModelNew majority based quorum model
Majority of Nodes based quorum
Disk is optional witness to have a vote in deciding majority
3 total votes, with 2 needed for majority
So the Cluster can survive the loss of any 1 vote
SAN
Node 1 Node 2
Shared Storage Device gets 1
vote
VoteVote
Vote
Each node counts as 1
vote
Majority Of Nodes
Only Nodes get votes3+ Node votes without Shared Storage voteMajority of votes needed to operate clusterNo shared disk vote
Node 3Node 1 Node 2
Replicated Storage Devices
Vote
VoteVote
Witness DiskOnly Disk gets a vote
Nodes have no votes
Quorum disk is the master
Cluster stays up even if only 1 node can talk to the disk
Achieves same behavior as legacy quorum model
SAN
Node 1 Node 2
Shared Storage Device is master
Vote
File Share WitnessFile Share Witness allows a 2-node cluster with no shared disk
Majority of Nodes + Witness based quorum
Excellent solution for GeoClusters
Witness could reside at a 3rd Site
Single file server could serve as the Witness for multiple clusters
Node 1 Node 2
File Share on an independent
server
VoteVote
Witness
Each node counts as 1
vote
Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
Stretching Clusters
But businesses are now demanding more!
Stretching Nodes across the river used to be good enough…
Geographically Dispersed Clusters
No More Single-Subnet LimitationAllow cluster nodes to communicate across network routers
No more having to connect nodes with VLANs!
Configurable Heartbeat TimeoutsIncrease to Extend Geographically Dispersed Clusters over greater distances
Decrease to detect failures faster and take recovery actions for quicker failover
IP Address Resource A
IP Address Resource B
Enhanced DependenciesNew Dependency Filter Objects
Network Name resource stays up if either IP Address resource A or B are up
Today both resource A and B have to be online for the Network Name to be available to users
Allows redundant resources and scoping impact to dependent services and applications
OR OR
Network Name Resource
Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
Shared Storage Topology Requirements
Only storage that supports Persistent Reservations will be supported in Longhorn Failover Clustering
Deprecating parallel-SCSI supportSerial Attached SCSI (SAS) based clusters will replace parallel-SCSI
Fibre Channel iSCSI SAS
Supported Shared Bus Types
Key Cluster Changes1. Cluster Validation2. Revamped Setup3. New Cluster Experience4. Networking Enhancements5. Getting to Longhorn6. New Security Model7. New Quorum Model8. Geographically Dispersed Cluster
Enhancements9. Shared Storage Topologies 10. Storage Compatibility
Storage Enhancements
Improved disk fencing for shared disksEnhanced mechanism to use Persistent Reservations
New algorithm for managing shared disks
No more device resets with PR’s!No longer uses SCSI Bus Resets which can be disruptive on a SAN
Disks are never left in an unprotected state
Tight integration into core OS disk management
Support for GPT disks
Windows Server Longhorn Will Be A Clean Slate
CompatibilitySome hardware may not be upgradeable
Can not assume solutions that previously worked with clustering will continue to work in Longhorn Clustering
SupportabilityThere will be no grandfathering of support for currently qualified solutions listed on the Windows Server Catalog
Windows Server 2003 clustering solutions will not necessarily work with failover clustering in Longhorn
Work with your vendor to find out
SCSI Command RequirementsStorage must support the following SCSI-3 SPC-3
compliant SCSI Commands:
Unique ID’s
Vital product data (VPD), device identification page (page code 83h) with Identifier Type 2 (EUI-64 based), 3 (NAA), or 8
PERSISTENT RESERVE IN Read Keys (00h)
PERSISTENT RESERVE IN Read Reservation (01h)
PERSISTENT RESERVE OUT Reserve (01h) Scope: LU_SCOPE (0h)
Type: Write Exclusive – Registrants Only (5h)
PERSISTENT RESERVE OUT Release (02h)
PERSISTENT RESERVE OUT Clear (03h)
PERSISTENT RESERVE OUT Preempt (04h)
PERSISTENT RESERVE OUT Register AND Ignore Existing Key (06h)
ClusSvc.exe ClusRes.dllDisk
Resource
RHS.exe
CluAdmin.msc
HBA
Storage enclosure
User
KernelVolume
C:\
Volume
F:\
PartMgr.sys
Disk.sys
ClusDisk.sys Control pathNetFT
Storport
Miniport
Major change is that ClusDisk no longer is in the disk fencing business
MS MPIO Filter
ClusAPI
CPrepSrv
Validate
WMI
New Cluster Architecture
Persistent Reservation Table
Registration Table Reservation Table
Node1_HBA1 Key1 Key1
Node1_HBA2 Key11. Every interface
has an entry in the registration table
2. You must be registered to place a reservation
3. Challenging nodes attempt to register
4. Registrations with unknown keys are periodically scrubbed
Key is known and unique
Anyone who knows the key has access to the disk
Persistent Reservation Table in the external
storage
Node2_HBA1
Node2_HBA2
Key2
Key2
0 1 5432 6 7 111098 12 13 161514
Defender Node
Challenger Node
Registerand ReserveRead
Read and Purge Read
Register andReserve (fails)
Read
PreemptAttempt Fails
Challenge
Successful defense
Timeline in sec’s
Read
Registration Defense ProtocolSuccessful defense
ExistingReserve
Register andReserve (fails)
Preemptand Reserve
Challenger Node
0 1 5432 6 7 111098 12 13 161514
Challenge
Successful Challenge
Timeline in sec’s
Read
Registration Defense Protocol Successful challengeDefender Node
HBA Requirements
All Host Bus Adapters (HBA) must use
a Storport mini-port driverAll multi-path software must be based on MS MPIO
If using a custom DSM it must have a logo
All components in a cluster must have a Designed for Windows logo
Summary/Call To Action
Try out the new cluster experience and send us feedbackTest Storage to ensure compatibility with Persistent ReservationsTest custom resource DLL’s and cluster aware applications for compatibility with new security model
Resources
Feature Overview webcast off all the new Failover Clustering features coming in Windows Server codenamed Longhorn:
http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032271683&Culture=en-US
Appendix
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.