04 Cluster Verification Enhancements

17
pSeries High Availability Software Development © 2005 IBM Corporation Cluster verification enhancements Changes in cluster verification and synchronization June-August 2005 | pSeries HACMP Train The Trainer (T3) Presentations IBM eServer pSeries Verification Enhancements Allows smoother cluster performance and higher reliability Eliminates time-consuming manual editing Reduces maintenance issues Enables faster resolution of configuration inconsistencies Improves ease-of-use and performance capabilities Reduces issues and costs related to planned downtime Lessens possibility of human error that could result in l dd ti

description

04 Cluster Verification Enhancements

Transcript of 04 Cluster Verification Enhancements

Page 1: 04 Cluster Verification Enhancements

pSeries High Availability Software Development

© 2005 IBM Corporation

Cluster verification enhancements

Changes in cluster verification and synchronization

June-August 2005 | pSeries HACMP Train The Trainer (T3) Presentations

IBM eServer pSeries

Verification Enhancements

Allows smoother cluster performance and higher reliability

Eliminates time-consuming manual editing

Reduces maintenance issues

Enables faster resolution of configuration inconsistencies

Improves ease-of-use and performance capabilities

Reduces issues and costs related to planned downtime

Lessens possibility of human error that could result in l d d ti

Page 2: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

Ability to detect and correct problems with cluster configurations automatically at startup

Detects potential SPOF previously only discovered by the Automatic Error Notification (AEN) feature

Performs additional automatic checks that enable the administrator to maintain a consistent cluster configuration

Automatically detects presence of HACMP/XD features and checks that management policies are set correctly

Auto-population of clhosts file eliminates manual editing on each client node

IBM eServer pSeries

Introduction

Automatic verification and synchronization during cluster startup (AV&S)

New functionalityAdditional checks and corrective actions

Page 3: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

Customers try to start the cluster services on an unverified cluster, or an unsynchronized cluster

HACMP is now capable to perform the following actions without requiring user’s involvement:

Verify the cluster configuration

Execute all available auto-corrective actions

Synchronize the cluster configuration

This functionality is available by default and can be disabled on demand via provided SMIT interface

If there are unresolved problems that HACMP cannot correct by itself, the cluster will not start…unless a user demands it anyway

IBM eServer pSeries

Nodes Joining Inactive HACMP Cluster

Verify local Default Configuration Database (DCD) is consistent across all starting nodes

Verification run against local DCD configurationErrors on any of the selected starting nodes will result in cluster services not starting on any node

Nodes that are out of syncSnapshot - taken on nodes where DCD is inconsistentSynchronization – local DCD will be synchronized to nodes that are out of sync

Page 4: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

VerificationActive configuration on already running node used to perform verification

Nodes that are out of sync with active nodeSnapshot taken on nodes where DCD of starting node is inconsistent with Active Configuration

Synchronization of HACMP cluster configuration to out-of-sync node(s)

DARENodes that are DARE’d out of active cluster configuration are not allowed to start cluster services

IBM eServer pSeries

SMIT Starting Cluster Services

smitty clstartSystem Management (C-SPOC)

-> Manage HACMP Services -> Start Cluster Services

Start Cluster Services

Type or select values in entry fields.Press Enter AFTER making all desired changes.

[Entry Fields]* Start now, on system restart or both nowStart Cluster Services on these nodes [rac1n1]BROADCAST message at startup? trueStartup Cluster Information Daemon? trueReacquire after forced down? falseIgnore verification errors? false

Page 5: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

smitty cl_startup_options (new)Extended Configuration -> Extended Cluster Service Settings

Extended Cluster Service SettingsType or select values in entry fields.Press Enter AFTER making all desired changes.

[ Entry Fields]Start HACMP at system restart? falseBROADCAST message at startup? trueStartup Cluster Information Daemon? trueVerify Cluster Prior to Startup? true

IBM eServer pSeries

New Functionality Overview

Automatic Error Notification (AEN) for Single Points of Failure (SPOF) automatic setup

XD solutions verification integration

Deprecation of command-line utilities cldiag and clverify

Auto-population of clhosts.client file for client nodes

Page 6: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

Provided AEN configuration functionality:Reset (add), list, and remove AEN stanzas

Available via AEN for SPOF SMIT screens

Used HA cluster configuration ODMs

Needed to be reset manually after a cluster configuration change

cluster synchronization utilities issued a reset reminder for a user to refresh AEN stanzas

IBM eServer pSeries

AEN for SPOF – Verification Changes

Two reasons to refresh AEN for SPOF ODMs:After cluster a configuration change

If there is an SPOF w/o AEN configured to it

HACMP refreshes AEN ODMs after verification and synchronization

No refresh if none of above cases detected

Refresh occurs quickly and automatically

Extra refresh does not harm

Reminder has been removed

Page 7: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

Currently supported XD solutions:PPRC & SVC PPRCGEOGLVMERCMF…expecting more

Existing verifications for XD solutions:Separate utilitiesSome need to be executed separatelyDifferent usage

Verification categories – topology, resources, both, none

IBM eServer pSeries

XD Verification Integration – Feature ChangesVerification is aware of all supported XD solutions

Separate verification category after topology and resources

For each supported XD solution:Look for installed XD filesetsIf found, verify related HACMP resources are configuredIf no such resource exists, issue a warningOtherwise, execute the related XD verification utilityRedirect output to verification logs

Same design, same reports, same output sources for all

Page 8: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

User visible command-line based utilities

Independent utilities from SMIT verification performed verification at the command line

Most functionality was already available via SMIT interfaces

Cldiag allowed the user to set the clstrmgr.debug output file debug level using old HAS standards:

Old debug level was a scale 0...9

Current clstrmgr debug level settings are different

IBM eServer pSeries

cldiag and clverify – Feature ChangesUsage discontinued

Relocated to /usr/es/sbin/cluster/samplesWhen executed, report that no longer supported

Replaced with scripts reporting the support discontinuation

All verification functionality is available via SMIT

New SMIT screens provided to change clstrmgr.debug log file debug level:

HighStandard (default)

Page 9: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

smit cluster_manager_log_paramProblem Determination Tools

-> HACMP Log Viewing and Management-> Change/Show Cluster Manager Log File Parameters

Change / Show Cluster Manager Log File Parameters

Type or select values in entry fields.Press Enter AFTER making all desired changes.

[Entry Fields]

Debug Level [ Standard/High ]

IBM eServer pSeries

clhosts.client File PopulationPrototype of clhosts file intended for client nodes usage only

New verification check:clhosts.client should have all IP-labels known to HACMPExcluding standby and private networks

Warning message reported during verification if there are missing entries from clhosts.client

If corrective actions are enabled, missing entries will be added into clhosts.client file

/usr/es/sbin/cluster/etc/clhosts client files on cluster nodes are

Page 10: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

RSCT failures sometimes cause by non-HACMP related settings

However, HACMP verifications can detect them

More possibilities for volume groups verifications enhancements have been determined

Common verification errors with obvious corrective actions can be resolved without user involvement

And more…

IBM eServer pSeries

Incompatibilities Between Network and Adapter Type

Compare types for HACMP interface and HACMP network, stored in HACMP ODMs. Report an error if there are inconsistencies

Report an error if HACMP interface type information stored in HACMP ODM is different from the type of associated NIC type stored in CuAt

Page 11: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

The following network options must be the same when cluster services are running:

tcp_pmtu_discoverudp_pmtu_discoveripignoreredirects

If a node is not running HACMP cluster services, it will not be checked during verification. A verification error is reported indicating that a node is out of sync with other HACMP cluster nodes:ERROR: Network option: “tcp_pmtu_discover” has different settings between nodes: “nodeA” and “nodeB”. Please make sure that the command no –o ”tcp_pmtu_discover” provides the same output on all nodes.

A corrective action (if verification is executed with corrective actions enabled) will automatically adjust inactive nodes that are out of sync with running nodes

IBM eServer pSeries

HACMP Network Option Settings Consistency (Cont)

The following network option must have the value of one when cluster services are running:

routerevalidate == 1

If a node is not running HACMP cluster services, it will not be checked during verification. A verification error is reported:WARNING: Network option: “routerevalidate” is set to “0” on node “nodeA”. Please be aware that this setting will be changed to “1” during HACMP startup.

A corrective action (if verification is executed with correctiveactions enabled) will automatically adjust the node

Page 12: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

The following verifications are performed only on inactive nodes to ensure the value of each of these options is not disabled (RSCT will enable these network options on startup)

nonlocsrcrouteipsrcroutesendipsrcrouterecvIpsrcrouteforward

Verification error is reported indicating that a node is out of sync with other HACMP cluster nodes:WARNING: Network option: “nonlcsrcroute” is set to “0” on node “nodeA”. Please be aware that this setting will be changed to “1”during HACMP startup.

A corrective action (if enabled) will automatically adjust network option on the non-running cluster node. Otherwise, RSCT would be expected to do the job at cluster startup.

IBM eServer pSeries

Verify Interface MTU SizeRSCT requires MTU size consistency among all interfaces on the networkVerification will check the interfaces define to HACMP to ensure the MTU size is the same amongst all interfaces. If there is an inconsistency an error will be reported:ERROR: The MTU sizes do not match for communication interface: ip_label_mtu1500 and ip_label_mtu2000. The NIC en1 on node nodeAhas an MTU size of 1500, and the NIC on node nodeB has an MTU size of 2000. To correct this error, make sure that the MTU size is consistent across all NICs on the same HACMP network.

This does not guarantee that RSCT will be able to communicate between nodes. If a device in the network has jumbo packets enabled, or changes the MTU size, RSCT may not properly communicate over that network

Page 13: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

Verification checks the following filesets for level differences where the AIX level is the same but the RSCT fileset level is different:

rsct.basic.hacmp rsct.basic.rte rsct.core.utils rsct.core.sec

A warning message is reported to the user if a mismatch is detected in any of the above filesets

WARNING: The RSCT level is different on nodes: nodeA and nodeB. Both nodes have AIX level 5.3.0.1 installed, and RSCT software is at 2.3.0.0 on node nodeA and 2.2.0.0 on nodeB. To ensure HACMP is working properly it is recommended the same level RSCT software be installed on nodes with the same level of AIX.

IBM eServer pSeries

WAN Software Validation

If a user has configured a SNA/X.25 communication link a new verification check will detect missing SNA/X.25 filesets on participating nodes of the resource group containing an adapter of that type:

SNA/LAN HA Communication Link: sna.rte version 6.1.0.0 or higher

X.25 HA Communication Link: sx25.rte version 2.0.0.0 or higher

SNA/X.25 Communication Link requires both

Page 14: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

Checks nodes in a resource group list only

Flag for concurrent capable is not consistent – error

PVID list is not identical – error

Corrective actionUse the volume group with the latest timestamp

Disk availability checks - warning:cl_querypv command

Not available during cluster runtime

IBM eServer pSeries

Site Policy Validation

Resource group site policy other than ignore:Requires sites to be defined in the HACMP topology

XD software must be installed

Verification check for RG with defined site policy:Check for sites defined – otherwise report an error

Check for XD software installed – otherwise report a warning

Protecting the error condition:When removing all sites, change all RG site policies to ignore

Page 15: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

Cross-site mirroring expects forced varyon flag set to trueA warning will be displayed during verification if forced varyon is not set to true

IBM eServer pSeries

New Corrective Actions for Existing Verification ChecksRSCT instance number verification check

If out of sync – synchronize entire cluster configuration via synchronization prior to starting services

Boot Time IP-AddressesOnly for alias service interfaces, IPAT via replacement carries routing information that will not move, clcomd will reconnect when changing interface used by clcomd

Disable File Systems Auto-Mount/etc/filesystems remove automatic from mount option for configured file

Page 16: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

( )Shared Volume Group With Auto-varyon

Verification checks any HACMP cluster nodes containing volume groups that were defined as:

– Shared Volume Groups directly Configured in a Resource Group– Own a File System define in a Resource Group

Verification will report an error ERROR: Volume Group: VG1 used in resource group RG1 has automatic varyon attribute configured to “yes” on node waltham. This parameter needs to be changed to “no” in order for HACMP to function

and if enabled run a corrective action to disable the auto vary on flag. This requires that the volume group can be brought online

IBM eServer pSeries

Log Files, Troubleshooting

/var/hacmp/log/autoverify.log – AV&S uses this log file to log trace execution information (set –x) output from the script that runs prior to cluster startup

VERBOSE_LOGGING=high – turns debugging on in various components including pre-post scripts for cluster startup

/var/hacmp/clverify/* directory contains HACMP verification log files

Page 17: 04 Cluster Verification Enhancements

© 2005 IBM CorporationHACMP 5.3 Train The Trainer: More HACMP Verification Functionality

Automatic cluster verification and synchronization option is provided. Easy tunable by a user via SMIT interface

New verification functionality has been added:AEN for SPOF refreshXD external verifications have been consistently integratedCommand-based cldiag and clverify utilities have been excluded with all functionality preserved via SMITAutomatic creation for clhosts.client file to be used as the prototype of clhosts file for client node

New verifications, both with and w/o corrective actions

New corrective actions for existing verifications