HACMP Administration Tasks

53
© 2010 IBM Corporation Michael Herrera PowerHA SystemMirror (HACMP) for AIX ATS Certified IT Specialist [email protected] PowerHA SystemMirror Common Tasks for HA Administrators Session ID: 41CO

description

powerHA

Transcript of HACMP Administration Tasks

Page 1: HACMP Administration Tasks

© 2010 IBM Corporation

Michael Herrera

PowerHA SystemMirror (HACMP) for AIX

ATS Certified IT Specialist

[email protected]

PowerHA SystemMirror Common Tasks for HA Administrators

Session ID: 41CO

Page 2: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

2

Agenda

� Management:– Start & Stop of cluster services

– Moving Resources

– Saving off Configuration

� Maintenance

– Upgrading AIX & Cluster Software

– CSPOC - LVM Changes

– Adding / Removing Physical volumes

– Network Changes (dynamic)

– Setting up Pager Notification

– Deploying File Collections

– Custom Cluster Verification Methods

– Practical use of UDE events

– Online Planning Worksheets

� Configuration Optimization

– Hostname Changes

– Naming requirements in V7.x

– Auto start or not of cluster

services

– Dynamic Node Priority

– Application Monitoring

– DLPAR Integration

– Resource Group Dependencies

� Tunables

– Cluster Security

– Failure Detection Rate (FDR)

– Adding Users

– Password Changes

� Common Commands (CLI)

– clmgr, lscluster

Page 3: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

3

Attention:

Be aware that HA 7.1.1 SP2 or SP3 does not get reported back properly. The halevel command

probes with the wrong option and since the “server.rte” fileset is not updated it will not catch the

updates to the cluster.cspoc.rte filesets.

How do you check what version of code you are running ?

� Historically we have run:# lslpp –l cluster.es.server.rte

Fileset Level State Description

----------------------------------------------------------------------------

Path: /usr/lib/objrepos

cluster.es.server.rte 7.1.1.1 COMMITTED Base Server Runtime

Path: /etc/objrepos

cluster.es.server.rte 7.1.1.1 COMMITTED Base Server Runtime

� Now you can also run:

# halevel –s

7.1.1 SP1 � even though machine may be running SP2

� Also useful:

# lssrc –ls clstrmgrES | grep fixcluster fix level is "3"

Page 4: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

4

Upgrade Considerations

Operating System:� Should you do AIX first or HA code?

– Should you combine the upgrade– New OS requirements for HA – What is your back-out plan?

• Alternate disk install• Mksysb

� BOS updates will typically require a reboot (hence a disruption)

Cluster Software Code:

� What type of Migration

– Snapshot Migration

– Rolling Migration

– Non-Disruptive Update

� Evaluate source to target level

– Can you perform a NDU update?

– New minimum OS requirements

– New required settings• IP Multicasting, Hostname restrictions

• Required topology changes

There are two main areas that you need to consider – OS & HA Software� Change Controls: what is your ability to apply and test the updates ?

� Consider things like Interim Fixes locking down the system

– Will they need to be reapplied?

– Will they need to be rebuilt?

Page 5: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

5

AIX Upgrade Flow in Clustered Environment

Active Production Environment –

- Operating System @ AIX 7.1.0.0

- Stop with Takeover

- OS update TL1 & SPs

- Reboot

- Reintegrate into cluster with AIX 7.1.1.5

Standby System running New Level

Hypothetical Example – 2 Node Cluster running AIX 7.1

Starting Point – Standby SystemOperating System @ AIX 7.1.0.0

- Stop Cluster Services- OS update TL1 & SPs- Reboot

Reintegrate into cluster with AIX 7.1.1.5

- Acquire Resource Group / Application

- Issue rg_move back or continue to run on the standby System

You can start the upgrade on either node

but obviously an update to the node

hosting the application would cause a

disruption to operations

Common Question: Can the cluster run with the nodes running different levels?

Page 6: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

6

Flow of PowerHA Software Upgrade

Active Production Environment –- HA Version 5.5

- UNMANAGE resources- Application is still running

- smit update_all- HA Level & Patches- Be mindful of new base filesets

- smit clstart- Start scripts will get reinvoked

Node Running at New 6.1 version- Application still active

Hypothetical Example – 2 Node Cluster HA version 5.5 to 6.1

Starting Point – Standby SystemHA Version @ HA Version 5.5

- UNMANAGE resources

- smit update_all

- smit clstart

Node Running Version 6.1

We advise against stopping the

cluster with the UNMANAGE option

on more than one node at a time.

Note that it can be done but there

are various factors to consider

Common Question: How long can the cluster run in a mixed mode ? What operations are supported ?

Page 7: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

7

Client Scenario – Database Binary Upgrade

Scenario:- Client had an environment running independent Oracle databases in a mutual takeover cluster

configuration. They wanted to update the Oracle binaries one node at a time and wanted to avoid an unexpected fallover during the process. They wished to UNMANAGE cluster resources on all nodes at the same time.

Lessons Learned:� Do not do an upgrade of the cluster filesets while unmanaged on all nodes

– This would recycle the clstrmgrES daemon and the cluster would lose its internal state

� Application monitors are not suspended when you UNMANAGE the resources– If you manually stop the application and forget about the monitors existing application

monitors could auto-restart it or initiate a takeover depending on your configuration

� Application Start scripts will get invoked again on restart of cluster services– Be aware of what happens when you invoke your start script while already running, or

comment out the scripts prior to restarting cluster services

� Leave the Manage Resources attribute set to “Automatic”– Otherwise it will continue to show the RG as UNMANAGED until you do an RG move

ONLINE

Page 8: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

8

PowerHA SystemMirror: Cluster Startup Behavior

� What is the “Best Practice” ?

Cluster Services are

set to automatically

start up on boot up

All currently supported

releases perform a

cluster verification on

start up and will validate

whether the node can

enter the cluster

Page 9: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

9

PowerHA SystemMirror - Cluster Start up Behavior

� The cluster manager daemon is now running all of the time# clshowsrv -v

Status of the RSCT subsystems used by HACMP:

Subsystem Group PID Status

cthags cthags 4980948 active

ctrmc rsct 4063376 active

Status of the HACMP subsystems:

Subsystem Group PID Status

clstrmgrES cluster 4915234 active

clcomd caa 6422738 active

Status of the optional HACMP subsystems:

Subsystem Group PID Status

clinfoES cluster 8847544 active

� Settings can be altered within the cluster panels:

# lssrc -ls clstrmgrES | grep state

Current state: ST_STABLE

- Default Start up behavior is false

- Verify Cluster should be left to true

Page 10: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

10

So how do you start up Cluster Services ?

� smitty sysmirror � System Management � PowerHA SystemMirror Services � Start / Stop

� smitty clstart (FastPath)

� clmgr start cluster– clmgr online node nodeA

– clmgr start node nodeA

� IBM Systems Director Plug-In

Page 11: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

11

PowerHA SystemMirror: Cluster Stop Options

� What is the purpose of each option ?

• You cannot Non-Disruptively upgrade from pre-version 7.X to newer releases

• The upgrade from 7.1.0 to 7.1.1 is also disruptive

For non-disruptive updates

stop services on only one

node at a time to allow for

one node to retain the status

of the cluster resources

Page 12: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

12

UNMANAGE Resource Group Feature in PowerHA

� Function used for Non-Disruptive Updates (one node at a time)

– Previously known as the Forced Stop

� HA Daemons will continue to run but resources will not be monitored

Application

Monitors will

continue to run.

Depending on the

implementation it

might be wise to

suspend monitors

prior to this

operation

Page 13: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

13

Moving Resources between Nodes

� clRGmove –g <RGname> –n <nodename> -m

� clmgr move rg <RGname> node=<nodename>

� IBM Systems Director Plug-In

� smitty cl_admin

If multiple RGs are selected

the operation and resources

will be processed

sequentially

Page 14: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

14

Types of Available RG Dependencies

� Parent Child Dependencies � Made Available in V5.2

� Location Dependencies � Made Available in V5.3

– Online on Same Node

– Online on Different Nodes

– Online on Same Site

� Start After & Stop After � Made Available in V7.1.1

Most of this is old news, but the use of dependencies can affect where and how

the resources get acquired. More importantly it can affect the steps required to

move resource groups and more familiarity with the configuration is required

Page 15: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

15

Moving Resource Groups with Dependencies

� Invoked –– clRGmove –g <RGname> –n <nodename> -m

Page 16: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

16

Automatic Corrections on Verify & Sync

There are Verify & Sync options in the first two

paths, however, note that they do not include

the Auto-Corrective option. You need to follow

the Custom Cluster Configuration Path for that.

The custom path will allow to make corrective

actions only if ALL cluster nodes are not running

cluster services. By default it will not perform

any corrective actions.

Page 17: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

17

Automatic Nightly Cluster Verification

� By Default the cluster will run a nightly Verification check at midnight

� The clutils.log file should show the results of the nightly check

Be aware of the

clcomd changes for

version 7 clusters

Page 18: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

18

Cluster Custom Verification Methods

Note: Automatic verify & sync on node start up does not include any custom verification methods

� Cluster Verification is made up of a bunch of data collectors

� Checks will return PASSED or FAILED– Will often provide more details than what is reported in the smit.log output

� Custom Verification Methods may be defined to run during the Verify / Sync operations

Page 19: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

19

Adding Custom Verification Methods

Problem Determination Tools > PowerHA SystemMirror Verification > Configure Custom Verification Method� Add a Custom Verification Method and press Enter

Output in smit.log and clverify.log files:Currently Loaded Interim Fixes:

NODE mutiny.dfw.ibm.comPACKAGE INSTALLER LABEL======================================================== =========== ==========bos.rte.security installp passwdLock

NODE munited.dfw.ibm.comPACKAGE INSTALLER LABEL======================================================== =========== ==========bos.rte.security installp passwdLock

Please Ensure that they are consistent between the nodes!

Page 20: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

20

Custom Verification Methods

� Custom methods should be in a common path between the cluster members– ie. /usr/local/hascripts/custom_ver_check.sh

� The Methods are stored in the cluster ODM stanzas

� Script Logic & Return Codes– How fancy do you want to get

#!/bin/ksh

echo "Currently Loaded Interim Fixes:"

clcmd emgr -P

echo "Please Ensure that they are consistent between the nodes!"

Page 21: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

21

PowerHA SystemMirror: Cluster Snapshots

� /usr/es/sbin/cluster/snapshots/ <snapshotname>.info<snapshotname>.odm

Cluster Configuration

HACMPcluster

...infoT

HACMPnodeTinfoT

HACMPadapterTinfoT.

Cluster Report & CLI output

<html tags>cllsnodeT..

cllscfT..

cllsifT..

Snapshot C .odm

cluster

ODM stanzas

Snapshot files:

Snapshot C .info

cluster reportSnapshot B .odm

cluster

ODM stanzasSnapshot A .odm

cluster

ODM stanzas

Snapshot B .info

cluster reportSnapshot A .info

cluster report

� Snapshots are saved

off automatically any

time a Verify / Sync

operation is invoked

� The .info file is not

necessary in order to

able to restore the

configuration

� The snapshot menu will

ask for a <name> and a

<description> as the

only required fields

� The snapshot upgrade

migration path requires

the entire cluster to be

down

Page 22: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

22

PowerHA SystemMirror: Changing the Hostname

� CAA does not currently support changing a system’s hostname– Basically means do not attempt to do this in a Version 7.X cluster

* This is restriction currently under evaluation by the CAA development team and may

be lifted in a future update

# lscluster output

TT.

UUID as well

Application

Controller

start.sh

stop.sh

#!/bin/ksh

set new Hostname �

#!/bin/ksh

unset hostname�

Service IP

Volume Group

/filesystems

Inet0 - hostname Inet0 - hostname

The same is true for

the cluster repository

disk. The UUID is

stored hence you

should not attempt to

replicate the volume

or create an mirrors

to the volume

caa_private volume

group.

Only the service IP should

be swapping between nodes

Page 23: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

23

Naming requirements in V7 clusters

� The COMMUNICATION_PATH has to resolve to the hostname IP– In prior releases the CP could be any path to the node

� Node name can be different than the hostname

� The use of a “-” is not supported in the node name– We had clients further highlight this limitation by using clmgr to create the

cluster. If a node name is not specified and the hostname has a “-” the default node name assigned will also try to use a “-”

– ksh restrictions were removed to allow the use of a “-” in service IP labels so both V6.1 and V7.X support their use in the name

Page 24: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

24

Changes to Node outbound traffic

There were changes made to AIX & PowerHA alias processing

Cluster running HA V6.1 SP7

with AIX 6.1 TL2

• Service IP Alias is listed after

persistent & base address

Cluster running HA V7.1 SP3

with AIX 7.1 TL1 SP4

• Service IP Alias is automatically

listed before the base address.

Note that no persistent IP is

configured in this environment

Page 25: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

25

Number of Resources & Fallover Times

Common Questions: – Will the number of disks or volume groups affect my fallover time?

– Should I configure less larger luns or more smaller luns?

Versions 6.1 and earlier allowed Standard VGs or Enhanced Concurrent VGs– Version 7.X require the use of ECM volume groups

Your Answers:

� Standard VGs would require an openx call against each physical volume– Processing could take several seconds to minutes depending on the number of LUNs

� ECM VGs are varied on all nodes (ACTIVE / PASSIVE)– It takes seconds per VG

� Parallel processing will attempt to varyon on all VGs in parallel

Page 26: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

26

Number of Resource Groups

RG1 (NodeA, NodeB)

Service IP

VG1

APP Server 1

NODE A

RG3 (NodeA, NodeB)

Service IP

VG3

APP Server 3

RG2 (NodeB, NodeA)

Service IP

VG2

APP Server 2

NODE B

RG4 (NodeB, NodeA)

Service IP

VG4

APP Server 4

� RG Decisions beyond: Startup Fallover & Fallback behavior

Best Practice:

Always try to keep it simple, but stay current with new features and take advantage

of existing functionality to avoid added manual customization.

Further Options

� 1 RG vs. Multiple RGs– Selective Fallover behavior (VG / IP)

� RG Processing – Parallel vs. Sequential

� Delayed Fallback Timer– When do you want to fail back

� RG Dependencies– Parent / Child, Location

– Start After / Stop After

Page 27: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

27

Filesystem Definitions in a Resource Group

� Should you explicitly define the filesystems in a Resource Group?

� PowerHA default behavior is to mount ALL

� Reasons to explicitly define:– Nested Filesystems

– Only mount Filesystems specified

� Scenario:– 10 Filesystems in volume group & only 1 defined in RG

• HA processing will only mount the one FS

What are the implications going

forward if you add new Filesystems

via CSPOC and forget to append

them to the resource group

definition?

Page 28: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

28

Event Processing of resources

Invoked during Parallel Processing:� acquire_svc_addr� acquire_takeover_addr� node_down� node_up� release_svc_addr� release_takeover_addr� start_server� stop_server

Not invoked:

� get_disk_vg_fs

� node_down_local

� node_down_remote

� node_down_local_complete

� node_down_remote_complete

� node_up_ local

� node_up_remote

� node_up_local_complete

� node_up_remote_complete

� release_vg_fs

� Resource Groups are processed in Parallel unless you implement RG dependencies or

set a customized serial processing order (HA 4.5 +)

� The new process_resources event script is organized around job types: ACQUIRE,

RELEASE, ONLINE, OFFLINE, DISKS, TAKEOVER_LABELS, APPLICATIONS and more

i.e. JOB_TYPE = VGS

* Be mindful of this with the implementation of Pre/Post Events

Page 29: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

29

Defining Pre / Post Events

� Pre/Post-Event Commands are NOT the same thing as User Defined Events

A custom Event will never

get invoked unless you

explicitly define it as a Pre or

Post event command to an

existing Cluster Event

Page 30: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

30

User Defined Events - UDE

� This option allows you to exploit RMC resource monitors to trigger EVENTs

� Familiarize yourself with the “lsrsrc” command– A Practical Guide for Resource Monitoring and Control - SG24-6615

# odmget HACMPude

HACMPude:

name = "Herrera_UDE_event"

state = 0

recovery_prog_path =

"/usr/local/hascripts/Herrera_UDE“

recovery_type = 2

recovery_level = 0

res_var_name = "IBM.FileSystem"

instance_vector = "Name = \"/\""

predicate = "PercentTotUsed > 95"

rearm_predicate = "PercentTotUsed < 70"

Notes:

� Recycle cluster services after updating UDE events

� Scripts must exist on all cluster nodes: (Path, permissions)

� Logic in recovery program can be configured to send

notification, append more space, etcT

� Can specify multiple values in Selection String field

� Actions logged in clstrmgr.debug and hacmp.out files

Page 31: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

31

PowerHA SystemMirror: File Collections

� Configuration_Files– /etc/hosts– /etc/services– /etc/snmpd.conf– /etc/snmpdv3.conf– /etc/rc.net– /etc/inetd.conf– /usr/es/sbin/cluster/netmon.cf– /usr/es/sbin/cluster/etc/clhosts– /usr/es/sbin/cluster/etc/rhosts– /usr/es/sbin/cluster/etc/clinfo.rc

� Introduced in HA 5.2– Ability to automatically push files every 10 min from source node specified

– Default collections created but not enabled by default

� SystemMirror_Files– Pre, Post & Notification– Start & Stop scripts– Scripts specified in monitors– Custom pager text messages– SNA scripts– Scripts for tape support– Custom snapshot methods– User defined events

� Not intended to maintain users & passwords between cluster nodes

Page 32: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

32

File Collections Application script Scenario

If set to yes files

will be propagated

every 10 minutes

# smitty sysmirror � System Management � File Collections

#!/bin/ksh

Application Start Logic

RED Updates#!/bin/ksh

Application Stop Logic

RED Updates

/usr/local/hascripts/app*

#!/bin/ksh

Application Start Logic

BLUE Logic#!/bin/ksh

Application Stop Logic

Blue Logic

/usr/local/hascripts/app*

Node A Node B

Page 33: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

33

PowerHA SystemMirror - User & Group Administration

� Can select – Local (files)

– LDAP

� Select Nodes by

Resource Group– No selection

means all nodes

# smitty sysmirror � System Management � Security and Users

� Users will be

propagated to all of

the cluster nodes

applicable

� Password command

can be altered to

ensure consistency

across al nodes

Page 34: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

34

PowerHA SystemMirror - User Passwords (clpasswd)

� Optional List of

Users whose

passwords will be

propagated to all

cluster nodes– passwd

command is

aliased to

clpasswd

� Functionality

available since

HACMP 5.2 (Fall 2004)

# smitty sysmirror � System Management � Security and Users � Passwords in a PowerHA SystemMirror cluster

Page 35: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

35

Repository Disk Failure

Page 36: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

36

Pager Notification Events

� As long as sendmail is enabled you can easily receive EVENT notification

smitty sysmirror � Custom Cluster Configuration � Events � Cluster Events

� Remote Notification Methods � Add a Custom Remote Notification Method

Sample Email:

From: root 10/23/2012 Subject: HACMPNode mhoracle1: Event acquire_takeover_addr occurred at Tue Oct 23 16:29:36 2012, object =

Page 37: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

37

Pager Notification Methods

HACMPpager:methodname = "Herrera_notify"desc = “Lab Systems Pager Event"nodename = "connor kaitlyn"dialnum = "[email protected]"filename = "/usr/es/sbin/cluster/samples/pager/sample.txt"eventname = "acquire_takeover_addr config_too_longevent_error node_down_complete node_up_complete"retrycnt = 3timeout = 45

# cat /usr/es/sbin/cluster/samples/pager/sample.txtNode %n: Event %e occurred at %d, object = %o

Sample Email:

From: root 09/01/2009 Subject: HACMPNode kaitlyn: Event acquire_takeover_addr occurred at Tue Sep 1 16:29:36 2009, object =

Attention:

Sendmail must be working and accessible via the firewall to receive notifications

� Action Taken: Halted Node Connor

Page 38: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

38

Online Planning Worksheets Discontinued in Version 7

� The fileset is still there, but the content is no longer there

There is a push to

leverage IBM Systems

Director which will guide

you through the step by

step configuration of the

cluster

Page 39: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

39

PowerHA SystemMirror – Deadman Switch (CAA)

� CAA DMS tunable (deadman_mode) allows two different actions– Assert (crash) the system (default behavior)

– Generate AHAFS event

TT..............

� Version 7 cluster software changes the old behavior

Recent Client Failure Scenario

- Repository Disk LUN had been

locked and had not been

responsive for days. Client was

unaware and standby node had a

problem. Primary system was

brought down when it was unable

to write to repository disk

Page 40: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

40

LVM Dynamic Updates

� The cluster is easy to set up, but what about changes going forward

� ECM Volume Groups (required at HA V7)– New lvs will get pushed across, filesystems will not

• LV updates get pushed across but do not update the /etc/filesystems.

• Lazy Update would resolve this issue

– ECM Limitations lifted for:

• reorgvg & chvg -g size changes

� Cluster Import Option– Correcting out of sync timestamps � auto-corrections or import

� Built-In Lazy Update

Page 41: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

41

CSPOC allows for a multitude of DARE operations� The Cluster Single Point of Control options facilitate dynamic operations

Follow these panels to

dynamically add or remove

resources from the cluster or

perform resource group

movements between nodes

There are CSPOC specific

logs in the HA cluster that will

provide details in the event of

a problem

# smitty cl_admin

Page 42: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

42

CSPOC: Storage & LVM Menus

Page 43: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

43

Tunable Failure Detection Rate in 7.1.1

� Note that the SMIT menu to alter values was missing prior to HA 7.1.1 SP1

� Checking current settings:root@mhoracle1 /> clctrl -tune -o node_down_delay

sapdemo71_cluster(07552a84-057b-11e1-b7cb-46a6ba546402).node_down_delay = 10000

root@mhoracle1 /> clctrl -tune -o node_timeout

sapdemo71_cluster(07552a84-057b-11e1-b7cb-46a6ba546402).node_timeout = 20000

� Modifying via command line:

clmgr modify cluster HEARTBEAT_FREQUENCY= 10000 GRACE_PERIOD=5000

*** The settings will take effect only after the next sync

Attributes stored

in HACMPcluster

object class

Page 44: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

RSCT (topsvcs) CAA

Heartbeat settings can be defined for each network type (nim).

Heartbeat settings are same for all networks in the cluster.

One perspective is that we only support Ethernet networks.

The settings for heartbeat areGrace periodFailure Cycle

Interval between Heartbeats

The settings for heartbeat are Grace period – (5 - 30 Seconds)Failure cycle – (1 - 20 seconds)

The combination of heartbeat rate and failure cycle determines how quickly a failure can be

detected and may be calculated using this formula:

(heartbeat rate) * (failure cycle) * 2 seconds

Failure cycle is the time that another node may consider the adapter to be DOWN if it

receives no incoming heartbeats.Actual heartbeat rate is calculated

depending on the Failure cycle.

Grace period is the waiting time period after detecting the Failure before it is reported.

Grace period is the waiting time period after detecting the Failure before it is reported.

FDR Comparison to Version 6.1 & earlier versions

*** Note that HA 7.1.0 had self-tuning failure detection rate

Page 45: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

45

Application Monitoring within PowerHA SystemMirror

� Some are provided in Smart Assistants– ie. cluster.es.assist.oracle � /usr/es/sbin/cluster/sa/oracle/sbin/DBInstanceMonitor

� A Monitor is bound to the Application Controller– Example OracleDB

Startup

Monitor

Process

Monitor

Custom

Monitor

Confirm the

startup of the

application

Invokes the

custom logic

Checks the

process table

New

Application

Startup Mode

in HA 7.1.1

60 sec

interval

60 sec

interval

Only

invoked on

application

startup

Long Running Monitors will

continue run locally with the

running application

Page 46: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

46

PowerHA SystemMirror: Application Startup 7.1.1

� The cluster invokes the start script but doesn’t confirm its success

� Consider at least an application start up monitor

Enhancement was introduced in HA Version 7.1.1

- Application start may be set to run in the foreground

Start up Monitor

Application

Controller

start.sh

stop.sh

Service IP

Volume Group

/filesystems

Long-Running Monitor

Resource Group A

Page 47: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

47

PowerHA SystemMirror – HMC Definition

Food for Thought: How many DLPAR operations can be handled at once?

Multiple HMC

IPs may be

defined

separated by a

space

Information

stored in HA

ODM object

classes

There was no

SDMC support.

No longer much

of an issue

Page 48: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

48

PowerHA SystemMirror – Integrated DLPAR Menu

Add Dynamic LPAR and CoD Resources for Applications

Type or select values in entry fields.Press Enter AFTER making all desired changes.

[TOP] [Entry Fields]* Application Controller Name Application_svr1

* Minimum number of processing units [ 0.00]* Desired number of processing units [ 0.00]

* Minimum number of CPUs [0] #* Desired number of CPUs [0] #

* Minimum amount of memory (in megabytes) [0] #* Desired amount of memory (in megabytes) [0] #

* Use CoD if resources are insufficient? [no] +

* I agree to use CoD resources [no] +(Using CoD may result in extra costs)

You must ensure that* CoD enablement keys are activated* CoD resources are not used for any other purpose

HMC

HMC IPs are

defined and stored

in a different HA

panel

Page 49: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

49

The many uses of the clmgr utility

# clmgr add cluster clmgr_cluster REPOSITORY=hdisk2 CLUSTER_IP=228.1.1.36# clmgr add node clmgr2

# clmgr add network net_ether_01 TYPE=ether# clmgr add interface clmgr2b2 NETWORK=net_ether_02 NODE=clmgr2 INTERFACE=en1

# clmgr add persistent clmgr1p1 NETWORK=net_ether_01 NODE=clmgr1# clmgr add service_ip clmgrsvc1 NETWORK=net_ether_01

# clmgr add application_controller test_app1 STARTSCRIPT="/home/apps/start1.sh" STOPSCRIPT="/home/apps/stop1.sh" STARTUP_MODE=background

# clmgr add volume_group test_vg1 NODES="clmgr1,clmgr2" PHYSICAL_VOLUMES=hdisk3 TYPE=original MAJOR_NUMBER=35 ACTIVATE_ON_RESTART=false

# clmgr add resource_group clmgr_RG1 NODES="clmgr1,clmgr2" STARTUP=OHN FALLOVER=FNPN FALLBACK=NFB VOLUME_GROUP=test_vgSERVICE_LABEL=clmgrsvc1 APPLICATIONS=test_app1

# clmgr verify cluster CHANGES_ONLY=no FIX=yes LOGGING=standard# clmgr sync cluster CHANGES_ONLY=no FIX=yes LOGGING=standard

# clmgr online cluster WHEN=now MANAGE=auto BROADCAST=true CLINFO=true

# clmgr modify clusterNAME=my_new_cls_label

# clmgr manage application_controller suspend test_app1 RESOURCE_GROUP="clmgr_RG1"# clmgr manage application_controller resume test_app1 RESOURCE_GROUP="clmgr_RG2"

� V7 Clustering introduces many applications for this command

Suspend / Resume

Application Monitors

Change cluster name

Start Cluster Services

Verify / Sync cluster

Add a new cluster

Add a new node

Add an Application

Controller

Add a new

Resource Group

Page 50: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

50

Summary

� There are some notable differences between V7 and HA 6.1 and earlier– Pay careful attention to where some of the options are available– Appended Summary Chart of new features to the presentation

� Version 7.1.2 Scheduled GA on Nov 9th

– Brings Enterprise Edition to V7 clusters

� This session is an attempt to make you aware of available options in PowerHA

– Take my recommendations with a grain of salt!

� Take advantage of integrated features & interfaces like:– Application monitoring infrastructure– File Collections– Pre/Post Events and User Defined Events– Pager Notification Methods– New clmgr CLI

SG24-8030

Page 51: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

51

Summary Chart

New Functionality & Changes– New CAA Infrastructure 7.1.X

• IP Multicast based Heartbeat Protocol• HBA Based SAN Heartbeating• Private Network Support• Tunable Failure Detection Rate• New Service IP Distribution Policies

• Full IPV6 Support 7.1.2

– Disk Fencing Enhancements 7.1.0– Rootvg System Event 7.1.0– Disk rename Function 7.1.0– Repository Disk Resilience 7.1.1

• Backup Repository Disks 7.1.2

– New Application Startup Mode 7.1.1– Exploitation of JFS2 Mount Guard 7.1.1– Adaptive Fallover 7.1.0– New RG Dependencies 7.1.0

• Start After, Stop After

– Federated Security 7.1.1• RBAC, EFS & Security System Administration

Extended Distance Clusters– XIV Replication Integration (12/16/2011)– XP12000, XP24000 (11/18/2011)– HP9500 (8/19/2011)– Storwize v7000 (9/30/2011)– SVC 6.2 (9/30/2011)

Smart Assistants (Application Integration)– SAP Live Cache with DS or SVC 7.1.1– MQ Series 7.1.1

DR Capabilities– Stretch & Linked Clusters 7.1.2– DS8000 Hyperswap 7.1.2

Management

– New Command Line Interface 7.1.0• clcmd

• clmgr utility

• lscluster

– IBM Systems Director Management 7.1.0

Page 52: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

52

Questions?

Thank you for your time!

Page 53: HACMP Administration Tasks

© 2010 IBM Corporation

IBM Power Systems

53

Additional Resources

� PowerHA SystemMirror 7.1.1 Update – SG24-8030http://www.redbooks.ibm.com/redpieces/abstracts/sg248030.html?Open

� PowerHA SystemMirror 7.1 Redbook – SG24-7845 � Removed from Download site

http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247845.html?Open

� Disaster Recovery RedbookSG24-7841 - Exploiting PowerHA SystemMirror Enterprise Edition for AIXhttp://www.redbooks.ibm.com/abstracts/sg247841.html?Open

� RedGuide: High Availability and Disaster Recovery Planning: Next-Generation Solutions for Multi server IBM Power Systems Environmentshttp://www.redbooks.ibm.com/abstracts/redp4669.html?Open

� PowerHA SystemMirror Marketing Pagehttp://www-03.ibm.com/systems/power/software/availability/aix/index.html

� PowerHA SystemMirror Wiki Pagehttp://www-941.ibm.com/collaboration/wiki/display/WikiPtype/High+Availability