HACMP Administration Tasks
-
Upload
djgorkhali-khatri-nepali -
Category
Documents
-
view
75 -
download
19
description
Transcript of HACMP Administration Tasks
© 2010 IBM Corporation
Michael Herrera
PowerHA SystemMirror (HACMP) for AIX
ATS Certified IT Specialist
PowerHA SystemMirror Common Tasks for HA Administrators
Session ID: 41CO
© 2010 IBM Corporation
IBM Power Systems
2
Agenda
� Management:– Start & Stop of cluster services
– Moving Resources
– Saving off Configuration
� Maintenance
– Upgrading AIX & Cluster Software
– CSPOC - LVM Changes
– Adding / Removing Physical volumes
– Network Changes (dynamic)
– Setting up Pager Notification
– Deploying File Collections
– Custom Cluster Verification Methods
– Practical use of UDE events
– Online Planning Worksheets
� Configuration Optimization
– Hostname Changes
– Naming requirements in V7.x
– Auto start or not of cluster
services
– Dynamic Node Priority
– Application Monitoring
– DLPAR Integration
– Resource Group Dependencies
� Tunables
– Cluster Security
– Failure Detection Rate (FDR)
– Adding Users
– Password Changes
� Common Commands (CLI)
– clmgr, lscluster
© 2010 IBM Corporation
IBM Power Systems
3
Attention:
Be aware that HA 7.1.1 SP2 or SP3 does not get reported back properly. The halevel command
probes with the wrong option and since the “server.rte” fileset is not updated it will not catch the
updates to the cluster.cspoc.rte filesets.
How do you check what version of code you are running ?
� Historically we have run:# lslpp –l cluster.es.server.rte
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
cluster.es.server.rte 7.1.1.1 COMMITTED Base Server Runtime
Path: /etc/objrepos
cluster.es.server.rte 7.1.1.1 COMMITTED Base Server Runtime
� Now you can also run:
# halevel –s
7.1.1 SP1 � even though machine may be running SP2
� Also useful:
# lssrc –ls clstrmgrES | grep fixcluster fix level is "3"
© 2010 IBM Corporation
IBM Power Systems
4
Upgrade Considerations
Operating System:� Should you do AIX first or HA code?
– Should you combine the upgrade– New OS requirements for HA – What is your back-out plan?
• Alternate disk install• Mksysb
� BOS updates will typically require a reboot (hence a disruption)
Cluster Software Code:
� What type of Migration
– Snapshot Migration
– Rolling Migration
– Non-Disruptive Update
� Evaluate source to target level
– Can you perform a NDU update?
– New minimum OS requirements
– New required settings• IP Multicasting, Hostname restrictions
• Required topology changes
There are two main areas that you need to consider – OS & HA Software� Change Controls: what is your ability to apply and test the updates ?
� Consider things like Interim Fixes locking down the system
– Will they need to be reapplied?
– Will they need to be rebuilt?
© 2010 IBM Corporation
IBM Power Systems
5
AIX Upgrade Flow in Clustered Environment
Active Production Environment –
- Operating System @ AIX 7.1.0.0
- Stop with Takeover
- OS update TL1 & SPs
- Reboot
- Reintegrate into cluster with AIX 7.1.1.5
Standby System running New Level
Hypothetical Example – 2 Node Cluster running AIX 7.1
Starting Point – Standby SystemOperating System @ AIX 7.1.0.0
- Stop Cluster Services- OS update TL1 & SPs- Reboot
Reintegrate into cluster with AIX 7.1.1.5
- Acquire Resource Group / Application
- Issue rg_move back or continue to run on the standby System
You can start the upgrade on either node
but obviously an update to the node
hosting the application would cause a
disruption to operations
Common Question: Can the cluster run with the nodes running different levels?
© 2010 IBM Corporation
IBM Power Systems
6
Flow of PowerHA Software Upgrade
Active Production Environment –- HA Version 5.5
- UNMANAGE resources- Application is still running
- smit update_all- HA Level & Patches- Be mindful of new base filesets
- smit clstart- Start scripts will get reinvoked
Node Running at New 6.1 version- Application still active
Hypothetical Example – 2 Node Cluster HA version 5.5 to 6.1
Starting Point – Standby SystemHA Version @ HA Version 5.5
- UNMANAGE resources
- smit update_all
- smit clstart
Node Running Version 6.1
We advise against stopping the
cluster with the UNMANAGE option
on more than one node at a time.
Note that it can be done but there
are various factors to consider
Common Question: How long can the cluster run in a mixed mode ? What operations are supported ?
© 2010 IBM Corporation
IBM Power Systems
7
Client Scenario – Database Binary Upgrade
Scenario:- Client had an environment running independent Oracle databases in a mutual takeover cluster
configuration. They wanted to update the Oracle binaries one node at a time and wanted to avoid an unexpected fallover during the process. They wished to UNMANAGE cluster resources on all nodes at the same time.
Lessons Learned:� Do not do an upgrade of the cluster filesets while unmanaged on all nodes
– This would recycle the clstrmgrES daemon and the cluster would lose its internal state
� Application monitors are not suspended when you UNMANAGE the resources– If you manually stop the application and forget about the monitors existing application
monitors could auto-restart it or initiate a takeover depending on your configuration
� Application Start scripts will get invoked again on restart of cluster services– Be aware of what happens when you invoke your start script while already running, or
comment out the scripts prior to restarting cluster services
� Leave the Manage Resources attribute set to “Automatic”– Otherwise it will continue to show the RG as UNMANAGED until you do an RG move
ONLINE
© 2010 IBM Corporation
IBM Power Systems
8
PowerHA SystemMirror: Cluster Startup Behavior
� What is the “Best Practice” ?
Cluster Services are
set to automatically
start up on boot up
All currently supported
releases perform a
cluster verification on
start up and will validate
whether the node can
enter the cluster
© 2010 IBM Corporation
IBM Power Systems
9
PowerHA SystemMirror - Cluster Start up Behavior
� The cluster manager daemon is now running all of the time# clshowsrv -v
Status of the RSCT subsystems used by HACMP:
Subsystem Group PID Status
cthags cthags 4980948 active
ctrmc rsct 4063376 active
Status of the HACMP subsystems:
Subsystem Group PID Status
clstrmgrES cluster 4915234 active
clcomd caa 6422738 active
Status of the optional HACMP subsystems:
Subsystem Group PID Status
clinfoES cluster 8847544 active
� Settings can be altered within the cluster panels:
# lssrc -ls clstrmgrES | grep state
Current state: ST_STABLE
- Default Start up behavior is false
- Verify Cluster should be left to true
© 2010 IBM Corporation
IBM Power Systems
10
So how do you start up Cluster Services ?
� smitty sysmirror � System Management � PowerHA SystemMirror Services � Start / Stop
� smitty clstart (FastPath)
� clmgr start cluster– clmgr online node nodeA
– clmgr start node nodeA
� IBM Systems Director Plug-In
© 2010 IBM Corporation
IBM Power Systems
11
PowerHA SystemMirror: Cluster Stop Options
� What is the purpose of each option ?
• You cannot Non-Disruptively upgrade from pre-version 7.X to newer releases
• The upgrade from 7.1.0 to 7.1.1 is also disruptive
For non-disruptive updates
stop services on only one
node at a time to allow for
one node to retain the status
of the cluster resources
© 2010 IBM Corporation
IBM Power Systems
12
UNMANAGE Resource Group Feature in PowerHA
� Function used for Non-Disruptive Updates (one node at a time)
– Previously known as the Forced Stop
� HA Daemons will continue to run but resources will not be monitored
Application
Monitors will
continue to run.
Depending on the
implementation it
might be wise to
suspend monitors
prior to this
operation
© 2010 IBM Corporation
IBM Power Systems
13
Moving Resources between Nodes
� clRGmove –g <RGname> –n <nodename> -m
� clmgr move rg <RGname> node=<nodename>
� IBM Systems Director Plug-In
� smitty cl_admin
If multiple RGs are selected
the operation and resources
will be processed
sequentially
© 2010 IBM Corporation
IBM Power Systems
14
Types of Available RG Dependencies
� Parent Child Dependencies � Made Available in V5.2
� Location Dependencies � Made Available in V5.3
– Online on Same Node
– Online on Different Nodes
– Online on Same Site
� Start After & Stop After � Made Available in V7.1.1
Most of this is old news, but the use of dependencies can affect where and how
the resources get acquired. More importantly it can affect the steps required to
move resource groups and more familiarity with the configuration is required
© 2010 IBM Corporation
IBM Power Systems
15
Moving Resource Groups with Dependencies
� Invoked –– clRGmove –g <RGname> –n <nodename> -m
© 2010 IBM Corporation
IBM Power Systems
16
Automatic Corrections on Verify & Sync
There are Verify & Sync options in the first two
paths, however, note that they do not include
the Auto-Corrective option. You need to follow
the Custom Cluster Configuration Path for that.
The custom path will allow to make corrective
actions only if ALL cluster nodes are not running
cluster services. By default it will not perform
any corrective actions.
© 2010 IBM Corporation
IBM Power Systems
17
Automatic Nightly Cluster Verification
� By Default the cluster will run a nightly Verification check at midnight
� The clutils.log file should show the results of the nightly check
Be aware of the
clcomd changes for
version 7 clusters
© 2010 IBM Corporation
IBM Power Systems
18
Cluster Custom Verification Methods
Note: Automatic verify & sync on node start up does not include any custom verification methods
� Cluster Verification is made up of a bunch of data collectors
� Checks will return PASSED or FAILED– Will often provide more details than what is reported in the smit.log output
� Custom Verification Methods may be defined to run during the Verify / Sync operations
© 2010 IBM Corporation
IBM Power Systems
19
Adding Custom Verification Methods
Problem Determination Tools > PowerHA SystemMirror Verification > Configure Custom Verification Method� Add a Custom Verification Method and press Enter
Output in smit.log and clverify.log files:Currently Loaded Interim Fixes:
NODE mutiny.dfw.ibm.comPACKAGE INSTALLER LABEL======================================================== =========== ==========bos.rte.security installp passwdLock
NODE munited.dfw.ibm.comPACKAGE INSTALLER LABEL======================================================== =========== ==========bos.rte.security installp passwdLock
Please Ensure that they are consistent between the nodes!
© 2010 IBM Corporation
IBM Power Systems
20
Custom Verification Methods
� Custom methods should be in a common path between the cluster members– ie. /usr/local/hascripts/custom_ver_check.sh
� The Methods are stored in the cluster ODM stanzas
� Script Logic & Return Codes– How fancy do you want to get
#!/bin/ksh
echo "Currently Loaded Interim Fixes:"
clcmd emgr -P
echo "Please Ensure that they are consistent between the nodes!"
© 2010 IBM Corporation
IBM Power Systems
21
PowerHA SystemMirror: Cluster Snapshots
� /usr/es/sbin/cluster/snapshots/ <snapshotname>.info<snapshotname>.odm
Cluster Configuration
HACMPcluster
...infoT
HACMPnodeTinfoT
HACMPadapterTinfoT.
Cluster Report & CLI output
<html tags>cllsnodeT..
cllscfT..
cllsifT..
Snapshot C .odm
cluster
ODM stanzas
Snapshot files:
Snapshot C .info
cluster reportSnapshot B .odm
cluster
ODM stanzasSnapshot A .odm
cluster
ODM stanzas
Snapshot B .info
cluster reportSnapshot A .info
cluster report
� Snapshots are saved
off automatically any
time a Verify / Sync
operation is invoked
� The .info file is not
necessary in order to
able to restore the
configuration
� The snapshot menu will
ask for a <name> and a
<description> as the
only required fields
� The snapshot upgrade
migration path requires
the entire cluster to be
down
© 2010 IBM Corporation
IBM Power Systems
22
PowerHA SystemMirror: Changing the Hostname
� CAA does not currently support changing a system’s hostname– Basically means do not attempt to do this in a Version 7.X cluster
* This is restriction currently under evaluation by the CAA development team and may
be lifted in a future update
# lscluster output
TT.
UUID as well
Application
Controller
start.sh
stop.sh
#!/bin/ksh
set new Hostname �
#!/bin/ksh
unset hostname�
Service IP
Volume Group
/filesystems
Inet0 - hostname Inet0 - hostname
The same is true for
the cluster repository
disk. The UUID is
stored hence you
should not attempt to
replicate the volume
or create an mirrors
to the volume
caa_private volume
group.
Only the service IP should
be swapping between nodes
© 2010 IBM Corporation
IBM Power Systems
23
Naming requirements in V7 clusters
� The COMMUNICATION_PATH has to resolve to the hostname IP– In prior releases the CP could be any path to the node
� Node name can be different than the hostname
� The use of a “-” is not supported in the node name– We had clients further highlight this limitation by using clmgr to create the
cluster. If a node name is not specified and the hostname has a “-” the default node name assigned will also try to use a “-”
– ksh restrictions were removed to allow the use of a “-” in service IP labels so both V6.1 and V7.X support their use in the name
© 2010 IBM Corporation
IBM Power Systems
24
Changes to Node outbound traffic
There were changes made to AIX & PowerHA alias processing
Cluster running HA V6.1 SP7
with AIX 6.1 TL2
• Service IP Alias is listed after
persistent & base address
Cluster running HA V7.1 SP3
with AIX 7.1 TL1 SP4
• Service IP Alias is automatically
listed before the base address.
Note that no persistent IP is
configured in this environment
© 2010 IBM Corporation
IBM Power Systems
25
Number of Resources & Fallover Times
Common Questions: – Will the number of disks or volume groups affect my fallover time?
– Should I configure less larger luns or more smaller luns?
Versions 6.1 and earlier allowed Standard VGs or Enhanced Concurrent VGs– Version 7.X require the use of ECM volume groups
Your Answers:
� Standard VGs would require an openx call against each physical volume– Processing could take several seconds to minutes depending on the number of LUNs
� ECM VGs are varied on all nodes (ACTIVE / PASSIVE)– It takes seconds per VG
� Parallel processing will attempt to varyon on all VGs in parallel
© 2010 IBM Corporation
IBM Power Systems
26
Number of Resource Groups
RG1 (NodeA, NodeB)
Service IP
VG1
APP Server 1
NODE A
RG3 (NodeA, NodeB)
Service IP
VG3
APP Server 3
RG2 (NodeB, NodeA)
Service IP
VG2
APP Server 2
NODE B
RG4 (NodeB, NodeA)
Service IP
VG4
APP Server 4
� RG Decisions beyond: Startup Fallover & Fallback behavior
Best Practice:
Always try to keep it simple, but stay current with new features and take advantage
of existing functionality to avoid added manual customization.
Further Options
� 1 RG vs. Multiple RGs– Selective Fallover behavior (VG / IP)
� RG Processing – Parallel vs. Sequential
� Delayed Fallback Timer– When do you want to fail back
� RG Dependencies– Parent / Child, Location
– Start After / Stop After
© 2010 IBM Corporation
IBM Power Systems
27
Filesystem Definitions in a Resource Group
� Should you explicitly define the filesystems in a Resource Group?
� PowerHA default behavior is to mount ALL
� Reasons to explicitly define:– Nested Filesystems
– Only mount Filesystems specified
� Scenario:– 10 Filesystems in volume group & only 1 defined in RG
• HA processing will only mount the one FS
What are the implications going
forward if you add new Filesystems
via CSPOC and forget to append
them to the resource group
definition?
© 2010 IBM Corporation
IBM Power Systems
28
Event Processing of resources
Invoked during Parallel Processing:� acquire_svc_addr� acquire_takeover_addr� node_down� node_up� release_svc_addr� release_takeover_addr� start_server� stop_server
Not invoked:
� get_disk_vg_fs
� node_down_local
� node_down_remote
� node_down_local_complete
� node_down_remote_complete
� node_up_ local
� node_up_remote
� node_up_local_complete
� node_up_remote_complete
� release_vg_fs
� Resource Groups are processed in Parallel unless you implement RG dependencies or
set a customized serial processing order (HA 4.5 +)
� The new process_resources event script is organized around job types: ACQUIRE,
RELEASE, ONLINE, OFFLINE, DISKS, TAKEOVER_LABELS, APPLICATIONS and more
i.e. JOB_TYPE = VGS
* Be mindful of this with the implementation of Pre/Post Events
© 2010 IBM Corporation
IBM Power Systems
29
Defining Pre / Post Events
� Pre/Post-Event Commands are NOT the same thing as User Defined Events
A custom Event will never
get invoked unless you
explicitly define it as a Pre or
Post event command to an
existing Cluster Event
© 2010 IBM Corporation
IBM Power Systems
30
User Defined Events - UDE
� This option allows you to exploit RMC resource monitors to trigger EVENTs
� Familiarize yourself with the “lsrsrc” command– A Practical Guide for Resource Monitoring and Control - SG24-6615
# odmget HACMPude
HACMPude:
name = "Herrera_UDE_event"
state = 0
recovery_prog_path =
"/usr/local/hascripts/Herrera_UDE“
recovery_type = 2
recovery_level = 0
res_var_name = "IBM.FileSystem"
instance_vector = "Name = \"/\""
predicate = "PercentTotUsed > 95"
rearm_predicate = "PercentTotUsed < 70"
Notes:
� Recycle cluster services after updating UDE events
� Scripts must exist on all cluster nodes: (Path, permissions)
� Logic in recovery program can be configured to send
notification, append more space, etcT
� Can specify multiple values in Selection String field
� Actions logged in clstrmgr.debug and hacmp.out files
© 2010 IBM Corporation
IBM Power Systems
31
PowerHA SystemMirror: File Collections
� Configuration_Files– /etc/hosts– /etc/services– /etc/snmpd.conf– /etc/snmpdv3.conf– /etc/rc.net– /etc/inetd.conf– /usr/es/sbin/cluster/netmon.cf– /usr/es/sbin/cluster/etc/clhosts– /usr/es/sbin/cluster/etc/rhosts– /usr/es/sbin/cluster/etc/clinfo.rc
� Introduced in HA 5.2– Ability to automatically push files every 10 min from source node specified
– Default collections created but not enabled by default
� SystemMirror_Files– Pre, Post & Notification– Start & Stop scripts– Scripts specified in monitors– Custom pager text messages– SNA scripts– Scripts for tape support– Custom snapshot methods– User defined events
� Not intended to maintain users & passwords between cluster nodes
© 2010 IBM Corporation
IBM Power Systems
32
File Collections Application script Scenario
If set to yes files
will be propagated
every 10 minutes
# smitty sysmirror � System Management � File Collections
#!/bin/ksh
Application Start Logic
RED Updates#!/bin/ksh
Application Stop Logic
RED Updates
/usr/local/hascripts/app*
#!/bin/ksh
Application Start Logic
BLUE Logic#!/bin/ksh
Application Stop Logic
Blue Logic
/usr/local/hascripts/app*
Node A Node B
© 2010 IBM Corporation
IBM Power Systems
33
PowerHA SystemMirror - User & Group Administration
� Can select – Local (files)
– LDAP
� Select Nodes by
Resource Group– No selection
means all nodes
# smitty sysmirror � System Management � Security and Users
� Users will be
propagated to all of
the cluster nodes
applicable
� Password command
can be altered to
ensure consistency
across al nodes
© 2010 IBM Corporation
IBM Power Systems
34
PowerHA SystemMirror - User Passwords (clpasswd)
� Optional List of
Users whose
passwords will be
propagated to all
cluster nodes– passwd
command is
aliased to
clpasswd
� Functionality
available since
HACMP 5.2 (Fall 2004)
# smitty sysmirror � System Management � Security and Users � Passwords in a PowerHA SystemMirror cluster
© 2010 IBM Corporation
IBM Power Systems
35
Repository Disk Failure
© 2010 IBM Corporation
IBM Power Systems
36
Pager Notification Events
� As long as sendmail is enabled you can easily receive EVENT notification
smitty sysmirror � Custom Cluster Configuration � Events � Cluster Events
� Remote Notification Methods � Add a Custom Remote Notification Method
Sample Email:
From: root 10/23/2012 Subject: HACMPNode mhoracle1: Event acquire_takeover_addr occurred at Tue Oct 23 16:29:36 2012, object =
© 2010 IBM Corporation
IBM Power Systems
37
Pager Notification Methods
HACMPpager:methodname = "Herrera_notify"desc = “Lab Systems Pager Event"nodename = "connor kaitlyn"dialnum = "[email protected]"filename = "/usr/es/sbin/cluster/samples/pager/sample.txt"eventname = "acquire_takeover_addr config_too_longevent_error node_down_complete node_up_complete"retrycnt = 3timeout = 45
# cat /usr/es/sbin/cluster/samples/pager/sample.txtNode %n: Event %e occurred at %d, object = %o
Sample Email:
From: root 09/01/2009 Subject: HACMPNode kaitlyn: Event acquire_takeover_addr occurred at Tue Sep 1 16:29:36 2009, object =
Attention:
Sendmail must be working and accessible via the firewall to receive notifications
� Action Taken: Halted Node Connor
© 2010 IBM Corporation
IBM Power Systems
38
Online Planning Worksheets Discontinued in Version 7
� The fileset is still there, but the content is no longer there
There is a push to
leverage IBM Systems
Director which will guide
you through the step by
step configuration of the
cluster
© 2010 IBM Corporation
IBM Power Systems
39
PowerHA SystemMirror – Deadman Switch (CAA)
� CAA DMS tunable (deadman_mode) allows two different actions– Assert (crash) the system (default behavior)
– Generate AHAFS event
TT..............
� Version 7 cluster software changes the old behavior
Recent Client Failure Scenario
- Repository Disk LUN had been
locked and had not been
responsive for days. Client was
unaware and standby node had a
problem. Primary system was
brought down when it was unable
to write to repository disk
© 2010 IBM Corporation
IBM Power Systems
40
LVM Dynamic Updates
� The cluster is easy to set up, but what about changes going forward
� ECM Volume Groups (required at HA V7)– New lvs will get pushed across, filesystems will not
• LV updates get pushed across but do not update the /etc/filesystems.
• Lazy Update would resolve this issue
– ECM Limitations lifted for:
• reorgvg & chvg -g size changes
� Cluster Import Option– Correcting out of sync timestamps � auto-corrections or import
� Built-In Lazy Update
© 2010 IBM Corporation
IBM Power Systems
41
CSPOC allows for a multitude of DARE operations� The Cluster Single Point of Control options facilitate dynamic operations
Follow these panels to
dynamically add or remove
resources from the cluster or
perform resource group
movements between nodes
There are CSPOC specific
logs in the HA cluster that will
provide details in the event of
a problem
# smitty cl_admin
© 2010 IBM Corporation
IBM Power Systems
42
CSPOC: Storage & LVM Menus
© 2010 IBM Corporation
IBM Power Systems
43
Tunable Failure Detection Rate in 7.1.1
� Note that the SMIT menu to alter values was missing prior to HA 7.1.1 SP1
� Checking current settings:root@mhoracle1 /> clctrl -tune -o node_down_delay
sapdemo71_cluster(07552a84-057b-11e1-b7cb-46a6ba546402).node_down_delay = 10000
root@mhoracle1 /> clctrl -tune -o node_timeout
sapdemo71_cluster(07552a84-057b-11e1-b7cb-46a6ba546402).node_timeout = 20000
� Modifying via command line:
clmgr modify cluster HEARTBEAT_FREQUENCY= 10000 GRACE_PERIOD=5000
*** The settings will take effect only after the next sync
Attributes stored
in HACMPcluster
object class
© 2010 IBM Corporation
IBM Power Systems
RSCT (topsvcs) CAA
Heartbeat settings can be defined for each network type (nim).
Heartbeat settings are same for all networks in the cluster.
One perspective is that we only support Ethernet networks.
The settings for heartbeat areGrace periodFailure Cycle
Interval between Heartbeats
The settings for heartbeat are Grace period – (5 - 30 Seconds)Failure cycle – (1 - 20 seconds)
The combination of heartbeat rate and failure cycle determines how quickly a failure can be
detected and may be calculated using this formula:
(heartbeat rate) * (failure cycle) * 2 seconds
Failure cycle is the time that another node may consider the adapter to be DOWN if it
receives no incoming heartbeats.Actual heartbeat rate is calculated
depending on the Failure cycle.
Grace period is the waiting time period after detecting the Failure before it is reported.
Grace period is the waiting time period after detecting the Failure before it is reported.
FDR Comparison to Version 6.1 & earlier versions
*** Note that HA 7.1.0 had self-tuning failure detection rate
© 2010 IBM Corporation
IBM Power Systems
45
Application Monitoring within PowerHA SystemMirror
� Some are provided in Smart Assistants– ie. cluster.es.assist.oracle � /usr/es/sbin/cluster/sa/oracle/sbin/DBInstanceMonitor
� A Monitor is bound to the Application Controller– Example OracleDB
Startup
Monitor
Process
Monitor
Custom
Monitor
Confirm the
startup of the
application
Invokes the
custom logic
Checks the
process table
New
Application
Startup Mode
in HA 7.1.1
60 sec
interval
60 sec
interval
Only
invoked on
application
startup
Long Running Monitors will
continue run locally with the
running application
© 2010 IBM Corporation
IBM Power Systems
46
PowerHA SystemMirror: Application Startup 7.1.1
� The cluster invokes the start script but doesn’t confirm its success
� Consider at least an application start up monitor
Enhancement was introduced in HA Version 7.1.1
- Application start may be set to run in the foreground
Start up Monitor
Application
Controller
start.sh
stop.sh
Service IP
Volume Group
/filesystems
Long-Running Monitor
Resource Group A
© 2010 IBM Corporation
IBM Power Systems
47
PowerHA SystemMirror – HMC Definition
Food for Thought: How many DLPAR operations can be handled at once?
Multiple HMC
IPs may be
defined
separated by a
space
Information
stored in HA
ODM object
classes
There was no
SDMC support.
No longer much
of an issue
© 2010 IBM Corporation
IBM Power Systems
48
PowerHA SystemMirror – Integrated DLPAR Menu
Add Dynamic LPAR and CoD Resources for Applications
Type or select values in entry fields.Press Enter AFTER making all desired changes.
[TOP] [Entry Fields]* Application Controller Name Application_svr1
* Minimum number of processing units [ 0.00]* Desired number of processing units [ 0.00]
* Minimum number of CPUs [0] #* Desired number of CPUs [0] #
* Minimum amount of memory (in megabytes) [0] #* Desired amount of memory (in megabytes) [0] #
* Use CoD if resources are insufficient? [no] +
* I agree to use CoD resources [no] +(Using CoD may result in extra costs)
You must ensure that* CoD enablement keys are activated* CoD resources are not used for any other purpose
HMC
HMC IPs are
defined and stored
in a different HA
panel
© 2010 IBM Corporation
IBM Power Systems
49
The many uses of the clmgr utility
# clmgr add cluster clmgr_cluster REPOSITORY=hdisk2 CLUSTER_IP=228.1.1.36# clmgr add node clmgr2
# clmgr add network net_ether_01 TYPE=ether# clmgr add interface clmgr2b2 NETWORK=net_ether_02 NODE=clmgr2 INTERFACE=en1
# clmgr add persistent clmgr1p1 NETWORK=net_ether_01 NODE=clmgr1# clmgr add service_ip clmgrsvc1 NETWORK=net_ether_01
# clmgr add application_controller test_app1 STARTSCRIPT="/home/apps/start1.sh" STOPSCRIPT="/home/apps/stop1.sh" STARTUP_MODE=background
# clmgr add volume_group test_vg1 NODES="clmgr1,clmgr2" PHYSICAL_VOLUMES=hdisk3 TYPE=original MAJOR_NUMBER=35 ACTIVATE_ON_RESTART=false
# clmgr add resource_group clmgr_RG1 NODES="clmgr1,clmgr2" STARTUP=OHN FALLOVER=FNPN FALLBACK=NFB VOLUME_GROUP=test_vgSERVICE_LABEL=clmgrsvc1 APPLICATIONS=test_app1
# clmgr verify cluster CHANGES_ONLY=no FIX=yes LOGGING=standard# clmgr sync cluster CHANGES_ONLY=no FIX=yes LOGGING=standard
# clmgr online cluster WHEN=now MANAGE=auto BROADCAST=true CLINFO=true
# clmgr modify clusterNAME=my_new_cls_label
# clmgr manage application_controller suspend test_app1 RESOURCE_GROUP="clmgr_RG1"# clmgr manage application_controller resume test_app1 RESOURCE_GROUP="clmgr_RG2"
� V7 Clustering introduces many applications for this command
Suspend / Resume
Application Monitors
Change cluster name
Start Cluster Services
Verify / Sync cluster
Add a new cluster
Add a new node
Add an Application
Controller
Add a new
Resource Group
© 2010 IBM Corporation
IBM Power Systems
50
Summary
� There are some notable differences between V7 and HA 6.1 and earlier– Pay careful attention to where some of the options are available– Appended Summary Chart of new features to the presentation
� Version 7.1.2 Scheduled GA on Nov 9th
– Brings Enterprise Edition to V7 clusters
� This session is an attempt to make you aware of available options in PowerHA
– Take my recommendations with a grain of salt!
� Take advantage of integrated features & interfaces like:– Application monitoring infrastructure– File Collections– Pre/Post Events and User Defined Events– Pager Notification Methods– New clmgr CLI
SG24-8030
© 2010 IBM Corporation
IBM Power Systems
51
Summary Chart
New Functionality & Changes– New CAA Infrastructure 7.1.X
• IP Multicast based Heartbeat Protocol• HBA Based SAN Heartbeating• Private Network Support• Tunable Failure Detection Rate• New Service IP Distribution Policies
• Full IPV6 Support 7.1.2
– Disk Fencing Enhancements 7.1.0– Rootvg System Event 7.1.0– Disk rename Function 7.1.0– Repository Disk Resilience 7.1.1
• Backup Repository Disks 7.1.2
– New Application Startup Mode 7.1.1– Exploitation of JFS2 Mount Guard 7.1.1– Adaptive Fallover 7.1.0– New RG Dependencies 7.1.0
• Start After, Stop After
– Federated Security 7.1.1• RBAC, EFS & Security System Administration
Extended Distance Clusters– XIV Replication Integration (12/16/2011)– XP12000, XP24000 (11/18/2011)– HP9500 (8/19/2011)– Storwize v7000 (9/30/2011)– SVC 6.2 (9/30/2011)
Smart Assistants (Application Integration)– SAP Live Cache with DS or SVC 7.1.1– MQ Series 7.1.1
DR Capabilities– Stretch & Linked Clusters 7.1.2– DS8000 Hyperswap 7.1.2
Management
– New Command Line Interface 7.1.0• clcmd
• clmgr utility
• lscluster
– IBM Systems Director Management 7.1.0
© 2010 IBM Corporation
IBM Power Systems
52
Questions?
Thank you for your time!
© 2010 IBM Corporation
IBM Power Systems
53
Additional Resources
� PowerHA SystemMirror 7.1.1 Update – SG24-8030http://www.redbooks.ibm.com/redpieces/abstracts/sg248030.html?Open
� PowerHA SystemMirror 7.1 Redbook – SG24-7845 � Removed from Download site
http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247845.html?Open
� Disaster Recovery RedbookSG24-7841 - Exploiting PowerHA SystemMirror Enterprise Edition for AIXhttp://www.redbooks.ibm.com/abstracts/sg247841.html?Open
� RedGuide: High Availability and Disaster Recovery Planning: Next-Generation Solutions for Multi server IBM Power Systems Environmentshttp://www.redbooks.ibm.com/abstracts/redp4669.html?Open
� PowerHA SystemMirror Marketing Pagehttp://www-03.ibm.com/systems/power/software/availability/aix/index.html
� PowerHA SystemMirror Wiki Pagehttp://www-941.ibm.com/collaboration/wiki/display/WikiPtype/High+Availability