Linux-HA with Pacemaker
-
Upload
kris-buytaert -
Category
Technology
-
view
6.840 -
download
6
description
Transcript of Linux-HA with Pacemaker
Linux High AvailabilityLinux High Availability
Kris Buytaert
Kris BuytaertKris Buytaert@krisbuytaert@krisbuytaert● I used to be a Dev, Then Became an I used to be a Dev, Then Became an
OpOp● Senior Linux and Open Source Senior Linux and Open Source
Consultant @inuits.beConsultant @inuits.be● „„Infrastructure Architect“Infrastructure Architect“● Building Clouds since before the Building Clouds since before the
Cloud Cloud ● Surviving the 10Surviving the 10thth floor test floor test● Co-Author of some books Co-Author of some books ● Guest Editor at some sitesGuest Editor at some sites
What is HA Clustering ?What is HA Clustering ?
● One service goes down One service goes down
=> others take over its work=> others take over its work
● IP address takeover, service takeover, IP address takeover, service takeover,
● Not designed for high-performanceNot designed for high-performance
● Not designed for high troughput (load balancing)Not designed for high troughput (load balancing)
Does it Matter ?Does it Matter ?
● Downtime is expensiveDowntime is expensive
● You mis out on $$$You mis out on $$$
● Your boss complains Your boss complains
● New users don't returnNew users don't return
Lies, Damn Lies, and Lies, Damn Lies, and StatisticsStatistics
Counting ninesCounting nines(slide by Alan R)(slide by Alan R)
99.9999% 30 sec99.999% 5 min99.99% 52 min99.9% 9 hr 99% 3.5 day
The Rules of HAThe Rules of HA
● Keep it SimpleKeep it Simple● Keep it SimpleKeep it Simple● Prepare for FailurePrepare for Failure● Complexity is the enemy of Complexity is the enemy of
reliabilityreliability● Test your HA setup Test your HA setup
MythsMyths● Virtualization will solve your HA NeedsVirtualization will solve your HA Needs
● Live migration is the solution to all your problemsLive migration is the solution to all your problems
● VM mirroring is the solution to all your problemsVM mirroring is the solution to all your problems
● HA will make your platform more stableHA will make your platform more stable
Eliminating the SPOFEliminating the SPOF● Find out what Will Fail
• Disks• Fans• Power (Supplies)
● Find out what Can Fail• Network• Going Out Of Memory
Split BrainSplit Brain● Communications failures can lead to separated partitions of Communications failures can lead to separated partitions of
the clusterthe cluster
● If those partitions each try and take control of the cluster, If those partitions each try and take control of the cluster, then it's called a split-brain conditionthen it's called a split-brain condition
● If this happens, then bad things will happenIf this happens, then bad things will happen
• http://linux-ha.org/BadThingsWillHappenhttp://linux-ha.org/BadThingsWillHappen
You care about ?You care about ?
● Your data ?Your data ?• ConsistentConsistent• RealitimeRealitime• Eventual Consistent Eventual Consistent
● Your ConnectionYour Connection• AlwaysAlways• Most of the timeMost of the time
Shared StorageShared Storage● Shared StorageShared Storage
● Filesystem Filesystem
• e.g GFS, GpFSe.g GFS, GpFS
● Replicated ?Replicated ?
● Exported Exported Filesystem ?Filesystem ?
● $$$ $$$ 1+1 <> 21+1 <> 2
● Storage = SPOF Storage = SPOF
● Split Brain :(Split Brain :(
● StonithStonith
(Shared) Data(Shared) Data● Issues : Issues :
• Who Writes ? Who Writes ?
• Who Reads ?Who Reads ?
• What if 2 Active application want to write ? What if 2 Active application want to write ?
• What if an active server crashes during writing ?What if an active server crashes during writing ?
• Can we accept delays ? Can we accept delays ?
• Can we accept readonly data ?Can we accept readonly data ?
● Hardware Requirements Hardware Requirements
● Filesystem Requirements (GFS, GpFS, ...) Filesystem Requirements (GFS, GpFS, ...)
DRBDDRBD● Distributed Replicated Block DeviceDistributed Replicated Block Device
● In the Linux Kernel (as of very recent)In the Linux Kernel (as of very recent)
● Usually only 1 mountUsually only 1 mount
• Multi mount as of 8.X Multi mount as of 8.X
• Requires GFS / OCFS2Requires GFS / OCFS2
● Regular FS ext3 ... Regular FS ext3 ...
● Only 1 application instance Active accessing dataOnly 1 application instance Active accessing data
● Upon Failover application needs to be started on other nodeUpon Failover application needs to be started on other node
DRBD(2)DRBD(2)● What happens when you pull the plug of a Physical What happens when you pull the plug of a Physical
machine ? machine ?
• Minimal TimeoutMinimal Timeout
• Why did the crash happen ? Why did the crash happen ?
• Is my data still correct ?Is my data still correct ?
Alternatives to DRBDAlternatives to DRBD● GlusterFS looked promising GlusterFS looked promising
• ““Friends don't let Friends use Gluster”Friends don't let Friends use Gluster”
• Consistency problems Consistency problems
• Stability ProblemsStability Problems
• Maybe laterMaybe later
● MogileFSMogileFS
• Not posix Not posix
• App needs to implement the APIApp needs to implement the API
● Ceph Ceph
• ??
HA ProjectsHA Projects● Linux HA ProjectLinux HA Project
● Red Hat Cluster SuiteRed Hat Cluster Suite
● LVS/KeepalivedLVS/Keepalived
● Application Specific Clustering SoftwareApplication Specific Clustering Software
• e.g Terracotta, MySQL NDBDe.g Terracotta, MySQL NDBD
Heartbeat Heartbeat ● Heartbeat v1Heartbeat v1
• Max 2 nodesMax 2 nodes
• No finegrained resourcesNo finegrained resources
• Monitoring using “mon”Monitoring using “mon”
● Heartbeat v2Heartbeat v2
• XML usage was a consulting opportunityXML usage was a consulting opportunity
• Stability issuesStability issues
• Forking ?Forking ?
Heartbeat v1Heartbeat v1
/etc/ha.d/ha.cf/etc/ha.d/ha.cf
/etc/ha.d/haresources/etc/ha.d/haresourcesmdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0 \mdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0 \
IPaddr2::10.16.0.13/16/bond0.16 monIPaddr2::10.16.0.13/16/bond0.16 mon
/etc/ha.d/authkeys/etc/ha.d/authkeys
Heartbeat v2Heartbeat v2
““A consulting Opportunity”A consulting Opportunity”
LMBLMB
Clone ResourceClone Resource
Clones in v2 were buggyClones in v2 were buggy
Resources were started on 2 nodesResources were started on 2 nodes
Stopped again on “1” Stopped again on “1”
Heartbeat v3Heartbeat v3
• No more /etc/ha.d/haresourcesNo more /etc/ha.d/haresources
• No more xmlNo more xml
• Better integrated monitoringBetter integrated monitoring
• /etc/ha.d/ha.cf has /etc/ha.d/ha.cf has
• crm=yescrm=yes
Pacemaker ?Pacemaker ?● Not a fork Not a fork
● Only CRM Code taken out of Heartbeat Only CRM Code taken out of Heartbeat
● As of Heartbeat 2.1.3As of Heartbeat 2.1.3
• Support for both OpenAIS / HeartBeatSupport for both OpenAIS / HeartBeat
• Different Release Cycles as Heartbeat Different Release Cycles as Heartbeat
Heartbeat, OpenAis, Heartbeat, OpenAis, Corosync ?Corosync ?● All Messaging LayersAll Messaging Layers
● Initially only HeartbeatInitially only Heartbeat
● OpenAISOpenAIS
● Heartbeat got unmaintainedHeartbeat got unmaintained
● OpenAIS had heisenbugs :(OpenAIS had heisenbugs :(
● Corosync Corosync
● Heartbeat maintenance taken over by LinBitHeartbeat maintenance taken over by LinBit
● CRM Detects which layerCRM Detects which layer
OpenAISHeartbeat
Pacemaker
Cluster Glue
or
● Stonithd : The Heartbeat fencing subsystem.
● Lrmd : Local Resource Management Daemon. Interacts directly with resource agents (scripts).
● pengine Policy Engine. Computes the next state of the cluster based on the current state and the configuration.
● cib Cluster Information Base. Contains definitions of all cluster options, nodes, resources, their relationships to one another and current status. Synchronizes updates to all cluster nodes.
● crmd Cluster Resource Management Daemon. Largely a message broker for the PEngine and LRM, it also elects a leader to co-ordinate the activities of the cluster.
● openais messaging and membership layer.
● heartbeat messaging layer, an alternative to OpenAIS.
● ccm Short for Consensus Cluster Membership. The Heartbeat membership layer.
Pacemaker ArchitecturePacemaker Architecture
Configuring Heartbeat with puppetConfiguring Heartbeat with puppet
heartbeat::hacf {"clustername":heartbeat::hacf {"clustername":
hosts => ["host-a","host-b"],hosts => ["host-a","host-b"],
hb_nic => ["bond0"],hb_nic => ["bond0"],
hostip1 => ["10.0.128.11"],hostip1 => ["10.0.128.11"],
hostip2 => ["10.0.128.12"],hostip2 => ["10.0.128.12"],
ping => ["10.0.128.4"],ping => ["10.0.128.4"],
} }
heartbeat::authkeys {"ClusterName":heartbeat::authkeys {"ClusterName":
password => “ClusterName ",password => “ClusterName ",
}}
http://github.com/jtimberman/puppet/tree/master/heartbeat/http://github.com/jtimberman/puppet/tree/master/heartbeat/
CRM CRM ● Cluster Resource ManagerCluster Resource Manager
● Keeps Nodes in SyncKeeps Nodes in Sync
● XML BasedXML Based
● cibadm cibadm
● Cli manageableCli manageable
● Crm Crm
configureconfigureproperty $id="cibbootstrapoptions" \property $id="cibbootstrapoptions" \ stonithenabled="FALSE" \stonithenabled="FALSE" \ noquorumpolicy=ignore \noquorumpolicy=ignore \ startfailureisfatal="FALSE" \startfailureisfatal="FALSE" \rsc_defaults $id="rsc_defaultsoptions" \rsc_defaults $id="rsc_defaultsoptions" \ migrationthreshold="1" \migrationthreshold="1" \ failuretimeout="1"failuretimeout="1"primitive d_mysql ocf:local:mysql \primitive d_mysql ocf:local:mysql \ op monitor interval="30s" \op monitor interval="30s" \ params test_user="sure" params test_user="sure" test_passwd="illtell" test_table="test.table"test_passwd="illtell" test_table="test.table"primitive ip_db ocf:heartbeat:IPaddr2 \primitive ip_db ocf:heartbeat:IPaddr2 \ params ip="172.17.4.202" nic="bond0" \params ip="172.17.4.202" nic="bond0" \ op monitor interval="10s"op monitor interval="10s"group svc_db d_mysql ip_dbgroup svc_db d_mysql ip_dbcommitcommit
Heartbeat ResourcesHeartbeat Resources● LSBLSB
● Heartbeat resource (+status)Heartbeat resource (+status)
● OCF (Open Cluster FrameWork) (+monitor)OCF (Open Cluster FrameWork) (+monitor)
● Clones (don't use in HAv2)Clones (don't use in HAv2)
● Multi State ResourcesMulti State Resources
LSB Resource AgentsLSB Resource Agents● LSB == Linux Standards BaseLSB == Linux Standards Base
● LSB resource agents are standard System V-style init LSB resource agents are standard System V-style init scripts commonly used on Linux and other UNIX-like OSes scripts commonly used on Linux and other UNIX-like OSes
● LSB init scripts are stored under /etc/init.d/LSB init scripts are stored under /etc/init.d/
● This enables Linux-HA to immediately support nearly every This enables Linux-HA to immediately support nearly every service that comes with your system, and most packages service that comes with your system, and most packages which come with their own init scriptwhich come with their own init script
● It's straightforward to change an LSB script to an OCF It's straightforward to change an LSB script to an OCF scriptscript
OCF OCF ● OCF == Open Cluster FrameworkOCF == Open Cluster Framework
● OCF Resource agents are the most powerful OCF Resource agents are the most powerful
type of resource agent we supporttype of resource agent we support
● OCF RAs are extended init scriptsOCF RAs are extended init scripts• They have additional actions:They have additional actions:
• monitor – for monitoring resource healthmonitor – for monitoring resource health• meta-data – for providing information meta-data – for providing information
about the RA about the RA
● OCF RAs are located in OCF RAs are located in /usr/lib/ocf/resource.d/provider-name//usr/lib/ocf/resource.d/provider-name/
MonitoringMonitoring● Defined in the OCF Resource scriptDefined in the OCF Resource script
● Configured in the parametersConfigured in the parameters
● You have to support multiple states You have to support multiple states
• Not runningNot running
• RunningRunning
• FailedFailed
Anatomy of a Cluster Anatomy of a Cluster configconfig
• Cluster propertiesCluster properties
• Resource DefaultsResource Defaults
• Primitive DefinitionsPrimitive Definitions
• Resource Groups and ConstraintsResource Groups and Constraints
Cluster PropertiesCluster Properties
property $id="cib-bootstrap-options" \ property $id="cib-bootstrap-options" \
stonith-enabled="FALSE" \ stonith-enabled="FALSE" \
no-quorum-policy="ignore" \ no-quorum-policy="ignore" \
start-failure-is-fatal="FALSE" \ start-failure-is-fatal="FALSE" \
No-quorum-policy = We'll ignore the loss of quorum on a 2 node clusterNo-quorum-policy = We'll ignore the loss of quorum on a 2 node cluster
Start-failure : Start-failure : When set to FALSE, the cluster will instead use the resource's failcount and When set to FALSE, the cluster will instead use the resource's failcount and value for resource-failure-stickinessvalue for resource-failure-stickiness
Resource DefaultsResource Defaults
rsc_defaults $id="rsc_defaults-options" \ rsc_defaults $id="rsc_defaults-options" \ migration-threshold="1" \ migration-threshold="1" \ failure-timeout="1" \ failure-timeout="1" \ resource-stickiness="INFINITY" resource-stickiness="INFINITY"
failure-timeout means that after a failure there will be a 60 second timeout before the failure-timeout means that after a failure there will be a 60 second timeout before the resource can come back to the node on which it failed.resource can come back to the node on which it failed.
Migration-treshold=1 means that after 1 failure the resource will try to start on the other nodeMigration-treshold=1 means that after 1 failure the resource will try to start on the other node
Resource-stickiness=INFINITY means that the resource really wants to stay where it is now.Resource-stickiness=INFINITY means that the resource really wants to stay where it is now.
Primitive DefinitionsPrimitive Definitions
primitive d_mine ocf:custom:tomcat \primitive d_mine ocf:custom:tomcat \ params instance_name="mine" \params instance_name="mine" \ monitor_urls="health.html" \monitor_urls="health.html" \ monitor_use_ssl="no" \ monitor_use_ssl="no" \ op monitor interval="15s" \op monitor interval="15s" \
on-fail="restart" \ on-fail="restart" \
primitive ip_mine_svc ocf:heartbeat:IPaddr2 \primitive ip_mine_svc ocf:heartbeat:IPaddr2 \ params ip="10.8.4.131" cidr_netmask="16" nic="bond0" \params ip="10.8.4.131" cidr_netmask="16" nic="bond0" \ op monitor interval="10s"op monitor interval="10s"
Parsing a configParsing a config● Isn't always done correctlyIsn't always done correctly
● Even a verify won't find all issuesEven a verify won't find all issues
● Unexpected behaviour might occurUnexpected behaviour might occur
Where a resource runs Where a resource runs • multi state resourcesmulti state resources
• Master – Slave , Master – Slave , • e.g mysql master-slave, drbde.g mysql master-slave, drbd
• ClonesClones• Resources that can run on multiple nodes Resources that can run on multiple nodes
e.ge.g• Multimaster mysql serversMultimaster mysql servers• Mysql slavesMysql slaves• Stateless applicationsStateless applications
• location location • Preferred location to run resource, eg. Based on hostnamePreferred location to run resource, eg. Based on hostname
• colocation colocation • Resources that have to live together Resources that have to live together
• e.g ip address + servicee.g ip address + service• order order
Define what resource has to start first, or wait for another Define what resource has to start first, or wait for another resourceresource
• groups groups • Colocation + orderColocation + order
eg. A Service on DRBDeg. A Service on DRBD● DRBD can only be active on 1 nodeDRBD can only be active on 1 node
● The filesystem needs to be mounted on that active DRBD The filesystem needs to be mounted on that active DRBD nodenode
group svc_mine d_mine ip_minegroup svc_mine d_mine ip_mine
ms ms_drbd_storage drbd_storage \ ms ms_drbd_storage drbd_storage \
meta master_max="1" master_node_max="1" clone_max="2" meta master_max="1" master_node_max="1" clone_max="2" clone_node_max="1" notify="true" clone_node_max="1" notify="true"
colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Master colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Master
order fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start order fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start
location cli-prefer-svc_db svc_db \ location cli-prefer-svc_db svc_db \
rule $id="cli-prefer-rule-svc_db" inf: #uname eq db-arule $id="cli-prefer-rule-svc_db" inf: #uname eq db-a
Crm commandsCrm commands
Crm Crm Start the cluster resource managerStart the cluster resource manager
Crm resourceCrm resourceChange in to resource modeChange in to resource mode
Crm configureCrm configureChange into configure modeChange into configure mode
Crm configure show Crm configure show Show the current resource config Show the current resource config
Crm resource showCrm resource showShow the current resource stateShow the current resource state
Cibadm -QCibadm -QDump the full Cluster Information Base in XML Dump the full Cluster Information Base in XML
Using crmUsing crm● Crm configureCrm configure
● Edit primitiveEdit primitive
● VerifyVerify
● CommitCommit
But We love XMLBut We love XML● Cibadm -Q Cibadm -Q
Checking the Cluster Checking the Cluster StateState
crm_mon -1 crm_mon -1
============ ============ Last updated: Wed Nov 4 16:44:26 2009 Last updated: Wed Nov 4 16:44:26 2009 Stack: Heartbeat Stack: Heartbeat Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 2 Nodes configured, unknown expected votes 2 Nodes configured, unknown expected votes 2 Resources configured. 2 Resources configured. ============ ============
Online: [ xms-1 xms-2 ] Online: [ xms-1 xms-2 ]
Resource Group: svc_mysql Resource Group: svc_mysql d_mysqld_mysql (ocf::ntc:mysql):(ocf::ntc:mysql): Started xms-1 Started xms-1 ip_mysqlip_mysql (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1 Resource Group: svc_XMS Resource Group: svc_XMS d_XMSd_XMS (ocf::ntc:XMS):(ocf::ntc:XMS): Started xms-2 Started xms-2 ip_XMSip_XMS (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-2 Started xms-2 ip_XMS_publicip_XMS_public (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-2 Started xms-2
Stopping a resourceStopping a resourcecrm resource stop svc_XMS crm resource stop svc_XMS
crm_mon -1 crm_mon -1
============ ============ Last updated: Wed Nov 4 16:56:05 2009 Last updated: Wed Nov 4 16:56:05 2009 Stack: Heartbeat Stack: Heartbeat Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 2 Nodes configured, unknown expected votes 2 Nodes configured, unknown expected votes 2 Resources configured. 2 Resources configured. ============ ============
Online: [ xms-1 xms-2 ] Online: [ xms-1 xms-2 ]
Resource Group: svc_mysql Resource Group: svc_mysql d_mysqld_mysql (ocf::ntc:mysql):(ocf::ntc:mysql): Started xms-1 Started xms-1 ip_mysqlip_mysql (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1
Starting a resourceStarting a resourcecrm resource start svc_XMS crm resource start svc_XMS crm_mon -1 crm_mon -1
============ ============ Last updated: Wed Nov 4 17:04:56 2009 Last updated: Wed Nov 4 17:04:56 2009 Stack: Heartbeat Stack: Heartbeat Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 2 Nodes configured, unknown expected votes 2 Nodes configured, unknown expected votes 2 Resources configured. 2 Resources configured. ============ ============
Online: [ xms-1 xms-2 ] Online: [ xms-1 xms-2 ]
Resource Group: svc_mysql Resource Group: svc_mysql d_mysqld_mysql (ocf::ntc:mysql):(ocf::ntc:mysql): Started xms-1 Started xms-1 ip_mysqlip_mysql (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1 Resource Group: svc_XMS Resource Group: svc_XMS
Moving a resourceMoving a resource● Resource migrateResource migrate
● Is permanentIs permanent , even upon failure , even upon failure
● Usefull in upgrade scenariosUsefull in upgrade scenarios
● Use resource unmigrate to restore Use resource unmigrate to restore
Moving a resourceMoving a resource[xpoll-root@XMS-1 ~]# crm resource migrate svc_XMS xms-1 [xpoll-root@XMS-1 ~]# crm resource migrate svc_XMS xms-1 [xpoll-root@XMS-1 ~]# crm_mon -1 [xpoll-root@XMS-1 ~]# crm_mon -1 Last updated: Wed Nov 4 17:32:50 2009 Last updated: Wed Nov 4 17:32:50 2009 Stack: Heartbeat Stack: Heartbeat Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 2 Nodes configured, unknown expected votes 2 Nodes configured, unknown expected votes 2 Resources configured. 2 Resources configured. Online: [ xms-1 xms-2 ] Online: [ xms-1 xms-2 ] Resource Group: svc_mysql Resource Group: svc_mysql d_mysqld_mysql (ocf::ntc:mysql):(ocf::ntc:mysql): Started xms-1 Started xms-1 ip_mysqlip_mysql (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1 Resource Group: svc_XMS Resource Group: svc_XMS d_XMSd_XMS (ocf::ntc:XMS):(ocf::ntc:XMS): Started xms-1 Started xms-1 ip_XMSip_XMS (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1 ip_XMS_publicip_XMS_public (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1
Migrate vs StandbyMigrate vs Standby● Think nrofnodes > 2 clustersThink nrofnodes > 2 clusters
● Migrate : send resource to node XMigrate : send resource to node X
• Only use that available oneOnly use that available one
● Standby : do not send resources to node XStandby : do not send resources to node X
• But use the other available onesBut use the other available ones
DebuggingDebugging● Check crm_mon -f Check crm_mon -f
● Failcounts ? Failcounts ?
● Did the application launch correctly ?Did the application launch correctly ?
● /var/log/messages//var/log/messages/
• Warning: very verboseWarning: very verbose
Resource not runningResource not running[menos-val3-root@mrs-a ~]# crm
crm(live)# resource
crm(live)resource# show
Resource Group: svc-MRS
d_MRS (ocf::ntc:tomcat) Stopped
ip_MRS_svc (ocf::heartbeat:IPaddr2) Stopped
ip_MRS_usr (ocf::heartbeat:IPaddr2) Stopped
Resource FailcountResource Failcount[menos-val3-root@mrs-a ~]# crm
crm(live)# resource
crm(live)resource# failcount d_MRS show mrs-a
scope=status name=fail-count-d_MRS value=1
crm(live)resource# failcount d_MRS delete mrs-a
crm(live)resource# failcount d_MRS show mrs-a
scope=status name=fail-count-d_MRS value=0
Resource FailcountResource Failcount[menos-val3-root@mrs-a ~]# crm
crm(live)# resource
crm(live)resource# failcount d_MRS show mrs-a
scope=status name=fail-count-d_MRS value=1
crm(live)resource# failcount d_MRS delete mrs-a
crm(live)resource# failcount d_MRS show mrs-a
scope=status name=fail-count-d_MRS value=0
Resource FailcountResource Failcount[menos-val3-root@mrs-a ~]# crm
crm(live)# resource
crm(live)resource# failcount d_MRS show mrs-a
scope=status name=fail-count-d_MRS value=1
crm(live)resource# failcount d_MRS delete mrs-a
crm(live)resource# failcount d_MRS show mrs-a
scope=status name=fail-count-d_MRS value=0
Pacemaker and PuppetPacemaker and Puppet
● Plenty of non usable modules aroundPlenty of non usable modules around
• Hav1Hav1
● https://github.com/rodjek/puppet-pacemaker.githttps://github.com/rodjek/puppet-pacemaker.git
• Strict set of ops / parametersStrict set of ops / parameters
●
● Make sure your modules don't enable resourcesMake sure your modules don't enable resources
● I've been using templates till to populateI've been using templates till to populate
● Cibadm to configure Cibadm to configure
● Crm is complex , even crm doesn't parse correctly yetCrm is complex , even crm doesn't parse correctly yet
●
● Plenty of work ahead ! Plenty of work ahead !
Getting HelpGetting Help● http://clusterlabs.orghttp://clusterlabs.org
● #linux-ha on irc.freenode.org#linux-ha on irc.freenode.org
● http://www.drbd.org/users-guide/http://www.drbd.org/users-guide/
Contact :Contact :Kris Buytaert Kris Buytaert [email protected]@inuits.be
Further ReadingFurther Reading@krisbuytaert @krisbuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/http://www.inuits.be/http://www.inuits.be/http://www.virtualizatihttp://www.virtualization.com/on.com/http://www.oreillygmt.com/http://www.oreillygmt.com/ EsquimauxEsquimaux
Kheops Business Kheops Business CenterCenterAvenque Georges Avenque Georges Lemaître 54Lemaître 546041 Gosselies6041 Gosselies889.780.406889.780.406+32 495 698 668 +32 495 698 668
InuitsInuits't Hemeltje't HemeltjeGemeentepark 2Gemeentepark 22930 Brasschaat2930 Brasschaat891.514.231891.514.231
+32 473 441 636 +32 473 441 636