Operations & Availability
-
Upload
sathyan-mahalingam -
Category
Documents
-
view
217 -
download
0
Transcript of Operations & Availability
-
Copyright IBM Corporation 2004Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Welcome to:
3.1
Operations and AvailabilityOperations and Availability
-
Copyright IBM Corporation 2004
Unit Objectives
After completing this unit, you should be able to:Understand how to achieve high availabilityConsider the role backup, failure recovery, and applying updates plays in daily operationsList what needs to be backed up and whenDiscuss additional monitoring and operational technology aids that can help with daily management
-
Copyright IBM Corporation 2004
Topic 1High AvailabilityDaily operations, Backup, and RecoveryExternal Tools for Monitoring and Operations
-
Copyright IBM Corporation 2004
Failure Recovery - Software and HardwareNeed plan in place to recover from failure before it happensSoftware Failure
Recovery may be automaticRe-create components if necessary Restore backups if necessary
Hardware Failure, Single SystemRecovery less likely to automaticFor high availability requirement use:
WMQ Clustering and/orPlatform level failover using HACMP, MS Cluster Server, Veritas, and so forthThen WMQ and Message Broker recovery using the support pac failover procedure
Combine Clustering and Platform Failover to build a highly available system
Hardware Failure, CatastrophicRecovery almost never automatic, usually at different physical siteCreate componentsRestore backup for all componentsDetermine need for reprocessing of work and initiate
Success depends on good synchronized backups
-
Copyright IBM Corporation 2004
High AvailabilityAvailability
Percent Availability = Up-Time / (Up-Time + Down-Time) * 100Down-Time = Scheduled Down-Time + Unscheduled Down-Time
Factors to ensure high availabilityHardware Software Configurations
Disk mirroringServer redundancyVeritasUninterruptible power supplies (UPS)
Application DesignApplications need to support non-disruptive release upgradesAvoid single points of failure
Data Center OrganizationStrict change controlComprehensive testingOperations support
-
Copyright IBM Corporation 2004
Issues Contributing to High AvailabilityReliable hardwareShared queuesHeath monitoringFailover clusteringOnline backupDual networksReliable operating systemOnline reconfigurationApplication design
WebSphere MQ ClusteringFast rebootRAID DisksCrash recoverySpeed of recoveryFast startupIP takeoverDocumented proceduresPracticing procedures
-
Copyright IBM Corporation 2004
Achieved by a combination of two distinct technologies:Hardware clustering (for example, HACMP)
To provide high availability of a single server within an WMQI hubSoftware clustering
To provide load balancing and high service availability across the whole hub (by allowing individual servers within a hub to become unavailable while the other servers continue to operate and service requests to the hub)
Use both of these technologies in messaging hubs to achieve high throughput and availability
Highly Available WebSphere MQ/WBIMB Configurations
-
Copyright IBM Corporation 2004
Restart/FailoverClustering (HA, RAID)
Distributed Clustering(Fastnet)
No clusteringsupport
Access toexisting msgs
Access fornew msgs
Shared QTandem continuous
continuous
continuous
automaticautomatic
none none
none
automatic continuous
Messaging and Availability
-
Copyright IBM Corporation 2004
Software ClusteringLogical hub network technology
Multiple physical WebSphere MQ queue managers Multiple brokers in WBIMB
Spreads the workload Improves performance and availability
Messaging HubsPotential for bottlenecksResolved by scalingPotential single point of failureResolved by availability/recoverability
-
Copyright IBM Corporation 2004
Clustered Servers Running a Queue Manager
-
Copyright IBM Corporation 2004
Cold Standby
-
Copyright IBM Corporation 2004
QM-B QM-A
ipaddr
/var/mqm/usr/lpp/mqm
/var/mqm/usr/lpp/mqm
/var/mqm/log/QM-A/var/mqm/qmgrs/QM-A
ipaddr
/var/mqm/log/QM-B/var/mqm/qmgrs/QM-B
Active/Active
-
Copyright IBM Corporation 2004
Availability - Two Servers
-
Copyright IBM Corporation 2004
z/OS Shared Queues
-
Copyright IBM Corporation 2004
Failover on Various Platforms
Platform Failover Facilityz/OS ARMAIX HACMP
HP-UX ServiceGuardOpenVMS
Solaris Cluster SunTru64 TruCluster
WinNT, Win2000 MSCS
-
Copyright IBM Corporation 2004
Topic 2High AvailabilityDaily Operations, Backup, and RecoveryExternal Tools for Monitoring and Operations
-
Copyright IBM Corporation 2004
OperationsDaily or Regularly Scheduled Operations Tasks
BackupsDatabase; Configuration Manager and Broker RepositoriesWMQ LogSoftware Configuration and Individual Workspace Files
MonitoringRuntime systems and application monitoring for Problem DetectionBusiness Process monitoring for business analysis
Exceptional Operations TasksCode maintenanceProblem Determination
Identification, Repair, and ImplementationFailure recovery
-
Copyright IBM Corporation 2004
What Could Go Wrong?All WBIMB components must be protected against the following failures:
CPU failureDisk failureSystem lossApplication failureWMQ object corruption WMQ object deletionDatabase corruptionDatabase deletionLoss/reset of environment variable settings
The above situations can affect any WBIMB component running in development, test or production
-
Copyright IBM Corporation 2004
Key Areas to Consider in Backing Up WBI MBWBIMB does not hold any data itself.Messages exist on WMQ queuesConfiguration data is held in databasesSource code kept in developer workspace or external SCM
Message flowsMessage setsUser-defined nodes and parsers (plug-ins)
Aspects of WBIMB Backups1. WMQ queues used by Message Broker2. WBIMB code and configuration files3. WBIMB product and application databases4. Developer artifacts (source code)
-
Copyright IBM Corporation 2004
Backup - 1. WMQ QueuesBrokers are WMQ applications processing production data.
Tight backup procedures are required.Backup procedures for base MQSeries apply for broker queue managers and their queues.
The queue manager used by the configuration manager does not handle production data.
Backup procedures need not be as tight.Deploys and command messages can be redriven.Ensure that the queue manager can be rebuilt.
Select appropriate log file type for disaster recovery Circular versus linear logs
Archive log files, backup WMQ config file, synchronize with DB backup
-
Copyright IBM Corporation 2004
Backup - 2. WBIMB Product Code and Configuration
The complete WBIMB directory structureDeveloper ToolkitsConfiguration manager workstationRuntime broker machines
Code and plugin LIL files in usr/opt/mqsi Backup on installation and plugin update
Configuration in broker file system /var/mqsi On broker creation and change, when user db added (odbc.ini)
Consider development, test and production systems
Registry entries for ConfigMgr and BrokerConfiguration infos like DBs to access, DB/Service IDs, passwordsHKEY_LOCAL_MACHINE\SOFTWARE\IBM\WebSphereMQIntegrator
Code for user written plugin nodes and parsersRuntime plugin source codeRuntime executable codeNode definition files used at configuration time (in Toolkit)
-
Copyright IBM Corporation 2004
Backup - 3. WBIMB DatabasesAfter each production deployment backup:
Configuration Manager databaseContains critical domain configuration information and ACLs (domain and Pub/Sub)
Broker databasesContain deployed object information
For broker database tables updated outside of the deploy process
Retained publications Subscriber list
Configure the database manager to store changes in its log files.If DB2 is used the DBM should employ archival logging
Application databasesContain user data and are accessed by inflight WBIMB message flowsBackup should conform to the practices in place for other databases in the enterprise
-
Copyright IBM Corporation 2004
Backup - 4. Developer ArtifactsMessage flows, ESQL, mappings, Message sets, test data, Plug-in nodesAll code artifacts stored in file systems
Programmer workspaceProjects can be distributed in file system
Software Configuration Manager repository
Toolkit provides Local HistoryCustomize days to keep files, entries per file, max. file size
-
Copyright IBM Corporation 2004
Recovery Scenarios It is recommended that recovery plans contain documented procedures for recovery from various failure scenariosFor WBIMB consider the following list
Execution group or single message flow failsBroker failsWMQ queue manager failsUserNameServer failsConfiguration manager failsConfiguration manager and queue manager fail
Include for each recovery scenario:Details of the individual components that are neededWhere each component is restored fromSteps to be performed to restore each component
Order the work items to be performedList the personnel involved
Test the recovery procedureRecord details of the time taken to restore full serviceHighlight critical stagesCapture details regarding the complexity of stages
-
Copyright IBM Corporation 2004
MonitoringBrokers should be monitored for performance, errors, and so forth
Several places to watch:Message flow input queues - number or messages and backout count on first messageMessage flow output queuesFailure queuesDead letter queueBackout queueSystem Log (NT Event Log and UNIX System Logs)
Can (and should) be done with automated toolsMore on available tools in the next topic
May want to enable automated responseCan use the same XML messaging that is used by the Configuration Manager to monitor the Broker
Subscribe to $SYS topics
-
Copyright IBM Corporation 2004
Problem DeterminationShould have plan in place before you really need itWho will troubleshoot problems?
May be multiple groupsDatabase, WMQ, System, Network administratorsApplication expertsIn addition to the person or group supporting the Message Broker
How will problems be found?Broker syslog entriesMessage Broker explicit failure handling techniques
TryCatch/Throw/Trace nodes, Exception Lists in Failure/Catch pathsUserTrace and Debugger for test/developmentMonitoring products -
Message Broker, DB2 and WMQMay also need to analyze system monitor information
Who will fix problems?Need plan to get right groups involved after determining failure pointFixes should be applied by the appropriate group/personAppropriate regression and promotion processes followed
-
Copyright IBM Corporation 2004
Code MaintenanceInfrastructure Code Fixes (CSDs) apply to all componentsOnline Software Updates for Message Brokers Toolkit
Documentation and interim fixes Get all from Help->Software Updates->New UpdatesOr download from ftp.software.ibm.com/software/mqseries/fixes/wbimbv50/ and apply selectively via Install/Update PerspectiveSave current configuration - can go back to saved configurations
Integration Code FixesFollow your shop practices Regression testing for functionStress testing for performanceCatalog and have available previous safe release of the codeMake sure you can regress to a previous safe level
-
Copyright IBM Corporation 2004
Topic 3High AvailabilityDaily operations, Backup, and RecoveryExternal Tools for Monitoring and Operations
-
Copyright IBM Corporation 2004
External Monitoring ToolsThe Message Broker and WMQ provided tools provide basic control and monitoringNeed something more if your needs are more sophisticatedExternal tools have on or more of the following capabilities:
Automated monitoring of application (either by looking at processes, using WMQI internal query queue, or watching queue depth)Automated response to problems
Varying options from automated recovery to operator notificationPerformance monitoringReport productionIntegrates with base MQ monitoring product for overall solutionMay require a specific plug-in node supplied by the provider
The Broker facilities include Statistics and Accounting servicesWill look at one tool that monitors business processes rather than the operational components of Broker domains
-
Copyright IBM Corporation 2004
IBM TivoliTivoli Manager for WebSphere MQ/WebSphere Business Integration Message Broker
Integrates with Tivoli FrameworkAllows monitoring of WebSphere MQ and Message Broker using Tivoli Distributed MonitoringAllows control of Message Broker componentsEvents are generated through Tivoli Enterprise Console
Rules can be created to handle various events, including paging, sending e-mail, and automated responses
Instrumentation is provided on Broker CD for monitoring by Tivoli Business Systems ManagerRequires Tivoli Framework
Can't be used stand-alone
-
Copyright IBM Corporation 2004
IBM WebSphere BI MonitorWebSphere Business Integration Monitor
Can be used for all WebSphere Business Integration TechnologyIs used to monitor business processes rather than software componentsCan be used to look at immediate or historical informationLooks at business processes from a macro level
Whole multi-component business processesShows progress of automation a high level step view
Not capable of monitor and control at operational levelRequires WebSphere Business Integration Modeler
-
Copyright IBM Corporation 2004
CandleNet Command CenterCandleNet Command Center (CCC)
Allows monitoring of WMQI brokers and eventsGathers performance statisticsOther reporting tools (subscriptions)Can perform automated actions Also has WebSphere MQ base counterpartRefer to: http://www.candle.com
-
Copyright IBM Corporation 2004
BMC PATROL PATROL for WebSphere MQ Integrator
Allows monitoring of WMQI brokers and eventsKeeps historical data of problemsCan perform automated actions Integrates with PATROL for MQ - OperatorRefer to: http://www.bmc.com
-
Copyright IBM Corporation 2004
MQSoftware QPasa!
MQSoftware's QPasa!Allows monitoring of WMQI brokers and eventsMonitors throughputHistorical data trackingCan personalize GUI to meet user's needsCan perform automated actions Can also monitor WebSphere MQRefer to: http://www.mqsoftware.com
-
Copyright IBM Corporation 2004
Unit Summary
High availability achieved through mix of HW, SW and proceduresMany daily tasks need to be considered and planned for to make a WebSphere BI Message Broker installation run smoothly.Define roles and responsibilitiesCreate disaster recovery planDesign backup schedules for configuration and production dataCoordinate with
Change controlProblem determination procedures
Test disaster recovery scenarios - often Many external products are available to help with daily monitoring and response