Post on 13-May-2015
description
High Availability with Novell® Cluster Servicesfor Novell Open Enterprise Server on Linux
Kent Boogertkboogert@novell.com
Charles Gonzalescgonzales@novell.com
© Novell, Inc. All rights reserved.2
Agenda
• Storage Foundation
• High Availability Concepts
• Novell® Cluster Services 1.8 Architecture
• Installing NCS
• Cluster Resources
• NCS management tools
Storage Foundation
© Novell, Inc. All rights reserved.4
Novell® Open Enterprise ServerStorage Foundation
Disk: FireWire, SCSI, FC, iSCSI, SATA, SAS
EVMSNetWare
MMI/O
fence
NCP redirector FS
Linux vfs (+efl)
CFSNCS
Backup & restore / Replication / SRMmon
SambaPDCDFS
NCPFreezethaw
NSSReiserExt3
CIMOM
SMI-Sschema
eDir
Policyengine
SMI-S client
LAN
Storage Network
ManagementNetwork
iManager
CLIscripting
NFS
High Availability Concepts
© Novell, Inc. All rights reserved.6
Fault Tolerant Environment
LUN 0
LUN 1
LUN …
Ctrl 1
LAN Fabric
SAN Fabric
Storage Array
Storage Array
Storage Array
Server Cluster
Fibre ChannelEthernet
© Novell, Inc. All rights reserved.7
Eliminating Single Points of Failure
• Servers
© Novell, Inc. All rights reserved.8
Fault Tolerant Environment
LUN 0 LUN 1 LUN …
Ctrl 1
LAN Fabric
SAN Fabric
Storage Array
Storage Array
Storage Array
Server Cluster
Fibre ChannelEthernet
© Novell, Inc. All rights reserved.9
Eliminating Single Points of Failure
• Servers
• Local area network
© Novell, Inc. All rights reserved.10
Fault Tolerant Environment
LUN 0
LUN 1
LUN …
Ctrl 1
LAN Fabric
SAN Fabric
Storage Array
Storage Array
Storage Array
Server Cluster
Fibre ChannelEthernet
Dual NICs
© Novell, Inc. All rights reserved.11
Eliminating Single Soints of Failure
• Servers
• Local area network
• Storage area network
© Novell, Inc. All rights reserved.12
Fault Tolerant Environment
LUN 0
LUN 1
LUN …
Ctrl 1
LAN Fabric
SAN Fabric
Storage Array
Storage Array
Storage Array
Server Cluster
Fibre ChannelEthernet
Ctrl 2
Dual NICs Dual HBAs
© Novell, Inc. All rights reserved.13
Eliminating Single Points of Failure
• Servers
• Local area network
• Storage area network
• Cluster communication
© Novell, Inc. All rights reserved.14
LAN Heartbeat Protocol
LAN
SAN
Unicast(s) back to master
Unicast(s) back to master
Unicast(s) back to master
0 1 2 3
SYS SYS SYS SYS
Sharable for clustering
Master Slave Slave Slave
Broadcast(s)
© Novell, Inc. All rights reserved.15
SAN Split Brain Detection
LAN
SAN
Unicast(s) back to master
Unicast(s) back to master
Unicast(s) back to master
0 1 2 3
SYS SYS SYS SYS
Sharable for clustering
Master Slave Slave Slave
ClusterPartition (SBD)
(Mirrored)
SBD disk I/O
Novell® Cluster Services 1.8 Architecture
© Novell, Inc. All rights reserved.17
Directory-based Configuration
• eDirectory™ for replicated cluster configuration
– NCS:NetWare® Cluster
> Name is misleading, but we haven't changed it
– NCS:Cluster Resource
– NCS:Resource Template
– NCS:NCP™ Server
> Don't confuse with NCP Server
– NCS:Volume Resource
© Novell, Inc. All rights reserved.18
eDirectory™ Object Relationships
NCS:NetWare Cluster
VIRTUAL NCP SERVER OBJECT=KCB_CLUSTER P69_SERVER
NCP SERVER, SERVER
CLUSTER NODE OBJECT= srv01_kcb_cluster.novell
NCS:NCP SERVER
CLUSTER ROOL RESOURCE OBJECT= P69_Server.kcb_cluster.novell
NCS:Volume Resource
NCP SERVER OBJECT =Srv01.novell
NCP SERVER, SERVER
NCP VOLUME OBJECT =kcb_cluster_VOL1.novell
VOLUME RESOURCE
object class
NCS:Network Address :(cluster address)Network address = (resource address i.e. 69)NCS:Network cluster = kcb_cluster.novellNCS:Volumes = kcb_cluster_VOL1.novell
object class
NCS:GIPC NODE NumberNCS:Network Address (Server Address)NCS:NCP Server
NCS:CRM Preferred NodesNCS:NCP Server = (KCS_CLUSTER_P69_SERVER.novellNCS:Volumes = (kcb_cluster_VOL1.novell)
NCS:CLUSTER RESOURCES
NCS:CRM Preferred Node
NCS:RESOURCE TEMPLATENCS:CRM Preferred Node
NCS:NetWare Cluster = not set!!NCS:Volumes = not set!!
network address: Server address
Cluster object= kcb_cluster.novell
HOSTSERVER = KCB_CLUSTER_P69_SERVER.novellHOSTSERVER RESOURCE name = VOL1Linux NCP Mount Point = NSS/media/nss/vol1
object class
object class
© Novell, Inc. All rights reserved.19
Directory-based Configuration
• eDirectory™ for replicated cluster configuration
• Configuration is in cluster container
– Files to know who I am
> /etc/opt/novell/ncs/clstrlib.conf> /etc/opt/novell/ncs/nodename
• Cluster configuration daemon (ncs-configd)
– Syncs configuration between LDAP and local files
• Cluster master declares current configuration
© Novell, Inc. All rights reserved.20
Group Interprocess Communication(GIPC)
• Heartbeat, membership, and multi-cast protocols– Common parameters and tuning concepts
• Linux kernel module creates two raw sockets– /proc/net/raw
• Linux Ethereal/Wireshark offers basic packet decoder
– Good for tracing master / slave heartbeat packets
panning clusterid 56867377heartbeat rate_usecs 1000000censustaker tolerance 8000000sequencer master_watchdog 1000000sequencer slave_watchdog 8000000sequencer retrans_max 30
© Novell, Inc. All rights reserved.21
Split Brain Detector(SBD)
• Nodes coordinate membership via shared disk
– Linux kernel module does direct I/O
> e.g. /dev/sda1 or /dev/evms/cluster.sbd
• Nodes locate matching SBD given cluster name
– SBD partition is created at cluster creation
> Linux device special filename is unimportant
• See man sbdutil for Linux SBD command line utility
– e.g. sbdutil -v
© Novell, Inc. All rights reserved.22
Cluster Resource Manager(CRM)
• Manages cluster-wide resource states– Linux kernel module executes distributed FSM
• Cluster resource daemon (ncs-resourced)– Forks once per resource action
– Forks again for load, unload, and monitor script> stdout & stderr redirected to /var/opt/novell/log/ncs/*.out
– Returns script status to kernel
• Cluster resource scripts run from local files– /bin/bash is default interpreter
– Shell functions in /opt/novell/ncs/lib/ncsfuncs
© Novell, Inc. All rights reserved.23
Load and Unload Script Example
Cluster Pool Resource Load script
Cluster Pool Resource Unload script
Cluster Pool Resource Monitor script
#!/bin/bash. /opt/novell/ncs/lib/ncsfuncsexit_on_error nss /poolact=AUTO_POOL_01exit_on_error ncpcon mount AUTO_VOL_012=253exit_on_error ncpcon mount AUTO_VOL_011=254exit_on_error add_secondary_ipaddress 151.155.189.131exit_on_error ncpcon bind --ncpservername=CGAO_SP3_PR6_CLUSTER_AUTO_POOL_01_SERVER --ipaddress=151.155.189.131exit 0
#!/bin/bash. /opt/novell/ncs/lib/ncsfuncsignore_error ncpcon unbind --ncpservername=CGAO_SP3_PR6_CLUSTER_AUTO_POOL_01_SERVER --ipaddress=151.155.189.131ignore_error del_secondary_ipaddress 151.155.189.131ignore_error nss /pooldeact=AUTO_POOL_01exit 0
#!/bin/bash. /opt/novell/ncs/lib/ncsfuncsexit_on_error status_fs /dev/evms/AUTO_POOL_01 /opt/novell/nss/mnt/.pools/AUTO_POOL_01 nsspoolexit_on_error status_secondary_ipaddress 151.155.189.131exit_on_error ncpcon volume AUTO_VOL_012exit_on_error ncpcon volume AUTO_VOL_011exit 0
© Novell, Inc. All rights reserved.24
Cluster Resources
– AFP– Certificate Server– CIFS– DFS– DHCP server
> On Linux Posix> On Novell Storage Services™
– DNS Server– iFolder 3.8– iPrint
– mySQL– NetStorage– QuickFinder– Samba– DST (shadow volumes)– Linux Posix volumes– NCP™ volumes– NSS
Cluster documentation for most OES2 services
© Novell, Inc. All rights reserved.25
Cluster Resource Templates
• ArkManager
• DHCP
• DNS
• Generic_FS
• Generic_IP
• iFolder
• iPrint
• MySQL
• Samba
• XenLive*
• Xen*
© Novell, Inc. All rights reserved.26
Cluster Management Agent(CMA)
• Common cluster management interface– Linux kernel module exports management files
> /admin/Novell/Cluster
• Enables iManager and cluster command line interface– Built on Novell® admin file system (adminfs)
• Direct access to XML formatted cluster state– e.g. /admin/Novell/Cluster/ResourceState.xml
• See NCS for Linux Administration Guide section 8.17– http://www.novell.com/documentation/oes2/clus_admin_lx/data/
h4hgu4hs.html
© Novell, Inc. All rights reserved.27
Handling Fault Conditions
Split brain– Server isolated from the LAN
Fatal SAN error– Server isolated from the SAN
Poison pills– panic
> /proc/sys/kernel/panic
– run eDirectory™ NCS:NodeIsolationScript
hangcheck_timer
© Novell, Inc. All rights reserved.28
Enterprise Volume Management System (EVMS)
• EVMS is an extensible host-based disk volume manager
– Extended by Novell® via shared library plugins> NetWare® segment and cluster managers
• EVMS Cluster Extension (ECE)– Manage any node's local plus shared storage– Novell Cluster Services plugin
> http://evms.sourceforge.net/cluster
• Cluster awareness similar to NetWare media manager– e.g. Provides safe online file system expansion
• Cluster / globally unique persistent device naming
Installing Novell® Cluster Services
© Novell, Inc. All rights reserved.30
Installing Novell® Cluster Services
YaST based installation– YaST2 select Open Enterprise Server
> OES Install and Configuration
Local or remote– Specify how NCS will access eDirectory™ via LDAP
New Cluster– Enter cluster FDN, IP address and optional device for SBD
Existing Cluster– Enter cluster FDN
Demo
© Novell, Inc. All rights reserved.32
Cluster Migrations?
Rolling cluster conversion– Cluster resources remain available
Three simple steps– Decommission NetWare® on server– Install Novell® Open Enterprise Server Linux onto server– Add Linux server to existing cluster
Rollback to NetWare– Convert Linux server back to NetWare
Sequential and parallel options– Convert one-by-one or many at once
© Novell, Inc. All rights reserved.33
Automatic Script Translation
During cluster conversion...– NetWare® resources failover to Linux– ncs-resourced translates scripts on-the-fly
Translates NetWare commands– e.g. cluster cvsbind add vserver 10.0.0.0
ncpcon bind –ncpservername=vserver --ipaddress=10.0.0.0
Committing the conversion cluster convert preview [all | (resource_name)] cluster convert commit
> script translation saved to eDirectory™
> update cluster revision number = 282
© Novell, Inc. All rights reserved.34
Rules for Mixed Clusters
• Online storage reconfiguration is not supported
• Can't add NetWare® nodes using deployment manager
• Resources created on Linux won't run on NetWare
© Novell, Inc. All rights reserved.35
Cluster Management Tools
cluster
– Perl-based command line
iManager snapins
– Common interface for NetWare® and Linux
ncs-emaild
– User-space cluster event email daemon
sbdutil
– Split brain detector utility
© Novell, Inc. All rights reserved.36
Tips, Tricks, and Troubleshooting
• Getting output
– grep ncs- /var/log/messages
– grep CLUSTER /var/log/messages
– sbdutil -v
– cluster stats display
© Novell, Inc. All rights reserved.37
Tips, Tricks, and Troubleshooting
• Getting MORE output– /admin/Novell/Cluster/EventLog.xml
> echo -n “trace crm on” > /proc/ncs/cluster> Can change in ldncs
– export NCSCONFIGD=1– export NCSRESOURCED=1
© Novell, Inc. All rights reserved.38
Tips, Tricks, and Troubleshooting
• SBD problems– Use sbdutil -f to search for the clusters SBD
• Keep the server from rebooting after panic– echo -n XX > /proc/sys/kernel/panic
» Where XX is number of seconds to delay before rebooting
» 0 disables automatic rebooting after a panic
– echo -n x >/proc/sys/kernel/panic_on_oops> When X is 0; disable reboot> When X is 1; enable reboot
© Novell, Inc. All rights reserved.39
More Tips and Tricks
• Resources going comatose> Review /var/opt/novell/log/ncs/<resource>.load.out
> Review /var/opt/novell/log/ncs/<resource>.unload.out
– Script can be run outside of cluster. Be careful– When lanched from resourced it may be a different environment
> e.g. path is different; therefore include full path to exe.
• Update schema prior to install – Be sure to patch!!!!– Yast2
> Open Enterprise Server section - Schema tool
• Maintenance mode– cluster maintenance on/off
> Will ignore lose of heartbeat. Use when doing lan work, upgrades, etc.
© Novell, Inc. All rights reserved.40
Particularly Hot
• Deleting nodes– Delete all the server objects– On master node execute
> cluster exec “/opt/novell/ncs/bin/ncs-configd.py -init”
• Console commands– Cluster help– Most non-configuration items can be done from command line
Unpublished Work of Novell, Inc. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary, and trade secret information of Novell, Inc. Access to this work is restricted to Novell employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of Novell, Inc. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Novell, Inc. makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for Novell products remains at the sole discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All Novell marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.