GlusterFS CTDB Integration
-
Upload
etsuji-nakai -
Category
Technology
-
view
3.794 -
download
11
Transcript of GlusterFS CTDB Integration
Red Hat K.K. All rights reserved.
GlusterFS / CTDB Integration
v1.0 2013.05.14Etsuji Nakai
Senior Solution ArchitectRed Hat K.K.
Red Hat K.K. All rights reserved. 2
$ who am i
Etsuji Nakai (@enakai00)
● Senior solution architect and cloud evangelist at Red Hat K.K.
● The author of “Professional Linux Systems” series.● Available in Japanese. Translation offering from
publishers are welcomed ;-)
Professional Linux SystemsTechnology for Next Decade
Professional Linux SystemsDeployment and Management
Professional Linux SystemsNetwork Management
Red Hat K.K. All rights reserved. 3
Contents
CTDB Overview
Why does CTDB matter?
CTDB split-brain resolution
Configuration steps for demo set-up
Summary
Red Hat K.K. All rights reserved. 4
Disclaimer
This document explains how to setup clustered Samba server using GlusterFS and CTDB with the following software components.
● Base OS, Samba, CTDB: RHEL6.4 (or any of your favorite clone)
● GlusterFS: GlusterFS 3.3.1 (Community version)
● http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/
Since this is based on the community version of GlusterFS, you cannot receive a commercial support from Red Hat for this configuration. If you need a commercial support, please consider using Red Hat Storage Server(RHS). In addition, there are different conditions for a supportable configuration with RHS. Please consult sales representatives from Red Hat for details.
Red Hat accepts no liability for the content of this document, or for the consequences of any actions taken on the basis of the information provided. Any views or opinions presented in this document are solely those of the author and do not necessarily represent those of Red Hat.
Red Hat K.K. All rights reserved.
CTDB Overview
Red Hat K.K. All rights reserved. 6
What's CTDB?
TDB = Trivial Database
● Simple backend DB for Samba, used to store user info, file lock info, etc...
CTDB = Clustered TDB
● Cluster extension of TDB, necessary for multiple Samba hosts configuration to provide the same filesystem contents.
All clients see the same contentsthrough different Samba hosts.
Samba Samba Samba
・・・
Shared Filesystem
Red Hat K.K. All rights reserved. 7
What's wrong without CTDB?
Windows file locks are not shared among Samba hosts.
● You would see the following alert when someone is opening the same file.
● Without CTDB, if others are opening the same file through a different Samba host from you, you never see that alert.
● This is because file lock info is stored in the local TDB if you don't use CTDB.
● CTDB was initially developed as a shared TDB for multiple Samba hosts to overcome this problem.
xxx.xlsWindows file locksare not shared.
Locked! Locked!
Red Hat K.K. All rights reserved. 8
CTDB interconnect(heartbeat) network
Yet another benefit of CTDB
Floating IP's can be assigned across hosts for the transparent failover.
● When one of the hosts fails, the floating IP is moved to another host.
● Mutual health checking is done through the CTDB interconnect (so called “heartbeat”) network.
● CTDB can also be used for NFS server cluster to provide the floating IP feature. (CTDB doesn't provide shared file locking for NFS though.)
Floating IP#1
・・・
Floating IP#2 Floating IP#N
Floating IP#1
・・・
Floating IP#2 Floating IP#N
Floating IP#1
Red Hat K.K. All rights reserved.
Why does CTDB matter?
Red Hat K.K. All rights reserved. 10
Access path of GlusterFS native client
The native client directly communicates to all storage nodes.
● Transparent failover is implemented on the client side. When the client detects the node failure, it accesses the replicated node.
● Floating IP is unnecessary by design for the native client.
file01 file02 file03
・・・
GlusterFS Storage Nodes
file01, file02, file03
GlusterFSNative Client
GlusterFS Volume
Native client sees the volume as a single filesystem
The real locations of files are calculated on the client side.
Red Hat K.K. All rights reserved. 11
CIFS/NFS usecase for GlusterFS
The downside of the native client is it's not available for Unix/Windows.
● You need to rely on CIFS/NFS for Unix/Windows clients.
● In that case, windows file lock sharing and floating IP feature are not in GlusterFS. It should be provided with an external tool.
CTDB is the tool for it ;-)
・・・
CIFS/NFS Client
CIFS/NFS client connects to just one specified node.
GlusterFS storage node acts as a proxy “client”.
Different clients can connect to different nodes.DNS round-robin may work for it.
Red Hat K.K. All rights reserved. 12
Network topology overview without CTDB
Storage Nodes
CIFS/NFS Clients
GlusterFS interconnect
CIFS/NFS Access segment
...
If you don't need the floating IP/Windows file lock, you can go without CTDB.
● NFS file lock sharing (DNLM) is provided by GlusterFS's internal NFS server.
Although it's not mandatory, you can separate CIFS/NFS access segment from the GlusterFS interconnect for the sake of network performance.
Samba Samba Samba Samba
glusterd glusterd glusterd glusterd
Red Hat K.K. All rights reserved. 13
Network topology overview with CTDB
Storage Nodes
CIFS/NFS Clients
GlusterFS interconnect
CIFS/NFS access segment
...
If you use CTDB with GlusterFS, you need to add an independent CTDB interconnect (heartbeat) segment for the reliable cluster.
● The reason will be explained later.
CTDB interconnect(Heartbeat)
Red Hat K.K. All rights reserved. 14
Demo - Seeing is believing!
http://www.youtube.com/watch?v=kr8ylOBCn8o
Red Hat K.K. All rights reserved.
CTDB split-brain resolution
Red Hat K.K. All rights reserved. 16
What's CTDB split-brain?
When heartbeat is cut-off from any reason (possibly network problem) while cluster nodes are still running, there must be some mechanism to choose which "island" should survive and keep running.
● Without this mechanism, the same floating IP's are assigned on both islands. This is not specific to CTDB, every cluster system in the world needs to take care of the “split-brain”.
In the case of CTDB, a master node is elected though the "lock file" on the shared filesystem. An island with the master node survives. Especially, in the case of GlusterFS, the lock file is stored on the dedicated GlusterFS volume, called "lock volume".
● The lock volume is locally mounted on each storage node. If you share the CTDB interconnect with GlusterFS interconnect, access to the lock volume is not guaranteed when the heartbeat is cut-off, resulting in an unpredictable condition.
Storage Nodes
GlusterFS interconnect
CTDB interconnect(Heartbeat)
Lock Volume
Master
The master takes an exclusive lock on the lock file.
Red Hat K.K. All rights reserved. 17
Typical volume config seen from storage node
# dfFilesystem 1Kblocks Used Available Use% Mounted on/dev/vda3 2591328 1036844 1422852 43% /tmpfs 510288 0 510288 0% /dev/shm/dev/vda1 495844 33450 436794 8% /boot/dev/mapper/vg_brickslv_lock 60736 3556 57180 6% /bricks/lock/dev/mapper/vg_brickslv_brick01 1038336 33040 1005296 4% /bricks/brick01localhost:/lockvol 121472 7168 114304 6% /gluster/locklocalhost:/vol01 2076672 66176 2010496 4% /gluster/vol01
# ls l /gluster/lock/total 2rwrr. 1 root root 294 Apr 26 15:43 ctdbrw. 1 root root 0 Apr 26 15:57 lockfilerwrr. 1 root root 52 Apr 26 15:56 nodesrwrr. 1 root root 96 Apr 26 15:04 public_addressesrwrr. 1 root root 218 Apr 26 16:31 smb.conf
Locally mounted lock volume.
Locally mounted data volume, exported with Samba.
Lock file to elect the master.
Common config files can be placed on the lock volume.
Red Hat K.K. All rights reserved. 18
What about sharing CTDB interconnect with the access segment?
No, it doesn't work.
When NIC for the access segment fails, the cluster detects the heartbeat failure and elects a master node through the lock file on the shared volume. However if the NIC failed node has the lock, it becomes the master although it doesn't serve to clients.
● In reality, CTDB event monitoring detects the NIC failure and the node becomes "CTDB UNHEALTHY" status, too.
Red Hat K.K. All rights reserved. 19
CTDB event monitoring
CTDB provides a custom event monitoring mechanism which can be used to monitor application status, NIC status, etc...
● Monitoring scripts are stored in /etc/ctdb/events.d/
● They need to implement handlers to pre-defined events.● They are called in the order of file name when some event occurs.
● Especially, "monitor" event is issued every 15seconds. If the "monitor" handler of some script exits with non-zero return code, the node becomes "UNHEALTHY", and will be rejected from the cluster.
● For example, “10.interface” checks the link status of NIC on which floating IP is assigned.
● See README for details - http://bit.ly/14KOjlC
# ls /etc/ctdb/events.d/00.ctdb 11.natgw 20.multipathd 41.httpd 61.nfstickle01.reclock 11.routing 31.clamd 50.samba 70.iscsi10.interface 13.per_ip_routing 40.vsftpd 60.nfs 91.lvs
Red Hat K.K. All rights reserved.
Configuration steps for demo set-up
Red Hat K.K. All rights reserved. 21
Step1 – Install RHEL6.4
Install RHEL6.4 on storage nodes.
● Scalable File System Add-On is required for XFS.
● Resilient Storage Add-On is required for CTDB packages.
Configure public key ssh authentication between nodes.
● This is for an administrative purpose.
Configure network interfaces as in the configuration pages.
192.168.122.11 gluster01192.168.122.12 gluster02192.168.122.13 gluster03192.168.122.14 gluster04
192.168.2.11 gluster01c192.168.2.12 gluster02c192.168.2.13 gluster03c192.168.2.14 gluster04c
192.168.1.11 gluster01g192.168.1.12 gluster02g192.168.1.13 gluster03g192.168.1.14 gluster04g
/etc/hosts
NFS/CIFS Access Segment
CTDB Interconnect
GlusterFS Interconnect
Red Hat K.K. All rights reserved. 22
Step1 – Install RHEL6.4
Configure iptables on all nodes
*filter:INPUT ACCEPT [0:0]:FORWARD ACCEPT [0:0]:OUTPUT ACCEPT [0:0]A INPUT m state state ESTABLISHED,RELATED j ACCEPTA INPUT p icmp j ACCEPTA INPUT i lo j ACCEPTA INPUT m state state NEW m tcp p tcp dport 22 j ACCEPTA INPUT m state state NEW m tcp p tcp dport 111 j ACCEPTA INPUT m state state NEW m tcp p tcp dport 139 j ACCEPTA INPUT m state state NEW m tcp p tcp dport 445 j ACCEPTA INPUT m state state NEW m tcp p tcp dport 24007:24050 j ACCEPTA INPUT m state state NEW m tcp p tcp dport 38465:38468 j ACCEPTA INPUT m state state NEW m tcp p tcp dport 4379 j ACCEPTA INPUT j REJECT rejectwith icmphostprohibitedA FORWARD j REJECT rejectwith icmphostprohibitedCOMMIT
/etc/sysconfig/iptables
# vi /etc/sysconfig/iptables# service iptables restart
CTDB
CIFSportmap
NFS/NLMBricksCIFS
Red Hat K.K. All rights reserved. 23
Step2 – Prepare bricks
Create and mount brick directories on all nodes.
# pvcreate /dev/vdb# vgcreate vg_bricks /dev/vdb# lvcreate n lv_lock L 64M vg_bricks# lvcreate n lv_brick01 L 1G vg_bricks
# yum install y xfsprogs# mkfs.xfs i size=512 /dev/vg_bricks/lv_lock # vi mkfs.xfs i size=512 /dev/vg_bricks/lv_brick01
# echo '/dev/vg_bricks/lv_lock /bricks/lock xfs defaults 0 0' >> /etc/fstab# echo '/dev/vg_bricks/lv_brick01 /bricks/brick01 xfs defaults 0 0' >> /etc/fstab# mkdir p /bricks/lock# mkdir p /bricks/brick01# mount /bricks/lock# mount /bricksr/brick01
/dev/vdb
lv_lock
lv_brick01
vg_bricks
Mount on /bricks/lock, used for lock volume.
Mount on /bricks/brick01, used for data volume.
Red Hat K.K. All rights reserved. 24
Step3 – Install GlusterFS and create volumes
Install GlusterFS packages on all nodes# wget O /etc/yum.repos.d/glusterfsepel.repo \ http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/RHEL/glusterfsepel.repo# yum install y rpcbind glusterfsserver# chkconfig rpcbind on# service rpcbind start# service glusterd start
# gluster peer probe gluster02g# gluster peer probe gluster03g# gluster peer probe gluster04g
# gluster vol create lockvol replica 2 \ gluster01g:/bricks/lock gluster02g:/bricks/lock \ gluster03g:/bricks/lock gluster04g:/bricks/lock# gluster vol start lockvol
# gluster vol create vol01 replica 2 \ gluster01g:/bricks/brick01 gluster02g:/bricks/brick01 \ gluster03g:/bricks/brick01 gluster04g:/bricks/brick01# gluster vol start vol01
Do not auto start glusterdwith chkconfig.
Need to specifyGlusterFS interconnect NICs.
Configure cluster and create volumes from gluster01
Red Hat K.K. All rights reserved. 25
Step4 – Install and configure Samba/CTDB
● Create the following config files on the shared volume.
# yum install y samba sambaclient ctdb
# mkdir p /gluster/lock# mount t glusterfs localhost:/lockvol /gluster/lock
Do not auto start smband ctdb with chkconfig.
CTDB_PUBLIC_ADDRESSES=/gluster/lock/public_addressesCTDB_NODES=/etc/ctdb/nodes# Only when using Samba. Unnecessary for NFS.CTDB_MANAGES_SAMBA=yes# some tunablesCTDB_SET_DeterministicIPs=1CTDB_SET_RecoveryBanPeriod=120CTDB_SET_KeepaliveInterval=5CTDB_SET_KeepaliveLimit=5CTDB_SET_MonitorInterval=15
/gluster/lock/ctdb
# yum install y rpcbind nfsutils# chkconfig rpcbind on# service rpcbind start
Install Samba/CTDB packages on all nodes
If you use NFS, install the following packages, too.
Configure CTDB and Samba only on gluster01
Red Hat K.K. All rights reserved. 26
Step4 – Install and configure Samba/CTDB
192.168.2.11192.168.2.12192.168.2.13192.168.2.14
/gluster/lock/nodes
192.168.122.201/24 eth0192.168.122.202/24 eth0192.168.122.203/24 eth0192.168.122.204/24 eth0
/gluster/lock/public_addresses
[global]workgroup = MYGROUPserver string = Samba Server Version %vclustering = yessecurity = userpassdb backend = tdbsam
[share]comment = Shared Directoriespath = /gluster/vol01browseable = yeswritable = yes
/gluster/lock/smb.conf
CTDB cluster nodes. Need to specify CTDB interconnect NICs.
Floating IP list.
Samba config.Need to specify “clustering = yes”
Red Hat K.K. All rights reserved. 27
Step4 – Install and configure Samba/CTDB
Set SELinux permissive for smbd_t on all nodes due to the non-standard smb.conf location.
● We'd better set an appropriate seculity context, but there's an open issue for using chcon with GlusterFS.
● https://bugzilla.redhat.com/show_bug.cgi?id=910380
# mv /etc/sysconfig/ctdb /etc/sysconfig/ctdb.orig# mv /etc/samba/smb.conf /etc/samba/smb.conf.orig# ln s /gluster/lock/ctdb /etc/sysconfig/ctdb# ln s /gluster/lock/nodes /etc/ctdb/nodes# ln s /gluster/lock/public_addresses /etc/ctdb/public_addresses# ln s /gluster/lock/smb.conf /etc/samba/smb.conf
# yum install y policycoreutilspython# semanage permissive a smbd_t
Create symlink to config files on all nodes.
Red Hat K.K. All rights reserved. 28
Step4 – Install and configure Samba/CTDB
Create the following script for start/stop services
#!/bin/sh
function runcmd { echo exec on all nodes: $@ ssh gluster01 $@ & ssh gluster02 $@ & ssh gluster03 $@ & ssh gluster04 $@ & wait}
case $1 in start) runcmd service glusterd start sleep 1 runcmd mkdir p /gluster/lock runcmd mount t glusterfs localhost:/lockvol /gluster/lock runcmd mkdir p /gluster/vol01 runcmd mount t glusterfs localhost:/vol01 /gluster/vol01 runcmd service ctdb start ;;
stop) runcmd service ctdb stop runcmd umount /gluster/lock runcmd umount /gluster/vol01 runcmd service glusterd stop Runcmd pkill glusterfs ;;esac
ctdb_manage.sh
Red Hat K.K. All rights reserved. 29
Step5 – Start services
Now you can start/stop services.
● After a few moments, ctdb status becomes “OK” for all nodes.
● And floating IP's are configured on each node.
# ./ctdb_manage.sh start
# ctdb statusNumber of nodes:4pnn:0 192.168.2.11 OK (THIS NODE)pnn:1 192.168.2.12 OKpnn:2 192.168.2.13 OKpnn:3 192.168.2.14 OKGeneration:1489978381Size:4hash:0 lmaster:0hash:1 lmaster:1hash:2 lmaster:2hash:3 lmaster:3Recovery mode:NORMAL (0)Recovery master:1
# ctdb ipPublic IPs on node 0192.168.122.201 node[3] active[] available[eth0] configured[eth0]192.168.122.202 node[2] active[] available[eth0] configured[eth0]192.168.122.203 node[1] active[] available[eth0] configured[eth0]192.168.122.204 node[0] active[eth0] available[eth0] configured[eth0]
Red Hat K.K. All rights reserved. 30
Step5 – Start services
Set samba password and check shared directories via one of floating IP's.
# pdbedit a u rootnew password:retype new password:
# smbclient L 192.168.122.201 U rootEnter root's password: Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9151.el6]
Sharename Type Comment share Disk Shared DirectoriesIPC$ IPC IPC Service (Samba Server Version 3.6.9151.el6)
Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9151.el6]
Server Comment
Workgroup Master
Password DB is sharedby all hosts in the cluster.
Red Hat K.K. All rights reserved. 31
Configuration hints
To specify the GlusterFS interconnect segment, "gluster peer probe" should be done for the IP addresses on that segment.
To specify the CTDB interconnect segment, IP addresses on that segment should be specified in "/gluster/lock/nodes" (symlink from "/etc/ctdb/nodes").
To specify the NFS/CIFS access segment, NIC names on that segment should be specified in "/gluster/lock/public_addresses" (symlink from "/etc/ctdb/public_addresses") associated with floating IP's.
To restrict NFS accesses for a volume, you can use “nfs.rpc-auth-allow” and “nfs.rpc-auth-reject” volume options. (reject supersedes allow.)
The following tunables in "/gluster/lock/ctdb" (symlink from "/etc/sysconfig/ctdb") may be useful for adjusting the CTDB failover timings. See the ctdbd man page for details.
● CTDB_SET_DeterministicIPs=1
● CTDB_SET_RecoveryBanPeriod=300
● CTDB_SET_KeepaliveInterval=5
● CTDB_SET_KeepaliveLimit=5
● CTDB_SET_MonitorInterval=15
Red Hat K.K. All rights reserved.
Summary
Red Hat K.K. All rights reserved. 33
Summary
CTDB is the tool well combined with CIFS/NFS usecase for GlusterFS.
Network design is crucial to realize the reliable cluster, not only for CTDB but also for every cluster in the world ;-)
Enjoy!
And one important fine print....
● Samba is not well tested on the large scale GlusterFS cluster. The use of CIFS as a primary access protocol on Red Hat Storage Server 2.0 is not officially supported by Red Hat. This will be improved in the future versions.
Red Hat K.K. All rights reserved.
WE CAN DO MOREWHEN WE WORK TOGETHER
THE OPEN SOURCE WAY