Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with...
Transcript of Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with...
![Page 1: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/1.jpg)
Reliable Replicated File Systems with GlusterFS
John Sellens
@jsellens
USENIX LISA 28, 2014
November 14, 2014
Notes PDF at http://www.syonex.com/notes/
![Page 2: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/2.jpg)
Reliable Replicated File Systems with GlusterFS
Contents
Preamble and Introduction 2
Setting Up GlusterFS Servers 8
Mounting on Clients 20
Managing, Monitoring, Fixing 25
Wrap Up 33
c©2014 John Sellens USENIX LISA 28, 2014 1
![Page 3: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/3.jpg)
Reliable Replicated File Systems with GlusterFS Preamble and Introduction
Preamble and Introduction
c©2014 John Sellens USENIX LISA 28, 2014 2
![Page 4: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/4.jpg)
Reliable Replicated File Systems with GlusterFS Preamble and Introduction
Overview
• Network Attached Storage is handy to have in many cases
– And sometimes we have limited budgets
• GlusterFS provides a scalable NAS system
– On “normal” systems and hardware
• An introduction to GlusterFS and its uses
• And how to implement and maintain a GlusterFS file service
c©2014 John Sellens USENIX LISA 28, 2014 3
Notes:
• http://www.gluster.org/
• We’re not going to cover everything in this Mini Tutorial session
– But it should get you started
– In time for mid-afternoon break!
• Both USENIX and I will very much appreciate your feedback — please fill
out the evaluation form
![Page 5: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/5.jpg)
Reliable Replicated File Systems with GlusterFS Preamble and Introduction
Solving a Problem
• Needed to replace a small but reliable network file service
– Expanding the existing service wasn’t going to work
• Wanted something comprehensive but comprehensible
• Needed Posix filesystem semantics, and NFS
• Wanted something that would let me sleep at night
• GlusterFS seemed a good fit
– Supported by RedHat, NFS, CIFS, . . .
– User space, on top of regular filesystem
c©2014 John Sellens USENIX LISA 28, 2014 4
Notes:
• I have a small hosting infrastructure that I like to implement reliably
• Red Hat Storage Server is a supported GlusterFS implementation
![Page 6: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/6.jpg)
Reliable Replicated File Systems with GlusterFS Preamble and Introduction
Alternatives I Was Less Enthused About
• Block replication – DRBD, HAST
– Not transparent – hard to look and confirm consistency
– Hard to expand, Limited to two server nodes
• Object stores – Ceph, Hadoop, etc.
– No need for shared block devices for KVMs, etc
– Not always Posix and NFS
• Others – MooseFS, Lustre, etc.
– Some needed separate meta-data server(s)
– Some had single master servers
c©2014 John Sellens USENIX LISA 28, 2014 5
Notes:
• I was running HAST on FreeBSD, and tried (and failed) to expand it
– Partly due to old hardware I was using
![Page 7: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/7.jpg)
Reliable Replicated File Systems with GlusterFS Preamble and Introduction
Why I Like GlusterFS
• Can run on just two servers – all functions on both
• Sits on top of a standard filesystem (ext3, xfs)
– Files in GlusterFS volumes are visible as normal files
– So if everything fails very badly, I can likely copy the files out
– Easy to compare replicated copies of files for consistency
• Fits nicely with CentOS which I tend to use
– NFS server support means that my existing FreeBSD boxes
would work “just fine”
c©2014 John Sellens USENIX LISA 28, 2014 6
Notes:
• I like to be both simple-minded and paranoid
– So being able to check and copy if need be was appealing
![Page 8: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/8.jpg)
Reliable Replicated File Systems with GlusterFS Preamble and Introduction
Hardware – Don’t Use Your Old Junk
• I have some old 32-bit machines
– Bad, bad idea
• These days, code doesn’t seem to be tested well on 32 bit
• GlusterFS inodes (or equivalent) are 64 bits
– Which doesn’t sit well with 32 bit NFS clients
• In theory 32 bit should work, in practice it’s at least annoying
• 26 Yes! but 25 No!
c©2014 John Sellens USENIX LISA 28, 2014 7
Notes:
• This is not just GlusterFS related
• My old 32 bit FreeBSD HAST systems started misbehaving when I tried
to update and expand
![Page 9: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/9.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
Setting Up GlusterFS Servers
c©2014 John Sellens USENIX LISA 28, 2014 8
![Page 10: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/10.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
Set Up Some Servers
• Ordinary servers with ordinary storage
– All the “normal” speed/reliability questions
– I’ll suggest CentOS 7 (or 6)
• Leave unallocated space to use for GlusterFS
• Separate storage network?
– Traffic and security
• Dedicated servers for storage?
– Likely want storage servers to be static and dedicated
c©2014 John Sellens USENIX LISA 28, 2014 9
Notes:
• Since RedHat does the development, it’s pretty likely that GlusterFS will
work well on CentOS
– Should work on Fedora and Debian as well, if you’re that way in-
clined
• GlusterFS 3.6 likely to have FreeBSD and MacOS support (I hope)
https://forums.freebsd.org/viewtopic.php?t=46923
• And of course, it should go without saying, but make sure NTP and DNS
and networking are working properly.
![Page 11: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/11.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
RAID on the Servers?
• GlusterFS hardware failures “should be” non-disruptive
• RAID should provide better I/O performance
– Especially hardware RAID with cache
• Re-building/silvering an entire server for a disk failure is boring
– Overall storage performance will suffer in the meantime
– A second failure might be a big problem
• Small general purpose deployment?
– Use good servers and suitable RAID
• Other situations may suit non-RAID
– Lots of servers, more than 2 replicas, etc.
c©2014 John Sellens USENIX LISA 28, 2014 10
Notes:
• Configuration management should mean that a server rebuild is “easy”
– Your mileage may vary
• Remember that a failed disk means lots of I/O and time to repair, and
you’re vulnerable to other failures while rebuilding
![Page 12: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/12.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
Networks and Security
• GlusterFS has limited security and access controls
– Assumption: all servers and networks are friendly
• A separate storage network may be prudent
– glusterfs mounts need to reach gluster peer addresses
– NFS mounts by default are available on all interfaces
• Generally you want to isolate GlusterFS traffic if you can
– Firewalls, subnets, iptables, . . .
c©2014 John Sellens USENIX LISA 28, 2014 11
Notes:
• I have very limited experience trying to contain GlusterFS
• If you’re using only glusterfs mounts an isolated network would be useful
– For performance and “containment”
![Page 13: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/13.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
IPs and Addressing
• Generally you will want fixed and floating addresses
• GlusterFS peers need to talk to each other
• glusterfs mounts need to find one peer then talk to the others
– First peer provides details of the volumes and peers
• NFS and CIFS mounts want floating service addresses
– Active/passive mounts need just one
– Active/active mounts need more
• CTDB is recommended for IP address manipulation
c©2014 John Sellens USENIX LISA 28, 2014 12
Notes:
• With two servers, I have 6 addresses total
– Management addresses
– Storage network peer addresses
– Floating addresses that are normally one per server
• More on CTDB later on slide 19
![Page 14: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/14.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
Installing GlusterFS
• Use the standard gluster.org repositories
– See notes
• Install withyum install glusterfs-serverservice glusterd startchkconfig glusterd on
• orapt-get install glusterfs-server
• Current version is 3.6.1
c©2014 John Sellens USENIX LISA 28, 2014 13
Notes:
• Versions – use 3.5.x
– I seemed to have less reliable/stable behaviour with 3.4
• Everything is under the download link at
http://download.gluster.org/pub/gluster/glusterfs/LATEST/
• CentOS:
wget -P /etc/yum.repos.d \http://download.gluster.org/pub/gluster/ \glusterfs/LATEST/CentOS/glusterfs-epel.repo
• Debian – see
http://download.gluster.org/pub/gluster/ \glusterfs/3.5/LATEST/Debian/wheezy/README
![Page 15: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/15.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
A Little Terminology
• A set of GlusterFS servers is a Trusted Storage Pool
– Members of a pool are peers of each other
• A GlusterFS filesystem is a Volume
• Volumes are composed of storage Bricks
• Volumes can be three types, and most combinations
– Distributed – different files are on different bricks
– Striped – (very large) files are split across bricks
– Replicated – two or more copies on different bricks
• Distributed Replicated – more servers than replicas
• A Sub-Volume is a replica set within a Volume
c©2014 John Sellens USENIX LISA 28, 2014 14
Notes:
• Distributed provides no redundancy
– Though you might have RAID disks on servers
– But you’re still in trouble if a server goes down
![Page 16: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/16.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
Set Up the Peers
• All servers in a pool need to know each othernode1# gluster peer probe node2
• Doesn’t hurt to do this (I think it’s optional)node2# gluster peer probe node1
• And make sure they are talking:node1# gluster peer status
– That only lists the other peer(s)
• List the servers in a poolnode1# gluster pool list
c©2014 John Sellens USENIX LISA 28, 2014 15
![Page 17: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/17.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
Set Us Up the Brick
• A brick is just a directory in an OS filesystem
• One brick per filesystem
– Disk storage dedicated to a volume
– /data/gluster/volname/brickN/brick
• Could have multiple bricks in a filesystem
– Disk storage shared between volumes
– /data/gluster/disk1/volname/brickN
• Don’t want a brick to be a filesystem mount point
– Big problems if underlying storage not mounted
• Multiple volumes? Use the latter for better utilization
c©2014 John Sellens USENIX LISA 28, 2014 16
Notes:
• XFS is the suggested filesystem to use
• A suggested naming convention for bricks:
http://www.gluster.org/community/documentation/
index.php/HowTos:Brick_naming_conventions
• With disk mount points, and multiple bricks per OS filesystem, one Glus-
terFS volume can use up space and “fill up” other volumes
• With multiple bricks per OS filesystem, it’s harder to know which gluster
volume is using up space – df shows the same for all volumes
• Depends on your use case
– One big volume or multiple volumes for different purposes
– Will volumes shrink, or only grow?
– Is it convenient to have multiple OS disk partitions?
![Page 18: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/18.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
Sizing Up a Brick
• How big should a brick (partition) be?
• One brick using all space on a server is easy to create
– But harder to move or replace if needed
• Consider using bricks of manageable size e.g. 500GB, 1TB
– Will likely be easier to migrate/replace if needed
– Of course, if you have a lot of storage, a zillion bricks might
be difficult
• Keep more space free than is on any one server?
c©2014 John Sellens USENIX LISA 28, 2014 17
Notes:
• I think there are some subtleties here that aren’t quite so obvious
• And might be worth a thought or two before you commit yourself to a
storage layout that will be hard to change
![Page 19: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/19.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
Create a Volume
• Volume creation is straightforwardnode1# gluster volume create vol1 replica 2 \node1:/data/glusterfs/disk1/vol1/brick1 \node2:/data/glusterfs/disk1/vol1/brick1 \node1:/data/glusterfs/disk2/vol1/brick2 \node2:/data/glusterfs/disk2/vol1/brick2
node1# gluster volume startnode1# gluster volume info vol1node1# mount -t glusterfs localhost:/vol1 /mntnode1# showmount -e node2
• Replicas are across the first two bricks, and next two
• Name things sensibly now, save your brain later
c©2014 John Sellens USENIX LISA 28, 2014 18
Notes:
• Each brick will now have a .glusterfs directory
• Adding files or directories to the volume causes them to show up in the
bricks of one of the replicated pairs
• You can look, but do not touch
– Only change a volume through a mount
– Never my modifying a brick directly
• Likely best to stick with the built-in NFS server
• You can set options on a volume with
gluster volume set volname option value
• If you’re silly (like me) and have 32 bit NFS clients:
gluster volume set volname \nfs.enable-ino32 on
![Page 20: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/20.jpg)
Reliable Replicated File Systems with GlusterFS Setting Up GlusterFS Servers
IP Addresses and CTDB
• CTDB is a clustered TDB database built for Samba
• Includes IP address failover
• Set up CTDB on each node – /etc/ctdb/nodes
• Manage public IPs – /etc/ctdb/public_addresses
• Needs a shared private directory for locks, etc.
• Starts/stops Samba
• Active/active with DNS round robin
c©2014 John Sellens USENIX LISA 28, 2014 19
Notes:
• Setup is fairly easy – follow these pages
http://www.gluster.org/community/documentation/index.php/CTDBhttp://wiki.samba.org/index.php/CTDB_Setuphttp://ctdb.samba.org/
![Page 21: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/21.jpg)
Reliable Replicated File Systems with GlusterFS Mounting on Clients
Mounting on Clients
c©2014 John Sellens USENIX LISA 28, 2014 20
![Page 22: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/22.jpg)
Reliable Replicated File Systems with GlusterFS Mounting on Clients
Native Mount or NFS?
• Many small files, mostly read?
– e.g. a web server?
– Use NFS client
• Write heavy load?
– Use native gluster client
• Client not Linux?
– Use NFS client
– Or CIFS if Windows client
c©2014 John Sellens USENIX LISA 28, 2014 21
Notes:
• http://www.gluster.org/documentation/Technical_FAQ/
![Page 23: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/23.jpg)
Reliable Replicated File Systems with GlusterFS Mounting on Clients
Gluster Native Mount
• Install glusterfs-fuse or glusterfs-clientclient# mount -t glusterfs ghost:/vol1 /mnt
• Use a public/floating IP/hostname for the mount
• Gluster client gets volume info
• Then uses the peer names used when adding bricks
– So a gluster client must have access to the storage network
• Client handles if nodes disappear
c©2014 John Sellens USENIX LISA 28, 2014 22
Notes:
• mount.glusterfs(8) does not mention all the mount options
• In particular, the option backupvolfile-server=node2might be
useful, if you don’t use public/floating IPs
![Page 24: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/24.jpg)
Reliable Replicated File Systems with GlusterFS Mounting on Clients
NFS Mount
• Like any other NFS mountclient# mount glusterhost:/vol1 /mnt
• Use a public/floating IP/hostname for the mount
• NFS talks to that IP/hostname
– So an NFS client need not have access to the storage
network
• NFS must use TCP, not UDP
• Failover should be handled by CTDB IP switch
– But a planned outage might pre-plan and adjust the mount
c©2014 John Sellens USENIX LISA 28, 2014 23
![Page 25: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/25.jpg)
Reliable Replicated File Systems with GlusterFS Mounting on Clients
CIFS Mounts
• Similar to NFS mounts
– Use public/floating IP’s name
• Need to configure Samba as appropriate on the serversclustering = yesidmap backend = tdb2private dir = /gluster/shared/lock
• CTDB will start/stop Samba
c©2014 John Sellens USENIX LISA 28, 2014 24
![Page 26: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/26.jpg)
Reliable Replicated File Systems with GlusterFS Managing, Monitoring, Fixing
Managing, Monitoring, Fixing
c©2014 John Sellens USENIX LISA 28, 2014 25
![Page 27: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/27.jpg)
Reliable Replicated File Systems with GlusterFS Managing, Monitoring, Fixing
Ongoing Management
• When all is going well, there’s not much to do
• Monitor filespace usage and other normal things
• Gluster monitoring – check for
– Processes running
– All bricks connected
– Free space
– Volume heal info
• Lots of logs in /var/log/glusterfs
• Note well: GlusterFS, like RAID, is not a backup
c©2014 John Sellens USENIX LISA 28, 2014 26
Notes:
• I use check_glusterfs by Mark Ruys, [email protected]
http://exchange.nagios.org/directory/Plugins/
System-Metrics/File-System/GlusterFS-checks/details
• I run it as root via SNMP
• Unsynced entries (from heal info) are normally 0, but when busy there
can be transitory unsynced entries
– My gluster volumes are not heavy write
– You may see more unsynced
![Page 28: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/28.jpg)
Reliable Replicated File Systems with GlusterFS Managing, Monitoring, Fixing
Command Line Stuff
• The gluster command is the primary toolnode1# gluster volume info vol1node1# gluster volume log rotate vol1node1# gluster volume status vol1node1# gluster volume heal vol1 infonode1# gluster help
• The volume heal subcommands provide info on consistency
– And can trigger a heal action
c©2014 John Sellens USENIX LISA 28, 2014 27
![Page 29: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/29.jpg)
Reliable Replicated File Systems with GlusterFS Managing, Monitoring, Fixing
Adding More Space
• Expanding the underlying filesystem provides more space
– But likely want to keep things consistent across servers
• And of course you can add bricksnode1# gluster volume add-brick vol1 \node1:/path/brick2 node2:/path/brick2
node1# gluster volume rebalance vol1 start
• Note that you must add bricks in multiple of replica count
– Each new pair is a replica pair, just like for create
• Increase replica count by setting new count and adding enough
bricks
c©2014 John Sellens USENIX LISA 28, 2014 28
Notes:
• If you have a replica with bricks of different sizes, you may be wasting
space
• You don’t have to add-brick on a particular node, any server that
knows about the volume should likely work fine
– I’m just a creature of habit
• But you can’t reduce the replica count . . .
– At least, I don’t think you can reduce the replica count
• A rebalance could be useful if file deletions have left bricks (sub-volumes)
unbalanced
![Page 30: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/30.jpg)
Reliable Replicated File Systems with GlusterFS Managing, Monitoring, Fixing
Removing Space
• Remove bricks with start, status, commitnode1# gluster volume remove-brick vol1 \node1:/path/brick1 node2:/path/brick1 start
• Replace start with status for progress
• When complete, run commit
• For replicated volumes, you have to remove all the bricks of a
sub-volume at the same time
c©2014 John Sellens USENIX LISA 28, 2014 29
Notes:
• This of course is never needed, because space needs never decrease
![Page 31: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/31.jpg)
Reliable Replicated File Systems with GlusterFS Managing, Monitoring, Fixing
Replacing or Moving a Brick
• Move a brick with replace-brick
node1# gluster volume replace-brick vol1 \node1:/path/brick1 node2:/path/brick1 start
• Start, status, commit like remove-brick
• If you’re adding a third server to a pool with replicas
– Should be able to shuffle bricks to the desired result
– Or, if there’s extra space, add and remove bricks
• If a brick is dead, you may need commit force
– With RAID, this is less of a problem . . .
c©2014 John Sellens USENIX LISA 28, 2014 30
Notes:
• The Red Hat manual suggests that this is much more complicated
• This is a nice description of adding a third server
http://joejulian.name/blog/
how-to-expand-glusterfs-replicated-clusters-by-one-server/
![Page 32: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/32.jpg)
Reliable Replicated File Systems with GlusterFS Managing, Monitoring, Fixing
Taking a Node Out of Service
• In theory it should be simplenode1# ctdb disablenode1# service gluster stop
• In practice, you might want to manually move NFS clients first
• Clients with native gluster mounts should be “just fine”
• On restart, volumes should “self-heal”
c©2014 John Sellens USENIX LISA 28, 2014 31
Notes:
• I’m paranoid about time for an NFS client to notice a new server
![Page 33: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/33.jpg)
Reliable Replicated File Systems with GlusterFS Managing, Monitoring, Fixing
Split Brain Problems
• With multiple servers (more than 2), useful to setnode1# gluster volume set all \cluster.server-quorum-ratio 51%
node1# gluster volume set VOLNAME \cluster.server-quorum-type server
• With two nodes, could add a 3rd “dummy” node with no storage
• If heal info reports unsync’d entriesnode1# gluster volume heal VOLNAME
• Sometimes a client-side “stat” of affected file can fix things
– Or a copy and move back
c©2014 John Sellens USENIX LISA 28, 2014 32
Notes:
• Default quorum ratio is more than 50
– Or so the docs seem to say
• The Red Hat Storage Administration Guide has a nice discussion
– And lots of details on recovery
• Fixing split brain:
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md
• Remember: do not modify bricks directly!
![Page 34: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/34.jpg)
Reliable Replicated File Systems with GlusterFS Wrap Up
Wrap Up
c©2014 John Sellens USENIX LISA 28, 2014 33
![Page 35: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/35.jpg)
Reliable Replicated File Systems with GlusterFS Wrap Up
We Haven’t Talked About
• GlusterFS has many features and options
• Snapshots
• Geo-Replication
• Object storage – OpenStack Storage (Swift)
• Quotas
c©2014 John Sellens USENIX LISA 28, 2014 34
Notes:
• We’ve tried to hit the key areas to get started with Gluster
• We didn’t cover everything
• Hopefully you’ve learned some of the more interesting aspects
• And can apply them in your own implementations
![Page 36: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/36.jpg)
Reliable Replicated File Systems with GlusterFS Wrap Up
Where to Get Gluster Help
• gluster.org web site has a lot of links
– Mailing lists, IRC, . . .
• Quick Start Guide
• Red Hat Storage documentation is pretty good
• HowTo page
• GLusterFS Administrator Guide
c©2014 John Sellens USENIX LISA 28, 2014 35
Notes:
• GlusterFS documentation is currently a bit disjointed
• http://www.gluster.org/
• http://www.gluster.org/documentation/quickstart/index.html
• Administrator Guide is currently a link to a github repository of markdown
files
• https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/
• http://www.gluster.org/documentation/howto/HowTo/
![Page 37: Reliable Replicated File Systems with · PDF fileReliable Replicated File Systems with GlusterFS Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients](https://reader034.fdocuments.us/reader034/viewer/2022051718/5a7164cf7f8b9ab6538cb9b0/html5/thumbnails/37.jpg)
Reliable Replicated File Systems with GlusterFS Wrap Up
And Finally!
• Please take the time to fill out the tutorial evaluations
– The tutorial evaluations help USENIX offer the best possible
tutorial programs
– Comments, suggestions, criticisms gratefully accepted
– All evaluations are carefully reviewed, by USENIX and by the
presenter (me!)
• Feel free to contact me directly if you have any unanswered
questions, either now, or later: [email protected]
• Questions? Comments?
• Thank you for attending!
c©2014 John Sellens USENIX LISA 28, 2014 36
Notes:
• Thank you for taking this tutorial, and I hope that it was (and will be)
informative and useful for you.
• I would be very interested in your feedback, positive or negative, and sug-
gestions for additional things to include in future versions of this tutorial,
on the comment form, here at the conference, or later by email.