Gluster fs architecture_&_roadmap_atin_punemeetup_2015
-
Upload
atin-mukherjee -
Category
Presentations & Public Speaking
-
view
243 -
download
1
Transcript of Gluster fs architecture_&_roadmap_atin_punemeetup_2015
07/02/2015 GlusterFS Meetup
Agenda
● Introduction in the Gluster community
● Current stable releases
● What is GlusterFS?
● Architecture
● GlusterFS 3.6 Features
● GlusterFS 3.7 Features planned
● GlusterFS 4.0 and beyond
● Q&A
07/02/2015 GlusterFS Meetup
Introduction in Gluster community
● Different roles
● Users, testers, supporters, developers, editors, ...
● Different organizations
● Products based on / containing GlusterFS
● Service, consulting and support
● Integration in other (Open Source) projects
07/02/2015 GlusterFS Meetup
Introduction in Gluster community
● Regular IRC meetings
● Discussions and support over mailinglists and on IRC
● Providing packages (RPMs, DEBs)
● Work with different Linux and BSD distributions to improve portability and availability
● Infrastructure hosting for Gluster related projects
● Gerrit and Jenkins for code review and testing
● Gluster Forge for git/wiki hosting of projects
07/02/2015 GlusterFS Meetup
Introduction in Gluster community
● Some numbers from 2014
● Approx. 175 IRC participants
● Two main mailinglists reach ~600 emails/month
● 100/60 active users/devs posting to the lists
● Around 2200 patches merged in the master branch
● Patches of ~90 developers got included
07/02/2015 GlusterFS Meetup
Current stable releases
● Maintenance of three minor releases
● 3.6, 3.5 and 3.4
● Bugfixes only, non-intrusive features on high demand
● Patches get backported to fix reported bugs
07/02/2015 GlusterFS Meetup
Integration with glusterfs
● More projects built and enhanced around the GlusterFS ecosystem – dockit, gluster-deploy, gluster-nagios, glusterfsiostat, puppet-gluster to name a few.
● Improved integration with broader ecosystem projects like Ambari, NFS-Ganesha, OpenStack, oVirt and Samba.
07/02/2015 GlusterFS Meetup
What is GlusterFS?
● A general purpose scale-out distributed file system.
● Aggregates storage exports over network interconnect to
provide a single unified namespace.
● Filesystem is stackable and completely in userspace.
● Layered on disk file systems that support extended
attributes.
07/02/2015 GlusterFS Meetup
Typical GlusterFS Deployment
Global namespace
Scale-out storage
building blocks
Supports
thousands of clients
Access using
GlusterFS native,
NFS, SMB and HTTP
protocols
Linear performance
scaling
07/02/2015 GlusterFS Meetup
GlusterFS Architecture – Foundations
● Software only, runs on commodity hardware
● No external metadata servers
● Scale-out with Elasticity
● Extensible and modular
● Deployment agnostic
● Unified access
● Largely POSIX compliant
07/02/2015 GlusterFS Meetup
GlusterFS concepts – Trusted Storage Pool
● Trusted Storage Pool (cluster) is a collection of storage servers.
● Trusted Storage Pool is formed by invitation – “probe” a new
member from the cluster and not vice versa.
● Logical partition for all data and management operations.
● Membership information used for determining quorum.
● Members can be dynamically added and removed from the
pool.
07/02/2015 GlusterFS Meetup
GlusterFS concepts – Trusted Storage Pool
Node2
Probe
Probe accepted
Node 1 and Node 2 are peers in a trusted storage pool
Node2Node1
Node1
07/02/2015 GlusterFS Meetup
GlusterFS concepts – Trusted Storage Pool
Node1 Node2 Node3Node2Node1 Trusted Storage Pool
Node3Node2Node1
Detach
07/02/2015 GlusterFS Meetup
A brick is the combination of a node and an export directory – for e.g. hostname:/dir
Each brick inherits limits of the underlying filesystem
No limit on the number bricks per node
Ideally, each brick in a cluster should be of the same size
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks
07/02/2015 GlusterFS Meetup
GlusterFS concepts - Volumes
● A volume is a logical collection of bricks.
● Volume is identified by an administrator provided name.
● Volume is a mountable entity and the volume name is
provided at the time of mounting.
– mount -t glusterfs server1:/<volname> /my/mnt/point
● Bricks from the same node can be part of different
volumes
07/02/2015 GlusterFS Meetup
GlusterFS concepts - Volumes
Node2Node1 Node3
/export/brick1
/export/brick2
/export/brick1
/export/brick2
/export/brick1
/export/brick2
music
Videos
07/02/2015 GlusterFS Meetup
Volume Types
➢Type of a volume is specified at the time of volume
creation
➢ Volume type determines how and where data is placed
➢ Following volume types are supported in glusterfs:
a) Distribute
b) Stripe
c) Replication
d) Distributed Replicate
e) Striped Replicate➢ f) Distributed Striped Replicate➢ g) Dispersed➢ h) Distributed dispersed
07/02/2015 GlusterFS Meetup
GlusterFS 3.6
● Better SSL support
● Heterogenous bricks
● Erasure coding
● Meta translator
● Volume snapshots and user-servicability
07/02/2015 GlusterFS Meetup
Better SSL support
● SSL support for management plane
● SSL for authorizing and authenticating access to volumes.
● Paves way for fine-grained access to volumes in the storage pool*.
● Makes self-service style management at a volume-level possible*.
* - Not implemented yet; technically possible
07/02/2015 GlusterFS Meetup
Heterogenous bricks
● Allows distribution of data to account for bricks of different sizes
● Uniform distribution can potentially penalise smaller bricks with more allocations
● Changes were made to the DHT (distribute) translator
07/02/2015 GlusterFS Meetup
Erasure Coding
● Provides resilience to brick failures using erasure codes
● Configurable redundancy and fault tolerance
● Reduces disk space consumption in comparison to replicated volumes
07/02/2015 GlusterFS Meetup
Meta xlator
● Provides a /proc like interface to GlusterFS runtime
● Allows users to inspect internals of translators present in GlusterFS runtime 'stack'.
● For e.g, cat /mnt/glusterfs/.meta/version
to fetch the version of glusterfs mount process
● tree /mnt/glusterfs/.meta/graphs/active
07/02/2015 GlusterFS Meetup
AFRv2
● Refactored AFR implementation
● Improvements in healing process' performance
● Paves way for better introspection into thehealing process. More on 3.7
07/02/2015 GlusterFS Meetup
RDMA Support
● Minor fixes that made RDMA transport more
usable for GlusterFS volumes
07/02/2015 GlusterFS Meetup
GlusterFS 3.7
● Small file performance
● Data classification
● Bitrot detection
● Better OpenStack integration – for e.g Manila
07/02/2015 GlusterFS Meetup
Small file performance
● Multithreaded epoll – Transport layer
● Caching stat and xattr calls on small files – Storage layer
● Migrate .glusterfs to SSDs – Physical layer
● Batching of RPCs per file access
07/02/2015 GlusterFS Meetup
Data Classification
● Mapping file characteristics to subvolume characteristics
● File characteristics – size, age, access rate, type (extension)
● Subvolume characteristics – physical location, storage type (SSD, disk), encoding method (deduplicated, erasure coded)
● User provided mappings via 'tags'
● Implementation using 'DHT over DHT' pattern
07/02/2015 GlusterFS Meetup
BitRot detection
● Silent disk corruption
● Useful for archival or WORM workloads
● Lazy, policybased and incremental checksum computation
07/02/2015 GlusterFS Meetup
Better Openstack integration
● Manila – File share as a service
● Cinder – Block storage as a service
● Swift – Object storage as a service
● Sahara – Hadoop as a service
● For Kilo release
07/02/2015 GlusterFS Meetup
GlusterFS 4.0 Vision
● To be the best in class distributed commodity storage with unified access of data
07/02/2015 GlusterFS Meetup
GlusterFS 4.0 Vision
● Community scaling – design by community
● Node scaling
● Technology scaling
● Development process scaling
07/02/2015 GlusterFS Meetup
GlusterFS 4.0 Vision
● 'Thousand node glusterd'
● DHT scalability
● NSR – Log based, chain replication
● Better brick management
● Split Network
● ... and many more. See http://www.gluster.org/community/documentation/index.php/Planning40
07/02/2015 GlusterFS Meetup
Resources
Mailing lists:[email protected]@gluster.org
IRC:#gluster and #gluster-dev on freenode
Links:http://www.gluster.orghttp://hekafs.orghttp://forge.gluster.orghttp://www.gluster.org/community/documentation/index.php/Arch