GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling...
Transcript of GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling...
![Page 1: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/1.jpg)
GlusterFS – a Scale-Out Data Platform
John Mark WalkerGluster Community Guy
![Page 2: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/2.jpg)
10/25/12
Topics
● What is GlusterFS● Humble beginnings
● Evolution● Community Process● GlusterFS 3.3● The Future is Gluster
![Page 3: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/3.jpg)
10/25/12
Simple Economics● Simplicity, scalability, less cost
Multi-TenantVirtualized Automated Commoditized
Scale on Demand In the Cloud Scale Out Open Source
![Page 4: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/4.jpg)
10/25/12
Simplicity Bias
● FC, FCoE, iSCSI → HTTP, Sockets ● Modified BSD OS → Linux / User Space /
C, Python & Java● Appliance based → Application based
![Page 5: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/5.jpg)
10/25/12
Scale-out Open Source is the winner
![Page 6: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/6.jpg)
10/25/12
Bengaluru Office
Conference Room US Head Office
Bengaluru Office
![Page 7: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/7.jpg)
Community Deployments
![Page 8: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/8.jpg)
10/25/12
Not a Storage Company
● At first a cluster-building company● Engineering team excelled at
building open source HPC systems
![Page 9: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/9.jpg)
10/25/12
Necessity:The Mother of
Invention
![Page 10: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/10.jpg)
10/25/12
The big idea:Storage should be
simple
![Page 11: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/11.jpg)
10/25/12
What is Simple Storage?
● Low-risk, easy to deploy and administer, data consistency, open source, software-only, user space
![Page 12: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/12.jpg)
10/25/12
What is GlusterFS, Really?
Gluster is a unified, distributed storage system
● User space, global namespace, stackable, POSIX-y, scale-out NAS platform, inspired by GNU Hurd
![Page 13: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/13.jpg)
10/25/12
Some Features
● No single point of failure● DHT
● Synchronous and asynchronous replication
● Proactive self-healing
![Page 14: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/14.jpg)
10/25/12
![Page 15: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/15.jpg)
10/25/12
What Can You Do With It? ● Media – Docs, Photos, Video● Shared storage – multi-tenant
environments● Big Data – Log Files, RFID Data● Objects – Long Tail Data
![Page 16: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/16.jpg)
10/25/12
Standard Deployment● Distributed
over multiple servers
● Replicate volumes
● On top of disk FS (XFS, Ext4, ie. Xattrs)
● Multi-protocol access
![Page 17: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/17.jpg)
Red Hat Proprietary17
Storage for Any EnvironmentScale-out NAS for On-premises and Public Clouds
●Standardized NAS infrastructure●On-premise and public cloud●POSIX-ish ●Apps move easily between environments●Replicate between both
Public CloudOn-premises
![Page 18: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/18.jpg)
10/25/12
First Versions ● Toolkit for building storage systems● Very hacker-friendly● Community integral part of development
– Drove feature development– Repeatable use cases
![Page 19: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/19.jpg)
10/25/12
Mid-2011 Snapshot ● Scale-out NAS● Distributed and replicated● NFS, CIFS and native GlusterFS ● User-space, stackable architecture● Lots of users, not many devs
→ A good platform to build on
![Page 20: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/20.jpg)
10/25/12
GlusterFS 3.3: Building on the Foundation
● Granular locking● Proactive self-healing● Improved rebalancing● More access methods
![Page 21: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/21.jpg)
10/25/12
Granular Locking– Server fails, comes back– Files evaluated– Block-by-block until healed
Server 1 Server 2
GlusterFS GlusterFS
Virtual Disk 1-1
Virtual Disk 1-2
Virtual Disk 2-1
Virtual Disk 2-2
Blocks compared
![Page 22: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/22.jpg)
10/25/12
Proactive Self-healing– Performed server-to-server– Recovered node queries peers
/ Symlink 1
Hidden | Symlink 2 \ Symlink 3
File 1File 2File 3
Server 1 - good
Server 2 - recovered
Replicated
Server 3 - good
File 1File 2File 3
Server 4 - good
Dis
trib
ute
d
Self-healing
![Page 23: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/23.jpg)
10/25/12
Easier Rebalancing
– Now faster● Previously, created entire new hash set, moving data unnecessarily
● Now recreates hash map and compares to old
– Easier to decommission server nodes– Proof point for synchronous translator
API
![Page 24: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/24.jpg)
10/25/12
Unified File and Object (UFO)
– S3, Swift-style object storage– Access via UFO or Gluster mount
Client Proxy Account
Container
Object
HTTP Request
ID=/dir/sub/sub2/file
Directory
Volume
FileClientNFS or
GlusterFS Mount
![Page 25: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/25.jpg)
10/25/12
Unified File and Object (UFO)
– Your gateway to the cloud– Your data, accessed your way
![Page 26: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/26.jpg)
10/25/12
HDFS Compatibility– Run MapReduce jobs on GlusterFS– Add unstructured data to Hadoop
Hadoop Server
GlusterFS
GlusterFS
GlusterFSLocal Disk
GlusterFS
HDFSConnector(Jar file)
![Page 27: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/27.jpg)
10/25/12
4. Coming Attractions
![Page 28: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/28.jpg)
10/25/12
API Check
● Ways to interface with GlusterFS– Translators
● Stackable, async and sync– FUSE mount
● GlusterFS client– Libgfapi
● FUSE bypass
![Page 29: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/29.jpg)
10/25/12
API Check
● Ways to interface with GlusterFS– Marker framework
● Geo-replication, quickly ID changes – UFO RESTful API– HDFS library– Management API
● oVirt 3.1
![Page 30: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/30.jpg)
10/25/12
Better VM Image Handling
– Better responsiveness for random I/o use cases
– Contribution: Block Device Translator
![Page 31: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/31.jpg)
10/25/12
Enabling GlusterFS for Virtualization use
● QEMU-GlusterFS integration
● Native integration, no FUSE mount● Gluster as QEMU block back end● QEMU talks to gluster and gluster hides different
image formats and storage types underneath● Block device support in GlusterFS via Block Device
translator
● Logical volumes as VM images
![Page 32: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/32.jpg)
10/25/12
GlusterFS & QEMU
![Page 33: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/33.jpg)
10/25/12
Libglusterfs Client API
– Previously abandoned– Brought back to life
● In part because of QEMU Fuse bypass contributions
![Page 34: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/34.jpg)
10/25/12
Multi-Master Geo Rep
– Async rep previously only master-slave
– Multi-master gives admins greater flexibility
– Cascading, > 2-way
![Page 35: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/35.jpg)
10/25/12
Split Brain– Nodes cannot see each other, but can
all still write– Often due to network outages– Sometimes results in conflicts– Up to 3.2, GlusterFS had no concept
of “quorum”
![Page 36: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/36.jpg)
10/25/12
Quorum Enforcement– Which node has valid data?– If quorum, keep writing, else stop
● Configurable option
-No quorum-Stops writing
-Quorum-Keeps writing
Server 1 Server 2
BrokenConnection
-Quorum-Keeps writing
Server 3
![Page 37: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/37.jpg)
10/25/12
Quorum Enforcement– After connection restored, self-heal kicks off
-No quorum-Stops writing
-Quorum-Keeps writing
Replica 1 Replica 2
-Quorum-Keeps writing
Replica 3
-No quorum-Stops writing
-Quorum-Keeps writing
-Quorum-Keeps writing
Self-heal
![Page 38: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/38.jpg)
10/25/12
Enhanced Quorum
– Quorum tracking on the servers– Need quorum for any management
changes– 3rd party arbiters / observers so never
N=2
![Page 39: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/39.jpg)
10/25/12
Management UI & REST API
– Collaboration with oVirt project– Management GUI for admins– RESTful gateway for devs– First community release... ?
![Page 40: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/40.jpg)
10/25/12
![Page 41: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/41.jpg)
10/25/12
Multi-tenancy & Encryption
– HekaFS created this for cloud deployments
– Being added to master branch
![Page 42: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/42.jpg)
10/25/12
Down the Road
– Snapshots– Versioning– GeoRep Sparse Replicas– File compression & de-dupe
![Page 43: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/43.jpg)
10/25/12
Server-side Processing
– Implementing gfind, glocate– Fast traversal of metadata in xattrs
● Find and locate responsive – Inotify-esque behavior: triggers based
on i/o activity, ie. file close● Why rely on Hadoop batch-processing?
![Page 44: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/44.jpg)
10/25/12
Goals● Gluster is the standard platform for big data and
cloud computing● Integration with every major big data, cloud and
storage technology.● Signifies distributed data workloads● Encompasses storage and big data spheres
● In 2012, GlusterFS will be the “Foundation for Big Data”
![Page 45: GlusterFS – a Scale-Out Data Platform - Linux Foundation · PDF file10/25/12 Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration Native integration, no FUSE mount](https://reader030.fdocuments.us/reader030/viewer/2022021504/5aaabc607f8b9a7c188e7534/html5/thumbnails/45.jpg)
10/25/12
Goal: Intelligent Storage
● Just storing and retrieving data is not enough
● Should be able to store, analyze, transform, mutilate, and retrieve
● Intelligent storage gives sysadmins and developers the ultimate data swiss army knife