TECHNICAL PRESENTATION - · PDF fileTECHNICAL PRESENTATION. 2 RED HAT STORAGE | UZOMA NWOSU...
Transcript of TECHNICAL PRESENTATION - · PDF fileTECHNICAL PRESENTATION. 2 RED HAT STORAGE | UZOMA NWOSU...
1
Uzoma NwosuSolution Architect, Red HatSeptember 25th, 2014
TECHNICAL PRESENTATION
2 RED HAT STORAGE | UZOMA NWOSU
Agenda
● Introduction● Basics: Data Placement● Basics: Data Accessibility● Basics: Deployment● Advanced Features: Data Protection● Advanced Features: Management and Monitoring● Developers: APIs and Automation● What about Ceph?
3
INTRODUCTION
4 RED HAT STORAGE | UZOMA NWOSU
Media, video
Machine,log data
Geospatial Research Documents
Open source, software-defined storagefor unstructured file data at petabyte scale
5 RED HAT STORAGE | UZOMA NWOSU
Red Hat Storage server advantage
● Synchronous replication with self-healing for server failure● Asynchronous geo-replication for site failure
● Global name space ● NFS, SMB, object, HDFS, Gluster native protocol● Posix compliant
● GlusterFS is based on file-system in user space (FUSE)● Modular stackable architecture allows easy addition of ● Features without being tied to any kernel version
● Uses an elastic hashing algorithm for data placement ● Uses local file system’s extended attributes to store metadata● Nothing shared scale-out architecture
● Based on GlusterFS open source community project● Uses proven local file system (XFS)● Data is stored in native format
MODULARNo kernel dependencies
ACCESSIBLEMulti-protocol to the same data
SCALABLENo metadata server
OPENOpen, software-defined distributed
file and object storage system
BASICS: DATA PLACEMENT
7 RED HAT STORAGE | UZOMA NWOSU
Elastic hash algorithm
● No central metadata– No performance bottleneck– Eliminates risk scenarios
● Location hashed on file name– Unique identifiers, similar to md5sum
● The elastic part– Files assigned to virtual volumes– Virtual volumes assigned to multiple
bricks– Volumes easily reassigned on the fly
8 RED HAT STORAGE | UZOMA NWOSU
VOLUME
A namespace presented as a POSIX mount point
and is comprised of bricks.
BRICK
The basic unit of storage, represented by an export directory on
a server
SERVER/NODES
Contain the bricks
Red Hat Storage concepts
9 RED HAT STORAGE | UZOMA NWOSU
/export1
/export2
/export3
Storage node
/export1
/export2
/export3
Storage node
/export4
/export5
/export1
/export2
/export3
Storage node
/export4
Bricks
● A brick is the combination of a node and a file sysytem: hostname:/dir● Each brick inherits limits of the underlying file system (XFS).● Red Hat Storage Server operates at the brick level, not at the
node level.● Ideally, each brick in a cluster should be the same size.
10 RED HAT STORAGE | UZOMA NWOSU
/export1
/export2
/export3
Storage node
/export1
/export2
/export3
Storage node
/export4
/export5
/export1
/export2
/export3
Storage node
/export4
Volumes
Volume “scratchspace” 6 brick distribute, exported as /scratchspace
Volume “homeshares” 6 brick replica, exported as /homeshare
● A volume is some number of bricks = 2, clusters and exported with Gluster.● Volumes have administrators assigned names (= export names).● A brick is a member of only one volume.● A global namespace can have a mix of replicated and distributed volumes.● Data in different volumes physically exists on different bricks.● Volumes can be sub-mounted on clients using NFS, CIFS and/or Glusterfs
clients.● The directory structure of the volume exists on every brick in the volume.
11 RED HAT STORAGE | UZOMA NWOSU
Data placement strategies
GlusterFS volume type Characteristics
Distributed ● Distributes files across bricks in the volume● Used where scaling and redundancy requirements
are not important, or provided by other hardware or software layers
Replicated ● Replicates files across bricks in the volume● Used in environments where high availability and
high reliability are critical
Distributed replicated ● Offers improved read performance in most environments
● Used in environments where high reliability and scalability are critical
12 RED HAT STORAGE | UZOMA NWOSU
server1:/exp1 server2:/exp1
DISTRIBUTED VOLUME
FILE 1 FILE 2 FILE 3
BRICK BRICK
Default Data Placement (distributed volume)Uniformly distributes files across all bricks in the namespace
MOUNT POINTserver1:/exp1 server2:/exp1
REPLICATED VOLUME
BRICK BRICK
13 RED HAT STORAGE | UZOMA NWOSU
MOUNT POINT
Replicated Volume 0
DISTRIBUTED VOLUME
FILE 1 FILE 2
BRICK(exp1)
Fault-tolerant data placement (distributed replicated volume)Creates a fault-tolerant distributed volume by mirroring the same file across 2 bricks
Replicated Volume 1
BRICK(exp2)
server1 server2
BRICK(exp3)
BRICK(exp4)
server3 server4
BASICS: DATA ACCESSIBILITY
15 RED HAT STORAGE | UZOMA NWOSU
Multi-protocol accessPrimarily accessed as scale-out file storage with optional Swift object APIs
Native Hi-perf NFS CIFS (Samba) Hadoop HDFS* C lib gfapi S3/Swift REST
File API Object
API
Clients
access
On-premise or in the cloud
Hundreds of server nodes
16 RED HAT STORAGE | UZOMA NWOSU
GlusterFS native client (FUSE)
● Based on FUSE kernel module, which allows the filesystem to operate entirely in userspace
● Specify mount to any GlusterFS server
● Native Client fetches volfile from mount server, then communicates directly with all nodes to access data
● Recommended for high concurrency and high write performance
● Load is inherently balanced across distributed volumes
17 RED HAT STORAGE | UZOMA NWOSU
RHS Server
Mirror Mirror Mirror
…
GlusterFS native client – data flowClients talk directly to the data bricks based on elastic hash
App server 1Running GlusterFS
client
Glusterfsd brick
server
Glusterfsd brick
server
Glusterfsd brick
server
Glusterfsd brick
server
Glusterfsd brick
server
Glusterfsd brick
server
18 RED HAT STORAGE | UZOMA NWOSU
NFSAccessibility from UNIX and Linux systems
● Standard NFS v3 clients connect to GlusterFS NFS server process (user space) on storage node
● Mount GlusterFS volume from any storage node
● GlusterFS NFS server includes network lock manager (NLM) to synchronize locks across clients
● Better performance for reading many small files from a single client
● Load balancing must be managed externally
● Standard automounter is supported
19 RED HAT STORAGE | UZOMA NWOSU
SMB/CIFSAccessibility from Windows systems
● Storage node uses Samba with winbind to connect with Active Directory environments
● Samba uses Libgfapi library to communicate directly with GlusterFS server process without going through FUSE
● SMB clients can connect to any storage node running Samba
● SMB version 2.0 supported
● Load balancing must be managed externally
● CTDB is required for Samba clustering
20 RED HAT STORAGE | UZOMA NWOSU
Mirror Mirror Mirror
…
SMDB
SWIFTNFS
NFS, CIFS & OBJECT – data flow
IP load balancing & high availability ctdb rrdns, ucarp, haproxy, hw (f5, ace etc.)
Glusterfsd brick
server
RHS Server
Glusterfsd brick
server
Glusterfsd brick
server
Glusterfsd brick
server
Glusterfsd brick
server
Glusterfsd brick
server…
Application serverRunning
NFS/SMDB/HTTP client
5
23
1
4 Distributed – replicated volume
SMDB
SWIFTNFS
SMDB
SWIFTNFS
SMDB
SWIFTNFS
SMDB
SWIFTNFS
SMDB
SWIFTNFS
. . . .
Clients first talk to the mounted storage node and are then directed to the data
bricks. Non-native protocol adds an additional network hop.
21 RED HAT STORAGE | UZOMA NWOSU
Object access of GlusterFS volume
● Built upon OpenStack’s Swift object storage
● GlusterFS is the back-end file system for Swift
● Implements objects as files and directories under the container
● Accounts are implemented as GlusterFS volumes
● Store and retrieve files using the REST interface
● Support integration with SWAuth and Keystone authentication service
Red Hat Confidential
22 RED HAT STORAGE | UZOMA NWOSU
Object store architecture
End Users
Node 1
Node 2
Node n
memcached
Red Hat Storage Volume (swift acount)
Coantainer server
Proxy server
Object server
Account server
Load Balancerglusterfsd
BASICS: DEPLOYMENT
24 RED HAT STORAGE | UZOMA NWOSU
Primary datacenter (east region) Secondary data center (west region)
…
…
Scale-out performance, capacity, and availability
Red Hat Storage Server deployment in AWS
Geo-replication
RHSS4 AMI RHSS4 AMI
RHSS4 AMI RHSS4 AMI
Gluster FS
Duplicate-replicate volume (primary)
Replicated bricks across zones for fall-over
Replicated bricks across zones for fall-over
EBSRaid0
EBSRaid0
EBSRaid0
EBSRaid0
● Red Hat Storage Server Amazon Machine Images (AMIs)
● Red Hat Storage Server provides the only solution to achieve high availability of Elastic Block Storage (EBS)
● Multiple EBS devices pooled● POSIX compatible (no
application to rewrite required to run on Amazon EC2)
● Scale-out performance and capacity as needed
Gluster FS
Duplicate-replicate volume (secondary)
Availability
Zone A
Availability
Zone B
25 RED HAT STORAGE | UZOMA NWOSU
RED HAT STORAGEFOR ON-PREMISE
SERVER (CPU/MEM)
1TB
● Deploys on Red Hat-supported servers and underlying storage: DAS, JBOD.
● Scale-out linearly
● Replicate synchronouslyand asynchronously.
RED HAT STORAGEFOR ON-PREMISE
1TB
Scale out performance, capacity, and availability
Sc
ale
up
c
ap
ac
ity
Single global namespace
...
...SERVER(CPU/MEM)
............ ... ...
Red Hat Storage Server deployment on physical machines
26
ADVANCED FEATURES: DATA PROTECTION
27 RED HAT STORAGE | UZOMA NWOSU
Node A dies, and a write destined for A ↔ B pair arrives.
● Write will happen to B.● When A comes back online, self-heal will kick in to fix the
discrepancy● No change in this behavior with or without quorum enabled.
Node A dies, and a write destined for C ↔ D pair arrives.
● Write will happen to C and D.● No change in this behavior with or without quorum enabled
If both A & B die, a write destined for the A ↔ B pair arrives.
● Quorum is enabled, and the quorum ratio is not met. All the bricks in A, B, C, and D will go down.
● Quorum is not enabled. Write will fail, and bricks in C & D will continue to be alive.
If both A & B die, a write destined for the C ↔ D pair arrives.
● Quorum is enabled, and the quorum ratio is not met. All the bricks in A, B, C, and D will go down.
● Quorum is not enabled. Write to C & D will succeed.
Server-side quorum scenarioIn a storage pool with 4-nodes (A, B, C and D) in a 2X2 distributed replicated configuration, A and B are replicated and C and D are replicated. The quorum ratio is set to the default value of > 50%.
28 RED HAT STORAGE | UZOMA NWOSU
● Client-side quorum is implemented to minimize split-brains during data modify operation (write/create).
● Client-side quorum configuration determines the number of bricks that must be online for it to allow data modification.
● If client-side quorum is not met, files in that replica group become read-only.
● Client-side quorum configuration applies for all the replica groups in the volume.
● Enforcement at a volume level.
● Volume settings are cluster.quorum-type and cluster.quorum-count.
Cluster.quorum-type Cluster.quorum-content Behavior
None Not applicable Quorum not in effect
Auto Not applicable ● Allow writes to a file only if the percentage of active replicated bricks is more than 50% of the total number of bricks that constitute that replica.
● Exception: For replica count=2, first brick in the pair must be online to allow writes.
Fixed 1 thru replica-count ● The minimum number of bricks that must be active in a replica-set to allow writes.
Avoiding split brain: Client-side quorum
29 RED HAT STORAGE | UZOMA NWOSU
Site D
Site E
Site F
Site G
Site C
Site B
Site C
Site A
Site B
● Asynchronous across LAN, WAN, or Internet
● Master-slave model, cascading possible
● Continuous and incremental
● One way
Multi-site content distribution: Geo-replication
Site A
30 RED HAT STORAGE | UZOMA NWOSU
● Performance – Parallel transfers– Efficient source scanning– Pipelined and batched– File type/layout agnostic
● Checkpoints● Failover and failback
Geo-replication features
Site A
Site B
31 RED HAT STORAGE | UZOMA NWOSU
● Snapshot: Point-in-time state of storage system/data (can be used as a recovery point later on when needed)
● Volume level, ability to create, list, restore, and delete● LVM2 based, operates only on thin-provisioned volumes● Crash consistent● Support a max of 256 snapshots per volume● Snapshot can be taken on one volume at a time● Snapshot names need to be cluster-wide unique● Managed via CLI● User serviceable snapshots: end user can recover one or more files without
admin involvement (tech preview)
Red Hat Confidential
Red Hat Storage Server snapshots
32 RED HAT STORAGE | UZOMA NWOSURed Hat Confidential
Red Hat Storage Server snapshot
Brick 1_s1
Brick 1 Brick 2
Brick 2_s1
Storage node
Brick 1_s1
Brick 1 Brick 2
Brick 2_s1
Storage node
Red Hat Storage Server
volume
Snapshot volume
ADVANCED FEATURES: MANAGEMENT AND
MONITORING
34 RED HAT STORAGE | UZOMA NWOSU
● Red Hat Storage Server quota allows you to control the disk utilizationat both a directory and volume level.
● Two levels of quota limits: Soft and hard
● Warning messages issued on reaching soft quota limit
● Write failures with EDQUAT message after hard limit is reached
● Hard and soft quota timeouts
● The default soft limit (80%) is an attribute of the volume that is specifiedas a percentage.
Disk utilization and capacity management: Quota
35 RED HAT STORAGE | UZOMA NWOSU
Request usage info
…
Directory quotas architecture
Brick 1Brick 0 Brick n
Brick 0 Aggregator
Enforcer
Accounting
Volume
quoted
V0 V1 V2
36 RED HAT STORAGE | UZOMA NWOSU
Understanding quota timeout thresholds
Volume (or directory) Volume (or directory)
Hard timeout
Soft timeout
Hard timeout
Soft timeout
Hard limit
Soft limit
Hard limit
Soft limit
Soft timeout in effect Hard timeout in effect
37 RED HAT STORAGE | UZOMA NWOSU
Functional architecture of Red Hat Storage Server monitoring
3rd party tools
3rd party tools
RESTAPI
Queries and commands
Http requests SNMP Traps
Red Hat Storage Server Console Nagios Server for RHSS
System trends
Queries and commands Results
Nagios active checks
Nagios active checks
Storage Administrators
38 RED HAT STORAGE | UZOMA NWOSU
Storage operations
● Intuitive user interface
● Volume management
● On-premise and public cloud
Virtualization and storage
● Shared management with Red Hat Enterprise Virtualization Manager
Provisioning
● Installation and configuration
● Update management
● Lifecycle management
● Familiar Red Hat Enterprise Linux tools
Note: Add Entitlement feature for CDN
Simplified and unified storage managementSingle view for converged storage and compute
Provisioning and life cycle management
Storage
MANAGEMENT TOOLS & FRAMEWORK
Red Hat Storage Server Console
Red Hat NetworkSatellite
Virtualization
RHEVManager
39 RED HAT STORAGE | UZOMA NWOSU
● Non-disruptive upgrade from Red Hat Storage Server 2.1 to 3.0
● Entitlement management via Red Hat’s Content Delivery Network (CDN)
● Catalog/ID-based logging
Red Hat Confidential
Other management capabilities
DEVELOPERS: APIs and AUTOMATION
41 RED HAT STORAGE | UZOMA NWOSU
● User-space library for accessing data in GlusterFS
● Filesystem-like API
● Runs in your application process
● No FUSE, no copies, no context switches, but same volfiles, translators, etc.
● Already in use for various access methods like QEMU storage layer, Samba VFS plugin, and NFS-Ganesha
● C, Python bindings available, more languages to come!
Libgfapi native application interface
42 RED HAT STORAGE | UZOMA NWOSU
● Announcements coming next month.
Red Hat Confidential
What about Ceph?
THANK YOU