High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability...
Transcript of High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability...
![Page 1: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/1.jpg)
High Availability StorageHigh Availability Extensions
Goldwyn RodriguesHigh Availability Storage Engineer
SUSE
![Page 2: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/2.jpg)
2
High Availability Extensions
• Highly available services for mission critical systems
• Integrated suite over robust open-source technologies
• Business Continuity
• Protect Data Integrity
• Reduce Unplanned Downtime
• Commodity Hardware for high availability
![Page 3: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/3.jpg)
3
Cluster
• A set of computers interacting with each other
• One goes down, another picks up its responsibility
• Should be available as much as possible
• Cluster Types:‒ Active/Active vs Active/Passive (N+1, N+M)
‒ Physical vs Virtual vs Hybrid
‒ Local Clusters vs Metro vs Geo Clusters
![Page 4: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/4.jpg)
4
Why Cluster
• Increased Availability
• Improved Performance
• Low cost of operation
• Scalability
• Disaster Recovery
• Data Protection
• Server Consolidation
• Storage Consolidation
![Page 5: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/5.jpg)
5
CAP TheoremBrewers Theorem
Con
sist
ency Availability
Partition Tolerance
A guarantee that every request receives a response about whether it was successful or failed
The system continues to operate despite arbitrary message loss or failure of part of the system
All nodes see the same data at the same time
![Page 6: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/6.jpg)
6
Fault Tolerance vs High Availability
• Specialized hardware to detect a hardware fault and switch to redundant hardware
• Expensive redundant and replicated components
• A set of computers system-wide, shared resources that cooperate to guarantee essential services
• Software and hardware to quickly restore services
![Page 7: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/7.jpg)
7
Cluster
DB Server
NFS Server
Mail ServerApplication
Server
Web Server
![Page 8: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/8.jpg)
8
A Dysfunctional Cluster
SP
L IT B
RA
IN
![Page 9: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/9.jpg)
9
STONITHShoot the Other Node In the Head
RESET
![Page 10: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/10.jpg)
10
Quorum
![Page 11: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/11.jpg)
11
Quorum Policies
• Ignore‒ Continue cluster operations as usual
• Freeze‒ Resource management continues
‒ New resources are not started
• Stop‒ All resources affected partition are stopped
• Suicide‒ Fence all nodes in affected partition
![Page 12: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/12.jpg)
12
Resource Agents
• Open Cluster Framework (OCF)
• Manage resources‒ Web Server
‒ IP Address
‒ Shared Filesystem
• Resource Operations‒ Id
‒ Name
‒ Interval
‒ timeout
![Page 13: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/13.jpg)
13
Configuration Tools
• Cluster Resource Manager (CRM)‒ Powerful Command line tool
• YasT‒ Basic Cluster setup
‒ DRBD
‒ IP Load balancing
• High Availability Web Konsole (Hawk)‒ Web based
![Page 14: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/14.jpg)
14
Architecture
![Page 15: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/15.jpg)
15
Resource Agent Constraints
1. IPAddr
2. Web Server
Database
Loca
tion
Colocation
Order
![Page 16: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/16.jpg)
16
Shared Storage
What's the problem here?
![Page 17: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/17.jpg)
17
Shared Storage
• A common view for all nodes
• Node Failure: One node can pick up from where the other left
• Local filesystems don't work‒ Data corruptions because of writing files other nodes access
‒ Node Cache Inconsistencies
![Page 18: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/18.jpg)
18
DLMDistributed Lock Manager
• Provides a Cluster-wide locking for data access
• Different Level for wide variety of uses
• No centralized-control‒ Easy for take over
‒ The node accessing the object first gets to create the distributed lock object
• Lock Value Block (LVB) for data synchronization
![Page 19: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/19.jpg)
19
DLMDistributed Lock Manager
Mode NL CR CW PR PW EX
NL Yes Yes Yes Yes Yes Yes
CR Yes Yes Yes Yes Yes No
CW Yes Yes Yes No No No
PR Yes Yes No Yes No No
PW Yes Yes No No No No
EX Yes No No No No No
![Page 20: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/20.jpg)
20
cLVMClustered Logical Volume Manager
• Logical Volume Manager for the cluster
• Add or remove devices as storage needs change
• Linear‒ Add storage as required
‒ Simple addition of devices in a linear form
• Mirrored‒ Redundancy of devices over the cluster
![Page 21: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/21.jpg)
21
DRBDDistributed Replicated Block Device
DRBD DRBD
Primary Node Secondary
Node
Disk 1
Disk 2
![Page 22: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/22.jpg)
22
DRBDDistributed Replicated Block Device
• Shared Storage without a SAN‒ Governed by network speeds
• RAID1 over network block device and local device
• Fully synchronous, memory synchronous or asynchronous modes of operation
• Dual Primary for clustered filesytems
![Page 23: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/23.jpg)
23
Shared Filesystemocfs2
• Simultaneously mounted on all nodes‒ All nodes should have access to the data
• Cluster Filesystem with a good throughput‒ Should not be bogged down with multiple access
‒ Force cache flush on other nodes using filesystem when current node access the file
![Page 24: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/24.jpg)
24
OCFS2 Features
• B-tree Extent based
• Inline data
• Indexed Directories
• Metadata ECC
• Refcount
• Extended Attributes‒ ACL
• Quota
Read man mkfs.ocfs2 for more information..
![Page 25: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/25.jpg)
25
Usage Scenarios
• Database Server Applications‒ Common database repository
‒ CTDB (Samba)
• Web Servers‒ WWW root
• Highly Available Virtual Machines‒ VM Disks
![Page 26: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/26.jpg)
26
Generic Clustering Tips
• Always use STONITH‒ Recommend multiple STONITH devices
‒ SBD is an alternative
• Time synchronization
• Read the logs when things are not working‒ Record Time of event
• Redundant Communications‒ Network Device bonding
‒ Redundant Ring Protocol
![Page 27: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/27.jpg)
27
OCFS2 Tips
• Prefer hardware based RAID (mirroring)
• If you don't want a feature don't enable it‒ Quota, ACL, Inline directories use additional data on disk and
additional lookups
‒ You can always enable it later using tunefs.ocfs2
• Inline directories for large number of files
• MetaECC for better data protection‒ Protects filesystem from getting corrupted further
‒ Immediately run fsck.ocfs2
![Page 28: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/28.jpg)
Thank you.
28
SUSE® Linux Enterprise High Availability Extensionhttps://www.suse.com/products/highavailability/
![Page 29: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/29.jpg)
29
![Page 30: High Availability Storage - SUSECONHigh Availability Storage Engineer SUSE 2 High Availability Extensions • Highly available services for mission critical systems • Integrated](https://reader035.fdocuments.us/reader035/viewer/2022063023/5ffada62844e743a8713f2ab/html5/thumbnails/30.jpg)
Unpublished Work of SUSE LLC. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.