Stacking up with OpenStack: building for High Availability
-
date post
19-Oct-2014 -
Category
Technology
-
view
2.754 -
download
3
description
Transcript of Stacking up with OpenStack: building for High Availability
Stacking up with OpenStack:Building for High Availability
Utpal Thakrar, Sr. Product ManagerApril 17, 2013
# 2
Cloud Management
# 2
#rightscale
My relationship with HA 1975
# 3
Cloud Management
# 3
#rightscale
My relationship with HA 1991
# 4
Cloud Management
# 4
#rightscale
My relationship with HA 2001
How many 9-s can your product do?
# 5
Cloud Management
# 5
#rightscale
So what did they mean by 5-9s?
Availability Allowed Down Time each Year
99% 3.65 days
99.9% 8.76 hours
99.99% 52.56 minutes
99.999% 5.26 minutes
# 6
Cloud Management
# 6
#rightscale
Stuff happens, are you prepared?
# 7
Cloud Management
# 7
#rightscale
Who dunnit?…
# 8
Cloud Management
# 8
#rightscale
And you see these …
# 9
Cloud Management
# 9
#rightscale
Is 100% Outage-proofing possible?
# 10
Cloud Management
# 10
#rightscale
Old School Fault-Tolerance: Build Two
# 11
Cloud Management
# 11
#rightscale
No Up-Front Capital Expense
Pay Only for What You Use
Self-Service Infrastructure
Easily Scale Up and Down
Improve Agility & Time-to-Market
Low Cost
Golden Age of Cloud Computing
Deploy
# 12
Cloud Management
# 12
#rightscale
No Up-Front HA Capital Expense
Pay for DR Only When You Use it
Self-Service DR Infrastructure
Easily Deliver Fault-Tolerant
Applications
Improve Agility & Time-to-Recovery
Low Cost Backups
Golden Age for Fault-Tolerance
Deploy
The benefits translate!
# 13
Cloud Management
# 13
#rightscale
Yeah, but …
What about my private cloud?
Applications deployed in private clouds have to worry about:
• Private Cloud Infrastructure being HA• Application architecture HA / DR
• With Public Clouds – Well, you get what your provider gives you
# 14
Cloud Management
# 14
#rightscale
Private Cloud Infrastructure HA
Several single points of failure in OpenStack deployment• OpenStack API services• MySQL• RabbitMQ
Solved in various ways • Pacemaker cluster management• Keepalived (e.g: RAX Private Cloud)• MySQL (Galera), RabbitMQ (active-active mirrored queues)
Eliminate SPoFs as best as you can.
# 15
Cloud Management
# 15
#rightscale
What about my app? Design for failure:• If your application relies on Cloud infrastructure
SLA for its HA needs, you are STUCK with that vendor / infrastructure
• Need to balance cost and complexity against risk tolerance
• Design application so that its:
Build for server failure Build for zone failure Build for cloud failure Keep management layer separate from infrastructure
# 16
Cloud Management
# 16
#rightscale
Build for Server Failure• Set up auto-scaling
• Set up database mirroring, master/slave configuration
• Use static public IPs
• Use Dynamic DNS for private IPs
# 17
Cloud Management
# 17
#rightscale
Build for Zone Failure
A creative deployment model would be to make your private cloud an “AZ” by placing it in close physical proximity to a public cloud provider
SLAVE DBMASTER DB
SNAPSHOTS
LOAD BALANCERS
REPLICATE
DNS
Object store
Block
Zone 1 1
LOAD BALANCERS
APP SERVERS
AUTOSCALE
172.168.7.31 172.168.8.62
Snapshot data volume for backups so the database can be readily recovered
within the region.
Place Slave databases in one or more zones for failover.
Zone 2
Static Public IPs
Where possible, use NoSQL DB like Cassandra or MongoDB
# 18
Cloud Management
# 18
#rightscale
Build for Cloud Failure (Cold DR)
LOAD BALANCERS
MASTER DB SLAVE DB
APP SERVERS
LOAD BALANCERS
REPLICATE
DNS
APP SERVERS
DALLAS
SNAPSHOTS
172.168.7.31
SLAVE DB
Private
CLOUD FILES
Staged Server Configuration and generally no staged data• Not recommended if rapid recovery is required• Slow to replicate data to other cloud and bring database online
Block
$
# 19
Cloud Management
# 19
#rightscale
Build for Cloud Failure (Warm DR)
LOAD BALANCERS
MASTER DB SLAVE DB
APP SERVERS
LOAD BALANCERS
REPLICATE
DNS
APP SERVERS
SLAVE DB
REPLICATE
DALLAS
172.168.7.31
Private
SNAPSHOTS
Staged Server Configuration, pre-staged data and running Slave Database Server• Generally recommended DR solution• Minimal additional cost and allows fairly rapid recovery
SNAPSHOTS
Block
CLOUD FILES
$$
# 20
Cloud Management
# 20
#rightscale
APP SERVERS
Build for Cloud Failure (Hot DR)
LOAD BALANCERS
MASTER DB SLAVE DB
APP SERVERS
LOAD BALANCERS
REPLICATE
DNS
SLAVE DB
REPLICATE
DALLAS
SNAPSHOTS
172.168.7.31
Private
Parallel Deployment with all servers running but all traffic going to primary• Not recommended• Very high additional cost to allow rapid recovery
SNAPSHOTS
Block
CLOUD FILES
$$$
# 21
Cloud Management
# 21
#rightscale
Availability vs. Cost - Dial
Min MaxMin Max
Availability
Cost
# 22
Cloud Management
# 22
#rightscale
Make sure workload is portable across clouds
# 23
Cloud Management
# 23
#rightscale
Automate and test everything
• Automate backups of your data• Setup monitoring and alerts• Run fire-drills! Plan and Practice your recovery procedures!
# 24
Cloud Management
# 24
#rightscale
Separate Management layer from Infrastructure
• Keep the keys to the car outside the car
# 25
Cloud Management
# 25
#rightscale
Automating HA and DR• Use dynamic DNS for your database servers
• Allow app servers to use a single FQDN.• Use a low TTL to allow rapid failover in the case of a change in master
database
• Automatic connection of app servers to load balancing servers• App servers can connect to all load balancers automatically at launch• No manual intervention• No DNS modifications
• Automated promotion of slave to master• Process is automated• Decision to run process is manual
Copyright © 2013 Samsung SDS Co., Ltd. All rights reserved
Samsung SDSMr. Kirk Kim
27 Copyright © 2013 Samsung SDS Co., Ltd. All rights reserved
Firewall IPS
VPN Gateway
CF RouterPublic ASN: XXXX
Private: 10.x.x.x/24Public: *.*.*.0/24
Private: 10.x.x.x/24Public: *.*.*.0/24
VM VM
SPCS
Virtual GW
VM
10.x.x.x/24
VM
Internet GW
EIP: e.x.y.bVM
EIP: e.x.y.aVM
ObjectStorage
VPC
Compute
Public Cloud
Internet traffic
Between SPCS and Public Cloud using public IPBetween SPCS and Public Cloud using private IPInternet traffic to SPCS and Public Cloud using public IP
Hybrid Cloud Network Architecture
Private Network
# 28
Cloud Management
# 28
#rightscale
RightScale ServerTemplates™
• Reproducible: Predictable deployment
• Dynamic: Configuration from scripts at boot time
• Multi-cloud: Cloud agnostic and portable
• Modular: Role and behavior abstracted from cloud infrastructure
How RightScale makes it possible
# 29
Cloud Management
# 29
#rightscale
MultiCloud Images• MultiCloud Images can be launched across regions and clouds
without modification
How RightScale makes it possible
MultiCloud Images
Cloud A, B, Image 1
Cloud A C, Image 2
Cloud B, Image 1
ServerTemplate contains a list of MultiCloud Images (MCIs)
When the Server is created, a specific MCI is chosen.
Cloud A, B, Image 1
Cloud B
Image 1
The appropriate RightImage is used at launch.
RightImage
Stability across clouds
1
2
3
# 30
Cloud Management
# 30
#rightscale
Outage-Proofing Best Practices
Place in >1 zone:• Load balancers• App servers• Databases
Maintain capacity to absorb zone or region failures
Replicate data across zones
Design stateless apps for resilience to reboot / relaunch
Replicate data across zones
Backup across regions & clouds
Monitoring, alert, and automate operations to speed up failover
Replication and Failover
Application Design
Resource Placement
# 31
Cloud Management
# 31
#rightscale
Thank you!
Sign-up for a free account at: www.rightscale.com
Check out job postings are: www.rightscale.com/jobs
We are hiring!