OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
-
Upload
cloudscaling-inc -
Category
Documents
-
view
4.650 -
download
3
description
Transcript of OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
![Page 1: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/1.jpg)
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution** All unlicensed or borrowed works retain their original licenses
Redundancy Doesn't Always Mean "HA" or "Cluster"A cautionary tale against using hammers to solve all redundancy and resiliency problems ...
1
Randy Bias@randybiasCTO, Cloudscaling
Dan Sneddon@dxsSr. Engineer, Cloudscaling
OpenStack Design Summit – Oct 2012
Thursday, October 18, 12
![Page 2: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/2.jpg)
1. “HA” pairs are not the only type of redundancy
2. Alternative redundancy patterns for HA
3. Redundancy patterns in Open Cloud System*
Our Journey Today
2
* Cloudscaling’s OpenStack-powered cloud operating system (“distribution”)
Thursday, October 18, 12
![Page 3: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/3.jpg)
We mean what most people mean ...What Do We Mean By “HA”?
3
Two servers or network devices that look like one
Thursday, October 18, 12
![Page 4: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/4.jpg)
HA pairs come in a couple flavors“HA HA”?
4
Active / Passive
Thursday, October 18, 12
![Page 5: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/5.jpg)
People like this flavor best, but it’s not always possible...
“HA HA”?
5
Active / Active
Thursday, October 18, 12
![Page 6: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/6.jpg)
Many people wish they could get it more like this ...“HA HA HA HA HA”??
6
HA cluster aka ‘massive operational nightmare’
Thursday, October 18, 12
![Page 7: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/7.jpg)
Imagine this was 4 or 6 nodes in the clusterCluster<bleep>!
7
• 4 network tech.
• 7 NICs / node
• A million different ways to break
Thursday, October 18, 12
![Page 8: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/8.jpg)
Herein lies the problem ...“HA” Pairs Are One Type of Redundancy
8
Thursday, October 18, 12
![Page 9: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/9.jpg)
There are many, but these two matter most ...
• Catastrophic failures
• No scale out
The Problem With “HA”-mmers
9
Thursday, October 18, 12
![Page 10: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/10.jpg)
Either working or dead, nothing in-betweenHA Pairs Have Binary Failures
10
Thursday, October 18, 12
![Page 11: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/11.jpg)
What is Scale-out?
11
A B
A B
A B C D N
Scale-up - Make boxes bigger (usually an HA pair)
Scale-out - Make moar boxes
Thursday, October 18, 12
![Page 12: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/12.jpg)
Scaling out is a mindset
12
bowzer.company.com web001.company.com
Servers *are* cattle
Scaling up is like treating your servers as pets
Thursday, October 18, 12
![Page 13: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/13.jpg)
Hardware rarely fails, operators fail, software failsHA Pair Failures* - 100% down
13
Who Type Year Why Duration
Apple Switch 2005 Bug 2 hrs
Flexiscale SAN 2007 Ops Err 24 hrs
Vendio NAS 2008 Ops Err 8 hrs
UOL Brazil SAN 2011 Bug 72 hrs
Twitter Datacenter 2012 Bug+Ops 2 hrs
* This is a handful of examples as a baseline; I’m sure you can find many more
Thursday, October 18, 12
![Page 14: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/14.jpg)
They better not fail ...“HA” Pairs Are an All-in Move
14
Thursday, October 18, 12
![Page 15: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/15.jpg)
Many small failure domains is usually betterRisk Reduction
15
Thursday, October 18, 12
![Page 16: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/16.jpg)
Would you rather have the whole cloud down or just a small bit for a short period of time?
Big failure domains vs. small
16
Still a scale-up pattern ... wouldn’t you rather scale-out?
Thursday, October 18, 12
![Page 17: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/17.jpg)
No scale-outPair vs. Scale-out Load Balancing
17
State Sync
(100% loss)
Shared-nothing Architecture
(20% loss)
Thursday, October 18, 12
![Page 18: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/18.jpg)
No scale-outPair vs. Scale-out Load Balancing
17
State Sync
(100% loss)
Shared-nothing Architecture
(20% loss)
Thursday, October 18, 12
![Page 19: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/19.jpg)
Everything ...What’s Usually an “HA” Pair in OpenStack?
18
Service Endpoints(APIs)
Messaging System(RPC)
Worker Threads(e.g. Scheduler,
Networking)
Database(MySQL)
Thursday, October 18, 12
![Page 20: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/20.jpg)
Not much needs state synchronizationWhat needs to be an HA pair?
19
Service Endpoints(APIs)
Messaging System(RPC)
Worker Threads(e.g. Scheduler,
Networking)
Database(MySQL)
Thursday, October 18, 12
![Page 21: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/21.jpg)
Fault Tolerance Methodologies
20
Thursday, October 18, 12
![Page 22: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/22.jpg)
Fault Tolerance in OCS
21
Thursday, October 18, 12
![Page 23: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/23.jpg)
Service Distribution
22
Resilient Stateless Scale-out
High Availability Without Compromise
Thursday, October 18, 12
![Page 24: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/24.jpg)
Combines Standard Networking TechnologiesService Distribution
23
OSPF
Anycast
Load-BalancingProxy
/etc/quagga/ospfd.confrouter ospf ospf router-id 10.1.1.1 network 10.1.255.1 area 0.0.0.0
/etc/quagga/zebra.confinterface lo:2 description Pound listening address ip address 10.1.255.1/32
/etc/pound/pound.conf
ListenHTTP Address 10.1.255.1 Port 8774 xHTTP 1 Service BackEnd Address 10.1.1.1 Port 8774 End BackEnd Address 10.1.1.2 Port 8774 End EndEnd
Thursday, October 18, 12
![Page 25: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/25.jpg)
Horizontally Scalable, No Single Point Of FailureResilient OpenStack
Service Endpoints(APIs)
Messaging System(RPC)
Worker Threads(e.g. Scheduler,
Networking)
Database(MySQL)
Service Distribution ZeroMQ
MMR + HAService Distribution
Thursday, October 18, 12
![Page 26: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/26.jpg)
What Makes This a Superior Solution?Service Distribution Advantages
25
• True horizontal scalability with no centralized controller
• Services are always running, failover is nearly instant
• Reduced complexity, fewer idle resources
• No need for separate load balancers
Server Server Server Server Server Server Server ...Failover Distributed Servicesvs.
Thursday, October 18, 12
![Page 27: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/27.jpg)
Service Distribution Works With Multiple Sites
Perfect For Site Resiliency
26
• Traditional HA pairs do not support cross-site resiliency
• Service Distribution fail across sites without DNS redirections
Thursday, October 18, 12
![Page 28: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/28.jpg)
27
Example: Distributed Load Balancing
HTTP Proxy
V
HTTP Proxy
OSPF Router(s)
OSPF1)
OSPFadvertisement
OSPFadvertisement
Quagga Quagga
Service Distribution in Action
Thursday, October 18, 12
![Page 29: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/29.jpg)
28
Example: Distributed Load BalancingService Distribution in Action
HTTP Proxy
V
HTTP Proxy
OSPF Router(s)
OSPF1)
OSPFadvertisement
OSPFadvertisement
Quagga Quagga
Per-FlowLoad
Balancing
ECMP Per-flowLoad Balancing
2)
HTTP ProxyHTTP Proxy
2)
Load-balancingHTTP Proxy
3)
1)
Thursday, October 18, 12
![Page 30: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/30.jpg)
29
Example: Distributed Load BalancingService Distribution in Action
HTTP Proxy
V
HTTP Proxy
OSPF Router(s)
OSPF1)
OSPFadvertisement
OSPFadvertisement
Quagga Quagga
Per-FlowLoad
Balancing
ECMP Per-flowLoad Balancing
2)
HTTP ProxyHTTP Proxy
2)
Load-balancingHTTP Proxy
3)
1)
HTTP ProxyHTTP Proxy
Unlimited #of Back-EndServers
4)
Server ServerServer Server
Thursday, October 18, 12
![Page 31: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/31.jpg)
30
10% LoadEach
Server
Load Balancer/Proxy
Load Balancer/Proxy
Load Balancer/Proxy
Load Balancer/Proxy
Server Server Server Server
Client
1 2 3 4
1 2 3 4
Failure Resiliency
Load Balancer/Proxy
Server
Server Server Server Server Server
Client Client Client
Thursday, October 18, 12
![Page 32: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/32.jpg)
31
10% LoadEach
Server
Load Balancer/Proxy
Load Balancer/Proxy
Load Balancer/Proxy
Load Balancer/Proxy
Server Server Server Server
Client
1 2 3 4
1 2 3 4
Failure Resiliency
Load Balancer/Proxy
Server
Server Server Server Server Server
Client Client Client
X1
Thursday, October 18, 12
![Page 33: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/33.jpg)
32
Load Balancer/Proxy
Load Balancer/Proxy
Load Balancer/Proxy
Load Balancer/Proxy
Server Server Server Server
Client
1 2 3 4
1 2 3 4
Failure Resiliency
Load Balancer/Proxy
Server
Server Server Server Server Server
Client Client Client
X 10% Increased
Server Load
Thursday, October 18, 12
![Page 34: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/34.jpg)
Example: Scale-out Network Address TranslationOCS NAT Service
33
NAT
VMs
BGP Multiple ISPproviders
ServiceDistribution
Thursday, October 18, 12
![Page 35: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/35.jpg)
Avoiding RabbitMQ’s Single Point Of FailureBrokerless Messaging With ZeroMQ
34
Nova-Compute
Nova-Scheduler Nova-API
RabbitMQBroker
RabbitMQ(Brokered)
Single Point Of Failure
Thursday, October 18, 12
![Page 36: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/36.jpg)
Avoiding RabbitMQ’s Single Point Of FailureBrokerless Messaging With ZeroMQ
35
Nova-Compute
Nova-Scheduler Nova-API
RabbitMQBroker
RabbitMQ(Brokered)
Single Point Of Failure
Nova-Compute
Nova-Scheduler Nova-API
vs. ZeroMQ(Peer To Peer)
Thursday, October 18, 12
![Page 37: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/37.jpg)
What did we learn today?
36
1. HA-mmers are for nails
2. Scale-out rules for redundancy
3. Design-for-failure is a mentality, not a pair
4. Resiliency over redundancy
Thursday, October 18, 12
![Page 38: OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"](https://reader033.fdocuments.us/reader033/viewer/2022051818/54b822df4a7959ad3f8b45c7/html5/thumbnails/38.jpg)
Q&A
37
Randy Bias@randybiasCTO, Cloudscaling
Dan Sneddon@dxsSr. Engineer, Cloudscaling
OCS 2.0Public Cloud Benefits | Private Cloud Control | Open Cloud Economics
Thursday, October 18, 12