Pacemaker: OpenStack's Pid 1

110
David Vossel <[email protected]> PACEMAKER OpenStack Summit May 21, 2015 David Vossel <[email protected]> OpenStack's PID 1

Transcript of Pacemaker: OpenStack's Pid 1

David Vossel <[email protected]>

PACEMAKER

OpenStack SummitMay 21, 2015David Vossel <[email protected]>

OpenStack's PID 1

David Vossel <[email protected]>

Story Time

David Vossel <[email protected]>

And how Pacemaker Saved the DayThe Future of HA

David Vossel <[email protected]>

There once was a database

David Vossel <[email protected]>

There once was a database

DB

David Vossel <[email protected]>

Not like the other databases

There once was a database

DB

David Vossel <[email protected]>

DB

Not like the other databases

There once was a database

A distributed self replicating database

David Vossel <[email protected]>

Not like the other databases

There once was a database

A distributed self replicating database

Active/Active Replicated Database

DB DB DB DB

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Everyone: HOORAY!

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Everyone: Load Balance?

Clients

????????????????????

David Vossel <[email protected]>

HAProxy enters the scene.

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

proxy

HA Proxy: No Problem

Clients

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

proxy

HA Proxy: No Problem

Clients

Everyone:Hooray!

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

proxy

Clients

* Everyone: ummmmm

David Vossel <[email protected]>

Everyone: What if a node dies?

Active/Active Replicated Database

DB DB DB DB

proxy

Clients

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

HA Proxy: No Problem

Everyone: What if a node dies?

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

* Everyone: ummmmm

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

Everyone: But... What if proxy dies?

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

HA Proxy:...

Everyone: But... What if proxy dies?

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

Everyone:What?????

HA Proxy:...

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

proxy

Everyone:Uhhh???!!!!??

HA Proxy:...

????????????????????

David Vossel <[email protected]>

Active/Active Replicated Database

DB DB DB DB

Clients

Everyone:Hello????

HA Proxy:...

????????????????????

David Vossel <[email protected]>

Clients

Everyone:Anyone?

????????????????????

David Vossel <[email protected]>

Everyone:I knew this cloud thing wouldn't work...

David Vossel <[email protected]>

KeepaliveD: Wait! Guys... I've got an idea!!!

Everyone:I knew this cloud thing wouldn't work...

David Vossel <[email protected]>

KeepaliveD enters the scene.

David Vossel <[email protected]>

keepaliveD

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Everyone:Whoa, Proxy Failover!

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Everyone:Awesome!!

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Everyone:Awesome!!

Keepalived:: I know right?!!

David Vossel <[email protected]>

keepaliveD

Clients

Everyone: No Way!!!

Keepalived: I Rock!

keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

David Vossel <[email protected]>

keepaliveD

Clients

keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Everyone:Wait... Hold up

David Vossel <[email protected]>

keepaliveD

Clients

keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Everyone: But.... What actually happens to the “other” nodes?

David Vossel <[email protected]>

Peripeteia - per·i·pe·tei·a

a sudden reversal of fortune or change in circumstances, especially in reference to fictional narrative.

David Vossel <[email protected]>

keepaliveDkeepaliveD keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

ClientsKeepalived:What Other Nodes? Keepalived:What Other Nodes?

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Clients: HEY?! How come the thing I just wrote isn't in the database?

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

Clients: HEY?! How come the thing I just wrote isn't in the database?

Clients:Yeah, what's going on?!

David Vossel <[email protected]>

keepaliveDkeepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Clients

* Everyone: I've made a huge mistake....

David Vossel <[email protected]>

The Missing Piece?

David Vossel <[email protected]>

David Vossel <[email protected]>

Pacemaker: System Level HA

● System level HA is holistic.

Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

David Vossel <[email protected]>

Pacemaker

Pacemaker: System Level HA

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

● System level HA is holistic.

● Defines the policy of how to recover a set of applications

David Vossel <[email protected]>

Pacemaker

Pacemaker: System Level HA

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

● System level HA is holistic.

● Defines the policy of how to recover a set of applications

● Enforces the policy to achieve system wide deterministic behavior.

David Vossel <[email protected]>

● We don't question what happened to the other nodes... We know.

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Back to the Story... How does Pacemaker Help?

David Vossel <[email protected]>

● We don't question what happened to the other nodes... We know.

● And how do we know this?

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Back to the Story... How does Pacemaker Help?

David Vossel <[email protected]>

Introducing STONITH

David Vossel <[email protected]>

Introducing STONITH

Shoot the Other Node in the Head

David Vossel <[email protected]>

STONITH = Pacemaker's Fencing Daemon.

● Pacemaker knows the state of lost/misbehaving nodes

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

David Vossel <[email protected]>

STONITH = Pacemaker's Fencing Daemon.

● Pacemaker knows the state of lost/misbehaving nodes

● Because with fencing via STONITH... that state is dead.

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

David Vossel <[email protected]>

Quick Recap...

David Vossel <[email protected]>

Without Pacemaker+STONITH

keepaliveDkeepaliveD keepaliveD

Active/Active Replicated DatabaseActive/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

ClientsKeepalived:I dunno. Who cares? Keepalived:I dunno. Who cares?

David Vossel <[email protected]>

With Pacemaker+STONITH...

Pacemaker Pacemaker

Active/Active Replicated Database

DB DB DB DB

proxy proxy proxy proxy

VIP VIP VIP VIP

Pacemaker:They are dead because I killed them..

Clients

David Vossel <[email protected]>

The Takeaway.

● Pacemaker and load balancing are NOT mutually exclusive.

David Vossel <[email protected]>

The Takeaway.

● Pacemaker and load balancing are NOT mutually exclusive.

● Pacemaker and HAProxy are meant for one another.

David Vossel <[email protected]>

The Takeaway.

● Pacemaker and load balancing are NOT mutually exclusive.

● Pacemaker and HAProxy are meant for one another.

HAProxy

Pacemaker

David Vossel <[email protected]>

PacemakerThe Distributed PID 1

David Vossel <[email protected]>

Modern PID 1's role.

● SystemD:

● Launch services parallel● yet observe strict ordering between dependent services.● Monitor/recover failed resources.

systemd

galerarabbitmqStart Order Unrelated dependencies

start in parallel.

Ordering is enforced. These services can start in parallel only after their dependencies start.Nova

redis

ceilometer

David Vossel <[email protected]>

The Problem

● OpenStack services are not isolated to a local machine.

systemd

Galera ClusterForm Galera cluster

Then Start Nova.

Nova

NODE1 NODE2 NODE3 NODE4

systemd systemd systemd

David Vossel <[email protected]>

The Problem

● OpenStack services are not isolated to a local machine.

● SystemD Can't coordinate this.

systemd

NODE1 NODE2 NODE3 NODE4

systemd systemd systemd

Galera ClusterForm Galera cluster

Then Start Nova.

Nova

David Vossel <[email protected]>

The Fix.

● But Pacemaker can

● because Pacemaker is distributed.

Pacemaker

NODE1 NODE2 NODE3 NODE4

Galera ClusterForm Galera cluster

Then Start Nova.

Nova

David Vossel <[email protected]>

The Fix.

● Pacemaker, just like systemd, can...

● Launch services parallel● yet observe strict ordering between dependent services.● Monitor/recover failed resources.

Pacemaker

galerarabbitmqStart Order Unrelated dependencies

start in parallel.

Ordering is enforced. These services can start in parallel only after their dependencies start.Nova

redis

ceilometer

David Vossel <[email protected]>

The Fix.● Except pacemaker can coordinate this across any number of nodes.

Pacemaker

NODE1 NODE2 NODE3 NODE4

RabbitMQ Cluster

NODE5 NODE6 NODE7 NODE8 NODE9

Galera Cluster Redis Cluster

Ceilometer ClusterNova Cluster

David Vossel <[email protected]>

The Fix.● Except pacemaker can coordinate this across any number of nodes.

● With any number of resources

Pacemaker

NODE1 NODE2 NODE3 NODE4 NODE5 NODE6 NODE7 NODE8 NODE9

HAProxy HAProxy HAProxy

VIP-Galera VIP-RedisVIP-Rabbit VIP-Nova

HAProxy

VIP-keys

HAProxy

VIP-cinder VIP-celio

Keystone ClusterCeilometer Cluster

Nova Cluster

Cinder Cluster

VIP-glance

HAProxy

VIP-neutron

HAProxy

Glance Cluster

Horizon Cluster

HAProxy HAProxy

RabbitMQ ClusterGalera Cluster Redis Cluster

Neutron Cluster

Swift Cluster

Heat Cluster

David Vossel <[email protected]>

Resource Constraints

● Pacemaker has unique capabilities for managing resources and modeling complex resource dependencies.

● Examples:

● Start resource X then start resource Y● Colocate resource X with resource Y● Resource X prefers node A over node B● Resource X prefers node A between 8am-5pm

David Vossel <[email protected]>

ArchitectureHA OpenStack Controller Nodes

David Vossel <[email protected]>

How it works... in one sentence

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

David Vossel <[email protected]>

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

Each distributed service has a front end Virtual IP tied to a Load Balancer.

How it works... in one sentence

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

HA-proxyHA-proxy

David Vossel <[email protected]>

Each distributed service has a front end Virtual IP tied to a Load Balancer.

Each Load Balancer routes VIP traffic to its respective Active/Active service instances

How it works... in one sentence

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

HA-proxyHA-proxy

David Vossel <[email protected]>

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

Each distributed service has a front end Virtual IP tied to a Load Balancer.

Each Load Balancer routes VIP traffic to its respective Active/Active service instances

How it works... in one sentence

HA-proxy

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

HA-proxy

David Vossel <[email protected]>

Each distributed service has a front end Virtual IP tied to a Load Balancer.

Each Load Balancer routes VIP traffic to its respective Active/Active service instances

Active/Active OpenStack Controller service... like Nova, Glance, Keystone, Galera, Redis, Rabbitmq, ect...

How it works... in one sentence

Pacemaker managing Virtual IPs + Load Balancers + Controller services to maximize the availability of OpenStack APIs

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

HA-proxyHA-proxy

David Vossel <[email protected]>

PROXY CLONE

SERVICE CLONE

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

Scaling with Resource Clones

● Pacemaker's ability to clone services makes scaling trivial.

● Want more instance?

HA-proxy HA-proxy

David Vossel <[email protected]>

PROXY CLONE

SERVICE CLONE

Pacemaker

HA-proxy

VIP

Service Service Service

NODE1 NODE2 NODE3

Scaling with Resource Clones

NODE4 NODE5 NODE6

Service Service Service

● Increment the number of clone instances pacemaker is allowed to run for a service to scale service instances.

HA-proxy HA-proxy HA-proxy HA-proxy HA-proxy

David Vossel <[email protected]>

Pacemaker

HA-proxy

Galera VIP

Galera Galera Galera

NODE1 NODE2 NODE3

How it works, continued... ● Services interact with one another using each service's Virtual IP● Example: Both Glance and Nova need access to Galera... Galera is

accessed via the front end Virtual IP and those requests are distributed to the backend galera cluster.

Glance NovaDB Request DB Request

HA-proxyHA-proxy

David Vossel <[email protected]>

Deployment StrategiesHA OpenStack Controller Nodes

David Vossel <[email protected]>

Collapsed Architecture

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

● All controller nodes run the same services.

● VIP+load balancers distribute access to APIs across cloned nodes.

David Vossel <[email protected]>

Collapsed Architecture

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

NOVA VIP

Clients

David Vossel <[email protected]>

Collapsed Architecture

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Glance VIP

Clients

David Vossel <[email protected]>

Collapsed Architecture Scaling

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

NODE4

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE5

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE6

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

David Vossel <[email protected]>

Startup Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Bootstrap and start Start Galera Cluster across multiple nodes.

David Vossel <[email protected]>

Startup Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Bootstrap and start Start Galera Cluster across multiple nodes.

Then Start Nova instances, which depend on an active Galera cluster.

David Vossel <[email protected]>

Stop ordering Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

If we're shutting down galera cluster

David Vossel <[email protected]>

Stop ordering Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Stop everything that depends on Galera. Like Nova...

David Vossel <[email protected]>

Stop ordering Ordering Revisited.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Then Shutdown Galera cluster.

David Vossel <[email protected]>

Complex Startup Ordering

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis SlaveNeutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Start Redis Clone.

David Vossel <[email protected]>

Complex Startup Ordering

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis MasterNeutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Promote one instance of Redisclone to be Master Instance.

David Vossel <[email protected]>

Complex Startup Ordering

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis Slave Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera Redis MasterNeutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

Then start Ceilometer cluster whichdepends on Redis

David Vossel <[email protected]>

Segregated Architecture

Pacemaker

● Each service runs on its own dedicated hardware.

● Scales much further

● Add capacity where capacity makes sense.

● Requires lots and lots of nodes.

David Vossel <[email protected]>

Segregated Architecture

Pacemaker

● Take a closer look.

David Vossel <[email protected]>

Segregated Architecture

PacemakerNODE1

Galera

NODE2

Galera

NODE3

Galera

● Possible to have have an entire set of nodes just for Load balancing

● Dedicated cluster for galera

NODE4

Proxy

NODE5

Proxy

NODE6

Proxy

NODE7

Nova

NODE8

Nova

NODE9

Nova VIP VIP VIP

More services that way.

Glance

NODE9

VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP

David Vossel <[email protected]>

Mixed Architecture

● Mixture of collapsed and segregated.

● Break some components into separate hardware

● Most of cluster is collapsed

PacemakerNODE1

Galera

NODE2

Galera

NODE3

Galera

NODE4

Proxy

NODE5

Proxy

NODE6

Proxy

NODE7 NODE8 NODE9

VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP VIP

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Redis

Neutron N

Swift

Heat

Neutron S

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Redis

Neutron N

Swift

Heat

Neutron S

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Redis

Neutron N

Swift

Heat

Neutron S

David Vossel <[email protected]>

Pacemaker Advantages● Automates bootstrap of services that previously required hand holding.

● Example: Automate Galera bootstrap

1. Find out which galera instance is most up-to-date

2. Bootstrap most current galera instance first.

3. Then sync other galera instances

David Vossel <[email protected]>

Pacemaker Advantages. Continued... ● start/stop distributed services in a graceful ordered manner.

● Gracefully and controller node into standby for maintenance.

● Dynamically grow capacity by adding more pacemaker nodes

● Centralized view of distributed service state.

David Vossel <[email protected]>

ArchitectureHA OpenStack Compute Nodes

David Vossel <[email protected]>

HA for Cattle

● Both Pets and Cattle need High Availability.● Recognize the techniques used for each are different.

David Vossel <[email protected]>

Pacemaker for Pets and small Herds.

● No limits in the number of resources.

● Pacemaker supports “n-node” clusters.

● Cluster are limited by the Corosync messaging layer to 16 nodes.

Pacemaker

David Vossel <[email protected]>

Pacemaker Remote for the Cattle.

● Pacemaker Remote allows clusters to scale beyond corosync membership layer limitations.

● Pacemaker Remote can scale clusters to 100s possibly 1000s of nodes.

Pacemaker + Pacemaker Remote

David Vossel <[email protected]>

The Solution: Pacemaker Remote

● Pacemaker Remote is a single daemon, pacemaker_remoted

Pacemaker Remote

Remote Node

David Vossel <[email protected]>

The Solution: Pacemaker Remote

Pacemaker

Node 2 Node 3Node 1

Pacemaker Remote

Node 5 Node 6Node 4Node 8 Node 9Node 7

Node 11 Node 12Node 10Node 14 Node 15Node 13

Node 16

Remote Node

● Pacemaker Remote is a single daemon, pacemaker_remoted● This daemon is a lightweight way of integrating nodes into the cluster.

David Vossel <[email protected]>

The Solution: Pacemaker Remote

Pacemaker

Node 2 Node 3Node 1

Pacemaker Remote

Node 5 Node 6Node 4Node 8 Node 9Node 7

Node 11 Node 12Node 10Node 14 Node 15Node 13

Node 16

Remote Node

● Pacemaker Remote is a single daemon, pacemaker_remoted● This daemon is a lightweight way of integrating nodes into the cluster.

● Cluster services spread out across pacemaker and pacemaker_remote nodes as a single cluster partition.

Cluster Services

David Vossel <[email protected]>

Pacemaker Remote use-case

Pacemaker

Node 2 Node 3Node 1

Pacemaker Remote

Node 5 Node 6Node 4Node 8 Node 9Node 7

Node 11 Node 12Node 10Node 14 Node 15Node 13

Node 16

Remote Node

● Management of services on Pacemaker Remote can work just like Pacemaker

● But thrives in the cattle use case where every remote instance is identical.

Cluster Services

Cloned service

Pacemaker RemoteRemote Node

Cloned service

Pacemaker RemoteRemote Node

Cloned service

David Vossel <[email protected]>

Pacemaker Remote use case

Pacemaker

Node 2 Node 3Node 1

● Cloned services scale quite well on pacemaker remote

Cluster Services

Cloned services

Remote Node

Pacemaker Remote

David Vossel <[email protected]>

Compute Node HA Strategy.

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per service

David Vossel <[email protected]>

Compute Node HA Strategy.

Compute Service Group

Remote Node

Pacemaker Remote

PacemakerNODE1

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE2

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

NODE3

Keystone

Ceilometer

Nova Cinder

Glance

Horizon

RabbitMQ

Galera

Redis

Neutron N

Swift

Heat

Neutron S

HA Proxy HA Proxy HA Proxy

Separate VIP per serviceLibvirtdD

Neutron Open vSwitch

Ceilometer Compute Nova Compute

David Vossel <[email protected]>

Why HA Compute Nodes?

● Maximize the Availability of Compute Instances.● detection of dead Cattle instances● Automate recovery of Cattle instances

David Vossel <[email protected]>

Why HA Compute Nodes?

● Maximize the Availability of Compute Instances.● detection of dead Cattle instances● Automate recovery of Cattle instances

● Pacemaker Remote also has a secret weapon.

David Vossel <[email protected]>

STONITH + Pacemaker Remote.

David Vossel <[email protected]>

The Future

David Vossel <[email protected]>

Limits?

Pacemaker + Pacemaker Remote

David Vossel <[email protected]>

How manyservices?

Pacemaker + Pacemaker Remote

David Vossel <[email protected]>

Pacemaker + Pacemaker Remote

How manyservices?

How manynodes?

David Vossel <[email protected]>

Questions?

Visit us at

clusterlabs.org