Building a High-Availability PostgreSQL Cluster - Team...

20
Building a High-Availability PostgreSQL Cluster Presenter: Devon Mizelle System Administrator Co-Author: Steven Bambling System Administrator ARIN — “critical internet infrastructure”

Transcript of Building a High-Availability PostgreSQL Cluster - Team...

Page 1: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Building a High-Availability PostgreSQL Cluster

Presenter: Devon Mizelle System Administrator

Co-Author: Steven Bambling System Administrator

ARIN — “critical internet infrastructure”

Page 2: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

What is ARIN?•Regional internet registry for North

America and parts of the Caribbean.

•Distributes IPv4 & IPv6 addresses and

Autonomous System Numbers (Internet

number resources) in the region

•Provides authoritative WHOIS services

for number resources in the region

2

Page 3: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

ARIN’s Internal Data

3

!Inside of our database exists all of the v4 and v6 networks that we manage, the organizations that they belong to, and the contacts at those organizations. This means that data integrity and how we store said data is extremely important.

Page 4: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Requirements

4

Multi-­‐member  Automatic  Failover  Prevent  a  ‘tainted’  master  from  coming  online  Needs  to  be  ACID-­‐Compliant  

Page 5: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Why Not Slony or pgpool-II?

• Slony replaces pgSQL’s replication – Why do this? –Why not let pgSQL handle it?

• Pgpool is not ACID-Compliant – Doesn’t confirm writes to multiple nodes

5

Page 6: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Our solution

• CMAN / Corosync – Red Hat + Open-source solution for cross-

node communication • Pacemaker – Red Hat and Novell’s solution for service

management and fencing • Both under active development by

Clusterlabs

6

Interested in using it due to active development by Clusterlab

Page 7: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

CMAN/ Corosync

• Provides a messaging framework between nodes

• Handles a heartbeat between nodes – “Are you up and available?” – Does not provide ‘status’ of service,

Pacemaker does • Pacemaker uses Corosync to send

messages between nodes

7

CMAN has the ability to do more - but we just use it as a messaging framework

Page 8: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

CMAN / Corosync

8

Builds  a  cluster  ‘ring’  using  a  configuration  file  Used  by  Pacemaker  in  order  to  pass  status  messages  between  the  nodes  Simply  a  framework  for  communication  –  no  heavy  lifting  in  our  implementation

Page 9: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

About Pacemaker

• Developed / maintained by Red Hat and Novell • Scalable – Anywhere from a two-node to a 16-

node setup • Scriptable – Resource scripts can be written in

any language – Monitoring – Watches out for service state changes – Fencing – Disables a box and switches roles when

failures occur • Shareable database between nodes about

status of services / nodes

9

Page 10: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Pacemaker

10

Master

AsyncSync

?

An  XML  ‘database’  (known  as  a  CIB  -­‐  cluster  information  base)  is  generated  with  the  status  of  each  resource  and  passed  between  nodes    The  state  of  pgSQL  is  controlled  by  Pacemaker  itself  Pacemaker  uses  a  ‘resource  script’  to  interact  with  pgSQL  Can  determine  the  state  of  the  service  (Master  /  Sync  /  Async)

Page 11: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Other Pacemaker Resources

11

Fencing IP  Addresses

Pacemaker also handles the following resources besides PGSQL: * Fencing of resources * IP Address colocation

Page 12: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

How does it all tie together?From the bottom up…

12

Page 13: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Pacemaker

13

Client  “vip”Replication  “vip”

Master

Sync Async App

All  slaves  in  the  cluster  point  to  a  replication  ‘vip’  This  interface  moves  to  whichever  node  is  the  master  -­‐  this  is  called  a  colocation  constraint  Another  ‘vip’  for  our  application  servers  to  connect  to  follows  the  master  as  well  

Page 14: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Event Scenario

14

?X

XMaster Sync AsyncMaster SyncAsync

In  the  event  that  a  node  becomes  unavailable,  cman  notifies  pacemaker  to  ‘fence’  or  shut  off  communication  to  the  node  via  SNMP  to  the  switch  The  SYNC  slave  becomes  the  Master  The  ASYNC  slave  becomes  the  SYNC  slave  Upon  manual  recovery,  the  old  Master  becomes  the  async  slave  If  any  resources  inside  of  Pacemaker  on  the  master  fail  their  monitoring  check,  fencing  occurs  as  well  These  resources  include:  

Both  replication  and  client  ‘vips’  

Page 15: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

PostgreSQL

• Still in charge of replicating data • The state of the service and how it

starts is controlled by Pacemaker

15

Page 16: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Layout

16

💙 💙

MasterSlave Slave

cman cman cman

Client

Page 17: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Using Tools to Look DeeperIntrospection…

17

Page 18: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

# crm_mon -i 1 -Arf

18

We  disable  quorum  within  the  pacemaker  HA  cluster  to  allow  for  failure  down  to  a  single  node  cluster  in  the  event  multiple  nodes  fail  • 8  Resources  configured  • ofce::heartbeat::IPaddr2  is  the  resource  used  to  create  the  vip  –  can  be  shell,  ruby,  etc.  • Primitive  vs  multistate  

• Primitive  –  only  runs  on  one  of  the  nodes  in  the  cluster  (vips,  fencing)  • Multi-­‐state  resource  –  runs  on  multiple  nodes  (pgsql)  

• The  vips  are  colocated.  If  anything  happens  to  either  of  them,  the  entire  node  fails  and  moves  to  the  next  master  • There  is  a  specific  check  interval  for  each  resource  • stonith  for  fencing  

Page 19: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

# crm_mon –i 1 -Arf (cont)

19

*  All  of  the  status  comes  from  the  pgsql  pacemaker  resource  script  • receiver-­‐status  is  error  because  the  resource  is  written  to  monitor  and  check  for  cascading.  We  don’t  use  cascading,  haven’t  invested  cycles  • Master-­‐postgresql  is  the  ‘weight’.  Uses  the  weight  to  determine  whom  should  be  promoted  next  in  line,  which  is  why  async  has  –INFINITY  • STREAMING  

Page 20: Building a High-Availability PostgreSQL Cluster - Team …teamarin.net/.../Building-a-High-Availability-PostgreSQL-Cluster.pdf · Building a High-Availability PostgreSQL Cluster Presenter:

Questions?20

Devon  Mizelle