Fail over fail_back

download Fail over fail_back

If you can't read please download the document

Transcript of Fail over fail_back

Fail

Fail

over

back

Josh BerkusPostgreSQL Experts Inc.NYC PgDay 2014

mozilla logo is a trademark of the Mozilla corporation. Used here under fair use.

2 servers

1 command

admin
executesfailoverconnecttomaster?

nowhat error?shutdown
masterno responseother error

yesfail to shutdownBRINGUPstandbysuccess

did standbycome up?

nostandbyis standing by?yes

no

Automated
Failover

image from libarynth.com. used under creative commons share-alike

www.handyrep.org

Fail

over

Goals

1. Minimize Downtime

2. Minimize data loss

3. Don't make it worse!

?

Planned
vs.
Emergency

Failover once a quarter

Postgres updates

Kernel updates

Disaster Recovery drills

Just for fun!

Automated or Not?

< 1hr

false failover

testing testing
testing

complex SW

>= 1hr

2am call

training

simple script

sysadmin > software

failover in 3 parts

(1)
DetectingFailure

(2) FailingOver DB

(3) FailingOver App

1. Detecting Failure

can't connect to master

could not connect to server: Connection refused
Is the server running on host "192.168.0.1" and accepting TCP/IP connections on port 5432?

can't connect to master

down?

too busy?

network problem?

configuration error?

can't connect to master

down?failover

too busy?don't fail over

pg_isready

pg_isready
-h 192.168.1.150
-p 6433 -t 15

192.168.1.150:6433 - accepting connections

pg_isready

0 == running and accepting connections (even if too busy)1 == running but rejecting connections (security settings)2 == not responding (down?)

more checks

can ssh?master is down;failovernopostgresprocesseson master?yesexit with erroryesattemptrestartnomaster is OK;no failoversucceed

fail

check replica

pg_isready?OK to failoveryesexit with errornois replica?yesno

some rules

don't just ping 5432

misconfiguration > downtime

tradeoff:confidence

time to failover

failover time

master poll fail:ssh master:attempt restart:verify replica:failover:

1 101 103 151 53 209 60

AppServerOneAppServerTwo

PARTITION

AppServerOneAppServerTwo

PARTITION

AppServerOneAppServerTwo

Broker

AppServerOneAppServerTwo

Proxy

Failing Over the DB

Failing Over the DB

choose a replica target

shutdown the master

promote the replica

verify the replica

remaster other replicas

Choosing a replica

One replica

Designated replica

Furthest ahead replica

One Replica

fail over to itor don'twell, that's easy

Designated Replica

load-free replica, or

cascade master, or

syncronous replica

Furthest Ahead

Pool of replicas

Least data loss

Least downtime

Other replicas can remaster

but what's furthest ahead?

receive vs. replay

receive == data it has

replay == data it applied

receive vs. replay

receive == data it hasfurthest ahead

replay == data it appliedmost caught up

receive vs. replay

get the furthest ahead, but not more than 2 hours behind on replay

receive vs. replay

get the furthest ahead, but not more than 1GB behind on replay

timestamp?

pg_last_xact_replay_timestamp()

last transaction commit

not last data

same timestamp, different receive positions

Position?

pg_xlog_location_diff()compare two XLOG locations

byte position

comparable granularly

Position?

select
pg_xlog_location_diff(
pg_current_xlog_location(), '0/0000000');---------------701505732608

Position?

rep1: 701505732608

rep2: 701505737072

rep3: 701312124416

Replay?

more replay == slower promotion

figure out max. acceptable

sacrifice the delayed replica

Replay?

SELECT pg_xlog_location_diff( pg_last_xlog_receive_location(), pg_last_xlog_replay_location());--------------- 1232132

Replay?

SELECT pg_xlog_location_diff( pg_last_xlog_receive_location(), pg_last_xlog_replay_location());--------------- 4294967296

master shutdown

STONITH

make sure master can't restart

or can't be reached

Terminate or Isolate

promotion

pg_ctl promotemake sure it worked

may have to waithow long?

Remastering

remastering pre-9.3

all replicas are set to:
recovery_target_timeline = 'latest'

change primary_conninfo to new master

all must pull from common archive

restart replicas

remastering pre-9.3

remastering pre-9.3

remastering pre-9.3

remastering pre-9.3

remastering post-9.3

remastering post-9.3

remastering post-9.3

remastering post-9.3

all replicas are set to:
recovery_target_timeline = 'latest'

change primary_conninfo to new master

restart replicas

restart problem

must restart to remasternot likely to change soon

break connections
vs.
fall behind

3. Application Failover

3. Application Failover

old master new master
for read-write

old replicas new replicas
for load balancing

fast: prevent split-brain

CMS method

update Configuration Management System

push change to all application servers

CMS method

slow

asynchronous

hard to confirm 100% complete

network split?

zookeeper method

write new connection config to zookeeper

application servers pull connection info from zookeeper

zookeeper method

asynchronousor poor response time

delay to verify

network split?

Pacemaker method

master has virtual IP

applications connect to VIP

Pacemaker reassigns VIP on fail

Pacemaker advantages

2-node solution (mostly)

synchronous

fast

absolute isolation

Pacemaker drawbacks

really hard to configure

poor integration with load-balancing

automated failure detection too simplecan't be disabled

proxy method

application servers connect to db via proxies

change proxy config

restart/reload proxies

AppServerOneAppServerTwo

Proxy

AppServerOneAppServerTwo

Proxy

proxies

pgBouncer

pgPool

HAProxy

Zeus, BigIP, Cisco

FEMEBE

Fail

back

what?

after failover, make the old master the master again

why?

old master is better machine?

some server locations hardcoded?

doing maintenance on both servers?

why not?

bad infrastructure design?

takes a while?

need to verify old master?

just spin up a new instance?

pg_basebackup

pg_basebackup

rsync

reduce time/data for old master recopy

doesn't work as well as you'd expecthint bits

pg_rewind ++

use XLOG + data files for rsync

super fast master resync

pg_rewind --

not yet stable

need to have all XLOGsdoesn't yet support archives

need checksumsor 9.4's wal_log_hints

Automated
Failback

www.handyrep.org

Fork it on Github!

Questions?

github.com/pgexperts/HandyRepfork it!

Josh Berkus: [email protected]: www.pgexperts.com

Blog: www.databasesoup.com

Copyright 2014 PostgreSQL Experts Inc. Released under the Creative Commons Share-Alike 3.0 License. All images, logos and trademarks are the property of their respective owners and are used under principles of fair use unless otherwise noted.

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline Level