Post on 15-Jan-2017
POSTGRESQL REPLICATIONNUR AGUS SURYOKO
SIMPLICTIC WAY TO REPLICATE POSTGRES
Primary DB Standby DB
1. Backup 3. RESTORE2. RSYNC
THIS METHOD IS… It work, yes• But:• With database growing, replication
getting slower• When the duration of backup +
rsync + restore > the interval = disaster
• Rely heavily on scripts
PROPOSAL
PostgresReplicationAsynchronousHot-Stanby
IN ONE SENTENCE
Same with Oracle Dataguard and DB2-HADR
AT A GLANCE
Primary DB Standby DB
Archive-Log Archive-Log
1. SQL Connection Open
2. Restore Database + Copy Archive Log
3. Replication in Sync
REMEMBER DB2-HADR? JUST THE SAME
The asynchronous one
YEAH, BABY!
READ-ONLY QUERY ON STANDBY
Same capability with Oracle Active Dataguard and DB2-HADR RoS (Read on Standby)
LET’S GET TO WORK2 nodes Installation: Download EnterpriseDB PostgreSQL binary distribution, save a lot of workConventions:Node1 hostname: userverNode2 hostname: userver2
Create sample table:CREATE TABLE test (
id uuid DEFAULT uuid_generate_v4() NOT NULL,
waktu timestamp with time zone DEFAULT now() NOT NULL,
text character varying(255)
);
CONTINUOUSCreate this script:#!/bin/bashwhile :do psql -d test -c "insert into test(text) values('X')"; sleep 1;done;
Run this script on node-1:primaryYou see what we are trying to do here…
NETWORKS• Maintain /etc/hosts on each nodes so that they can ping
each other• Configure ssh-key for user postgres on each nodes. Allow
passwordless ssh
PRIMARY SETUPpsql# create user replicator replication login encrypted password 'Initial1';
Keep the name. We don’t want any trouble. Believe me, I tried
PRIMARY SETUPedit postgresql.conf:wal_level = hot_standbyfsync = onsynchronous_commit = localwal_sync_method = fsyncwal_compression = onarchive_mode = onarchive_command = 'test ! -f /opt/postgresql/9.5/backup/archive/%f && cp %p /opt/postgresql/9.5/backup/archive/%f'max_wal_senders = 8wal_keep_segments = 24wal_sender_timeout = 10shot_standby = on
Careful with reddy things, adjust accordingly
PRIMARY SETUPas root: mkdir -p /opt/postgresql/9.5/backup/archiveas root: mkdir -p /opt/postgresql/9.5/backupas root: chown -R postgres:postgres /opt/postgresql/9.5/backuprestart postgres primaryensure no error in $PGDATA/pg_log
PRIMARY SETUPedit pg_hba.conf:host all replicator 192.168.56.105/32 md5host replication replicator 192.168.56.105/32 md5restart postgres primaryensure no error in $PGDATA/pg_log
Why 2? Because replication is not a database. It is a special keyword
STANDBY SETUPtest connection to primary:psql -h userver -d test -U replication -W
STANDBY SETUPedit postgresql.conf:wal_level = hot_standbyfsync = onsynchronous_commit = localwal_sync_method = fsyncwal_compression = onarchive_mode = onarchive_command = 'test ! -f /opt/postgresql/9.5/backup/archive/%f && cp %p /opt/postgresql/9.5/backup/archive/%f'max_wal_senders = 8wal_keep_segments = 24wal_sender_timeout = 10shot_standby = on
STANDBY SETUPdestroy db on standbyroot: rm -rf /opt/postgresql/9.5/dataroot: mkdir -p /opt/postgresql/9.5/dataroot: chown -R postgres:postgres /opt/postgresql/9.5/dataroot: chmod -R 700 /opt/postgresql/9.5/data
Destroy db, not the engine
backup - and direct restore from primary:pg_basebackup -h userver -D /opt/postgresql/9.5/data -U replicator -v -P
STANDBY SETUPprepare file /opt/postgresql/9.5/data/recovery.conf:standby_mode = 'on'primary_conninfo = 'host=userver port=5432 user=replicator password=Initial1'trigger_file = '/home/postgres/trigger.file‘
start standbycheck test table. data should be updated periodically
STANDBY SETUPcheck replication status:test=# select * from pg_stat_replication;sync_state must be ‘sync’
SYNC
TEST1: NODE2 SHUTDOWN OSnode1: primarynode2: standbyshutdown node2transaction on primary still goingreplication status:test=# select * from pg_stat_replication;<empty>
Test Result: OK
TEST2: START STANDBYnode1: primarynode2: standbystartup node2auto-start after OS, and sync still goingcheck replication status:test=# select * from pg_stat_replication;sync_state = ‘sync’
Test Result: OK
TEST3: NORMAL FAILOVERnode1: primarynode2: standbystop periodic insertmark last inserted value (primary): 2668fadd-59dc-468a-ae85-65f9750bc336 | 2015-11-05 22:04:15.658038+07 | Xshutdown node1
TEST3: NORMAL FAILOVERfound disconnected log in standby, this is normal:2015-11-05 22:14:18 WIB FATAL: could not connect to the primary server: could not connect to server: Connection refusedIs the server running on host "userver" (192.168.56.104) and acceptingTCP/IP connections on port 5432?
TEST3: NORMAL FAILOVERnode2 as primarytouch /home/postgres/trigger.filecheck log, this is a successful failover message:2015-11-05 22:15:53 WIB LOG: trigger file found: /home/postgres/trigger.file2015-11-05 22:15:53 WIB LOG: redo done at 0/100000282015-11-05 22:15:53 WIB LOG: last completed transaction was at log time 2015-11-05 22:04:15.658499+072015-11-05 22:15:53 WIB LOG: selected new timeline ID: 22015-11-05 22:15:54 WIB LOG: archive recovery complete2015-11-05 22:15:54 WIB LOG: MultiXact member wraparound protections are now enabled2015-11-05 22:15:54 WIB LOG: database system is ready to accept connections2015-11-05 22:15:54 WIB LOG: autovacuum launcher started
TEST3: NORMAL FAILOVERcheck latest inserted data, and compare to marked data:
2668fadd-59dc-468a-ae85-65f9750bc336 | 2015-11-05 22:04:15.658038+07 | X
data match!
ensure /home/postgres/trigger.file is deleted automatically by postgres
Test Result: OK
TEST4: NETWORK DISCONNECTnode1: primary
node2: standby
ensure the database is in sync
cut network connection
node1: primary: not in sync
node2: cannot connect to primary
re-connect network
node1: sync
node2: log report started streaming WAL
Test Result: OK
TEST5: NODE1 SHUTDOWN node1: primary
node2: standby
shutdown node1: primary
startup node1: primary
after startup, automatically sync
check test database, updated periodically
Test Result: OK
THANK YOU