PostgreSQL replication strategies Understanding High Availability and choosing the right solution

47
© Continuent 06/17/22 PostgreSQL replication strategies Understanding High Availability and choosing the right solution [email protected] om [email protected] Slides available at http://sequoia.continuent.org/Resources

description

Slides available at http://sequoia.continuent.org/Resources. PostgreSQL replication strategies Understanding High Availability and choosing the right solution. [email protected] [email protected]. What Drives Database Replication?. - PowerPoint PPT Presentation

Transcript of PostgreSQL replication strategies Understanding High Availability and choosing the right solution

Page 1: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

© Continuent04/21/23

PostgreSQL replication strategies

Understanding High Availability and choosing the right solution

[email protected]

[email protected]

Slides available at http://sequoia.continuent.org/Resources

Page 2: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

2 © Continuent www.continuent.com

What Drives Database Replication?

/ Availability – Ensure applications remain up and running when there are hardware/software failures as well as during scheduled maintenance on database hosts

/ Read Scaling – Distribute queries, reports, and I/O-intensive operations like backup, e.g., on media or forum web sites

/ Write Scaling – Distribute updates across multiple databases, for example to support telco message processing or document/web indexing

/ Super Durable Commit – Ensure that valuable transactions such as financial or medical data commit to multiple databases to avoid loss

/ Disaster Recovery – Maintain data and processing resources in a remote location to ensure business continuity

/ Geo-cluster – Allow users in different geographic locations to use a local database for processing with automatic synchronization to other hosts

Page 3: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

3 © Continuent www.continuent.com

High availability

/ The magic nines

Percent uptime Downtime/month Downtime/year

99.0% 7.2 hours 3.65 days

99.9% 43.2 minutes 8.76 hours

99.99% 4.32 minutes 52.56 minutes

99.999% 0.43 minutes 5.26 minutes

99.9999% 2.6 seconds 31 seconds

Page 4: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

4 © Continuent www.continuent.com

Few definitions

/ MTBF• Mean Time Between Failure• Total MTBF of a cluster must combine MTBF of its

individual components• Consider mean-time-between-system-abort (MTBSA)

or mean-time-between-critical-failure (MTBCF)

/ MTTR• Mean Time To Repair• How is the failure detected?• How is it notified?• Where are the spare parts for hardware?• What does your support contract say?

Page 5: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

5 © Continuent www.continuent.com

Outline

/ Database replication strategies

/ PostgreSQL replication solutions

/ Building HA solutions

/ Management issues in production

Page 6: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

6 © Continuent www.continuent.com

/ Clients connect to the application server

/ Application server builds web pages with data coming from the database

/ Application server clustering solves application server failure

/ Database outage causes overall system outage

Internet

Database

DatabaseDisk

Applicationservers

Problem: Database is the weakest link

Page 7: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

7 © Continuent www.continuent.com

Disk replication/clustering

/ Eliminates the single point of failure (SPOF) on the disk

/ Disk failure does not cause database outage

/ Database outage problem still not solved

Internet

Database

Database disks

Applicationservers

Page 8: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

8 © Continuent www.continuent.com

/ Multiple database instances share the same disk

/ Disk can be replicated to prevent SPOF on disk

/ No dynamic load balancing

/ Database failure not transparent to users (partial outage)

/ Manual failover + manual cleanup needed

Internet

Databases DatabaseDisks

Applicationservers

Database clustering with shared disk

Page 9: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

9 © Continuent www.continuent.com

Master/slave replication

/ Lazy replication at the disk or database level

/ No scalability

/ Data lost at failure time

/ System outage during failover to slave

/ Failover requires client reconfiguration

Internet

MasterDatabase

DatabaseDisks

Applicationservers

Slave Database

log shippinghot standby

Page 10: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

10 © Continuent www.continuent.com

Internet

Web frontend

App. server Master

Scaling the database tierMaster-slave replication

/ Pros• Good solution for disaster recovery with remote slaves

/ Cons• failover time/data loss on master failure• read inconsistencies• master scalability

Page 11: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

11 © Continuent www.continuent.com

Internet

/ Pros• consistency provided by multi-master replication

/ Cons• atomic broadcast scalability• no client side load balancing• heavy modifications of the database engine

Atomicbroadcast

Scaling the database tierAtomic broadcast

Page 12: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

12 © Continuent www.continuent.com

Scaling the database tier – SMP

Internet

Web frontend

App. server

Well-known database

vendor here

Database

Well-known hardware +database vendors here

/ Pros• Performance

/ Cons• Scalability limit• Limited reliability• Cost

Page 13: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

13 © Continuent www.continuent.com

Internet

/ Pros• no client application modification• database vendor independent• heterogeneity support• pluggable replication algorithm• possible caching

/ Cons• latency overhead• might introduce new deadlocks

Middleware-based replication

Page 14: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

14 © Continuent www.continuent.com

/ Failures can happen • in any component• at any time of a request execution• in any context (transactional, autocommit)

/ Transparent failover • masks all failures at any time to the client • perform automatic retry and preserves consistency

Internet

Sequoia

Transparent failover

Page 15: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

15 © Continuent www.continuent.com

Outline

/ Database replication strategies

/ PostgreSQL replication solutions

/ Building HA solutions

/ Management issues in production

Page 16: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

16 © Continuent www.continuent.com

Feature pgpool-I pgpool-II PGcluster-I PGcluster-II Slony-I Sequoia

Replication type

Hot standby Multi-master

Multi-master Shared disk Master/Slave Multi-master

Commodity hardware

Yes Yes Yes No Yes Yes

Application modifications

No No Yes if reading from slaves

No Yes if reading from slaves

Client driver update

Database modifications

No No Yes Yes No No

PG support >=7.4 Unix >=7.4 Unix 7.3.9, 7.4.6, 8.0.1 Unix

8.? Unix only?

>= 7.3.3 All versions

Data loss on failure

Yes Yes? No? No Yes No

Failover on DB failure

Yes Yes Yes No if due to disk

Yes Yes

Transparent failover

No No No No No Yes

Disaster recovery

Yes Yes Yes No if disk Yes Yes

Queries load balancing

No Yes Yes Yes Yes Yes

PostgreSQL replication solutions compared

Page 17: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

17 © Continuent www.continuent.com

Feature pgpool-I pgpool-II PGcluster-I PGcluster-II Slony-I Sequoia

Read scalability

Yes Yes Yes Yes Yes Yes

Write scalability

No No No Yes No No

Query parallelization

No Yes No No? No No

Replicas 2 up to 128 LB or replicator limit

SAN limit unlimited unlimited

Super durable commit

No Yes No Yes No Yes

Add node on the fly

No Yes Yes Yes Yes (slave) Yes

Online upgrades

No No Yes No Yes (small downtime)

Yes

Heterogeneous clusters

PG >=7.4 Unix only

PG >=7.4 Unix only

PG PG PG>=7.3.3 Yes

Geo-cluster support

No No Possible but don’t use

No Yes Yes

PostgreSQL replication solutions compared

Page 18: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

18 © Continuent www.continuent.com

Performance vs Scalability

/ Performance• latency different from throughput

/ Most solutions don’t provide parallel query execution• No parallelization of query execution plan• Query do not go faster when database is not loaded

/ What a perfect load distribution buys you• Constant response time when load increases• Better throughput when load surpasses capacity of a single

database

Page 19: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

19 © Continuent www.continuent.com

Understanding scalability (1/2)

Performance vs. Time

0

50

100

150

200

250

300

350

400

450

500

00:00:00 01:12:00 02:24:00 03:36:00 04:48:00 06:00:00 07:12:00 08:24:00 09:36:00 10:48:00

Time (sec.)

Re

sp

on

se

tim

e

1 Database - Load in users

1 Database - Response time

Sequoia 2 DBs - Load in users

Sequoia 2 DBs - Response time

20 users

Single DB

Sequoia

Page 20: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

20 © Continuent www.continuent.com

Understanding scalability (2/2)

Performance vs. Time

0

500

1000

1500

2000

2500

00:00:00 00:28:48 00:57:36 01:26:24 01:55:12 02:24:00 02:52:48

Time (sec.)

Re

sp

on

se

tim

e

1 DB - Load in users

1 DB - Response time

Sequoia 2DB - Load in users

Sequoia 2 DB - Response time

90 users

Single DB

Sequoia

Page 21: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

21 © Continuent www.continuent.com

RAIDb Concept: Redundant Array of Inexpensive Databases

/ RAIDb controller – creates single virtual db, balances load

/ RAIDb 0,1,2: various performance/fault tolerance tradeoffs

/ New combinations easy to implement

tables2 & 3

table ...

RAIDb controller

table n-1table 1 table n

SQL

• partitioning (whole tables)• no duplication• no fault tolerance• at least 2 nodes

RAIDb-0• mirroring• performance bounded by

write broadcast• at least 2 nodes• uni/cluster certifies only RAIDb-1

RAIDb-1

Full DB

RAIDb controller

SQL

Full DB Full DB Full DB Full DB

• partial replication• at least 2 copies of each

table for fault tolerance• at least 3 nodes

table x table y

tables x & yFull DB table z

SQL

RAIDb controller

RAIDb-2

Page 22: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

22 © Continuent www.continuent.com

JVM

Sequoia JDBC driver

Sequoiacontroller

JVM

PostgreSQLJDBC Driver PostgreSQL

Sequoia architectural overview

/ Middleware implementing RAIDb• 100% Java implementation• open source (Apache v2 License)

/ Two components• Sequoia driver (JDBC, ODBC, native lib)• Sequoia Controller

/ Database neutral

Page 23: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

23 © Continuent www.continuent.com

Sequoia Controller

Derby

Sequoia driver

Derby

Virtual database 1

Database Backend

Connection Manager

Database Backend

Connection Manager

Request Manager

Query result cache

Scheduler

Load balancer

Derby JDBC driver

Derby JDBC driver

Recovery Log

Authentication Manager

Derby

Database Backend

Connection Manager

Derby JDBC driver

Sequoia driver

Client application (Servlet, EJB, ...)

Client application (Servlet, EJB, ...)

connect myDBconnect login, passwordexecute SELECT * FROM t

ordering

exec

RR, WRR, LPRF, … get connection from poolupdate cache

(if available)

Sequoia read request

Page 24: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

24 © Continuent www.continuent.com

Sequoia Controller

Distributed Request Manager

Sequoia Controller

Distributed Request Manager

Sequoia driver

Virtual database 1

Database Backend

Connection Manager

Database Backend

Connection Manager

Derby JDBC driver

Derby JDBC driver

Virtual database 2

Database Backend

Connection Manager

Database Backend

Connection Manager

Request Manager

Query result cache

Scheduler

Load balancer

Authentication Manager

Derby JDBC driver

Sequoia driver

Client application (Servlet, EJB, ...)

Sequoia driver

Client application (Servlet, EJB, ...)

Client application (Servlet, EJB, ...)

Request Manager

Query result cache

Scheduler

Load balancer

Authentication Manager

Recovery Log

Recovery Log

Derby Derby Derby

Recovery Database

Embedded Derby

Derby JDBC driver

Derby

Recovery Database

Embedded Derby

Database Backend

Connection Manager

Database Backend

Connection Manager

Derby JDBC driver

Derby

Derby JDBC driver

Derby

Database Backend

Connection Manager

Database Backend

Connection Manager

Derby JDBC driver

Derby JDBC driver

Derby Derby

jdbc:sequoia://node1,node2/myDB

Total order reliable multicast

Sequoia write request

Page 25: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

25 © Continuent www.continuent.com

Alternative replication algorithms

/ GORDA API• European consortium defining API for pluggable replication

algorithms

/ Sequoia 3.0 GORDA compliant prototype for PostgreSQL

• Uses triggers to compute write-sets• Certifies transaction at commit time• Propagate write-sets to other nodes

/ Tashkent/Tashkent+• Research prototype developed at EPFL• Uses workload information for improved load balancing

/ More information• http://sequoia.continuent.org• http://gorda.di.uminho.pt/

Page 26: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

26 © Continuent www.continuent.com

PostgreSQL specific issues

/ Indeterminist queries• Macros in queries (now(), current_timestamp, rand(), …)• Stored procedures, triggers, …• SELECT … LIMIT can create non-deterministic results in UPDATE statements if

the SELECT does not have an ORDER BY with a unique index:UPDATE FOO SET KEYVALUE=‘x’ WHERE ID IN (SELECT ID FROM FOO WHERE KEYVALUE IS NULL LIMIT 10)

/ Sequences• setval() and nextval() are not rollback• nextval() can also be called within SELECT

/ Serial type

/ Large objects and OIDs

/ Schema changes

/ User access control • not stored in database (pg_hba.conf)• host-based control might be fooled by proxy• backup/restore with respect to user rights

/ VACUUM

Page 27: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

27 © Continuent www.continuent.com

Outline

/ Database replication strategies

/ PostgreSQL replication solutions

/ Building HA solutions

/ Management issues in production

Page 28: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

28 © Continuent www.continuent.com

Simple hot-standby solution (1/3)

/ Virtual IP address + Heartbeat for failover

/ Slony-I for replication

Page 29: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

29 © Continuent www.continuent.com

Simple hot-standby solution (2/3)

/ Virtual IP address + Heartbeat for failover

/ Linux DRDB for replication

/ Only 1 node serving requests Client ApplicationsClient Applications

Virtual IP

Postgres

Linux OS

DRBD

Heartbeat

/dev/drbd0

/dev/drbd0

Postgres

Linux OS

DRBD

Heartbeat

Page 30: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

30 © Continuent www.continuent.com

Simple hot-standby solution (3/3)

/ pgpool for failover

/ proxy might become bottleneck• requires 3 sockets per client connection• increased latency

/ Only 1 node serving requests

Client ApplicationsClient Applications

pgpool

Postgres1

Postgres2

Page 31: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

31 © Continuent www.continuent.com

Internet

/ Apache clustering• L4 switch, RR-DNS, One-IP techniques, LVS, Linux-HA, …

/ Web tier clustering• mod_jk (T4), mod_proxy/mod_rewrite (T5), session replication

/ PostgreSQL multi-master clustering solution

Highly available web site

mod-jkRR-DNS

Page 32: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

32 © Continuent www.continuent.com

Internet

/ Consider MTBF (Mean time between failure) of every hardware and software component

/ Take MTTR (Mean Time To Repair) into account to prevent long outages

/ Tune accordingly to prevent trashing

Sequoia

Highly available web applications

Page 33: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

33 © Continuent www.continuent.com

Building Geo-Clusters

Database

Database

Database

Database

Database

Database

America masterEurope slave

Asia slave

America slaveEurope master

Asia slave

America slaveEurope slaveAsia master

asynchronousWAN replication

Page 34: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

34 © Continuent www.continuent.com

Split brain problem (1/2)

/ This is what you should NOT do:• At least 2 network adapters in controller• Use a dedicated network for controller communication

Database

Database

Database

Database

Client servers

Controllers Databases

Networkswitch

eth0

eth0eth1

eth1

eth2

eth2

Page 35: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

35 © Continuent www.continuent.com

Split brain problem (2/2)

/ When controllers lose connectivity clients may update inconsistently each half of the cluster

/ No way to detect this scenario (each half thinks that the other half has simply failed)

Database

Database

Database

Database

Client servers

Controllers Databases

Networkswitch

eth0

eth0eth1

eth1

eth2

eth2

Page 36: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

36 © Continuent www.continuent.com

Avoiding network failure and split-brain

/ Collocate all network traffic using Linux Bonding

/ Replicate all network components (mirror the network configuration)

/ Various configuration options available for bonding (active-backup or trunking)

Database

Database

Database

Database

Client servers

Controllers Databases

eth1eth0

bond0

eth1eth0

bond0

bond0 eth0eth1

bond0 eth0eth1

bond0eth0eth1

bond0eth0eth1

bond0eth0eth1

bond0eth0eth1

Page 37: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

37 © Continuent www.continuent.com

Synchronous GeoClusters

/ Multi-master replication requires group communication optimized for WAN environments

/ Split-brain issues will happen unless expensive reliable dedicated links are used

/ Reconciliation procedures are application dependent

DB 6DB 5

DB native JDBC driver

DB 7

Sequoia driver

DB 1 DB 2

DB native JDBC driver

DB 3

DB native JDBC driver

DB 4

Sequoia controller Full replication

Sequoia controller Full replication

Sequoia controller Full replication

Sequoia controller Full replication

Sequoia driverJVM

Client program

Sequoia driver

JVM

Client program

Sequoia driver

JVM

Client program

Sequoia driver

Sequoia driver

DB 9

DB native JDBC driver

DB 10

Sequoia controller Full replication

DB 8

DB 12

DB native JDBC driver

DB 13

Sequoia controller Full replication

DB 11

Page 38: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

38 © Continuent www.continuent.com

Outline

/ Database replication strategies

/ PostgreSQL replication solutions

/ Building HA solutions

/ Management issues in production

Page 39: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

39 © Continuent www.continuent.com

Managing a cluster in production

/ Diagnosing reliably cluster status

/ Getting proper notifications/alarms when something goes wrong

• Standard email or SNMP traps• Logging is key for diagnostic

/ Minimizing downtime• Migrating from single database to cluster• Expanding cluster• Staging environment is key to test

/ Planned maintenance operations• Vacuum• Backup• Software maintenance (DB, replication software, …)• Node maintenance (reboot, power cycle, …)• Site maintenance (in GeoCluster case)

Page 40: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

40 © Continuent www.continuent.com

Dealing with failures

/ Sotfware vs Hardware failures• client application, database, replication software, OS, VM, …• power outage, node, disk, network, Byzantine failure, …• Admission control to prevent trashing

/ Detecting failures require proper timeout settings

/ Automated failover procedures• client and cluster reconfiguration• dealing with multiple simultaneous failures• coordination required between different tiers or admin scripts

/ Automatic database resynchronization / node repair

/ Operator errors• automation to prevent manual intervention• always keep backups and try procedures on staging environment first

/ Disaster recovery• minimize data loss but preserve consistency• provisioning and planning are key

/ Split brain or GeoCluster failover• requires organization wide coordination• manual diagnostic/reconfiguration often required

Page 41: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

41 © Continuent www.continuent.com

Summary

/ Different replication strategies for different needs

/ Performance ≠ Scalability

/ Manageability becomes THE major issue in production

Page 42: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

42 © Continuent www.continuent.com

/ pgpool: http://pgpool.projects.postgresql.org/

/ PGcluster: http://pgcluster.projects.postgresql.org/

/ Slony: http://slony.info/

/ Sequoia: http://sequoia.continuent.org

/ GORDA: http://gorda.di.uminho.pt/

/ Slides: http://sequoia.continuent.org/Resources

http://www.continuent.org

Links

Page 43: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

© Continuent04/21/23

Bonus slides

Page 44: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

44 © Continuent www.continuent.com

RAIDb-2 for scalability

/ limit replication of heavily written tables to subset of nodes

/ dynamic replication of temp tables / reduces disk space requirements

DB native JDBC driverDB native JDBC driver DB native JDBC driver

Sequoia controller RAIDb-2

Sequoia controller RAIDb-2

Sequoia controller RAIDb-2

Sequoia driver

Client program

Sequoia driver

Client program

Sequoia driver

Client program

RO + temp tables

All tables RO tables RO tablesAll tablesWO sub1 tables

RO + temp tables

WO sub2 tables

Page 45: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

45 © Continuent www.continuent.com

RAIDb-2 for heterogeneous clustering

/ Migrating from MySQL to Oracle

/ Migrating from Oracle x to Oracle x+1

DB native JDBC driverDB native JDBC driver Oracle 11h driver

Sequoia controller RAIDb-2

Sequoia controller RAIDb-2

Sequoia controller RAIDb-2

Sequoia driver

Client program

Sequoia driver

Client program

Sequoia driver

Client program

Oracle migrated

tables

MySQL Old tables

Oracle new apps

MySQL Old tables

Oracle driverMySQL driver MySQL driver Oracle driver

Oracle new apps

Oracle migrated

+ new apps

Page 46: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

46 © Continuent www.continuent.com

Server farms with master/slave db replication

/ No need for group communication between controller

/ Admin. operations broadcast to all controllers

RW

Client application node 1

Sequoia controller 1 ParallelDB

Sequoia driver

...

RO RORO

MySQL master

MySQL slave

MySQL slave

MySQL slave

MySQL JDBC driver

Client application node 2

Sequoia driver

Client application node 3

Sequoia driver

Client application node n-1

Sequoia driver

Client application node n

Sequoia driver

Sequoia controller 2 ParallelDB

MySQL JDBC driver

...Sequoia controller x

ParallelDB

MySQL JDBC driver

Page 47: PostgreSQL replication strategies Understanding High Availability and choosing the right solution

47 © Continuent www.continuent.com

Composing Sequoia controllers

/ Sequoia controller viewed as single database by client (app. or other Sequoia controller)

/ No technical limit on composition deepness

/ Backends/controller cannot be shared by multiple controllers

/ Can be expanded dynamically

RO RORO

RAC RAC MySQL master

MySQL slave

MySQL slave

MySQL slave

RAC RAC

SAN

DB native JDBC driver

Sequoia controller ParallelDB

DB native JDBC driver

Sequoia controller ParallelDB

Sequoia driver

Sequoia controller RAIDb-1

DB native JDBC driver

Sequoia controller RAIDb-2

Sequoia driver

Sequoia controller RAIDb-1

DB native driver

DB

DB DB DB