MariaDB Galera Cluster - Percona...13 Inconsistency for Master-Slave Master node is always the...

Post on 03-Apr-2020

23 views 2 download

Transcript of MariaDB Galera Cluster - Percona...13 Inconsistency for Master-Slave Master node is always the...

MariaDB Galera ClusterDealing with Inconsistency Issues

Seppo JaakolaCodership

2

www.galeracluster.com

➢ Seppo Jaakola➢ One of the Founders of Codership

➢ Codership – Galera Replication developers➢ Partner of MariaDB for developing and supporting MariaDB

Galera Cluster➢ Galera releases since 2009

3

www.galeracluster.com

Galera Project

WSREP API

MySQLCommunity edition

Galera Replication Plugin

Galera Cluster for MySQL

WSREP API

MariaDB

merge

MariaDB Galera Cluster

R&D

4

www.galeracluster.com

MariaDB Galera Cluster

● MGC releases based on MariaDB 5.5 and 10.0

● Since MariaDB 10.1, Galera is inbuilt in MariaDB● If wsrep_provider is not specified, works as native MariaDB

● And will be present in MariaDB 10.2, 10.3...

R&D

5

www.galeracluster.com

Agenda

● Master-Slave vs Multi-Master Replication Topologies

● Galera Replication Overview● Reasons for Inconsistencies

● Bad SST● Cluster misuse

● Effect – the Harm Done● How to Detect Inconsistencies● Inconsistency Recovery● Galera Inconsistency Voting Protocol

Inconsistency in Master-Slave and Multi-Master Topologies

7

www.galeracluster.com

Master-Slave Async Replication

MariaDBMaster

MariaDBSlave

Relaylog

Binlog

MariaDBSlave

Relaylog

8

www.galeracluster.com

Master-Slave Async Replication

read & write read only

MariaDBMaster

MariaDBSlave

Relaylog

Binlog

MariaDBSlave

Relaylog

9

www.galeracluster.com

Master-Slave Async Replication

read & write read only

MariaDBMaster

MariaDBSlave

Relaylog

Binlog

MariaDBSlave

Relaylog

Master node is trusted dataSource

And is used for recovery

10

www.galeracluster.com

Master-Slave Async Replication

read & write

MariaDBMaster

MariaDBSlave

Relaylog

Binlog

MariaDBSlave

Relaylog

If slave nodes arenot used for reading,

No immediate inconsistency issues

11

www.galeracluster.com

Master-Slave Async Replication

read & write read only

MariaDBMaster

MariaDBSlave

Relaylog

Binlog

MariaDBSlave

Relaylog

If slave nodes get readsBut reads are not used for

anything critical,No problems for business logic

12

www.galeracluster.com

Master-Slave Async Replication

MariaDBMaster

MariaDBSlave

Relaylog

Binlog

MariaDBSlave

Relaylog

If reads from slave nodes areused as input for writes,

Inconsistency may violate business logic

read only write

13

www.galeracluster.com

Inconsistency for Master-Slave

● Master node is always the trusted data source● Recovery from inconsistency will be straight forward

● If slave nodes are not used for reading, inconsistency issue does not cause immediate problem

● If slaves are used for reading, application will get false data

● But this is safe, is these reads are not used for anything critical

● If slave read results are trusted, and used as input for later write operations business logic, the application business logic will be violated as well

14

www.galeracluster.com

MariaDB MariaDB MariaDB

Multi-Master Replication

Galera ReplicationReplication is synchronous

read & write read & write read & write

Client can connect to any node

There can be several nodes

Read & write access to any node

15

www.galeracluster.com

MariaDB

Multi-Master Replication

a

Multi-master cluster looks like one big database with multiple entry points

read & write read & write read & write

16

www.galeracluster.com

MariaDB

Multi-Master Replication

a

Adding more nodesOpens new connectionports

read & write read & writeread & write read & write

18

www.galeracluster.com

Inconsistency for Multi-Master

● The effect of inconsistency more prominent● Inconsistency may immediately violate the business logic

19

www.galeracluster.com

MariaDB MariaDB MariaDB

Inconsistency for Multi-Master

Galera Replication

read & write read & write read & write

Reads from compromised node will immediately hurt business logic

20

www.galeracluster.com

MariaDB MariaDB MariaDB

Inconsistency for Multi-Master

Galera Replication

read & write read & write read & write

Reads from compromised node will immediately hurt business logic

Reasons for Inconsistency

23

www.galeracluster.com

Reasons for Inconsistency

● Bad SST● Full cluster Snapshot State Transfer is external operation

(to cluster), which happens through SST API● Cluster picks a donor node to help the joining node to get

same state as active cluster● 3 main variants: mysqldump, rsync and xtrabackup based

versions, (and more to come, MariaBackup)● Improper SST leaves joining node with inconsistent

database

● Cluster misuse● wsrep_on session and global variable, requires SUPER

privileges, tells if wsrep replication plugin should replicate the transaction or not

● sql_log_bin session variable, skips binlogging for the session in master node. With no binlog events, Galera has nothing to replicate

24

www.galeracluster.com

Reasons for Inconsistency

● Replication filtering● binlog_do* No binlogging in master node● replicate_do* slave node does not apply incoming events

● Slave lag● Not real inconsistency, but replication may be slow● if reads happen from too old version of data, the

consequence for application is same as having inconsistent database

● Bug

25

www.galeracluster.com

Consistent Reads from a Cluster

● Galera Cluster orders transactions and forces them to commit in strictly same order in every cluster node

● Cluster has builtin flow control, which makes nodes’ commit pace to be very close to each other

● gcs.fc_limit, gcs.fc_factor

● If client has only one connection (to one node), it will see and operate with safe data access

26

www.galeracluster.com

MariaDB MariaDB

Consistent Reads from a Cluster

403938 434241 464544 494847

27

www.galeracluster.com

Consistent Reads from a Cluster

● If client has several connections, or connects through proxy or load balancer, which can direct client’s connections to several cluster nodes, there may happen read causality issue

28

www.galeracluster.com

MariaDB MariaDB

Read Causality

Galera Replication Replication is “synchronous”

INSERT INTO mydata

INS

29

www.galeracluster.com

MariaDB MariaDB

Read Causality

Galera Replication

INSERT INTO mydata SELECT FROM mydata

INS

30

www.galeracluster.com

MariaDB MariaDB

Read Causality

Galera Replication

Causality between INSERT and SELECT

INSERT INTO mydata SELECT FROM mydata

INS

31

www.galeracluster.com

MariaDB MariaDB

Read Causality

Galera Replication

Proxy may use several connectionsAnd trigger read causality

INSERT INTO mydata SELECT FROM mydata

INS

MaxScale

32

www.galeracluster.com

Read Causality

● Can be avoided by wsrep_sync_wait

● Makes session to wait for slave queue flushing before issuing next read

● Sync wait can be applied for:● Select, show, begin, update, delete

● Makes session’s all reads somewhat slower, should be used only when read causality is really needed

● Read causality may look like inconsistency, at first glance, but however, cluster is totally consistent

33

www.galeracluster.com

Triggers, Stored Procedures, Events

● Triggers fire in master node only, and data manipulation effects will be replicated as ROW events

● Same applies for Stored Procedures, they are processed in master node only

● Events should be carefully used. They will execute in all nodes, where they are enabled

● Managing in which node event should be enabled or slaveside_disabled, may be error prone, especially when nodes drop and join the cluster often

34

www.galeracluster.com

Foreign Key Constraint

● FK constraints add new level for data coherence in the cluster

● Galera enforces FK constraints in all cluster nodes, and any problem in constraint validation will lead to node emergency shutdown

● Supporting FK constraints in Galera replication was a major effort in Galera release 2 (year 2011-2012)

● FK violation in slave node, shows up as cluster inconsistency

35

www.galeracluster.com

Async Replication

● Galera Cluster node can operate as slave node for external MariaDB master

● Incoming native replication is treated like any client connection

● However, it is possible to configure async replication to be “pre-ordered” replication, and this type of replication stream, is trusted and no certification is performed

● This may be a vulnerability for consistency, if not carefully used

● See: wsrep_preordered

The Effect of Inconsistency

37

www.galeracluster.com

The Harm Done

● MariaDB asynchronous replication slave thread stops for an error● But, it can be configured to accept certain errors in replication

● e.g. with: --slave-skip-errors● Note that, if replication errors are not dealt with, slaves

may become inconsistent and the inconsistency just gets worse by the time

● Galera is very strict about inconsistency:● Errors in DDL are neglected● Any error in regular DML transaction replication will

cause emergency abort for the node● Errors also in foreign key constraint validation, will cause

emergency abort

38

www.galeracluster.com

MariaDB MariaDB MariaDB

Inconsistency for Multi-Master

Galera Replication

read & write read & write read & write

Error in applying will trigger emergency abort

WSWS

39

www.galeracluster.com

MariaDB MariaDB MariaDB

Inconsistency for Multi-Master

Galera Replication

read & write read & write

Error in applying will trigger emergency abort

40

www.galeracluster.com

MariaDB

Inconsistency for Multi-Master

Galera Replication

Every failed write set will be logged in GRA_x_y.dat log file x=thread ID y=seqno

WS

Binlogevents

GRA_x_y.dat

41

www.galeracluster.com

Could Some Errors Be Neglected?

● Failure to delete a row, when the row is already deleted

● Failure to insert a row, when exactly same row already exists

● In failing node, we should allow reads● This requirement is implemented by

wsrep_dirty_reads

42

www.galeracluster.com

Could Some Errors Be Neglected?

● Application implements certain business logic

● There are reads, to get knowledge of the current state, and based on results of reads, application does certain write operations

Begin Select… Select… Select… Update… Update…Commit

reads

writes

Business logic

43

www.galeracluster.com

Could Some Errors Be Neglected?

● Any replication error is a sign of data inconsistency● Even a delete of non existing row

● Inconsistency may be anything between a fresh event or a hidden inconsistency from system startup time

● Detecting an inconsistency is always “after the fact”,business logic may have been violated for ages already

● Therefore strict measures are needed, dirty data may not be used anymore

44

www.galeracluster.com

Could Some Errors Be Neglected?

● Should failing node stay in the cluster instead of shutting down?

● Should we allow reads from non primary node?● This is a long term wish, and now supported by option:

wsrep_dirty_reads ● Is it wise to read from non primary node?

● No● Why DDL errors are neglected?

● Galera replicates DDL up-front as verbatim SQL statement

● Some application frameworks generate impossible DDL on regular basis, like DROP TABLE <non-existent>

45

www.galeracluster.com

About Foreign Key Constraint

● FK constraint adds extra discipline for database content

● If data inconsistency is about to happen, FK constraints may make it surface earlier

● Supporting FK constraints, in Galera Cluster is somewhat complicated, because:

● Galera allows multi-master access● Replication slaves are highly parallel

● For this, we have to carefully control, when and how FK constraints are enforced, and what kind of multi-master operation must be rejected as FK violation

Recovering from Inconsistency

47

www.galeracluster.com

How to Recover from Inconsistency

● Practically only SST helps

● However, troubleshooting analysis is needed for figuring out which node has the most reliable data

● Also, the reason for the inconsistency should be found out

● This may help to prevent that similar problem would happen in future

● Unfortunately, analysis of inconsistency is probably the hardest problem with database clusters

● There is no indication of when the inconsistency was brought to the system

● And, often there are no signs in the logs about the incdent which caused the inconsistency

48

www.galeracluster.com

How to Recover from Inconsistency

● Can you tune your database consistent by tolerating the applying errors and hoping that RBR events will eventually write over bad data?

● In multi-master topology, absolutely NO

● In master-slave topology, some good progress might happen

● But not necessarily● Delete and inserts would work correctly● But updates, which cannot match ROW before image

would fail● And reads from this node would be always a compromise

Detecting Inconsistencies

50

www.galeracluster.com

Detecting Inconsistencies up-front

● To prevent sudden inconsistency emergencies, it would be good to harvest the database before hand for inconsistencies

● Early inconsistency detection would:• Reduce the harm of the inconsistency for business logic● Allow better time for dealing with the issue● However, the found inconsistency is equally bad to

troubleshoot● Also, the constant database scanning takes some

resources, and can itself cause issues for the cluster use● But how can you check data consistency in a live cluster, with transactions processing in high speed

51

www.galeracluster.com

Galera Consistency Checking

● Galera has inbuilt a method for running consistency checking in isolated transaction sequence slot

● Consistency checking support is performed for statements:

• INSERT...SELECT ● REPLACE...SELECT● Having special version comment /* !99997 */

CREATE TABLE consistency (id INT AUTO_INCREMENT, checksum VARCHAR(32), … )

INSERT INTO consistency (checksum,...) SELECT MD5(GROUP_CONCAT(i,j)) FROM mytable /* !99997 */

52

www.galeracluster.com

Galera Consistency Checking

MariaDB MariaDB MariaDB

Galera Replication

read & write read & write read & write

WS947

WS948

WS949

WS950

WS951

WS952

WS953

WS945

WS946

53

www.galeracluster.com

Galera Consistency Checking

MariaDB MariaDB MariaDB

Galera Replication

read & write read & write read & write

WS956

WS948

WS949

WS950

WS951

WS952

WS953

WS954

WS955

INSERT...SELECT… /* !99997 */

54

www.galeracluster.com

Galera Consistency Checking

MariaDB MariaDB MariaDB

Galera Replication

read & write read & write read & write

WS956

WS957

WS958

WS959

WS960

WS961

WS962

WS954

WS955

INSERT...SELECT… /* 99997 */

WS963

WS964

WS965

55

www.galeracluster.com

Galera Consistency Checking

MariaDB MariaDB MariaDB

Galera Replication

read & write read & write read & write

WS956

WS957

WS958

WS959

WS960

WS961

WS962

WS955

WS963

WS964

WS965

56

www.galeracluster.com

Galera Consistency Checking

● The INSERT...SELECT is run in total order isolation (TOI)

● It will block all commits until the INSERT..SELECT has completed

● This may have impact on overall transaction throughput, avoid too large select result set

● Pt-table-checksum tool uses Galera consistency checking support, and scans automatically all tables for inconsistencies

Optimizing Inconsistency Shutdown

58

www.galeracluster.com

Optimizing Inconsistency Shutdown

● Current Policy for Inconsistency:● For suspected inconsistency, cluster node will do emergency shutdown

● However, DDL failures are logged only as warnings

● Injected inconsistency in one node can cause all other nodes to shutdown

59

www.galeracluster.com

Inconsistency Shutdown

Galera Replication

Node A Node B Node C

Create table t1 (i int)

60

www.galeracluster.com

Inconsistency Shutdown

Galera Replication

Node A Node B Node C

t1 t1 t1

Create table t1 (i int)

61

www.galeracluster.com

Inconsistency Shutdown

Galera Replication

Node A Node B Node C

Set wsrep_on=OFFInsert into t values (8)

t1 t1 t1

8

62

www.galeracluster.com

Inconsistency Shutdown

Node A Node B Node C

Set wsrep_on=ONDelete from t;

t1 t1 t1

8

Del 8

Del 8

63

www.galeracluster.com

Inconsistency Shutdown

Node A Node B Node C

t1 t1 t1

Set wsrep_on=ONDelete from t;

Del 8

Del 8

64

www.galeracluster.com

Inconsistency Shutdown

Node A Node B Node C

t1 t1 t1

Node A, remains in minority and changes to

Non-Primary

65

www.galeracluster.com

Optimizing Consistency Shutdown

● Codership has developed a method to optimize node emergency shutdowns due to suspected inconsistency, to happen only for minimal set of compromised nodes

● When inconsistency is observed, nodes will communicate through consistency voting protocol to compare which nodes face the similar issue in applying

● Target is to find the majority, which have same understanding about the fate of the offending write set

66

www.galeracluster.com

Inconsistency Shutdown

Node A Node B Node C

8t1 t1 t1

Set wsrep_on=ONDelete from t;

Del 8

Del 8

Consistency Voting

67

www.galeracluster.com

Consistency Voting

Node A Success

Node B Error: ‘row not found’

Node C Error: ‘row not found’

68

www.galeracluster.com

Inconsistency Shutdown

Node A Node B Node C

t1 t1 t1

Consistency Voting

69

www.galeracluster.com

Galera Consistency Voting Protocol

● With consistency voting, Galera Cluster can mitigate the harm of inconsistency for the cluster

● In the best case, only one node has to abort, and majority can continue operating normally

● However, the database has been inconsistent, for indefinitely long period, and application business logic may have been hurt

70

www.galeracluster.com

Summary

● Database inconsistency is a critical problem and very hard to track down

● It affects both MS and MM topologies● But, inconsistency problem is immediate in MM

● It is possible to scan Galera Clusters for inconsistencies up-front

● Recovery requires full database copy

● Galera Consistency Voting protocol will help, but only in saving the cluster to remain in service

74

www.galeracluster.com

4

Happy Clustering :-)Thank you for listening!