MariaDB Europe Roadshow 2015 - Galera Cluster 4.0 New Features

63
G a l e r a C l u s t e r New Features Seppo Jaakola Codership

Transcript of MariaDB Europe Roadshow 2015 - Galera Cluster 4.0 New Features

G a l e r a C l u s t e rNew Features

Seppo JaakolaCodership

2

www.galeracluster.com

Agenda

● MariaDB Galera Cluster● Release 3.0 Features

● WAN Replication● MySQL Replication Support

● Towards Galera 4● Intelligent Donor Selection● Cluster Crash Recovery● Inconsistency Voting● Huge Transaction Support● Non Blocking DDL

3

www.galeracluster.com

WSREP APIWSREP API

Galera Replication Plugin

MariaDB Galera Cluster

WSREP API

MariaDB

Clients

MariaDB MariaDB

➔ Synchronous Replication➔ Multi-Master➔ Read & Writes to Any Node➔ Automatic Node provisioning

➔ No Lost Transactions➔ No Slave Lag➔ Scalability➔ Works in LAN / WAN / Cloud

4

www.galeracluster.com

Galera Cluster

DBMS

wsrep provider

GCS framework

replication

wsrep hooks wsrep

Replication API

certification

vsbes gcommspread

Galera Plugin

dlopen

5

www.galeracluster.com

Galera Cluster

DBMS

wsrep provider

GCS framework

replication

wsrep hooks wsrep API

certification

vsbes gcommspread

Galera Plugin

6

www.galeracluster.com

Replication Plugin

Replication plugin is runtime loadable● Set global wsrep_provider=none;

● Set global wsrep_provider='/usr/lib/libgalera_smm.so';

With no replication plugin specified, server works as vanilla MariaDB server

MariaDB 10.1 will have wsrep API built in

7

www.galeracluster.com

Galera 3.0

Released Nov 2013, Featuring:● Optimization for WAN replication

● Cluster can be divided in segments based on location

● Asynchronous replication topologies● Async replication can be interleaved with Galera replication● Support for MySQL 5.6 GTID

● New write set key format● Makes certification faster, takes less RAM● A step towards huge transaction support

● A number of bug fixes and minor improvements

10

www.galeracluster.com

Towards 4.0

1. 3.x Improvements● we will have constant back log of 3.0 issues to sort out

while 4.0 will be under development

2. Test system development● MySQL test suite integration

3. New Features● Some new features going in for 3.* releases● Major uplift for 4.0 , feature set fixed● Feedback from community & partners

11

www.galeracluster.com

New Features...

● Non Blocking DDL● Huge transactions by streaming replication● Inconsistency voting

Galera 4.0

● Intelligent Donor selection● Cluster crash recovery

Galera 3.*

v

v

v

Donor Selection

13

www.galeracluster.com

Better Donor Selection Support

● In 3.0, SST donor was selected in random

● New SST “handshake” makes intelligent donor choice:

● Favor donor which can provide IST● Favor proximity (segment)● Introduced in Galera 3.6

● SST donor can still be forced by wsrep_sst_donor

Cluster Recovery

15

www.galeracluster.com

Engine Room Power Out

Node A Node CNode B

16

www.galeracluster.com

Engine Room Power Out

? ? ?

17

www.galeracluster.com

Engine Room Power Out

? ? ?

18

www.galeracluster.com

Engine Room Power Out

Node C

? ? Service mysql start –wsrep-new-cluster

19

www.galeracluster.com

Engine Room Power Out

Node A Node CNode B

Service mysql startService mysql start

20

www.galeracluster.com

Cluster Crash Recovery

● Engine room power out – use case● If all nodes shutdown

● New cluster must be started and first node to elected● This is manual operation (error prone)● Other nodes can join back automatically, either through

IST or SST

21

www.galeracluster.com

Cluster Crash Recovery

● Configure automatic crash recovery:● pc.recovery=ON

● Nodes maintain the group information in persistent storage

● After shutdown, the full group can start with same configuration

22

www.galeracluster.com

Engine Room Power Out

Node A Node CNode B

PC PC PC

23

www.galeracluster.com

Engine Room Power Out

PC PC PC

24

www.galeracluster.com

reconcile

Engine Room Power Out

Node A Node CNode B

PC PC PC

25

www.galeracluster.com

Engine Room Power Out

Node A Node CNode B

PC PC PC

Huge Transaction Support

27

www.galeracluster.com

Huge Transaction Support

● Currently transaction processes in master node until commit time

● For large transactions, the write size will be big, and is hard to handle

● Maximum supported writeset size: 2GB● There are means to prevent too large

transactions● wsrep_max_ws_rows● wsrep_max_ws_size (not enforced atm)● wsrep_provider_options=''repl.max_ws_size=#'

28

www.galeracluster.com

Huge Transaction Replication

Huge transaction

Galera Replication

Node A Node B

Huge trx

29

www.galeracluster.com

Huge Transaction Replication

Huge transaction

Galera Replication

Node A Node B

Huge trx

30

www.galeracluster.com

Huge Transaction Replication

Huge transaction

Galera Replication

Node A Node B

Huge trx

Ws

commit

31

www.galeracluster.com

Huge Transaction Replication

Huge transaction

Galera Replication

Node A Node B

Huge trx

32

www.galeracluster.com

Huge Transaction Replication

Huge transaction

Galera Replication

Node A Node B

Huge trx

WS WS WS

Slave queue

WSWSWS

33

www.galeracluster.com

Huge Transaction Support

● Galera Cluster 4.0 Implements Streaming Replication:

● Possible to replicate transactions of any size● Transaction size limits will remain, cluster can still reject

too large transactions

34

www.galeracluster.com

Streaming Replication

Huge transaction

Galera Replication

Node A Node B

Huge trx

WS

35

www.galeracluster.com

Streaming Replication

Huge transaction

Galera Replication

Node A Node B

Huge trx

WS

36

www.galeracluster.com

Streaming Replication

Huge transaction

Galera Replication

Node A Node B

Huge trx

WS

commit

37

www.galeracluster.com

Streaming Replication

● Transaction is replicated in small increments● Size threshold for replication is configurable● Replicated rows are locked in all cluster nodes

➔ they cannot be conflicted later

38

www.galeracluster.com

Streaming Replication

Huge transaction

Galera Replication

Node A Node B

Huge trx

WS

Update t1.....

Inconsistency Voting

40

www.galeracluster.com

Inconsistency Voting

● Current Policy for Inconsistency:● For suspected inconsistency, cluster node will do emergency shutdown

● (However, DDL failures are logged only as warnings)● Injected inconsistency in one node can cause all other nodes to shutdown

41

www.galeracluster.com

Inconsistency Voting

Galera Replication

Node A Node B Node C

Create table t1 (i int)

t1 t1 t1

42

www.galeracluster.com

Inconsistency Voting

Galera Replication

Node A Node B Node C

Set wsrep_on=OFFInsert into t values (8)

t1 t1 t1

8

43

www.galeracluster.com

Inconsistency Voting

Node A Node B Node C

Set wsrep_on=ONDelete from t;

t1 t1 t1

8

Del 8

Del 8

44

www.galeracluster.com

Inconsistency Voting

Node A Node B Node C

t1 t1 t1

Set wsrep_on=ONDelete from t;

Del 8

Del 8

45

www.galeracluster.com

Inconsistency Voting

Node A Node B Node C

t1

Quorum lost

46

www.galeracluster.com

Inconsistency Voting

● Galera Cluster 4.0 will minimize downtime due to suspected inconsistency

● Nodes will communicate through inconsistency voting protocol if inconsistency is observed

● Target is to shutdown minimal number of nodes

47

www.galeracluster.com

Inconsistency Voting

Node A Node B Node C

8t1 t1 t1

Set wsrep_on=ONDelete from t;

Del 8

Del 8

48

www.galeracluster.com

Inconsistency Voting

Node A Node B Node C

8t1 t1 t1

Set wsrep_on=ONDelete from t;

Inconsistency Voting

49

www.galeracluster.com

Inconsistency Voting

Node A Node B Node C

t1 t1 t1

Majority Wins

Non Blocking DDL

51

www.galeracluster.com

Non-Blocking DDL

Current DDL replication blocks whole cluster for the duration of DDL statement processing

Galera Cluster 4.0 optimizes DDL replication (TOI (Total Order Isolation)) to lock only the affected table

52

www.galeracluster.com

Non-Blocking DDL - TOI

ALTER TABLE t1

Node A Node B

53

www.galeracluster.com

ALTER t1

Non-Blocking DDL - TOI

ALTER TABLE t1

Node A Node B

WSseqno

55

www.galeracluster.com

Non-Blocking DDL - TOI

ALTER TABLE t1

Node A Node B

ALTER t1ALTER t1

56

www.galeracluster.com

Non-Blocking DDL - TOI

ALTER TABLE t1

Node A Node B

ALTER t1ALTER t1

UPDATE t1

57

www.galeracluster.com

Non-Blocking DDL - TOI

ALTER TABLE t1

Node A Node B

ALTER t1ALTER t1

UPDATE t1

UPDATE t3

59

www.galeracluster.com

Non-Blocking DDL - TOI

ALTER TABLE t1

Node A Node B

UPDATE t1

WSseqno

ALTER t1ALTER t1

UPDATE t3

60

www.galeracluster.com

Non-Blocking DDL - TOI

UPDATE t4

Node A Node B

UPDATE t1

61

www.galeracluster.com

Non Blocking DDL

● Affected table is locked in all cluster nodes● This table lock is native MySQL lock, Galera is not

adding replication level locks anymore

● Other tables are accessible to everybody

62

www.galeracluster.com

4

Happy Clustering :-)

Thank you for listening!

Huge Transaction Demo Setup

1. Two nodes

2. Steady load of pure autocommit updates to measure trx throughput

3. A huge table with ~1.5M rows

4. Run update on huge table to modify all rows

→ monitor trx/sec rate in the cluster when the huge transaction kicks in

Impact of Huge Transaction

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Huge Transaction Slave Lag

Trx in master24 secs

Trx in slave9 secs

Streaming Replication Demo Setup

1. Same scenario as before

2. Configure node1 to fragment huge transaction in 10K batches

→ monitor trx/sec rate in the cluster when streaming replication progresses

Streaming Replication

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Streaming Replication

time

trx/

sec

Streaming Replication70 secs

Streaming Replication

0

500

1000

1500

2000

2500

3000

3500

4000

4500

time

trx/

sec