MariaDB/MySQL Replication - State of the art
Transcript of MariaDB/MySQL Replication - State of the art
© MariaDB Corpora,on Ab
MariaDB / MySQL Replica,on State of the art
Serge Frezefond, Cloud Solu,on Architect
[email protected] @sfrezefond
hFp://serge.frezefond.com
* 1
© MariaDB Corpora,on Ab
Agenda
• Replica,on Usage • Replica,on improvements • MariaDB parallel slave apply • MariaDB Mul, source replica,on • MariaDB GTID • Galera Cluster Synchronous Replica,on • MaxScale Binlog Server • Conclusion
2
© MariaDB Corpora,on Ab
So, how does MariaDB Replica,on work?
• The replica,on is Asynchronous, or rather, data is pulled by the Slave from the Master as fast as possible
• To know what to replicate to the slave, the Master keeps an ordered log, the bin-‐log, of statements to replicate
3
© MariaDB Corpora,on Ab
So, how does MariaDB Replica,on work?
• The Slave keeps track of the posi,on in the Master bin-‐log either by a file/posi,on pair or by a global transac,on id
• The Slave reads the Master bin-‐log and replicates this to a Relay-‐log on the slave
• This is the Slave IO-‐thread
4
© MariaDB Corpora,on Ab
So, how does MariaDB Replica,on work?
• Then the Slave reads the Relay-‐log and applies the statements to the Slave database
5
Master Slave Bin log Relay log
© MariaDB Corpora,on Ab
Replica,on topologies
• One to many (can have mul,ple layers&filters) • Circular / master<-‐>master
• any modifica,on hits all servers • Many to one (called mul,source replica,on in MariaDB 10 and in MySQL 5.7 lab)
• Complex topologies lead to tricky failover • No conflict resolu,on included • Divergence is a nightmare
6
© MariaDB Corpora,on Ab
Replica,on can be statement based or row based
• Statement based means that the SQL text of the statement is replicated • UPDATE orders SET customer_id = 57
WHERE order_id = 19
• Row based means that each row change is replicated as a binary en,ty which is idempotent
• There is also a MIXED format, which is a mix of the two above
7
© MariaDB Corpora,on Ab
MariaDB / MySQL Replica,on con,nuous improvement
• Delivered with MySQL 5.1 • Row based Replica,on
• Delivered with MySQL 5.5 • Semisync Replica,on • Replica,on Heartbeat • Per server replica,on filtering
8
© MariaDB Corpora,on Ab
MariaDB / MySQL Replica,on con,nuous improvement
• Delivered with MariaDB 5.5 • Same as MySQL 5.5 • Added Binlog group commit • RBR for tables with no primary key • Checksums for binlog events • START TRANSACTION WITH CONSISTENT
SNAPSHOT now also works with the binary log. • Dynamic variables :
replicate_do_*, replicate_ignore_*, replicate_wild_*
9
© MariaDB Corpora,on Ab
MariaDB / MySQL Replica,on con,nuous improvement
• Delivered with MySQL 5.6 • Op,mized ROW Based Replica,on • Binlog group commit (already in MariaDB 5.5) • Mul,-‐Threaded Slave (per schema) • Global Transac,on Iden,fier • Crash Safe Slave and Binlog • Replica,on Event Checksums (already in
MariaDB 5.5)
10
© MariaDB Corpora,on Ab
MariaDB / MySQL Replica,on con,nuous improvement
• Delivered with MySQL 5.6 • Time Delayed Replica,on • Server UUID • U,li,es for Failover and Admin
11
© MariaDB Corpora,on Ab
MariaDB / MySQL Replica,on con,nuous improvement
• Delivered with MariaDB 10.0 • Global Transac,on ID
• With domain_id concept
• Parallel Replica,on(not limited to schema) • Domain_id can help
• Mul, source replica,on
12
© MariaDB Corpora,on Ab
MariaDB / MySQL Replica,on con,nuous improvement
• Coming with MySQL 5.7 • Non-‐blocking SHOW SLAVE STATUS, • Replica,on info in performance schema • Intra-‐schema mul,threaded slave (MariaDB 10) • Replica,on GTID no more need to run binary logs on the slaves. (true with MariaDB 10)
• GTID History stored in a system table (in MariaDB 10.0) • With compac,on of GTId sets (no need in MariaDB 10.0)
13
© MariaDB Corpora,on Ab
MariaDB / MySQL Replica,on con,nuous improvement
• Coming with MySQL 5.7 • Semi-‐sync beFer performance and behavior (Loss-‐Less) • Dump thread no needs to lock the binary log • Mul,-‐source Replica,on (same replica,on filters for all
of the masters)(in MariaDB 10) / lab release • MySQL Group Replica,on / lab release
• Proven technology in MariaDB Galera Cluster
14
© MariaDB Corpora,on Ab
MariaDB / MySQL Replica,on con,nuous improvement
• Coming with MariaDB 10.1 • Running triggers on the slave for RBR • Filtering on Domain_id
• CHANGE MASTER TO DO_DOMAIN_IDS=(1,2); • CHANGE MASTER TO IGNORE_DOMAIN_IDS=(1,2);
• Performance tuning Parallel Replication and GTID, more benchmarks. • Integration of Galera multi-master clusters into MariaDB:
• Fully tested, integrated, performance tuned
15
© MariaDB Corpora,on Ab
So, what is MariaDB replica,on used for?
• High Availability • Scale-‐out • Backup servers • Disaster Recovery • Secure way to do online schema change • Repor,ng servers / BI
16
© MariaDB Corpora,on Ab
Replica,on for High-‐Availability
• MariaDB Replica,on is asynchronous, which is an issue for HA in this case
• Failover Slave has to wait for Slave to catch up • Using MHA makes the process somewhat automa,c and less problema,c
17
© MariaDB Corpora,on Ab
Replica,on for Scale-‐out
• Scale-‐out is exactly what MariaDB Replica,on was intended to do from the start
• Use as a Write-‐only master and one or more Read-‐only slaves
• Slaves has low load on the Master • Easy to use and set up • Ohen combined with a Load Balancer
18
Load Balancer
© MariaDB Corpora,on Ab
Replica,on can op,onally be semi-‐synchronous
• Semi-‐sync is an op,on to MariaDB Replica,on • Semi-‐sync means that the master will ensure that a commiFed transac,on is replicated to at least N slaves before being accepted
• If there is a failure in replica,on, then the Master can fall back to normal Asynchronous replica,on
19
© MariaDB Corpora,on Ab
Without Semi-‐sync replica,on, the Master doesn’t care about the Slaves
• The Master just writes to the Bin-‐log, nothing else
• The Master also purges the bin-‐logs as needed • There is no real “cluster” setup where the nodes know about each other and act in a par,cular manner
20
© MariaDB Corpora,on Ab
Semi-‐sync replica,on
• Semi-‐sync replica,on (since 5.5 / from Google) • No much control : just ,meout • Very poor performance in 5.5/5.6 and 10.0 • "Enhanced Semisync” implementa,on done by Zhou Zhenxing • Improvement in 5.7 as Loss-‐Less Semisync
• Google working on MarianDB enhanced semisync (hFps://mariadb.atlassian.net/browse/MDEV-‐162)
21
© MariaDB Corpora,on Ab
Current Semi-‐sync behavior
Important to no,ce : • Currently commiFed data is exposed on the master before semi-‐sync rep is acknowledge to the client
• Phantom read if master crashes • Data has been exposed on the master that
does not shows up on the slave
22
Storage Engine Commit
Slave Receive
Ack
Client Ack
© MariaDB Corpora,on Ab
Semisync replica,on Performance
23
1691
13 11
347 200
Master in us-east-1, slave in us-west-1 (~78ms ping RTT) Benchmark by Jay Jansen
© MariaDB Corpora,on Ab
Semisync replica,on Performance
24
Master in us-east-1b, slave in us-east-1d (~1ms ping RTT) Benchmark by Morgan Tocker
© MariaDB Corpora,on Ab
Binlog Group Commit vs InnoDB Group Commit
● binlog_commits ● Total number of
transac,ons commiFed to the binary log
● binlog_group_commits Total number of groups of transac,ons commiFed to the binary log
When sync_binlog=1 it is the number of fsync()’s
25
© MariaDB Corpora,on Ab
MariaDB Binlog Group Commit
Binlog Group Commit
S1
T1
S4
S2
S3
T4
T3
T2
Binlog files InnoDB
Recovery logs
Transactions
innodb_flush_logs_at_trx_commit=1 and sync_binlog=1
Sessions
26
© MariaDB Corpora,on Ab
MariaDB Binlog Group Commit Huge Performance impact
• InnoDB & replica,on Group commit working together • Limita,on of required fsync (Kris,an Nielsen made decisive improvements)
27
© MariaDB Corpora,on Ab
MariaDB Binlog Group Commit Monitor efficency
28
MariaDB > SHOW variables LIKE 'binlog_commit%';!
…!
binlog_commit_wait_count 0 !
binlog_commit_wait_usec 100000 !
• MariaDB > SHOW STATUS LIKE 'binlog_%commits';!• +----------------------+-------+!• | Variable_name | Value |!• +----------------------+-------+!• | Binlog_commits | 15732 |!• | Binlog_group_commits | 1492 | ß much less fsync!
• +----------------------+-------+!
© MariaDB Corpora,on Ab
Big replica,on pain : Slave lag
• Serial applica,on of binlog events • Can hurt HA or read consistency • Prefetching data to have a warm cache
• mk-‐slave-‐prefetch • Complex and not enough efficient
• Parallel slave apply is the solu,on !
29
© MariaDB Corpora,on Ab
Parallel Slave Thread Replica,on ● Sponsored by Google ● Transac,ons are applied in parallel if they have been executed in parallel on the
master. ● It works beyond the boundaries of MySQL 5.6 parallel slave(single database)
● Parallel threads apply to: ● Queries that are run on the master in one group commit. ● Queries that are from different domains. ● Queries from different masters
(when using mul,-‐source replica,on).
● slave_parallel_threads ● Number of parallel threads on
the slave node ● slave_parallel_max_queued
● memory limit for SQL threads
© MariaDB Corpora,on Ab
MariaDB GTID + Domain Id Reduce replica,on lag
The costly alter table opera,on will be run in its own thread without impac,ng other transac,ons replica,on (Out of order parallel replica,on)
31
SET SESSION gtid_domain_id=1 ALTER TABLE t ADD INDEX myidx(b) SET SESSION gtid_domain_id=0
© MariaDB Corpora,on Ab
MariaDB Global Transac,on ID
“3-2-21” is a GTID J
32
© MariaDB Corpora,on Ab
MariaDB GTID benefits
• Easy to setup GTID base replica,on • Pos and GTID always there
• Reconfigure replica,on topology (mu, level) • Support mul,-‐source replica,on • Helps to parallelize slave beyond group commit
33
CHANGE MASTER TO master_use_gtid = slave_pos
© MariaDB Corpora,on Ab
MariaDB allows to Switch a config to GTID without Database unavailability
• In MariaDB 10 GTIds are always there and can be used or not J
• In MySQL 5.6 all servers using GTIds need to be switch at the same ,me. They need to be stopped for that L
34
CHANGE MASTER TO master_use_gtid = slave_pos
Ø select binlog_gtid_pos('mysqld3-bin.000004', 638); Ø 1-1-15,0-3-1
© MariaDB Corpora,on Ab
MariaDB Global Transac,on ID Domain id usage
• Help increase parallelism • More versa,le than schema base parallelism • Can be set in session (currently requires SUPER privilege)
• One domain_id for mul,ple server or mul,ple domain_id per server (logical concept ≠ from server_id)
35
set gtid_domain_id = 2
© MariaDB Corpora,on Ab
MariaDB Global Transac,on ID Domain id usage
36
B1 B2 B3 B4
A1 A2 A3 A4
A1 A2 B1 A3 B2 B3 A4
A1 A2 B1 B2 A3 B3
A1 B1 A2 A3 B2 A4 B3
Server 2
Server 3
Server 1
Server 4 Server 5
Replication streams
© MariaDB Corpora,on Ab
MySQL GTID variables
37
!MariaDB [test]> show variables like '%gtid%';!+------------------------+-------------------------------!| Variable_name | Value !+------------------------+-------------------------------!| gtid_binlog_pos | 0-3-3,4-2-1,2-2-2,1-1-16 !| gtid_binlog_state | 0-3-3,4-2-1,2-2-2,1-1-16 !| gtid_current_pos | 1-1-16,0-3-3,3-1-4,2-2-2,4-2-1 !| gtid_domain_id | 4 !| gtid_ignore_duplicates | OFF !| gtid_seq_no | 0 !| gtid_slave_pos | 1-1-16,0-3-3,3-1-4,2-1-1 !| gtid_strict_mode | OFF !| last_gtid | 4-2-1 !+------------------------+-------------------------------!9 rows in set (0.00 sec)!
© MariaDB Corpora,on Ab
MariaDB GTID vs master_pos
38
MariaDB [(none)]> show slave status\G! Master_Host: 127.0.0.1! Master_User: repl! Master_Port: 3309! Master_Log_File: mysqld3-bin.000005! Read_Master_Log_Pos: 344!…! Slave_IO_Running: Yes! Slave_SQL_Running: Yes! Using_Gtid: Current_Pos! Gtid_IO_Pos: 1-1-16,0-3-3,3-1-4,2-1-1!
© MariaDB Corpora,on Ab
MariaDB GTID and mysqldump
39
mysqldump --master-data --all-databases --single-transaction --user=root -S /var/run/mysqld/mysqld3.sock > all.sql!!
…!-- MySQL dump 10.15 Distrib 10.0.15-MariaDB, for Linux (x86_64)!-- Position to start replication or point-in-time recovery from!CHANGE MASTER TO MASTER_LOG_FILE='mysqld3-bin.000005', MASTER_LOG_POS=344;!-- GTID to start replication from!-- SET GLOBAL gtid_slave_pos='1-1-16,0-3-3';!
© MariaDB Corpora,on Ab
MariaDB GTID and mysqldump Fully online opera,on
• START TRANSACTION WITH CONSISTENT SNAPSHOT Is used by mysqldump !• No need for FLUSH TABLES WITH READ LOCK; thant can badly hurt a server!
40
mysqldump --single-transaction --master-data !!
START TRANSACTION WITH CONSISTENT SNAPSHOT;!!SHOW STATUS LIKE 'binlog_snapshot%'. !+--------------------------+--------------------+!| Variable_name | Value |!+--------------------------+--------------------+!| Binlog_snapshot_file | mysqld2-bin.000009 |!| Binlog_snapshot_position | 376 |!+--------------------------+--------------------+!2 rows in set (0.00 sec)!
© MariaDB Corpora,on Ab
MariaDB GTID and mysqlbinlog
41
mysqlbinlog mysqld3-bin.000004!
!# at 793!#141209 15:30:40 server id 1 end_log_pos 831 !GTID 1-1-16!/*!100001 SET @@session.gtid_domain_id=1*//*!*/;!/*!100001 SET @@session.server_id=1*//*!*/;!/*!100001 SET @@session.gtid_seq_no=16*//*!*/;!BEGIN!/*!*/;!# at 831!#141209 15:30:40 server id 1 end_log_pos 921 !Query
!thread_id=3 !exec_time=0 !error_code=0!SET TIMESTAMP=1418139040/*!*/;!insert into titi values(35)!/*!*/;!# at 921!#141209 15:30:40 server id 1 end_log_pos 948 !Xid = 51!COMMIT/*!*/;!!
© MariaDB Corpora,on Ab
MariaDB GTID Master promo,on among slaves
• Promo,ng a new master among a set of slaves • new master needs to be ahead of all the other slaves to avoid losing events
42
MariaDB > select @@session.last_gtid;!| @@session.last_gtid !| 0-3-234 !
CHANGE MASTER TO master_host="S2";!START SLAVE UNTIL master_gtid_pos = "<S2 GTID position>";! …!CHANGE MASTER TO master_host="Sn";!START SLAVE UNTIL master_gtid_pos = "<Sn GTID position>";!
© MariaDB Corpora,on Ab
MariaDB GTID Robust DDL replica,on
• With MySQL it is forbidden when @@GLOBAL.ENFORCE_GTID_CONSISTENCY = 1
• With MariaDB it is safe with GTID replication • CREATE OR REPLACE TABLE added in MariaDB 10.0.8
to make replication more robust if it has to rollback and repeat statements such as CREATE ... SELECT on slaves.
43
CREATE TABLE ... SELECT !
CREATE OR REPLACE TABLE table_name (a int);!Instead of!DROP TABLE IF EXISTS table_name;!CREATE TABLE table_name (a int);!!slave_ddl_exec_mode=IDEMPOTENT or STRICT!
© MariaDB Corpora,on Ab
MySQL GTID related func,on Not needed for MariaDB
• MySQL implementa,on is based on GTID sets • MariaDB keeps the idea of posi,on in a stream (per domain id) : no complex GTID sets manipula,ons
• g,d_executed and g,d_purged system variables are GTID sets.
• func,ons on GTID • GTID_SUBSET() • GTID_SUBTRACT()
44
© MariaDB Corpora,on Ab
MariaDB / MySQL GTID efficiency
• When a slave connects he needs to do costly GTID sets opera,ons
• Scan all binlogs • What if 100s of binlog on the master
• Errant transac,on • Requires Set opera,on on G,ds to be solved
• Already applied MySQL G,ds are stored • In binlog • In Table in MySQL5.7 • Forever growing if holes in the sequence ?
45
© MariaDB Corpora,on Ab
MariaDB 10 Scalability: Mul,-‐Source Replica,on
• Collects data for
analytics usingbuilt-in replication.
• Aids in administrationexample: consolidatedbackups of multiple databases.
• Uses MariaDB 10’s improvedGlobal Transaction ID (GTID).
• Up to 64 masters Easier analytics, more insight,
simpler administration, fewer headaches.
Online E-Commerce Application
Master S S S S
Content Management
System
Click-stream data
Data Warehouse Slave ETL
Master S S S S
Master S S S S
MySQL MariaDB
✘
✔ in 10.0.
46
© MariaDB Corpora,on Ab
MariaDB 10 Mul,-‐Source Replica,on
• -‐-‐replicate-‐rewrite-‐db op,on can be set on a per-‐master-‐connec,on basis.
47
[mysqld]!...!master_usa.replicate-rewrite-db=customer->customer_usa!master_emea.replicate-rewrite-db=customer->customer_emea!...!
mysql> CHANGE MASTER 'master_usa' TO MASTER_HOST= …!mysql> CHANGE MASTER 'master_emea' TO MASTER_HOST=…!mysql> START ALL SLAVES;!
© MariaDB Corpora,on Ab
Consistent Read from slave Avoiding stale reads
• With MariaDB in posi,on mode • SELECT MASTER_POS_WAIT (“master_log_file' master_log_pews [, ,meout][, channel_name])
With MariaDB GTID based SELECT MASTER_GTID_WAIT(g,d-‐list[, ,meout) • reads can be send safely • Can be embedded in a proxy (MaxScale) • No need to replicate with GTID (always there)
48
© MariaDB Corpora,on Ab
Time Delayed Replica,on
• Introduced in MySQL 5.6 • Does not exist with MariaDB :
• -‐use percona pt-‐delay • Issue with relaylog
• unproper for SQL thread consump,on problem • relay-‐log-‐info-‐repository to TABLE, • relay-‐log-‐recovery = 1, • Needs to be refetch from master(if 100s slave L)
• MariaDB MaxScale Binlog Server can help (no burden on master)
49
© MariaDB Corpora,on Ab
Compa,bility between versions
• The principle has always been older version should replicate to newer version.
• MySQL 5.6 to MariaDB 10 replica,on • Ok if not in GTID mode
• MariaDB 10 to MySQL 5.6 replica,on
50
© MariaDB Corpora,on Ab
MariaDB Galera Cluster – HA Replica,on
• For true High Availability, we want all servers in a consistent state
• This requires Synchronous replica,on • But HA also requires
• Controlled Failover • Cluster status informa,on and management • Add and Remove cluster nodes
• To the rescure: MariaDB Galera Cluster
51
© MariaDB Corpora,on Ab
MariaDB Galera Cluster
• MariaDB Galera Cluster is separate from and does not depend on MariaDB Replica,on
• MariaDB Galera Cluster is a complete HA Cluster setup
• Based on and requires InnoDB • Synchronous replica,on • Mul,-‐master with conflict detec,on • All nodes are “Cluster aware” • Add and Remove node is built-‐in
52
© MariaDB Corpora,on Ab
MariaDB Galera Cluster
● Read & Write access to any node
● Client can connect to any node
● There can be several nodes ● Automa,c node
provisioning ● Replica,on is synchronous Galera Replication
MariaDB MariaDB MariaDB
53
© MariaDB Corpora,on Ab
MariaDB Galera Cluster setup
54
WSREP Library
WSREP Library
• MariaDB 10.0 version includes WSREP API • MariaDB 10.1 has this as standard, for 5.5 and
10.0 patched versions are available
• WSREP API installed as a MariaDB Plugin • WSREP does replica,on, failover and management
© MariaDB Corpora,on Ab
MariaDB Galera Cluster and Standard replica,on
55
Mgc1
Mgc2
Mgc3
Srv1 Srv2 Srv3
Asynchronous replication
Synchronous replication
© MariaDB Corpora,on Ab
MariaDB Galera Cluster and Standard replica,on
56
MariaDB Multi Source Asynchronous replication
Synchronous replication
© MariaDB Corpora,on Ab
MariaDB Galera Cluster and Standard replica,on
57
Mgc1
Slave
Asynchronous replication
Synchronous replication
Slave can easily bin to new master with GTID
57
© MariaDB Corpora,on Ab
MariaDB MaxScale Log Server
• A binlog server carries binlog / no databases • Small lag / Small drain on master • Easy reconfiguration of topology
58
© MariaDB Corpora,on Ab
MariaDB MaxScale Proxy
• Connection Load Balancing • RW splitting • Galera Connection Load
Balancing • MaxScale monitors the
backends status
• Other usages : filtering, auditing, query rewriting …
MaxScale
Technology Preview
59
© MariaDB Corpora,on Ab
MariaDB Replica,on
• S,ll a great area of improvement and innova,on
• Cri,cal for high scale architectures • Sharding + Replica,on
• Key architecture element for DBaaS in cloud (OpenStack, …)
• Smooth integra,on of asynchronous and synchronous technologies
• Tooling important here 60
© MariaDB Corpora,on Ab
Ques,ons?
www.mariadb.com www.facebook.com/mariadb.dbms
www.twiFer.com/mariadb -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
Serge Frezefond, Cloud Solu,on Architect
[email protected] @sfrezefond
hFp://serge.frezefond.com