Lessons Learned: Troubleshooting Replication

185
Lessons Learned: Troubleshooting Replication April, 11, 2017 Sveta Smirnova

Transcript of Lessons Learned: Troubleshooting Replication

Lessons Learned: Troubleshooting Replication

April, 11, 2017

Sveta Smirnova

∙ MySQL Support engineer∙ Author of

∙ MySQL Troubleshooting∙ JSON UDF functions∙ FILTER clause for MySQL

∙ Speaker∙ Percona Live, OOW, Fosdem,

DevConf, HighLoad...

Sveta Smirnova

2

∙Typical Replication Errors∙MySQL and MariaDB Replication: Must Know∙Master∙Slave IO thread∙Slave SQL thread∙Multithreaded slave∙Multi-master

Table of Contents

3

Typical Replication Errors

Replication Stopped

5

Slave Lags from the Master

6

Master Increases Resource Usage

7

Not a Full List

8

MySQL and MariaDB Replication: Must Know

Master

Sends the packet ->

Slave<- Initiates

<- Requests a packet

... ?

Asynchronous

10

Master

Sends the packet ->

Slave<- Initiates<- Requests a packet

... ?

Asynchronous

10

Master

Sends the packet ->

Slave<- Initiates<- Requests a packet

... ?

Asynchronous

10

Master

Sends the packet ->

Slave<- Initiates<- Requests a packet

... ?

Asynchronous

10

Master

Sends the packet ->Waits "Ack"

Slave<- Initiates

<- Requests a packet

<- Sends "Ack"

Semisynchrous Plugin

11

Master

Sends the packet ->Waits "Ack"

Slave<- Initiates<- Requests a packet

<- Sends "Ack"

Semisynchrous Plugin

11

Master

Sends the packet ->

Waits "Ack"

Slave<- Initiates<- Requests a packet

<- Sends "Ack"

Semisynchrous Plugin

11

Master

Sends the packet ->Waits "Ack"

Slave<- Initiates<- Requests a packet

<- Sends "Ack"

Semisynchrous Plugin

11

Master

Sends the packet ->Waits "Ack"

Slave<- Initiates<- Requests a packet

<- Sends "Ack"

Semisynchrous Plugin

11

MasterRecieves a change

Sends to SE ->

Writes into binary logSynchronizes ->

Storage Engine

Writes into table<- Returns control

<- Synchronizes

Logical

12

MasterRecieves a changeSends to SE ->

Writes into binary logSynchronizes ->

Storage Engine

Writes into table<- Returns control

<- Synchronizes

Logical

12

MasterRecieves a changeSends to SE ->

Writes into binary logSynchronizes ->

Storage Engine

Writes into table

<- Returns control

<- Synchronizes

Logical

12

MasterRecieves a changeSends to SE ->

Writes into binary logSynchronizes ->

Storage Engine

Writes into table<- Returns control

<- Synchronizes

Logical

12

MasterRecieves a changeSends to SE ->

Writes into binary log

Synchronizes ->

Storage Engine

Writes into table<- Returns control

<- Synchronizes

Logical

12

MasterRecieves a changeSends to SE ->

Writes into binary logSynchronizes ->

Storage Engine

Writes into table<- Returns control

<- Synchronizes

Logical

12

IO threadReads from the master

Stores in the relay log

SQL thread

<- Reads from the relay logExecutes

Two Kinds of Slave Threads

13

IO threadReads from the masterStores in the relay log

SQL thread

<- Reads from the relay logExecutes

Two Kinds of Slave Threads

13

IO threadReads from the masterStores in the relay log

SQL thread

<- Reads from the relay log

Executes

Two Kinds of Slave Threads

13

IO threadReads from the masterStores in the relay log

SQL thread

<- Reads from the relay logExecutes

Two Kinds of Slave Threads

13

∙ Multiple SQL threads in 10.0.5+/5.6+

∙ From the troubleshooting point of view

∙ Single IO thread∙ Single Relay log∙ Slave lag still possible∙ Error in one thread stops all

Several SQL Threads

14

∙ Multiple SQL threads in 10.0.5+/5.6+∙ From the troubleshooting point of view

∙ Single IO thread

∙ Single Relay log∙ Slave lag still possible∙ Error in one thread stops all

Several SQL Threads

14

∙ Multiple SQL threads in 10.0.5+/5.6+∙ From the troubleshooting point of view

∙ Single IO thread∙ Single Relay log

∙ Slave lag still possible∙ Error in one thread stops all

Several SQL Threads

14

∙ Multiple SQL threads in 10.0.5+/5.6+∙ From the troubleshooting point of view

∙ Single IO thread∙ Single Relay log∙ Slave lag still possible

∙ Error in one thread stops all

Several SQL Threads

14

∙ Multiple SQL threads in 10.0.5+/5.6+∙ From the troubleshooting point of view

∙ Single IO thread∙ Single Relay log∙ Slave lag still possible∙ Error in one thread stops all

Several SQL Threads

14

∙ Multiple masters in 10.0.1+/5.7+

∙ From the troubleshooting point of view

∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution

Multi-Source (Multi-Channel)

15

∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view

∙ Multiple sets of relay logs

∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution

Multi-Source (Multi-Channel)

15

∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view

∙ Multiple sets of relay logs∙ Multiple IO threads

∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution

Multi-Source (Multi-Channel)

15

∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view

∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads

∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution

Multi-Source (Multi-Channel)

15

∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view

∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel

∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution

Multi-Source (Multi-Channel)

15

∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view

∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent

∙ Error in one stops only one∙ No automatic conflict resolution

Multi-Source (Multi-Channel)

15

∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view

∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one

∙ No automatic conflict resolution

Multi-Source (Multi-Channel)

15

∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view

∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution

Multi-Source (Multi-Channel)

15

∙ You must specify∙ Name of the master’s binary log file∙ Position

∙ From the troubleshooting point ov view

∙ Event executes if on the current position∙ Easy to skip∙ Easy to move position backward∙ No conflict resolution

Position-Based

16

∙ You must specify∙ Name of the master’s binary log file∙ Position

∙ From the troubleshooting point ov view∙ Event executes if on the current position

∙ Easy to skip∙ Easy to move position backward∙ No conflict resolution

Position-Based

16

∙ You must specify∙ Name of the master’s binary log file∙ Position

∙ From the troubleshooting point ov view∙ Event executes if on the current position∙ Easy to skip

∙ Easy to move position backward∙ No conflict resolution

Position-Based

16

∙ You must specify∙ Name of the master’s binary log file∙ Position

∙ From the troubleshooting point ov view∙ Event executes if on the current position∙ Easy to skip∙ Easy to move position backward

∙ No conflict resolution

Position-Based

16

∙ You must specify∙ Name of the master’s binary log file∙ Position

∙ From the troubleshooting point ov view∙ Event executes if on the current position∙ Easy to skip∙ Easy to move position backward∙ No conflict resolution

Position-Based

16

∙ Each transaction has unique number: GTID

∙ MySQL: AUTO_POSITION=1∙ MariaDB: master_use_gtid = { slave_pos | current_pos }

∙ No need to specify binary log and position

Global Transaction Identifiers (GTID)

17

∙ Each transaction has unique number: GTID∙ MySQL: AUTO_POSITION=1

∙ MariaDB: master_use_gtid = { slave_pos | current_pos }

∙ No need to specify binary log and position

Global Transaction Identifiers (GTID)

17

∙ Each transaction has unique number: GTID∙ MySQL: AUTO_POSITION=1∙ MariaDB: master_use_gtid = { slave_pos | current_pos }

∙ No need to specify binary log and position

Global Transaction Identifiers (GTID)

17

∙ Each transaction has unique number: GTID∙ MySQL: AUTO_POSITION=1∙ MariaDB: master_use_gtid = { slave_pos | current_pos }

∙ No need to specify binary log and position

Global Transaction Identifiers (GTID)

17

Client

INSERT INTO ... ->

Binary log

SET TIMESTAMP...SET sql_mode...INSERT INTO ...

Statement-Based Binary Log Format

18

ClientINSERT INTO ... ->

Binary log

SET TIMESTAMP...SET sql_mode...INSERT INTO ...

Statement-Based Binary Log Format

18

ClientINSERT INTO ... ->

Binary log

SET TIMESTAMP...

SET sql_mode...INSERT INTO ...

Statement-Based Binary Log Format

18

ClientINSERT INTO ... ->

Binary log

SET TIMESTAMP...SET sql_mode...

INSERT INTO ...

Statement-Based Binary Log Format

18

ClientINSERT INTO ... ->

Binary log

SET TIMESTAMP...SET sql_mode...INSERT INTO ...

Statement-Based Binary Log Format

18

Client

UPDATE ... ->

Binary log

SET TIMESTAMP...SET sql_mode...Row before changesRow with changes

Row-Based Binary Log Format

19

ClientUPDATE ... ->

Binary log

SET TIMESTAMP...SET sql_mode...Row before changesRow with changes

Row-Based Binary Log Format

19

ClientUPDATE ... ->

Binary log

SET TIMESTAMP...

SET sql_mode...Row before changesRow with changes

Row-Based Binary Log Format

19

ClientUPDATE ... ->

Binary log

SET TIMESTAMP...SET sql_mode...

Row before changesRow with changes

Row-Based Binary Log Format

19

ClientUPDATE ... ->

Binary log

SET TIMESTAMP...SET sql_mode...Row before changes

Row with changes

Row-Based Binary Log Format

19

ClientUPDATE ... ->

Binary log

SET TIMESTAMP...SET sql_mode...Row before changesRow with changes

Row-Based Binary Log Format

19

∙ Error log file

∙ At the slave∙ At the master∙ Percona Toolkit∙ MySQL Utilities

Main Instruments

20

∙ Error log file∙ At the slave

∙ SHOW SLAVE STATUS∙ MySQL: Tables in Performance Schema∙ System database mysql

∙ At the master∙ Percona Toolkit∙ MySQL Utilities

Main Instruments

20

∙ Error log file∙ At the slave∙ At the master

∙ SHOW MASTER STATUS∙ SHOW BINLOG EVENTS∙ mysqlbinlog

∙ Percona Toolkit∙ MySQL Utilities

Main Instruments

20

∙ Error log file∙ At the slave∙ At the master∙ Percona Toolkit

∙ MySQL Utilities

Main Instruments

20

∙ Error log file∙ At the slave∙ At the master∙ Percona Toolkit∙ MySQL Utilities

Main Instruments

20

∙ Always available, requires setup∙ Asynchronous∙ Master

∙ Keeps all changes in the binary logTwo formats: ROW и STATEMENT

∙ Slave∙ IO thread reads from the master into relay log∙ SQL thread executes updates

Multiple SQL threads in 10.0.5+/5.6+Multiple channels/sources (masters) in 10.0.1+/5.7+

∙ GTID in 10.0.2+/5.6+

Replication Must Know: Summary

21

Master

∙ More writes∙ binlog_row_image =

FULL | MINIMAL | NOBLOB

∙ Synchronization∙ sync_binlog∙ Do not disable!∙ You may set it greater than 1

Performance

23

∙ More writes∙ binlog_row_image =

FULL | MINIMAL | NOBLOB∙ binlog_cache_size

Watch Binlog_cache_disk_use

∙ Synchronization∙ sync_binlog∙ Do not disable!∙ You may set it greater than 1

Performance

23

∙ More writes∙ binlog_row_image =

FULL | MINIMAL | NOBLOB∙ binlog_cache_size

Watch Binlog_cache_disk_use

∙ binlog_stmt_cache_sizeWatch Binlog_stmt_cache_disk_use

∙ Synchronization∙ sync_binlog∙ Do not disable!∙ You may set it greater than 1

Performance

23

∙ More writes∙ Synchronization

∙ sync_binlog∙ Do not disable!∙ You may set it greater than 1

Performance

23

∙ Binary log lifetime∙ expire_log_days

∙ Synchronization∙ Order of records in the binary log

Behavior

24

∙ Binary log lifetime∙ Synchronization

∙ SBR is not safe with READ COMMITTED andREAD UNCOMMITTED

∙ Order of records in the binary log

Behavior

24

∙ Binary log lifetime∙ Synchronization∙ Order of records in the binary log

∙ Non-deterministic events and SBR

Behavior

24

Slave IO thread

∙ SHOW SLAVE STATUSSlave_IO_Running: Connecting

Slave_SQL_Running: Yes...

Last_IO_Errno: 1045Last_IO_Error: error connecting to master ’[email protected]:13000’ -

retry-time: 60 retries: 1Last_SQL_Errno: 0Last_SQL_Error:...

Slave_SQL_Running_State: Slave has read all relay log; waiting for more updatesMaster_Retry_Count: 86400

Master_Bind:Last_IO_Error_Timestamp: 160824 03:18:36

Last_SQL_Error_Timestamp:

∙ P_S.replication_connection_status∙ Error log∙ Access

∙ MySQL client slave’s login-password

SHOW GRANTS∙ Fix privileges on the master∙ Restart slave

Network

26

∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status

mysql> select * from performance_schema.replication_connection_status\G*************************** 1. row ***************************

CHANNEL_NAME:GROUP_NAME:

SOURCE_UUID:THREAD_ID: NULL

SERVICE_STATE: CONNECTINGCOUNT_RECEIVED_HEARTBEATS: 0LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00RECEIVED_TRANSACTION_SET:

LAST_ERROR_NUMBER: 1045LAST_ERROR_MESSAGE: error connecting to master ’[email protected]:13000’ -

retry-time: 60 retries: 4LAST_ERROR_TIMESTAMP: 2016-08-24 03:21:36

1 row in set (0,01 sec)

∙ Error log∙ Access

∙ MySQL client slave’s login-password

SHOW GRANTS∙ Fix privileges on the master∙ Restart slave

Network

26

∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log

2016-08-24T00:18:36.077384Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 1, Error_code: 10452016-08-24T00:19:36.299011Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 2, Error_code: 10452016-08-24T00:20:36.485315Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 3, Error_code: 10452016-08-24T00:21:36.677915Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 4, Error_code: 10452016-08-24T00:22:36.872066Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 5, Error_code: 1045

∙ Access∙ MySQL client slave’s login-password

SHOW GRANTS∙ Fix privileges on the master∙ Restart slave

Network

26

∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log∙ Access

$ perror 1045MySQL error code 1045 (ER_ACCESS_DENIED_ERROR): Access denied for user ’%-.48s’@’%-.64s’(using password: %s)

∙ MySQL client slave’s login-password

SHOW GRANTS∙ Fix privileges on the master∙ Restart slave

Network

26

∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log∙ Access

∙ MySQL client slave’s login-password$ mysql -h127.0.0.1 -P13000 -uroot -pbarWarning: Using a password on the command line interface can be insecure.ERROR 1045 (28000): Access denied for user ’root’@’localhost’ (using password: YES)

SHOW GRANTS∙ Fix privileges on the master∙ Restart slave

Network

26

∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log∙ Access

∙ MySQL client slave’s login-passwordSHOW GRANTSmysql> SHOW GRANTS;+----------------------------------+| Grants for foo@% |+----------------------------------+| GRANT SELECT ON *.* TO ’foo’@’%’ |+----------------------------------+

∙ Fix privileges on the master∙ Restart slave

Network

26

∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log∙ Access

∙ MySQL client slave’s login-passwordSHOW GRANTS

∙ Fix privileges on the master∙ Restart slave

Network

26

∙ Regular performance troubleshooting∙ Check with command line client∙ Troubleshooting hardware resource usage

webinar

Performance

27

Slave SQL thread

∙ One master - one slave∙ Different data

Slave cannot execute event from the relay log

∙ Different errors on master and slave∙ Slave lags behind the master

∙ Circle replication and other writes in additionto SQL thread

∙ Different data

SQL thread: typical issues

29

∙ One master - one slave∙ Different data

Slave cannot execute event from the relay log

∙ Different errors on master and slave∙ Slave lags behind the master

∙ Circle replication and other writes in additionto SQL thread

∙ Different data

SQL thread: typical issues

29

∙ Did table change outside of the replication?∙ How?∙ Can it cause conflict with changes on the master?

∙ Are table structures identical?∙ Are changes in the correct order?

Different Data

30

∙ Did table change outside of the replication?∙ Are table structures identical?

∙ Percona Toolkitpt-table-checksum, pt-table-sync

∙ MySQL UtilitiesMySQL: mysqlrplsyncmysqldbcompare, mysqldiff

∙ Are changes in the correct order?

Different Data

30

∙ Did table change outside of the replication?∙ Are table structures identical?∙ Are changes in the correct order?

∙ mysqlbinlog∙ Application logic on the master

Different Data

30

∙ Only with SBR

∙ Row-level locks∙ Triggers∙ Different options: for old versions

Updates in the wrong order

31

∙ Only with SBR∙ Row-level locks

∙ Triggers∙ Different options: for old versions

Updates in the wrong order

31

∙ Only with SBR∙ Row-level locks∙ Triggers

∙ SET GLOBAL sql_slave_skip_counter – NoGTIDs!

∙ Skip transaction – GTIDs∙ Synchronize tables!

∙ Different options: for old versions

Updates in the wrong order

31

∙ Only with SBR∙ Row-level locks∙ Triggers∙ Different options: for old versions

∙ Start slave with master’s options∙ Restart SQL thread∙ Most issues are fixed in recent versions

Updates in the wrong order

31

∙ Threads∙ Master executes changes in multiple threads∙ Slave uses one

∙ Seconds_behind_master increases – Youcannot 100% rely on this number!

∙ Tune slave performance

Slave lags from the master

32

∙ Threads∙ Seconds_behind_master increases – You

cannot 100% rely on this number!

∙ Tune slave performance

Slave lags from the master

32

∙ Threads∙ Seconds_behind_master increases – You

cannot 100% rely on this number!∙ Tune slave performance

∙ Multi-threaded slaveOne thread for one database in MySQL 5.6There may be conflicts between multiple slave SQL threads

Slave lags from the master

32

∙ Threads∙ Seconds_behind_master increases – You

cannot 100% rely on this number!∙ Tune slave performance

∙ Multi-threaded slaveOne thread for one database in MySQL 5.6There may be conflicts between multiple slave SQL threads

∙ Indexes on the slaveMakes sense for SBR only

Slave lags from the master

32

Multithreaded slave

∙ Single relay log∙ Speed in high concurrent environment may be

less than on master

∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none

Performance

34

∙ Single relay log∙ Speed in high concurrent environment may be

less than on master∙ MySQL: slave_parallel_workers

∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none

Performance

34

∙ Single relay log∙ Speed in high concurrent environment may be

less than on master∙ MySQL: slave_parallel_workers∙ MySQL: slave_parallel_type=DATABASE | LOGICAL_CLOCK

∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none

Performance

34

∙ Single relay log∙ Speed in high concurrent environment may be

less than on master∙ MariaDB: slave_parallel_threads

∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none

Performance

34

∙ Single relay log∙ Speed in high concurrent environment may be

less than on master∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued

∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none

Performance

34

∙ Single relay log∙ Speed in high concurrent environment may be

less than on master∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads

∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none

Performance

34

∙ Single relay log∙ Speed in high concurrent environment may be

less than on master∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none

Performance

34

∙ Same methods as for single-threaded

∙ Error of one thread stops all

Wrong Behavior

35

∙ Same methods as for single-threaded∙ Error of one thread stops all

mysql> select WORKER_ID, SERVICE_STATE, LAST_SEEN_TRANSACTION, LAST_ERROR_NUMBER,-> LAST_ERROR_MESSAGE from performance_schema.replication_applier_status_by_worker\G

*************************** 1. row ***************************WORKER_ID: 1

SERVICE_STATE: OFFLAST_SEEN_TRANSACTION: d318bc17-66dc-11e6-a471-30b5c2208a0f:4988

LAST_ERROR_NUMBER: 0LAST_ERROR_MESSAGE:

*************************** 2. row ***************************WORKER_ID: 3

SERVICE_STATE: OFFLAST_SEEN_TRANSACTION: d318bc17-66dc-11e6-a471-30b5c2208a0f:4986

LAST_ERROR_NUMBER: 1032LAST_ERROR_MESSAGE: Worker 2 failed executing transaction...

Wrong Behavior

35

∙ Same methods as for single-threaded∙ Error of one thread stops all

MariaDB [test]> select id, command, time, state, progress from information_schema.processlist-> where user=’system user’;

+----+---------+------+------------------------------------------------------------------+| id | command | time | state |+----+---------+------+------------------------------------------------------------------+| 25 | Connect | 4738 | Waiting for master to send event || 24 | Connect | 5096 | Slave has read all relay log; waiting for the slave I/O thread t || 23 | Connect | 0 | Waiting for work from SQL thread || 22 | Connect | 0 | Unlocking tables || 21 | Connect | 0 | Update_rows_log_event::ha_update_row(-1) || 20 | Connect | 0 | Waiting for prior transaction to start commit before starting ne || 19 | Connect | 0 | Update_rows_log_event::ha_update_row(-1) || 18 | Connect | 0 | Update_rows_log_event::ha_update_row(-1) || 17 | Connect | 0 | Update_rows_log_event::find_row(-1)...

Wrong Behavior

35

Multi-master

∙ Replication must be set for eachchannel/source

∙ You may use master with GTID and withoutsame time

∙ Same issues as with regular replication∙ MySQL: Filters work for all channels∙ MariaDB: You may setup filters for each source

Specifics

37

∙ Replication must be set for eachchannel/source

∙ You may use master with GTID and withoutsame time

∙ Same issues as with regular replication∙ MySQL: Filters work for all channels∙ MariaDB: You may setup filters for each source

Specifics

37

∙ Replication must be set for eachchannel/source

∙ You may use master with GTID and withoutsame time

∙ Same issues as with regular replication

∙ MySQL: Filters work for all channels∙ MariaDB: You may setup filters for each source

Specifics

37

∙ Replication must be set for eachchannel/source

∙ You may use master with GTID and withoutsame time

∙ Same issues as with regular replication∙ MySQL: Filters work for all channels

∙ MariaDB: You may setup filters for each source

Specifics

37

∙ Replication must be set for eachchannel/source

∙ You may use master with GTID and withoutsame time

∙ Same issues as with regular replication∙ MySQL: Filters work for all channels∙ MariaDB: You may setup filters for each source

Specifics

37

Summary

∙ Issues on the master∙ Same as for standalone server∙ More writes and consistency checks

∙ Slave IO thread∙ Slave SQL thread

Summary

39

∙ Issues on the master∙ Slave IO thread

∙ Common network issues∙ mysql command line client for tests

∙ Slave SQL thread

Summary

39

∙ Issues on the master∙ Slave IO thread∙ Slave SQL thread

∙ Regular query-related issues∙ Regular storage engine issues∙ Less execution threads than on master

Summary

39

???

Time for questions

41

http://www.slideshare.net/SvetaSmirnova

https://twitter.com/svetsmirnova

https://github.com/svetasmirnova

Thank You!

42

Appendix

AppendixReplication in Details

∙ Is data up to date?∙ Are these same?

∙ Table structures∙ Storage Engine∙ Data

∙ Any write can break replication

Asynchronous: Slave Q&A

45

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?∙ What does "Ack"mean?∙ What happens in case of timeout?

Semi-synchrous Replication

46

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?

∙ What does "Ack"mean?∙ What happens in case of timeout?

Semi-synchrous Replication

46

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?∙ Before 5.7: from single slave

∙ What does "Ack"mean?∙ What happens in case of timeout?

Semi-synchrous Replication

46

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?∙ Before 5.7: from single slave∙ Now in MySQL:rpl_semi_sync_master_wait_for_slave_count

∙ What does "Ack"mean?∙ What happens in case of timeout?

Semi-synchrous Replication

46

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?∙ Before 5.7: from single slave∙ Now in MySQL:rpl_semi_sync_master_wait_for_slave_count

∙ Would not wait others

∙ What does "Ack"mean?∙ What happens in case of timeout?

Semi-synchrous Replication

46

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?∙ What does "Ack"mean?

∙ What happens in case of timeout?

Semi-synchrous Replication

46

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?∙ What does "Ack"mean?

∙ Event is written into relay log

∙ What happens in case of timeout?

Semi-synchrous Replication

46

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?∙ What does "Ack"mean?

∙ Event is written into relay log∙ No guarantee it is executed

∙ What happens in case of timeout?

Semi-synchrous Replication

46

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?∙ What does "Ack"mean?∙ What happens in case of timeout?

Semi-synchrous Replication

46

∙ Writes on master are slower than in case ofasynchronous

∙ How many "Ack"s waits master?∙ What does "Ack"mean?∙ What happens in case of timeout?

∙ Replication becomes asynchronous

Semi-synchrous Replication

46

∙ Every change written twice:SE files:logs, data, ... Binary Log

∙ You can write on slave

Logical Replication

47

∙ Every change written twice:SE files:logs, data, ... Binary Log

∙ You can write on slave

Logical Replication

47

∙ Does not exist in MySQL/MariaSB!

∙ Master writes only into SE files∙ Which are replicated to slave∙ From the troubleshooting point of view

∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break

Just for Comparison: Physical Replication

48

∙ Does not exist in MySQL/MariaSB!∙ There are two closed-source solutions

∙ Master writes only into SE files∙ Which are replicated to slave∙ From the troubleshooting point of view

∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break

Just for Comparison: Physical Replication

48

∙ Does not exist in MySQL/MariaSB!∙ Master writes only into SE files

∙ Which are replicated to slave∙ From the troubleshooting point of view

∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break

Just for Comparison: Physical Replication

48

∙ Does not exist in MySQL/MariaSB!∙ Master writes only into SE files∙ Which are replicated to slave

∙ From the troubleshooting point of view∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break

Just for Comparison: Physical Replication

48

∙ Does not exist in MySQL/MariaSB!∙ Master writes only into SE files∙ Which are replicated to slave∙ From the troubleshooting point of view

∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break

Just for Comparison: Physical Replication

48

∙ Data transfer∙ Execution∙ Different

∙ Diagnostics∙ Fixes

Two Kinds of Threads – Two Kinds of Issues

49

∙ Guaranteed that every transaction will beexecuted only once

∙ Simple failover∙ It is not easy to skip a transaction

∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;

∙ Be careful with expire_logs_days!

GTID

50

∙ Guaranteed that every transaction will beexecuted only once

∙ Simple failover

∙ It is not easy to skip a transaction

∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;

∙ Be careful with expire_logs_days!

GTID

50

∙ Guaranteed that every transaction will beexecuted only once

∙ Simple failover∙ It is not easy to skip a transaction

∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;

∙ Be careful with expire_logs_days!

GTID

50

∙ Guaranteed that every transaction will beexecuted only once

∙ Simple failover∙ It is not easy to skip a transaction

∙ MySQL: use mysqlslavetrx

∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;

∙ Be careful with expire_logs_days!

GTID

50

∙ Guaranteed that every transaction will beexecuted only once

∙ Simple failover∙ It is not easy to skip a transaction

∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;

∙ Be careful with expire_logs_days!

GTID

50

∙ Guaranteed that every transaction will beexecuted only once

∙ Simple failover∙ It is not easy to skip a transaction

∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;

∙ Be careful with expire_logs_days!

GTID

50

∙ Statement-based (SBR)∙ Queries are written as received

∙ There is a risk of data inconsistency (non-safe)INSERT IGNORELIMIT without ORDER BYNon-deterministic functions...

∙ Row-based (RBR)∙ Mixed

Binary Log Formats

51

∙ Statement-based (SBR)∙ Queries are written as received∙ There is a risk of data inconsistency (non-safe)

INSERT IGNORELIMIT without ORDER BYNon-deterministic functions...

∙ Row-based (RBR)∙ Mixed

Binary Log Formats

51

∙ Statement-based (SBR)∙ Row-based (RBR)

∙ Usually more data are writtenIOTransfer speedbinlog_row_image

∙ Performance may be worse if table does not haveprimary (unique) key

MariaDB may use any index

∙ Mixed

Binary Log Formats

51

∙ Statement-based (SBR)∙ Row-based (RBR)

∙ Usually more data are writtenIOTransfer speedbinlog_row_image

∙ Performance may be worse if table does not haveprimary (unique) key

MariaDB may use any index

∙ Mixed

Binary Log Formats

51

∙ Statement-based (SBR)∙ Row-based (RBR)∙ Mixed

∙ Advantages of both formats

Binary Log Formats

51

AppendixMain Tools by Example

∙ Slave start

∙ Errors∙ Slave stop

Error log

53

∙ Slave start∙ Errors

2016-08-23T12:11:21.867440Z 4 [ERROR] Slave SQL for channel ’master-1’: Could not executeUpdate_rows event on table m2.t1; Can’t find record in ’t1’, Error_code: 1032; handler errorHA_ERR_END_OF_FILE; the event’s master log master-bin.000001, end_log_pos 1213, Error_code: 10322016-08-23T12:11:21.867471Z 4 [Warning] Slave: Can’t find record in ’t1’ Error_code: 10322016-08-23T12:11:21.867484Z 4 [ERROR] Error running query, slave SQL thread aborted.Fix the problem, and restart the slave SQL thread with "SLAVE START".We stopped at log ’master-bin.000001’ position 989

∙ Slave stop

Error log

53

∙ Slave start∙ Errors∙ Slave stop

Error log

53

All information about slave∙ IO thread Configuration∙ SQL thread Configuration∙ IO thread Status∙ SQL thread Status∙ Errors

Only last oneAll are in the error log

SHOW SLAVE STATUS

54

mysql> show slave status \G*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send eventMaster_Host: 127.0.0.1...

Master_Log_File: master-bin.000002Read_Master_Log_Pos: 63810611

Relay_Log_File: [email protected]_Log_Pos: 1156

Relay_Master_Log_File: master-bin.000001Slave_IO_Running: Yes

Slave_SQL_Running: No...

Replicate_Wild_Ignore_Table:Last_Errno: 1032Last_Error: Could not execute Update_rows event on table m2.t1; Can’t findrecord in ’t1’, Error_code: 1032; handler error HA_ERR_END_OF_FILE;the event’s master log master-bin.000001, end_log_pos 1213

Skip_Counter: 0...

SHOW SLAVE STATUS

54

∙ No need to parse SHOW

∙ Configuration∙ IO thread status∙ SQL thread status

Tables in Performance Schema

55

∙ No need to parse SHOW∙ Configuration

∙ replication_connection_configuration∙ replication_applier_configuration∙

mysql> select * from replication_connection_configuration-> join replication_applier_configuration using(channel_name);

∙ IO thread status∙ SQL thread status

Tables in Performance Schema

55

∙ No need to parse SHOW∙ Configuration∙ IO thread status

∙ replication_connection_status

∙ SQL thread status

Tables in Performance Schema

55

∙ No need to parse SHOW∙ Configuration∙ IO thread status∙ SQL thread status

∙ replication_applier_status∙ replication_applier_status_by_coordinator - MTS!

mysql> select * from replication_applier_status join-> replication_applier_status_by_coordinator-> using(channel_name);

Tables in Performance Schema

55

∙ No need to parse SHOW∙ Configuration∙ IO thread status∙ SQL thread status

∙ replication_applier_status∙ replication_applier_status_by_worker

mysql> select * from replication_applier_status join-> replication_applier_status_by_worker-> using(channel_name);

Tables in Performance Schema

55

∙ Master Infomysql> select * from slave_master_info\G*************************** 1. row ***************************

Number_of_lines: 25Master_log_name: mysqld-bin.000001Master_log_pos: 154

Host: 127.0.0.1User_name: root

User_password: secretPort: 13000

Connect_retry: 60Enabled_ssl: 0...

Uuid: 31ed7c8f-74ea-11e6-8de8-30b5c2208a0fRetry_count: 86400...

Enabled_auto_position: 1...

∙ Relay log info∙ Worker info: multi-threaded slave

system database mysql: only on the slave

56

∙ Master Info∙ Relay log info

mysql> select * from slave_relay_log_info\G*************************** 1. row ***************************

Number_of_lines: 7Relay_log_name: ./[email protected]_log_pos: 1156

Master_log_name: master-bin.000001Master_log_pos: 989

Sql_delay: 0Number_of_workers: 0

Id: 1Channel_name: master-1

∙ Worker info: multi-threaded slave

system database mysql: only on the slave

56

∙ Master Info∙ Relay log info∙ Worker info: multi-threaded slave

mysql> select * from slave_worker_info\G*************************** 1. row ***************************

Id: 1...

*************************** 8. row ***************************Id: 8

Relay_log_name: ./Thinkie-relay-bin.000004Relay_log_pos: 1216

Master_log_name: mysqld-bin.000001Master_log_pos: 1342

Checkpoint_relay_log_name: ./Thinkie-relay-bin.000004Checkpoint_relay_log_pos: 963

Checkpoint_master_log_name: mysqld-bin.000001Checkpoint_master_log_pos: 1089

...

system database mysql: only on the slave

56

mysql> show master status\G*************************** 1. row ***************************

File: master-bin.000005Position: 154

Binlog_Do_DB:Binlog_Ignore_DB:

Executed_Gtid_Set:1 row in set (0,00 sec)

SHOW MASTER STATUS

57

mysql> show binlog events in ’master-bin.000001’ from 989;+-------------------+------+----------------+-----------+-------------+--------------------------------+| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |+-------------------+------+----------------+-----------+-------------+--------------------------------+| master-bin.000001 | 989 | Anonymous_Gtid | 1 | 1054 | SET @@SESSION.GTID_NEXT= ... || master-bin.000001 | 1054 | Query | 1 | 1124 | BEGIN || master-bin.000001 | 1124 | Table_map | 1 | 1167 | table_id: 109 (m2.t1) || master-bin.000001 | 1167 | Update_rows | 1 | 1213 | table_id: 109 flags: STMT_END_F|| master-bin.000001 | 1213 | Xid | 1 | 1244 | COMMIT /* xid=64 */ |+-------------------+------+----------------+-----------+-------------+--------------------------------+5 rows in set (0,00 sec)

SHOW BINLOG EVENTS

58

$ mysqlbinlog var/mysqld.1/data/master-bin.000001 –start-position=989 –stop-position=1213...# at 1167#160822 14:15:11 server id 1 end_log_pos 1213 CRC32 0x1f346c6bUpdate_rows: table id 109 flags: STMT_END_F

BINLOG ’v966VxMBAAAAKwAAAI8EAAAAAG0AAAAAAAEAAm0yAAJ0MQABAwABY2HOoQ==v966Vx8BAAAALgAAAL0EAAAAAG0AAAAAAAEAAgAB///+BQAAAP4GAAAAa2w0Hw==’/*!*/;ROLLBACK /* added by mysqlbinlog */ /*!*/;SET @@SESSION.GTID_NEXT= ’AUTOMATIC’ /* added by mysqlbinlog */ /*!*/;...

mysqlbinlog

59

$ mysqlbinlog -v var/mysqld.1/data/master-bin.000001 –start-position=989 –stop-position=1213...# at 1167#160822 14:15:11 server id 1 end_log_pos 1213 CRC32 0x1f346c6bUpdate_rows: table id 109 flags: STMT_END_F

BINLOG ’v966VxMBAAAAKwAAAI8EAAAAAG0AAAAAAAEAAm0yAAJ0MQABAwABY2HOoQ==v966Vx8BAAAALgAAAL0EAAAAAG0AAAAAAAEAAgAB///+BQAAAP4GAAAAa2w0Hw==’/*!*/;### UPDATE ‘m2‘.‘t1‘### WHERE### @1=5### SET### @1=6ROLLBACK /* added by mysqlbinlog */ /*!*/;SET @@SESSION.GTID_NEXT= ’AUTOMATIC’ /* added by mysqlbinlog */ /*!*/;...

mysqlbinlog

60

∙ Percona Toolkit∙ pt-table-checksum

Checks data consistency

∙ MySQL Utilities

Toolkits

61

∙ Percona Toolkit∙ pt-table-checksum

Checks data consistency∙ pt-table-sync

Fixes data inconsistencies

∙ MySQL Utilities

Toolkits

61

∙ Percona Toolkit∙ pt-table-checksum

Checks data consistency∙ pt-table-sync

Fixes data inconsistencies∙ pt-slave-find

Shows topology

∙ MySQL Utilities

Toolkits

61

∙ MySQL Utilities∙ mysqlrplcheck

Checks if MySQL servers are ready to replicate

Toolkits

61

∙ MySQL Utilities∙ mysqlrplcheck

Checks if MySQL servers are ready to replicate∙ mysqlrplshow

Shows topology

Toolkits

61

∙ MySQL Utilities∙ mysqlrplcheck

Checks if MySQL servers are ready to replicate∙ mysqlrplshow

Shows topology∙ mysqlrplsync

Checks data consistency

Toolkits

61

∙ MySQL Utilities∙ mysqlrplcheck

Checks if MySQL servers are ready to replicate∙ mysqlrplshow

Shows topology∙ mysqlrplsync

Checks data consistency∙ mysqlslavetrx

Skips 1-N transactions

Toolkits

61

∙ MySQL Utilities∙ mysqldbcompare

Compares two databasesMariaDB-friendly

Toolkits

61

∙ MySQL Utilities∙ mysqldbcompare

Compares two databasesMariaDB-friendly

∙ mysqldiffChecks objects definitionsMariaDB-friendly

Toolkits

61

∙ MySQL Utilities∙ mysqldbcompare

Compares two databasesMariaDB-friendly

∙ mysqldiffChecks objects definitionsMariaDB-friendly

∙ mysqlserverinfoShows main options, such as port and datadirReplication-orientedMariaDB-friendly

Toolkits

61

∙ Error log∙ Slave

∙ SHOW SLAVE STATUS∙ Tables in Performance Schema∙ Tables in mysql database

∙ Master∙ SHOW MASTER STATUS∙ SHOW BINLOG EVENTS

∙ mysqlbinlog∙ mysql command line client

Main Instruments: Summary

62