Lessons Learned: Troubleshooting Replication
-
Upload
sveta-smirnova -
Category
Software
-
view
390 -
download
1
Transcript of Lessons Learned: Troubleshooting Replication
∙ MySQL Support engineer∙ Author of
∙ MySQL Troubleshooting∙ JSON UDF functions∙ FILTER clause for MySQL
∙ Speaker∙ Percona Live, OOW, Fosdem,
DevConf, HighLoad...
Sveta Smirnova
2
∙Typical Replication Errors∙MySQL and MariaDB Replication: Must Know∙Master∙Slave IO thread∙Slave SQL thread∙Multithreaded slave∙Multi-master
Table of Contents
3
Master
Sends the packet ->Waits "Ack"
Slave<- Initiates
<- Requests a packet
<- Sends "Ack"
Semisynchrous Plugin
11
Master
Sends the packet ->Waits "Ack"
Slave<- Initiates<- Requests a packet
<- Sends "Ack"
Semisynchrous Plugin
11
Master
Sends the packet ->
Waits "Ack"
Slave<- Initiates<- Requests a packet
<- Sends "Ack"
Semisynchrous Plugin
11
Master
Sends the packet ->Waits "Ack"
Slave<- Initiates<- Requests a packet
<- Sends "Ack"
Semisynchrous Plugin
11
Master
Sends the packet ->Waits "Ack"
Slave<- Initiates<- Requests a packet
<- Sends "Ack"
Semisynchrous Plugin
11
MasterRecieves a change
Sends to SE ->
Writes into binary logSynchronizes ->
Storage Engine
Writes into table<- Returns control
<- Synchronizes
Logical
12
MasterRecieves a changeSends to SE ->
Writes into binary logSynchronizes ->
Storage Engine
Writes into table<- Returns control
<- Synchronizes
Logical
12
MasterRecieves a changeSends to SE ->
Writes into binary logSynchronizes ->
Storage Engine
Writes into table
<- Returns control
<- Synchronizes
Logical
12
MasterRecieves a changeSends to SE ->
Writes into binary logSynchronizes ->
Storage Engine
Writes into table<- Returns control
<- Synchronizes
Logical
12
MasterRecieves a changeSends to SE ->
Writes into binary log
Synchronizes ->
Storage Engine
Writes into table<- Returns control
<- Synchronizes
Logical
12
MasterRecieves a changeSends to SE ->
Writes into binary logSynchronizes ->
Storage Engine
Writes into table<- Returns control
<- Synchronizes
Logical
12
IO threadReads from the master
Stores in the relay log
SQL thread
<- Reads from the relay logExecutes
Two Kinds of Slave Threads
13
IO threadReads from the masterStores in the relay log
SQL thread
<- Reads from the relay logExecutes
Two Kinds of Slave Threads
13
IO threadReads from the masterStores in the relay log
SQL thread
<- Reads from the relay log
Executes
Two Kinds of Slave Threads
13
IO threadReads from the masterStores in the relay log
SQL thread
<- Reads from the relay logExecutes
Two Kinds of Slave Threads
13
∙ Multiple SQL threads in 10.0.5+/5.6+
∙ From the troubleshooting point of view
∙ Single IO thread∙ Single Relay log∙ Slave lag still possible∙ Error in one thread stops all
Several SQL Threads
14
∙ Multiple SQL threads in 10.0.5+/5.6+∙ From the troubleshooting point of view
∙ Single IO thread
∙ Single Relay log∙ Slave lag still possible∙ Error in one thread stops all
Several SQL Threads
14
∙ Multiple SQL threads in 10.0.5+/5.6+∙ From the troubleshooting point of view
∙ Single IO thread∙ Single Relay log
∙ Slave lag still possible∙ Error in one thread stops all
Several SQL Threads
14
∙ Multiple SQL threads in 10.0.5+/5.6+∙ From the troubleshooting point of view
∙ Single IO thread∙ Single Relay log∙ Slave lag still possible
∙ Error in one thread stops all
Several SQL Threads
14
∙ Multiple SQL threads in 10.0.5+/5.6+∙ From the troubleshooting point of view
∙ Single IO thread∙ Single Relay log∙ Slave lag still possible∙ Error in one thread stops all
Several SQL Threads
14
∙ Multiple masters in 10.0.1+/5.7+
∙ From the troubleshooting point of view
∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution
Multi-Source (Multi-Channel)
15
∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view
∙ Multiple sets of relay logs
∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution
Multi-Source (Multi-Channel)
15
∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view
∙ Multiple sets of relay logs∙ Multiple IO threads
∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution
Multi-Source (Multi-Channel)
15
∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view
∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads
∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution
Multi-Source (Multi-Channel)
15
∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view
∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel
∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution
Multi-Source (Multi-Channel)
15
∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view
∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent
∙ Error in one stops only one∙ No automatic conflict resolution
Multi-Source (Multi-Channel)
15
∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view
∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one
∙ No automatic conflict resolution
Multi-Source (Multi-Channel)
15
∙ Multiple masters in 10.0.1+/5.7+∙ From the troubleshooting point of view
∙ Multiple sets of relay logs∙ Multiple IO threads∙ Multiple SQL threads∙ MySQL: slave_parallel_workers for each channel∙ Channels/sources are independent∙ Error in one stops only one∙ No automatic conflict resolution
Multi-Source (Multi-Channel)
15
∙ You must specify∙ Name of the master’s binary log file∙ Position
∙ From the troubleshooting point ov view
∙ Event executes if on the current position∙ Easy to skip∙ Easy to move position backward∙ No conflict resolution
Position-Based
16
∙ You must specify∙ Name of the master’s binary log file∙ Position
∙ From the troubleshooting point ov view∙ Event executes if on the current position
∙ Easy to skip∙ Easy to move position backward∙ No conflict resolution
Position-Based
16
∙ You must specify∙ Name of the master’s binary log file∙ Position
∙ From the troubleshooting point ov view∙ Event executes if on the current position∙ Easy to skip
∙ Easy to move position backward∙ No conflict resolution
Position-Based
16
∙ You must specify∙ Name of the master’s binary log file∙ Position
∙ From the troubleshooting point ov view∙ Event executes if on the current position∙ Easy to skip∙ Easy to move position backward
∙ No conflict resolution
Position-Based
16
∙ You must specify∙ Name of the master’s binary log file∙ Position
∙ From the troubleshooting point ov view∙ Event executes if on the current position∙ Easy to skip∙ Easy to move position backward∙ No conflict resolution
Position-Based
16
∙ Each transaction has unique number: GTID
∙ MySQL: AUTO_POSITION=1∙ MariaDB: master_use_gtid = { slave_pos | current_pos }
∙ No need to specify binary log and position
Global Transaction Identifiers (GTID)
17
∙ Each transaction has unique number: GTID∙ MySQL: AUTO_POSITION=1
∙ MariaDB: master_use_gtid = { slave_pos | current_pos }
∙ No need to specify binary log and position
Global Transaction Identifiers (GTID)
17
∙ Each transaction has unique number: GTID∙ MySQL: AUTO_POSITION=1∙ MariaDB: master_use_gtid = { slave_pos | current_pos }
∙ No need to specify binary log and position
Global Transaction Identifiers (GTID)
17
∙ Each transaction has unique number: GTID∙ MySQL: AUTO_POSITION=1∙ MariaDB: master_use_gtid = { slave_pos | current_pos }
∙ No need to specify binary log and position
Global Transaction Identifiers (GTID)
17
Client
INSERT INTO ... ->
Binary log
SET TIMESTAMP...SET sql_mode...INSERT INTO ...
Statement-Based Binary Log Format
18
ClientINSERT INTO ... ->
Binary log
SET TIMESTAMP...SET sql_mode...INSERT INTO ...
Statement-Based Binary Log Format
18
ClientINSERT INTO ... ->
Binary log
SET TIMESTAMP...
SET sql_mode...INSERT INTO ...
Statement-Based Binary Log Format
18
ClientINSERT INTO ... ->
Binary log
SET TIMESTAMP...SET sql_mode...
INSERT INTO ...
Statement-Based Binary Log Format
18
ClientINSERT INTO ... ->
Binary log
SET TIMESTAMP...SET sql_mode...INSERT INTO ...
Statement-Based Binary Log Format
18
Client
UPDATE ... ->
Binary log
SET TIMESTAMP...SET sql_mode...Row before changesRow with changes
Row-Based Binary Log Format
19
ClientUPDATE ... ->
Binary log
SET TIMESTAMP...SET sql_mode...Row before changesRow with changes
Row-Based Binary Log Format
19
ClientUPDATE ... ->
Binary log
SET TIMESTAMP...
SET sql_mode...Row before changesRow with changes
Row-Based Binary Log Format
19
ClientUPDATE ... ->
Binary log
SET TIMESTAMP...SET sql_mode...
Row before changesRow with changes
Row-Based Binary Log Format
19
ClientUPDATE ... ->
Binary log
SET TIMESTAMP...SET sql_mode...Row before changes
Row with changes
Row-Based Binary Log Format
19
ClientUPDATE ... ->
Binary log
SET TIMESTAMP...SET sql_mode...Row before changesRow with changes
Row-Based Binary Log Format
19
∙ Error log file
∙ At the slave∙ At the master∙ Percona Toolkit∙ MySQL Utilities
Main Instruments
20
∙ Error log file∙ At the slave
∙ SHOW SLAVE STATUS∙ MySQL: Tables in Performance Schema∙ System database mysql
∙ At the master∙ Percona Toolkit∙ MySQL Utilities
Main Instruments
20
∙ Error log file∙ At the slave∙ At the master
∙ SHOW MASTER STATUS∙ SHOW BINLOG EVENTS∙ mysqlbinlog
∙ Percona Toolkit∙ MySQL Utilities
Main Instruments
20
∙ Error log file∙ At the slave∙ At the master∙ Percona Toolkit
∙ MySQL Utilities
Main Instruments
20
∙ Always available, requires setup∙ Asynchronous∙ Master
∙ Keeps all changes in the binary logTwo formats: ROW и STATEMENT
∙ Slave∙ IO thread reads from the master into relay log∙ SQL thread executes updates
Multiple SQL threads in 10.0.5+/5.6+Multiple channels/sources (masters) in 10.0.1+/5.7+
∙ GTID in 10.0.2+/5.6+
Replication Must Know: Summary
21
∙ More writes∙ binlog_row_image =
FULL | MINIMAL | NOBLOB
∙ Synchronization∙ sync_binlog∙ Do not disable!∙ You may set it greater than 1
Performance
23
∙ More writes∙ binlog_row_image =
FULL | MINIMAL | NOBLOB∙ binlog_cache_size
Watch Binlog_cache_disk_use
∙ Synchronization∙ sync_binlog∙ Do not disable!∙ You may set it greater than 1
Performance
23
∙ More writes∙ binlog_row_image =
FULL | MINIMAL | NOBLOB∙ binlog_cache_size
Watch Binlog_cache_disk_use
∙ binlog_stmt_cache_sizeWatch Binlog_stmt_cache_disk_use
∙ Synchronization∙ sync_binlog∙ Do not disable!∙ You may set it greater than 1
Performance
23
∙ More writes∙ Synchronization
∙ sync_binlog∙ Do not disable!∙ You may set it greater than 1
Performance
23
∙ Binary log lifetime∙ expire_log_days
∙ Synchronization∙ Order of records in the binary log
Behavior
24
∙ Binary log lifetime∙ Synchronization
∙ SBR is not safe with READ COMMITTED andREAD UNCOMMITTED
∙ Order of records in the binary log
Behavior
24
∙ Binary log lifetime∙ Synchronization∙ Order of records in the binary log
∙ Non-deterministic events and SBR
Behavior
24
∙ SHOW SLAVE STATUSSlave_IO_Running: Connecting
Slave_SQL_Running: Yes...
Last_IO_Errno: 1045Last_IO_Error: error connecting to master ’[email protected]:13000’ -
retry-time: 60 retries: 1Last_SQL_Errno: 0Last_SQL_Error:...
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updatesMaster_Retry_Count: 86400
Master_Bind:Last_IO_Error_Timestamp: 160824 03:18:36
Last_SQL_Error_Timestamp:
∙ P_S.replication_connection_status∙ Error log∙ Access
∙ MySQL client slave’s login-password
SHOW GRANTS∙ Fix privileges on the master∙ Restart slave
Network
26
∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status
mysql> select * from performance_schema.replication_connection_status\G*************************** 1. row ***************************
CHANNEL_NAME:GROUP_NAME:
SOURCE_UUID:THREAD_ID: NULL
SERVICE_STATE: CONNECTINGCOUNT_RECEIVED_HEARTBEATS: 0LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00RECEIVED_TRANSACTION_SET:
LAST_ERROR_NUMBER: 1045LAST_ERROR_MESSAGE: error connecting to master ’[email protected]:13000’ -
retry-time: 60 retries: 4LAST_ERROR_TIMESTAMP: 2016-08-24 03:21:36
1 row in set (0,01 sec)
∙ Error log∙ Access
∙ MySQL client slave’s login-password
SHOW GRANTS∙ Fix privileges on the master∙ Restart slave
Network
26
∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log
2016-08-24T00:18:36.077384Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 1, Error_code: 10452016-08-24T00:19:36.299011Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 2, Error_code: 10452016-08-24T00:20:36.485315Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 3, Error_code: 10452016-08-24T00:21:36.677915Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 4, Error_code: 10452016-08-24T00:22:36.872066Z 3 [ERROR] Slave I/O for channel ”: error connecting tomaster ’[email protected]:13000’ - retry-time: 60 retries: 5, Error_code: 1045
∙ Access∙ MySQL client slave’s login-password
SHOW GRANTS∙ Fix privileges on the master∙ Restart slave
Network
26
∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log∙ Access
$ perror 1045MySQL error code 1045 (ER_ACCESS_DENIED_ERROR): Access denied for user ’%-.48s’@’%-.64s’(using password: %s)
∙ MySQL client slave’s login-password
SHOW GRANTS∙ Fix privileges on the master∙ Restart slave
Network
26
∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log∙ Access
∙ MySQL client slave’s login-password$ mysql -h127.0.0.1 -P13000 -uroot -pbarWarning: Using a password on the command line interface can be insecure.ERROR 1045 (28000): Access denied for user ’root’@’localhost’ (using password: YES)
SHOW GRANTS∙ Fix privileges on the master∙ Restart slave
Network
26
∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log∙ Access
∙ MySQL client slave’s login-passwordSHOW GRANTSmysql> SHOW GRANTS;+----------------------------------+| Grants for foo@% |+----------------------------------+| GRANT SELECT ON *.* TO ’foo’@’%’ |+----------------------------------+
∙ Fix privileges on the master∙ Restart slave
Network
26
∙ SHOW SLAVE STATUS∙ P_S.replication_connection_status∙ Error log∙ Access
∙ MySQL client slave’s login-passwordSHOW GRANTS
∙ Fix privileges on the master∙ Restart slave
Network
26
∙ Regular performance troubleshooting∙ Check with command line client∙ Troubleshooting hardware resource usage
webinar
Performance
27
∙ One master - one slave∙ Different data
Slave cannot execute event from the relay log
∙ Different errors on master and slave∙ Slave lags behind the master
∙ Circle replication and other writes in additionto SQL thread
∙ Different data
SQL thread: typical issues
29
∙ One master - one slave∙ Different data
Slave cannot execute event from the relay log
∙ Different errors on master and slave∙ Slave lags behind the master
∙ Circle replication and other writes in additionto SQL thread
∙ Different data
SQL thread: typical issues
29
∙ Did table change outside of the replication?∙ How?∙ Can it cause conflict with changes on the master?
∙ Are table structures identical?∙ Are changes in the correct order?
Different Data
30
∙ Did table change outside of the replication?∙ Are table structures identical?
∙ Percona Toolkitpt-table-checksum, pt-table-sync
∙ MySQL UtilitiesMySQL: mysqlrplsyncmysqldbcompare, mysqldiff
∙ Are changes in the correct order?
Different Data
30
∙ Did table change outside of the replication?∙ Are table structures identical?∙ Are changes in the correct order?
∙ mysqlbinlog∙ Application logic on the master
Different Data
30
∙ Only with SBR
∙ Row-level locks∙ Triggers∙ Different options: for old versions
Updates in the wrong order
31
∙ Only with SBR∙ Row-level locks
∙ Triggers∙ Different options: for old versions
Updates in the wrong order
31
∙ Only with SBR∙ Row-level locks∙ Triggers
∙ SET GLOBAL sql_slave_skip_counter – NoGTIDs!
∙ Skip transaction – GTIDs∙ Synchronize tables!
∙ Different options: for old versions
Updates in the wrong order
31
∙ Only with SBR∙ Row-level locks∙ Triggers∙ Different options: for old versions
∙ Start slave with master’s options∙ Restart SQL thread∙ Most issues are fixed in recent versions
Updates in the wrong order
31
∙ Threads∙ Master executes changes in multiple threads∙ Slave uses one
∙ Seconds_behind_master increases – Youcannot 100% rely on this number!
∙ Tune slave performance
Slave lags from the master
32
∙ Threads∙ Seconds_behind_master increases – You
cannot 100% rely on this number!
∙ Tune slave performance
Slave lags from the master
32
∙ Threads∙ Seconds_behind_master increases – You
cannot 100% rely on this number!∙ Tune slave performance
∙ Multi-threaded slaveOne thread for one database in MySQL 5.6There may be conflicts between multiple slave SQL threads
Slave lags from the master
32
∙ Threads∙ Seconds_behind_master increases – You
cannot 100% rely on this number!∙ Tune slave performance
∙ Multi-threaded slaveOne thread for one database in MySQL 5.6There may be conflicts between multiple slave SQL threads
∙ Indexes on the slaveMakes sense for SBR only
Slave lags from the master
32
∙ Single relay log∙ Speed in high concurrent environment may be
less than on master
∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none
Performance
34
∙ Single relay log∙ Speed in high concurrent environment may be
less than on master∙ MySQL: slave_parallel_workers
∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none
Performance
34
∙ Single relay log∙ Speed in high concurrent environment may be
less than on master∙ MySQL: slave_parallel_workers∙ MySQL: slave_parallel_type=DATABASE | LOGICAL_CLOCK
∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none
Performance
34
∙ Single relay log∙ Speed in high concurrent environment may be
less than on master∙ MariaDB: slave_parallel_threads
∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none
Performance
34
∙ Single relay log∙ Speed in high concurrent environment may be
less than on master∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued
∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none
Performance
34
∙ Single relay log∙ Speed in high concurrent environment may be
less than on master∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads
∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none
Performance
34
∙ Single relay log∙ Speed in high concurrent environment may be
less than on master∙ MariaDB: slave_parallel_threads∙ MariaDB: slave_parallel_max_queued∙ MariaDB: slave_domain_parallel_threads∙ MariaDB: slave_parallel_mode=optimistic | conservative |aggressive | minimal | none
Performance
34
∙ Same methods as for single-threaded∙ Error of one thread stops all
mysql> select WORKER_ID, SERVICE_STATE, LAST_SEEN_TRANSACTION, LAST_ERROR_NUMBER,-> LAST_ERROR_MESSAGE from performance_schema.replication_applier_status_by_worker\G
*************************** 1. row ***************************WORKER_ID: 1
SERVICE_STATE: OFFLAST_SEEN_TRANSACTION: d318bc17-66dc-11e6-a471-30b5c2208a0f:4988
LAST_ERROR_NUMBER: 0LAST_ERROR_MESSAGE:
*************************** 2. row ***************************WORKER_ID: 3
SERVICE_STATE: OFFLAST_SEEN_TRANSACTION: d318bc17-66dc-11e6-a471-30b5c2208a0f:4986
LAST_ERROR_NUMBER: 1032LAST_ERROR_MESSAGE: Worker 2 failed executing transaction...
Wrong Behavior
35
∙ Same methods as for single-threaded∙ Error of one thread stops all
MariaDB [test]> select id, command, time, state, progress from information_schema.processlist-> where user=’system user’;
+----+---------+------+------------------------------------------------------------------+| id | command | time | state |+----+---------+------+------------------------------------------------------------------+| 25 | Connect | 4738 | Waiting for master to send event || 24 | Connect | 5096 | Slave has read all relay log; waiting for the slave I/O thread t || 23 | Connect | 0 | Waiting for work from SQL thread || 22 | Connect | 0 | Unlocking tables || 21 | Connect | 0 | Update_rows_log_event::ha_update_row(-1) || 20 | Connect | 0 | Waiting for prior transaction to start commit before starting ne || 19 | Connect | 0 | Update_rows_log_event::ha_update_row(-1) || 18 | Connect | 0 | Update_rows_log_event::ha_update_row(-1) || 17 | Connect | 0 | Update_rows_log_event::find_row(-1)...
Wrong Behavior
35
∙ Replication must be set for eachchannel/source
∙ You may use master with GTID and withoutsame time
∙ Same issues as with regular replication∙ MySQL: Filters work for all channels∙ MariaDB: You may setup filters for each source
Specifics
37
∙ Replication must be set for eachchannel/source
∙ You may use master with GTID and withoutsame time
∙ Same issues as with regular replication∙ MySQL: Filters work for all channels∙ MariaDB: You may setup filters for each source
Specifics
37
∙ Replication must be set for eachchannel/source
∙ You may use master with GTID and withoutsame time
∙ Same issues as with regular replication
∙ MySQL: Filters work for all channels∙ MariaDB: You may setup filters for each source
Specifics
37
∙ Replication must be set for eachchannel/source
∙ You may use master with GTID and withoutsame time
∙ Same issues as with regular replication∙ MySQL: Filters work for all channels
∙ MariaDB: You may setup filters for each source
Specifics
37
∙ Replication must be set for eachchannel/source
∙ You may use master with GTID and withoutsame time
∙ Same issues as with regular replication∙ MySQL: Filters work for all channels∙ MariaDB: You may setup filters for each source
Specifics
37
∙ Issues on the master∙ Same as for standalone server∙ More writes and consistency checks
∙ Slave IO thread∙ Slave SQL thread
Summary
39
∙ Issues on the master∙ Slave IO thread
∙ Common network issues∙ mysql command line client for tests
∙ Slave SQL thread
Summary
39
∙ Issues on the master∙ Slave IO thread∙ Slave SQL thread
∙ Regular query-related issues∙ Regular storage engine issues∙ Less execution threads than on master
Summary
39
∙ Basic Techniques – troubleshooting webinar∙ Troubleshooting hardware webinar∙ Introduction into SE troubleshooting webinar
∙ Percona Monitoring and Management∙ Percona Toolkit∙ MySQL Utilities∙ Book MySQL High Availability∙ MySQL Replication Team blog∙ Replication in MariaDB
More Information
40
∙ Basic Techniques – troubleshooting webinar∙ Troubleshooting hardware webinar∙ Introduction into SE troubleshooting webinar∙ Percona Monitoring and Management∙ Percona Toolkit∙ MySQL Utilities
∙ Book MySQL High Availability∙ MySQL Replication Team blog∙ Replication in MariaDB
More Information
40
∙ Basic Techniques – troubleshooting webinar∙ Troubleshooting hardware webinar∙ Introduction into SE troubleshooting webinar∙ Percona Monitoring and Management∙ Percona Toolkit∙ MySQL Utilities∙ Book MySQL High Availability∙ MySQL Replication Team blog∙ Replication in MariaDB
More Information
40
http://www.slideshare.net/SvetaSmirnova
https://twitter.com/svetsmirnova
https://github.com/svetasmirnova
Thank You!
42
∙ Is data up to date?∙ Are these same?
∙ Table structures∙ Storage Engine∙ Data
∙ Any write can break replication
Asynchronous: Slave Q&A
45
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?∙ What does "Ack"mean?∙ What happens in case of timeout?
Semi-synchrous Replication
46
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?
∙ What does "Ack"mean?∙ What happens in case of timeout?
Semi-synchrous Replication
46
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?∙ Before 5.7: from single slave
∙ What does "Ack"mean?∙ What happens in case of timeout?
Semi-synchrous Replication
46
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?∙ Before 5.7: from single slave∙ Now in MySQL:rpl_semi_sync_master_wait_for_slave_count
∙ What does "Ack"mean?∙ What happens in case of timeout?
Semi-synchrous Replication
46
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?∙ Before 5.7: from single slave∙ Now in MySQL:rpl_semi_sync_master_wait_for_slave_count
∙ Would not wait others
∙ What does "Ack"mean?∙ What happens in case of timeout?
Semi-synchrous Replication
46
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?∙ What does "Ack"mean?
∙ What happens in case of timeout?
Semi-synchrous Replication
46
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?∙ What does "Ack"mean?
∙ Event is written into relay log
∙ What happens in case of timeout?
Semi-synchrous Replication
46
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?∙ What does "Ack"mean?
∙ Event is written into relay log∙ No guarantee it is executed
∙ What happens in case of timeout?
Semi-synchrous Replication
46
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?∙ What does "Ack"mean?∙ What happens in case of timeout?
Semi-synchrous Replication
46
∙ Writes on master are slower than in case ofasynchronous
∙ How many "Ack"s waits master?∙ What does "Ack"mean?∙ What happens in case of timeout?
∙ Replication becomes asynchronous
Semi-synchrous Replication
46
∙ Every change written twice:SE files:logs, data, ... Binary Log
∙ You can write on slave
Logical Replication
47
∙ Every change written twice:SE files:logs, data, ... Binary Log
∙ You can write on slave
Logical Replication
47
∙ Does not exist in MySQL/MariaSB!
∙ Master writes only into SE files∙ Which are replicated to slave∙ From the troubleshooting point of view
∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break
Just for Comparison: Physical Replication
48
∙ Does not exist in MySQL/MariaSB!∙ There are two closed-source solutions
∙ Master writes only into SE files∙ Which are replicated to slave∙ From the troubleshooting point of view
∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break
Just for Comparison: Physical Replication
48
∙ Does not exist in MySQL/MariaSB!∙ Master writes only into SE files
∙ Which are replicated to slave∙ From the troubleshooting point of view
∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break
Just for Comparison: Physical Replication
48
∙ Does not exist in MySQL/MariaSB!∙ Master writes only into SE files∙ Which are replicated to slave
∙ From the troubleshooting point of view∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break
Just for Comparison: Physical Replication
48
∙ Does not exist in MySQL/MariaSB!∙ Master writes only into SE files∙ Which are replicated to slave∙ From the troubleshooting point of view
∙ IO: changes are written only once∙ You cannot write on slave in parallel∙ Any data inconsistency leads to replication break
Just for Comparison: Physical Replication
48
∙ Data transfer∙ Execution∙ Different
∙ Diagnostics∙ Fixes
Two Kinds of Threads – Two Kinds of Issues
49
∙ Guaranteed that every transaction will beexecuted only once
∙ Simple failover∙ It is not easy to skip a transaction
∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;
∙ Be careful with expire_logs_days!
GTID
50
∙ Guaranteed that every transaction will beexecuted only once
∙ Simple failover
∙ It is not easy to skip a transaction
∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;
∙ Be careful with expire_logs_days!
GTID
50
∙ Guaranteed that every transaction will beexecuted only once
∙ Simple failover∙ It is not easy to skip a transaction
∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;
∙ Be careful with expire_logs_days!
GTID
50
∙ Guaranteed that every transaction will beexecuted only once
∙ Simple failover∙ It is not easy to skip a transaction
∙ MySQL: use mysqlslavetrx
∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;
∙ Be careful with expire_logs_days!
GTID
50
∙ Guaranteed that every transaction will beexecuted only once
∙ Simple failover∙ It is not easy to skip a transaction
∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;
∙ Be careful with expire_logs_days!
GTID
50
∙ Guaranteed that every transaction will beexecuted only once
∙ Simple failover∙ It is not easy to skip a transaction
∙ MySQL: use mysqlslavetrx∙ MariaDB: set global gtid_slave_pos=’X-Y-Z’;
∙ Be careful with expire_logs_days!
GTID
50
∙ Statement-based (SBR)∙ Queries are written as received
∙ There is a risk of data inconsistency (non-safe)INSERT IGNORELIMIT without ORDER BYNon-deterministic functions...
∙ Row-based (RBR)∙ Mixed
Binary Log Formats
51
∙ Statement-based (SBR)∙ Queries are written as received∙ There is a risk of data inconsistency (non-safe)
INSERT IGNORELIMIT without ORDER BYNon-deterministic functions...
∙ Row-based (RBR)∙ Mixed
Binary Log Formats
51
∙ Statement-based (SBR)∙ Row-based (RBR)
∙ Usually more data are writtenIOTransfer speedbinlog_row_image
∙ Performance may be worse if table does not haveprimary (unique) key
MariaDB may use any index
∙ Mixed
Binary Log Formats
51
∙ Statement-based (SBR)∙ Row-based (RBR)
∙ Usually more data are writtenIOTransfer speedbinlog_row_image
∙ Performance may be worse if table does not haveprimary (unique) key
MariaDB may use any index
∙ Mixed
Binary Log Formats
51
∙ Slave start∙ Errors
2016-08-23T12:11:21.867440Z 4 [ERROR] Slave SQL for channel ’master-1’: Could not executeUpdate_rows event on table m2.t1; Can’t find record in ’t1’, Error_code: 1032; handler errorHA_ERR_END_OF_FILE; the event’s master log master-bin.000001, end_log_pos 1213, Error_code: 10322016-08-23T12:11:21.867471Z 4 [Warning] Slave: Can’t find record in ’t1’ Error_code: 10322016-08-23T12:11:21.867484Z 4 [ERROR] Error running query, slave SQL thread aborted.Fix the problem, and restart the slave SQL thread with "SLAVE START".We stopped at log ’master-bin.000001’ position 989
∙ Slave stop
Error log
53
All information about slave∙ IO thread Configuration∙ SQL thread Configuration∙ IO thread Status∙ SQL thread Status∙ Errors
Only last oneAll are in the error log
SHOW SLAVE STATUS
54
mysql> show slave status \G*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send eventMaster_Host: 127.0.0.1...
Master_Log_File: master-bin.000002Read_Master_Log_Pos: 63810611
Relay_Log_File: [email protected]_Log_Pos: 1156
Relay_Master_Log_File: master-bin.000001Slave_IO_Running: Yes
Slave_SQL_Running: No...
Replicate_Wild_Ignore_Table:Last_Errno: 1032Last_Error: Could not execute Update_rows event on table m2.t1; Can’t findrecord in ’t1’, Error_code: 1032; handler error HA_ERR_END_OF_FILE;the event’s master log master-bin.000001, end_log_pos 1213
Skip_Counter: 0...
SHOW SLAVE STATUS
54
∙ No need to parse SHOW
∙ Configuration∙ IO thread status∙ SQL thread status
Tables in Performance Schema
55
∙ No need to parse SHOW∙ Configuration
∙ replication_connection_configuration∙ replication_applier_configuration∙
mysql> select * from replication_connection_configuration-> join replication_applier_configuration using(channel_name);
∙ IO thread status∙ SQL thread status
Tables in Performance Schema
55
∙ No need to parse SHOW∙ Configuration∙ IO thread status
∙ replication_connection_status
∙ SQL thread status
Tables in Performance Schema
55
∙ No need to parse SHOW∙ Configuration∙ IO thread status∙ SQL thread status
∙ replication_applier_status∙ replication_applier_status_by_coordinator - MTS!
mysql> select * from replication_applier_status join-> replication_applier_status_by_coordinator-> using(channel_name);
Tables in Performance Schema
55
∙ No need to parse SHOW∙ Configuration∙ IO thread status∙ SQL thread status
∙ replication_applier_status∙ replication_applier_status_by_worker
mysql> select * from replication_applier_status join-> replication_applier_status_by_worker-> using(channel_name);
Tables in Performance Schema
55
∙ Master Infomysql> select * from slave_master_info\G*************************** 1. row ***************************
Number_of_lines: 25Master_log_name: mysqld-bin.000001Master_log_pos: 154
Host: 127.0.0.1User_name: root
User_password: secretPort: 13000
Connect_retry: 60Enabled_ssl: 0...
Uuid: 31ed7c8f-74ea-11e6-8de8-30b5c2208a0fRetry_count: 86400...
Enabled_auto_position: 1...
∙ Relay log info∙ Worker info: multi-threaded slave
system database mysql: only on the slave
56
∙ Master Info∙ Relay log info
mysql> select * from slave_relay_log_info\G*************************** 1. row ***************************
Number_of_lines: 7Relay_log_name: ./[email protected]_log_pos: 1156
Master_log_name: master-bin.000001Master_log_pos: 989
Sql_delay: 0Number_of_workers: 0
Id: 1Channel_name: master-1
∙ Worker info: multi-threaded slave
system database mysql: only on the slave
56
∙ Master Info∙ Relay log info∙ Worker info: multi-threaded slave
mysql> select * from slave_worker_info\G*************************** 1. row ***************************
Id: 1...
*************************** 8. row ***************************Id: 8
Relay_log_name: ./Thinkie-relay-bin.000004Relay_log_pos: 1216
Master_log_name: mysqld-bin.000001Master_log_pos: 1342
Checkpoint_relay_log_name: ./Thinkie-relay-bin.000004Checkpoint_relay_log_pos: 963
Checkpoint_master_log_name: mysqld-bin.000001Checkpoint_master_log_pos: 1089
...
system database mysql: only on the slave
56
mysql> show master status\G*************************** 1. row ***************************
File: master-bin.000005Position: 154
Binlog_Do_DB:Binlog_Ignore_DB:
Executed_Gtid_Set:1 row in set (0,00 sec)
SHOW MASTER STATUS
57
mysql> show binlog events in ’master-bin.000001’ from 989;+-------------------+------+----------------+-----------+-------------+--------------------------------+| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |+-------------------+------+----------------+-----------+-------------+--------------------------------+| master-bin.000001 | 989 | Anonymous_Gtid | 1 | 1054 | SET @@SESSION.GTID_NEXT= ... || master-bin.000001 | 1054 | Query | 1 | 1124 | BEGIN || master-bin.000001 | 1124 | Table_map | 1 | 1167 | table_id: 109 (m2.t1) || master-bin.000001 | 1167 | Update_rows | 1 | 1213 | table_id: 109 flags: STMT_END_F|| master-bin.000001 | 1213 | Xid | 1 | 1244 | COMMIT /* xid=64 */ |+-------------------+------+----------------+-----------+-------------+--------------------------------+5 rows in set (0,00 sec)
SHOW BINLOG EVENTS
58
$ mysqlbinlog var/mysqld.1/data/master-bin.000001 –start-position=989 –stop-position=1213...# at 1167#160822 14:15:11 server id 1 end_log_pos 1213 CRC32 0x1f346c6bUpdate_rows: table id 109 flags: STMT_END_F
BINLOG ’v966VxMBAAAAKwAAAI8EAAAAAG0AAAAAAAEAAm0yAAJ0MQABAwABY2HOoQ==v966Vx8BAAAALgAAAL0EAAAAAG0AAAAAAAEAAgAB///+BQAAAP4GAAAAa2w0Hw==’/*!*/;ROLLBACK /* added by mysqlbinlog */ /*!*/;SET @@SESSION.GTID_NEXT= ’AUTOMATIC’ /* added by mysqlbinlog */ /*!*/;...
mysqlbinlog
59
$ mysqlbinlog -v var/mysqld.1/data/master-bin.000001 –start-position=989 –stop-position=1213...# at 1167#160822 14:15:11 server id 1 end_log_pos 1213 CRC32 0x1f346c6bUpdate_rows: table id 109 flags: STMT_END_F
BINLOG ’v966VxMBAAAAKwAAAI8EAAAAAG0AAAAAAAEAAm0yAAJ0MQABAwABY2HOoQ==v966Vx8BAAAALgAAAL0EAAAAAG0AAAAAAAEAAgAB///+BQAAAP4GAAAAa2w0Hw==’/*!*/;### UPDATE ‘m2‘.‘t1‘### WHERE### @1=5### SET### @1=6ROLLBACK /* added by mysqlbinlog */ /*!*/;SET @@SESSION.GTID_NEXT= ’AUTOMATIC’ /* added by mysqlbinlog */ /*!*/;...
mysqlbinlog
60
∙ Percona Toolkit∙ pt-table-checksum
Checks data consistency∙ pt-table-sync
Fixes data inconsistencies
∙ MySQL Utilities
Toolkits
61
∙ Percona Toolkit∙ pt-table-checksum
Checks data consistency∙ pt-table-sync
Fixes data inconsistencies∙ pt-slave-find
Shows topology
∙ MySQL Utilities
Toolkits
61
∙ MySQL Utilities∙ mysqlrplcheck
Checks if MySQL servers are ready to replicate∙ mysqlrplshow
Shows topology
Toolkits
61
∙ MySQL Utilities∙ mysqlrplcheck
Checks if MySQL servers are ready to replicate∙ mysqlrplshow
Shows topology∙ mysqlrplsync
Checks data consistency
Toolkits
61
∙ MySQL Utilities∙ mysqlrplcheck
Checks if MySQL servers are ready to replicate∙ mysqlrplshow
Shows topology∙ mysqlrplsync
Checks data consistency∙ mysqlslavetrx
Skips 1-N transactions
Toolkits
61
∙ MySQL Utilities∙ mysqldbcompare
Compares two databasesMariaDB-friendly
∙ mysqldiffChecks objects definitionsMariaDB-friendly
Toolkits
61
∙ MySQL Utilities∙ mysqldbcompare
Compares two databasesMariaDB-friendly
∙ mysqldiffChecks objects definitionsMariaDB-friendly
∙ mysqlserverinfoShows main options, such as port and datadirReplication-orientedMariaDB-friendly
Toolkits
61