Galera Cluster Best Practices for DBA's and DevOps Part 1

25
Galera Cluster Best Practices for DBAs and DevOps Philip Stoev Codership Oy

Transcript of Galera Cluster Best Practices for DBA's and DevOps Part 1

Page 1: Galera Cluster Best Practices for DBA's and DevOps Part 1

Galera Cluster Best Practicesfor DBAs and DevOps

Philip StoevCodership Oy

Page 2: Galera Cluster Best Practices for DBA's and DevOps Part 1

Agenda• A very quick overview of Galera Cluster• Ongoing monitoring of the cluster and detection of

bottlenecks• Backup strategies• Selecting the optimal State Snapshot Transfer (SST)

method

Page 3: Galera Cluster Best Practices for DBA's and DevOps Part 1

Galera Cluster OverviewSynchronous

– each transaction is immediately replicated on all nodes at commit– no stale slaves

Multi-Master– read from and write to any node– automatic transaction conflict detection

Replication– a copy of the entire dataset is available on all nodes– new nodes can join automatically

For MySQL– based on a modified version of MySQL (5.5, 5.6 with 5.7 coming up)– InnoDB storage engine

Page 4: Galera Cluster Best Practices for DBA's and DevOps Part 1

And more …• Recovers from node failures within seconds• Data consistency protections

– avoids reading stale data– prevents unsafe data modifications

• Cloud and WAN support

Page 5: Galera Cluster Best Practices for DBA's and DevOps Part 1

Monitoring Galera Cluster

Page 6: Galera Cluster Best Practices for DBA's and DevOps Part 1

General Principles of Monitoring• Galera Cluster is first and foremost MySQL + InnoDB:

– all traditional advice still applies (e.g. slow query log)– InnoDB still does most of the heavy lifting (e.g. I/O)

• Galera reports metrics via SHOW GLOBAL STATUS– FLUSH STATUS resets counters

• Galera reports events via the MySQL server error log– good idea to check you logs or log rotation scripts

• Monitor each node separately for maximum visibility

Page 7: Galera Cluster Best Practices for DBA's and DevOps Part 1

Monitoring the NetworkEspecially in WAN or complex network environments:• monitor the health of the network using a third-party tool:

– round-trip (ping) times– bandwidth utilization

• monitor each network link between any two segments separately– Galera assumes all links perform equally well

Page 8: Galera Cluster Best Practices for DBA's and DevOps Part 1

Detecting Issues With Cluster Health• SHOW GLOBAL STATUS variables:

MySQL [test]> show status like '%wsrep%';+------------------------------+-------------------------------------------+| Variable_name | Value |+------------------------------+-------------------------------------------+| wsrep_ready | ON || wsrep_cluster_status | Primary (or non-Primary) || wsrep_local_state_comment | Synced (or Donor/Desynced, Joiner) |

| wsrep_cluster_size | 3 |+------------------------------+-------------------------------------------+

Page 9: Galera Cluster Best Practices for DBA's and DevOps Part 1

Detecting Galera-Specific BottlenecksMySQL [test]> show global status like '%wsrep%';+------------------------------+--------------------------------------------+| Variable_name | Value |+------------------------------+--------------------------------------------+| wsrep_flow_control_paused | 0.100000 || wsrep_flow_control_sent | 123 || wsrep_local_bf_aborts | 77 || wsrep_local_cert_failures | 24 |+------------------------------+--------------------------------------------+

In case of BF aborts or cert failures:• try increasing wsrep_retry_autocommit• move conflicting workload to just one node

Page 10: Galera Cluster Best Practices for DBA's and DevOps Part 1

Detecting Galera-Specific Bottlenecks #2• SHOW PROCESSLIST:

MySQL [test]> show processlist;+----+------+-----------+------+---------+------+----------+---------------------------+| Id | User | Host | db | Command | Time | State | Info |+----+------+-----------+------+---------+------+----------+---------------------------+...| 5 | root | localhost | test | Query | 10 | query end| INSERT INTO t1 VALUES (1) |...3 rows in set (0.05 sec)

Page 11: Galera Cluster Best Practices for DBA's and DevOps Part 1

Backups

Page 12: Galera Cluster Best Practices for DBA's and DevOps Part 1

Backups• Multiple nodes do not remove the need for backups

– backups also help in case of human error• All nodes in Galera have the same information, so

backing up one node is sufficient• Galera does not use the binary log files, so generate

and back those up only if you have a specific need• Practicing the restore operation is highly recommended

Page 13: Galera Cluster Best Practices for DBA's and DevOps Part 1

Performance Considerations• Backups slow down the node being backed up

– which in turn can cause the entire cluster to slow down• or reduce the number of available nodes in the cluster

– can affect quorum calculations• you can have an async slave and take backups from it

– Just do not let it fall far behind the master cluster

Page 14: Galera Cluster Best Practices for DBA's and DevOps Part 1

XtraBackupNon-blocking backup for InnoDB• --galera-info option records the precise position

when the backup was taken in a file named xtrabackup_galera_info

• a short lock is still taken

Page 15: Galera Cluster Best Practices for DBA's and DevOps Part 1

Backup Procedure with a Dedicated Node• Make sure backup node will get no write traffic

– make sure all existing sessions are disconnected• Temporarily disconnect a node from the cluster

– SET GLOBAL wsrep_provider = ‘none’• Perform backup• Restore the value of wsrep_provider; check for wsrep_ready=ON

Page 16: Galera Cluster Best Practices for DBA's and DevOps Part 1

Restoring Single Node from BackupOption #1:

– wipe the data directory clean and restart– let it rejoin the cluster via SST– the operation will involve a donating node as well

Option #2:– use a backup that was recently taken– wipe the data directory clean and restore data– put the information from xtrabackup_galera_info into grastate.dat

# GALERA saved stateversion: 2.1uuid: 021a77ed-80cf-11e6-9e8e-6249ad0d3a57seqno: 1234cert_index:

– restart node and it will catch up via IST

Page 17: Galera Cluster Best Practices for DBA's and DevOps Part 1

Restoring Entire Cluster #1Note: a new logical cluster is being createdEasiest approach:

1. Restore data on one node and restart first node with –wsrep-new-cluster

2. Start one more node. Node rejoins via SST, with first node as donor

3. Restart two more nodes, first two nodes serve as donors, and so forth.

Page 18: Galera Cluster Best Practices for DBA's and DevOps Part 1

Restoring Entire Cluster #2Optimized approach:1. Restore backup on first node, bootstrap 1-node

cluster. Prevent new transactions from being issued.2. Restore backup on other nodes and create

grastate.dat on them:# GALERA saved stateversion: 2.1uuid: <new-cluster-uuid>seqno: 0cert_index:

Page 19: Galera Cluster Best Practices for DBA's and DevOps Part 1

SST

Page 20: Galera Cluster Best Practices for DBA's and DevOps Part 1

General Principles• It is always best to avoid SST altogether by sizing

gcache.size appropriately• SST choice is determined by:

– availability of a dedicated donating node– encryption requirements (SST encryption is configured

separately from other cluster operations)– bandwidth utilization

Page 21: Galera Cluster Best Practices for DBA's and DevOps Part 1

SST Methods• Rsync (the default)

– causes the donor to block new transactions • XtraBackup

– Donor remains operational• Mysqldump

– logical, rather than physical database copy

Page 22: Galera Cluster Best Practices for DBA's and DevOps Part 1

Configuration without a Dedicated Donor• Use xtrabackup-v2 SST method

– requires the installation of XtraBackup– there is still performance impact on the donor– encryption, compression are available

Page 23: Galera Cluster Best Practices for DBA's and DevOps Part 1

Configuration with Dedicated Donor• rsync may be the best method to use:

– available on any machine– compression and partial file transfers are possible

• use wsrep_sst_donor to specify donor node to use:– takes a list of node names as configured with

wsrep_node_name– if the list terminates with a comma, other nodes can also be

used for donors if none of the nodes from the list are available• if using an arbitrator, consider switching it to a full

Galera node that can then act as a dedicated donor

Page 24: Galera Cluster Best Practices for DBA's and DevOps Part 1

Questions

• Please use the Question/Chat box in the GoToWebinar panel

• Ideas welcome for future webinars

Page 25: Galera Cluster Best Practices for DBA's and DevOps Part 1

Thank You

http://www.galeracluster.com

Discussion group:

[email protected]