Become a MySQL DBA - webinar series - slides: Which High Availability solution?

Copyright 2015 Severalnines AB

Designing HA for MySQL

July 28, 2015

Krzysztof Książek

Severalnines

[email protected]

1

mailto:[email protected]


! We want to help all non-DBA people who have to take care of MySQL infrastructure

! Discuss most common activities

! Share tips and good practicies

! If you missed, we’d like to encourage you to watch the replay of the “Deep Dive Into How to Monitor Galera Cluster” and “Deciding on a relevant backup solution"

! http://www.slideshare.net/Severalnines/videos

2

“Become a MySQL DBA” series

http://www.slideshare.net/Severalnines/videos


! HA - what it is?

! Caching layer

! HA solutions

! Proxy layer

! Common problems

3

Agenda


! It’s not enough to just build a database infrastructure, you have to keep it available

! Redundancy is your friend

! Automate as much of the failover process where possible

! Know your business requirements and then decide

! Higher availability = higher costs - you need to find a sweet spot

4

High Availability - what it is about?


Caching layer

5


! Reduce the load on the database

! Works as a buffer between the application and the database tier

! Gives you some HA features as it can serve the data even if database is not available

! It’s a must have for any larger application - cheaper than a database and easier to scale

6

Caching layer - why do I need it?


! Memcached, Redis, Couchbase, you name it

! Database access is expensive and should be avoided - therefore we have cache to handle reads

! Avoid cache miss storm

! Serve outdated data or wait for refresh if you can’t do that

! Refresh the cache by executing a query ONCE!

! If you can serve old results, you can partially hide issues

7

Read cache


! Do not write directly to the database, use persistent queue for that (Kinesis or Rabbitmq for example)

! Helps you to avoid overloading data tier with writes

! You can define exact number of workers to handle the writes

! Helps to minimize impact of the database tier not being available by caching writes

8

Write cache


High Availability Solutions

9


! RAID 1 over the TCP

! Maintain an exact copy (or two) of your volume in a separate location

! Active - passive model, only one volume can be mounted

! Works nice if you have a single database node

10

Distributed Redundant Block Device


! Passive -> active switch takes time (InnoDB recovery is required)

! You can’t use the passive node for anything

! Just a single node, not feasible for larger environments

! Great tool but with limited use cases

11

Distributed Redundant Block Device


! Best known HA solution for MySQL

! You can use it for scaling too

! By default - asynchronous

! Failover process may be tricky without GTID

! Many tools are available to automate it, though

12

MySQL Replication


! Locate the most advanced slave, apply missing updates from master’s binary logs (if needed and possible)

! Ensure all slaves are on the same position for reslaving

! Reslave rest of the slaves to the chosen node

! Perform the failover

! Whole process is error-prone and tricky

! You can use MHA to manage it for you

13

MySQL Replication - failover


! Reslaving became much easier (CHANGE MASTER TO … MASTER_AUTO_POSITION=1)

! You still have to choose a most advanced slave to promote

! You still have to replay binary logs (if possible) to apply missing changes

! You still may benefit from additional tooling

14

MySQL Replication - GTID


! Handles the failover process for you

! Can be used as a standalone solution or a part of a grand schema (Pacemaker)

! masterha_manager being a SPOF - you need to monitor it

! Can work with GTID and regular replication

! Make sure that you have shutdown_script defined for STONITH in MHA config

15

MySQL Replication - MHA


Clustering

16


! Synchronous cluster

! Based on NDB engine (not InnoDB!)

! Great point-select performance

! Great insert performance

! Using data partitioning for redundancy

! Behaves differently than InnoDB - especially range queries

! It’s not a drop-in replacement for regular MySQL

17

MySQL Cluster


! Virtually synchronous cluster (lag <1s, typically few ms)

! Doesn’t split the data, each node is a full copy

! Harder to scale as you can’t increase disk capacity by adding new nodes

! Easier to run reporting queries as all data is available on every node

18

Galera Cluster


! "almost" a drop-in replacement for regular MySQL (uses InnoDB engine)

! Different AUTO_INCREMENT handling

! All tables should have a primary key defined

! Basically, it’s InnoDB only (avoid MyISAM)

! Large transactions may be problematic or impossible

! Schema changes may become more complex

19

Galera Cluster


Proxy layer

20


! It’s a set of servers that will work as a middle man between the application and the database layer

! They route the traffic to the database layer

! They should detect failed instances and topology changes

! It’s useful to hide the database layer complexity from the application

21

Proxy layer - why do we need it?


! Popular and proven tool

! Not database-aware, it just moves packets

! Can check if the port is available

! Can do HTTP tests - very useful to build a logic

22

HAProxy


! Use read_only variable to differentiate master and slaves

! Have a script that works in the background, checks the state of a node and store it in shared memory

! Have a script that will be executed via xinetd, check the state from shared memory and return HTTP codes (200, 503) accordingly

! Make sure that the read_only variable will be changed after the old master was stopped (by shutdown_script in MHA) - otherwise split brain may happen

23

HAProxy


! MySQL-aware proxy

! Read/write splitting

! Still a new software, requires detailed tests

! Tends to use significant amount of CPU for R/W split

! Updated frequently, you may want to follow the changes

24

MaxScale


! Proxies need to be highly available too

! Multiple options to choose from:

! Put a proxy in front of the proxy (ELB)

! VIP + failover (i.e. keepalived)

! Colocate proxies with web nodes and handle config changes via orchestration tools

25

HA for proxies


! Every approach has pros and cons

! ELB - easiest but available only in AWS (similar tools may be available for other cloud providers too)

! VIP - only one proxy node will be active at a given time - CPU utilization may become an issue

! Colocating proxies allows you to use more of them but maintaining configuration can become a burden and may be error prone

26

HA for proxies


Common problems

27


! Errant transactions are transactions executed on the slave only, not on the master

! With GTID, all transactions executed on a given host will be requested once the host become a master

! If transactions are not available in binlogs, replication will break

! http://www.severalnines.com/blog/mysql-replication-and-gtid-based-failover-deep-dive-errant-transactions

28

Errant transactions in GTID

http://www.severalnines.com/blog/mysql-replication-and-gtid-based-failover-deep-dive-errant-transactions


! Master loses network connection, failover is deemed necessary

! One of the slaves is staged to be a new master, others are reslaved

! VIP is assigned to the new master

! VIP is still assigned to the old master (as it’s not available and VIP can’t be removed)

! Once the old master comes up, data will be written to it

29

Split Brain


! STONITH is the solution - kill it with fire

! Ensure that the old master won’t come back up (think how your automated recovery will behave?)

! Start with dedicated network connection (if available) - use a patchcord. Maybe it’s a network error only?

! Try to use IPMI/iLO/KVM-ish solutions to stop the server

! Try to stop the server using manageable power strip

30

Split Brain - STONITH


! In the cloud - try to stop the instance completely

! You can also try to bond your interfaces for better network availability (although you never know what’s on the other side of the black box)

! For MHA - STONITH process is executed as shutdown_script. For other tools - ensure you have implemented similar behavior

! This is relevant for MySQL replication - Galera doesn’t require such harsh methods

31

Split Brain - STONITH

32

Synchro-nous

Load-balancing

reads

Load-balancing

writesWAN Scalable

MySQL/InnoDB

compatible

DRBD yes no noyes

(standbyDR site)

no yes

MySQL replicati

on

no (async or

semi sync)

yes no yes only reads yes

MySQL Cluster yes yes yes yes yes

expect different behavior

Galera virtually yes yes yesreads,

writes to some

extend

almost


! More blogs in “Become a MySQL DBA” series:

! http://www.severalnines.com/blog/become-dba-blog-series-monitoring-and-trending

! http://www.severalnines.com/blog/become-mysql-dba-blog-series-backup-restore

! Contact: [email protected]

33

Thank You!

http://www.severalnines.com/blog/become-dba-blog-series-monitoring-and-trending

http://www.severalnines.com/blog/become-mysql-dba-blog-series-backup-restore

mailto:[email protected]

Become a MySQL DBA - webinar series - slides: Which High Availability solution?

Technology

Transcript of Become a MySQL DBA - webinar series - slides: Which High Availability solution?