Become a MySQL DBA - webinar series - slides: Which High Availability solution?
-
Upload
severalnines-ab -
Category
Technology
-
view
217 -
download
0
Transcript of Become a MySQL DBA - webinar series - slides: Which High Availability solution?
Copyright 2015 Severalnines AB
Designing HA for MySQL
July 28, 2015
Krzysztof Książek
Severalnines
1
Copyright 2015 Severalnines AB
! We want to help all non-DBA people who have to take care of MySQL infrastructure
! Discuss most common activities
! Share tips and good practicies
! If you missed, we’d like to encourage you to watch the replay of the “Deep Dive Into How to Monitor Galera Cluster” and “Deciding on a relevant backup solution"
! http://www.slideshare.net/Severalnines/videos
2
“Become a MySQL DBA” series
Copyright 2015 Severalnines AB
! HA - what it is?
! Caching layer
! HA solutions
! Proxy layer
! Common problems
3
Agenda
Copyright 2015 Severalnines AB
! It’s not enough to just build a database infrastructure, you have to keep it available
! Redundancy is your friend
! Automate as much of the failover process where possible
! Know your business requirements and then decide
! Higher availability = higher costs - you need to find a sweet spot
4
High Availability - what it is about?
Copyright 2015 Severalnines AB
! Reduce the load on the database
! Works as a buffer between the application and the database tier
! Gives you some HA features as it can serve the data even if database is not available
! It’s a must have for any larger application - cheaper than a database and easier to scale
6
Caching layer - why do I need it?
Copyright 2015 Severalnines AB
! Memcached, Redis, Couchbase, you name it
! Database access is expensive and should be avoided - therefore we have cache to handle reads
! Avoid cache miss storm
! Serve outdated data or wait for refresh if you can’t do that
! Refresh the cache by executing a query ONCE!
! If you can serve old results, you can partially hide issues
7
Read cache
Copyright 2015 Severalnines AB
! Do not write directly to the database, use persistent queue for that (Kinesis or Rabbitmq for example)
! Helps you to avoid overloading data tier with writes
! You can define exact number of workers to handle the writes
! Helps to minimize impact of the database tier not being available by caching writes
8
Write cache
Copyright 2015 Severalnines AB
! RAID 1 over the TCP
! Maintain an exact copy (or two) of your volume in a separate location
! Active - passive model, only one volume can be mounted
! Works nice if you have a single database node
10
Distributed Redundant Block Device
Copyright 2015 Severalnines AB
! Passive -> active switch takes time (InnoDB recovery is required)
! You can’t use the passive node for anything
! Just a single node, not feasible for larger environments
! Great tool but with limited use cases
11
Distributed Redundant Block Device
Copyright 2015 Severalnines AB
! Best known HA solution for MySQL
! You can use it for scaling too
! By default - asynchronous
! Failover process may be tricky without GTID
! Many tools are available to automate it, though
12
MySQL Replication
Copyright 2015 Severalnines AB
! Locate the most advanced slave, apply missing updates from master’s binary logs (if needed and possible)
! Ensure all slaves are on the same position for reslaving
! Reslave rest of the slaves to the chosen node
! Perform the failover
! Whole process is error-prone and tricky
! You can use MHA to manage it for you
13
MySQL Replication - failover
Copyright 2015 Severalnines AB
! Reslaving became much easier (CHANGE MASTER TO … MASTER_AUTO_POSITION=1)
! You still have to choose a most advanced slave to promote
! You still have to replay binary logs (if possible) to apply missing changes
! You still may benefit from additional tooling
14
MySQL Replication - GTID
Copyright 2015 Severalnines AB
! Handles the failover process for you
! Can be used as a standalone solution or a part of a grand schema (Pacemaker)
! masterha_manager being a SPOF - you need to monitor it
! Can work with GTID and regular replication
! Make sure that you have shutdown_script defined for STONITH in MHA config
15
MySQL Replication - MHA
Copyright 2015 Severalnines AB
! Synchronous cluster
! Based on NDB engine (not InnoDB!)
! Great point-select performance
! Great insert performance
! Using data partitioning for redundancy
! Behaves differently than InnoDB - especially range queries
! It’s not a drop-in replacement for regular MySQL
17
MySQL Cluster
Copyright 2015 Severalnines AB
! Virtually synchronous cluster (lag <1s, typically few ms)
! Doesn’t split the data, each node is a full copy
! Harder to scale as you can’t increase disk capacity by adding new nodes
! Easier to run reporting queries as all data is available on every node
18
Galera Cluster
Copyright 2015 Severalnines AB
! "almost" a drop-in replacement for regular MySQL (uses InnoDB engine)
! Different AUTO_INCREMENT handling
! All tables should have a primary key defined
! Basically, it’s InnoDB only (avoid MyISAM)
! Large transactions may be problematic or impossible
! Schema changes may become more complex
19
Galera Cluster
Copyright 2015 Severalnines AB
! It’s a set of servers that will work as a middle man between the application and the database layer
! They route the traffic to the database layer
! They should detect failed instances and topology changes
! It’s useful to hide the database layer complexity from the application
21
Proxy layer - why do we need it?
Copyright 2015 Severalnines AB
! Popular and proven tool
! Not database-aware, it just moves packets
! Can check if the port is available
! Can do HTTP tests - very useful to build a logic
22
HAProxy
Copyright 2015 Severalnines AB
! Use read_only variable to differentiate master and slaves
! Have a script that works in the background, checks the state of a node and store it in shared memory
! Have a script that will be executed via xinetd, check the state from shared memory and return HTTP codes (200, 503) accordingly
! Make sure that the read_only variable will be changed after the old master was stopped (by shutdown_script in MHA) - otherwise split brain may happen
23
HAProxy
Copyright 2015 Severalnines AB
! MySQL-aware proxy
! Read/write splitting
! Still a new software, requires detailed tests
! Tends to use significant amount of CPU for R/W split
! Updated frequently, you may want to follow the changes
24
MaxScale
Copyright 2015 Severalnines AB
! Proxies need to be highly available too
! Multiple options to choose from:
! Put a proxy in front of the proxy (ELB)
! VIP + failover (i.e. keepalived)
! Colocate proxies with web nodes and handle config changes via orchestration tools
25
HA for proxies
Copyright 2015 Severalnines AB
! Every approach has pros and cons
! ELB - easiest but available only in AWS (similar tools may be available for other cloud providers too)
! VIP - only one proxy node will be active at a given time - CPU utilization may become an issue
! Colocating proxies allows you to use more of them but maintaining configuration can become a burden and may be error prone
26
HA for proxies
Copyright 2015 Severalnines AB
! Errant transactions are transactions executed on the slave only, not on the master
! With GTID, all transactions executed on a given host will be requested once the host become a master
! If transactions are not available in binlogs, replication will break
! http://www.severalnines.com/blog/mysql-replication-and-gtid-based-failover-deep-dive-errant-transactions
28
Errant transactions in GTID
Copyright 2015 Severalnines AB
! Master loses network connection, failover is deemed necessary
! One of the slaves is staged to be a new master, others are reslaved
! VIP is assigned to the new master
! VIP is still assigned to the old master (as it’s not available and VIP can’t be removed)
! Once the old master comes up, data will be written to it
29
Split Brain
Copyright 2015 Severalnines AB
! STONITH is the solution - kill it with fire
! Ensure that the old master won’t come back up (think how your automated recovery will behave?)
! Start with dedicated network connection (if available) - use a patchcord. Maybe it’s a network error only?
! Try to use IPMI/iLO/KVM-ish solutions to stop the server
! Try to stop the server using manageable power strip
30
Split Brain - STONITH
Copyright 2015 Severalnines AB
! In the cloud - try to stop the instance completely
! You can also try to bond your interfaces for better network availability (although you never know what’s on the other side of the black box)
! For MHA - STONITH process is executed as shutdown_script. For other tools - ensure you have implemented similar behavior
! This is relevant for MySQL replication - Galera doesn’t require such harsh methods
31
Split Brain - STONITH
32
Synchro-nous
Load-balancing
reads
Load-balancing
writesWAN Scalable
MySQL/InnoDB
compatible
DRBD yes no noyes
(standbyDR site)
no yes
MySQL replicati
on
no (async or
semi sync)
yes no yes only reads yes
MySQL Cluster yes yes yes yes yes
expect different behavior
Galera virtually yes yes yesreads,
writes to some
extend
almost
Copyright 2015 Severalnines AB
! More blogs in “Become a MySQL DBA” series:
! http://www.severalnines.com/blog/become-dba-blog-series-monitoring-and-trending
! http://www.severalnines.com/blog/become-mysql-dba-blog-series-backup-restore
! Contact: [email protected]
33
Thank You!