Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for MySQL / MariaDB
-
Upload
severalnines -
Category
Internet
-
view
603 -
download
3
Transcript of Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for MySQL / MariaDB
Confidential 1
Your host & some logistics
¤ I'm Jean-Jérôme from the Severalnines Team and I'm your host for today's webinar!
¤ Feel free to ask any questions in the Questions section of this application or via the Chat box.
¤ You can also contact me directly via the chat box or via email: [email protected] during or after the webinar.
Copyright 2016 Severalnines AB
2
About Severalnines and ClusterControl
Confidential 3
What we do
Manage Scale
Monitor Deploy
Copyright 2016 Severalnines AB
4
ClusterControl Automation & Management
¤ Provisioning
¤ Deploy a cluster in minutes
¤ On-premises or in the cloud (AWS)
¤ Monitoring
¤ Systems view
¤ 1sec resolution
¤ DB / OS stats & performance advisors
¤ Configurable dashboards
¤ Query Analyzer
¤ Real-time / historical
Management Multi cluster/data-center
Automate repair/recovery
Database upgrades
Backups
Configuration management
Cloning
One-click scaling
Confidential 5
Supported Databases
Confidential 6
Customers
Confidential
Agenda ¤ #1: 101 Sanity Check
¤ #2: Operating System
¤ #3: Backup Strategies
¤ #4: Replication & Sync
¤ #5: Query Performance
¤ #6: Schema Changes
¤ #7: Security / Encryption
¤ #8: Reporting
¤ #9: Managing from disaster
¤ Q&A
7
Copyright Severalnines AB
Confidential
#1: 101 Sanity Check
Confidential
101 Sanity Check ¤ Ensure ALL tables are in the correct storage engine
¤ MySQL: InnoDB or XtraDB ¤ Innodb supports FULLTEXT indexes in MySQL 5.6
¤ MYISAM tables - don’t use
¤ Disabled/forbidden support in Percona XtraDB 5.7 ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK
with pxc_strict_mode = ENFORCING
¤ Ensure ALL tables have a PRIMARY KEY ¤ If no PRIMARY KEY is defined: add one!
¤ Ensure you have NO unbound queries ¤ E.g UPDATE <table> SET x=x+1 (and there are many rows)
¤ Update/delete in smaller batches (e.g 1000 records).
¤ UPDATE <table> SET x=x+1 LIMIT 1000
Confidential
101 Sanity Check
¤ Ensure that the application can tolerate non-sequential auto increments. ¤ Galera manages the autoincrements.
¤ Redirect deadlock prone update queries on hot tables and rows to one of the Galera nodes:
¤ E.g UPDATE counter_tbl SET counter = counter +1;
¤ http://www.severalnines.com/blog/avoiding-deadlocks-galera- set-haproxy-single-node-writes-and-multi-node-reads
¤ Ensure your application does not use LOCK TABLES
¤ Use wsrep_sst_method=xtrabackup-v2
Confidential
101 Sanity (WAN replication)
¤ Galera ¤ Increase timeouts
wsrep_provider_options=”evs.keepalive_period=PT3S;
evs.inactive_check_period=PT10S;
evs.suspect_timeout=PT30S;
evs.inactive_timeout=PT1M;
evs.install_timeout=PT1M;
evs.send_window=1024;
evs.user_send_window=512”;
¤ This will relax how fast a node will be evicted from the cluster.
¤ Usually default values are good if networks with a ping time of <10-15 ms
Confidential
#2: Operating System
Confidential
Operating System
¤ Swapping ¤ echo “1” > /proc/sys/vm/swappiness
¤ NUMA on Multi-socket ¤ Lead to contention and strange lock ups, but has been
mostly resolved nowadays
¤ Is it enabled: ¤ dmesg|grep–inuma
¤ Grub boot option ”numa=off”
¤ … and other possibilities
¤ Filesystem ¤ Reduce writes by mounting with noatime¤ Check/etc/mtab
Confidential
Operating System
¤ In virtualized environments it is easy to over-commit resources on a single host.
¤ Keep track of the host hosting the VMs ¤ Is it heavily loaded?
¤ CPU Steal (check on the VMs)?
¤ Is it swapping?
¤ Be prepared to kill off slow nodes
Confidential
#3: Backup
Confidential
Backup
¤ Logical backups ¤ mysqldump
¤ Physical backups ¤ Percona XtraBackup
¤ Full / incremental backups
¤ Streaming backups
¤ Parallelism, compression and encryption
¤ Filesystem snapshots
¤ S3 / Glacier or Swift can be used for offline/offsite storage
Confidential
Backup
¤ Implement a Backup Policy ¤ Full backup every night
¤ Incremental every 4 hours
¤ Enable Binary Logging
¤ PITR recovery!
Confidential
#4: Replication and Sync
Confidential
Replication and Sync
¤ Galera: IST vs SST ¤ IST (Incremental State Transfer) is (mostly) quicker
¤ Uses gcache to retrieve incremental state
¤ Avoid SST (Snapshot State Transfer) over WAN
¤ SST is triggered if the IST can’t use the gcache
Confidential
Replication and Sync
¤ Galera SST ¤ Ensure you are using a non-blocking SST method
¤ wsrep_sst_method=xtrabackup-v2
¤ Use other more optimal ways to synk larger DBs, e.g. Snapshots ¤ Or a recent backup stored on the node or a disk
attached.
Confidential
Replication and Sync
¤ Dimension the gcache, example to handle a maintenance window of 6 hours: ¤ Writes to cluster per second: 1MB/s ¤ Maintenance window (seconds) = 6 hours *60*60 = 21600s ¤ gcache size = 1 MB/s x 21600 s = 21GB ¤ 1.5x or 2x the value to have margins: ¤ gcache.size=42G ¤ wsrep_provider_options=‘gcache.size=42G’;
Confidential
#5: Query Performance
Confidential
Query Performance
¤ A number of things to watch out for: ¤ Badly written queries or missing indexes
¤ DDL locking many records BEGIN; SELECT * FROM t1 FOR UPDATE; … LOCK TABLES .. ; /* do something */ ; UNLOCK TABLES;
¤ DDL updating/deleting many records in one chunk
¤ Update/delete “small” batches of 1000-10000 records. Do not update 100000 records.
¤ Deadlocks and deadlock prone code
¤ E.g running two mysqldumps at the same time
¤ Updating the very same record in a very hot table from multiple threads on multiple hosts
¤ Use your favorite tool to detect the problems
Confidential
Query Performance
¤ When performance grinds to a halt you want to know!
Confidential
Query Performance
¤ You want to be warned about any slow downs
Confidential
Query Performance
¤ If a deadlock happens, have something your devs can look at
Confidential
Query Performance
¤ And see if there is any overflow of queries happening
Confidential
#6: Schema Changes
Confidential
Schema Changes
¤ Make a plan on how to deal with schema changes
¤ MySQL replication and Galera apply DDL changes differently!
¤ Compatible or In-compatible schema change?
¤ Naturally you have a test cluster to make sure your plan sticks.
Confidential
Schema Changes
¤ Online schema change tools for MySQL: ¤ Facebook OSC ¤ Percona OSC ¤ Github Gh-ost
Confidential
Schema Changes
¤ MySQL Galera ¤ TOI (Total Order Isolation) is the default
¤ Executed on all nodes at the same time
¤ Works fine for non-copying ALTER TABLEs, otherwise is locking
¤ Only on TINY tables (1000 records)
¤ If it takes 1 sec your app will be blocked for 1 sec.
¤ RSU (Rolling Schema Update)
¤ DDL is not replicated, so only executed locally
¤ Changes must be compatible with queries executed on the other nodes
¤ For each node do : SET GLOBAL wsrep_OSU_method=RSU; ALTER TABLE …
Confidential
#7: Security / Encryption
Confidential
Security / Encryption
¤ Enable SSL client-server encryption ¤ MySQL protocols can be sniffed
¤ Encrypt replication links using SSL ¤ WAN Connections
¤ MySQL Galera
Confidential
#8: Reporting
Confidential
Reporting
¤ Try to separate OLTP and OLAP if possible ¤ Run reports off an (async) slave/secondary or dedicated
node
¤ Remember: huge queries eat CPU, RAM and DISK.
¤ Galera is not faster than its slowest node.
¤ Watch out for reports with side effects ¤ Large updates writing back?
Confidential
#9: Protecting from Disasters
Confidential
Protecting from Disaster ¤ Eventually a disaster will happen
¤ Software bugs
¤ Network / router upgrades
¤ Availability Zone / DC down
¤ Schema / software / hardware upgrade going wrong
¤ Too many connections
¤ User Errors
Confidential
Protecting from Disasters (Galera)
¤ One way of protecting from cluster failures is to use an asynchronous slave replicating from the Galera cluster.
¤ If the cluster would fail, the asynchronous slave could take over and handle the application workload until the cluster error has been resolved.
Confidential
Protecting from Disasters
¤ Using GTIDs (available from MySQL 5.6 and MariaDB 10.1* onwards) allows for easy fail-over from MASTER1 to MASTER2: ¤ slave> CHANGE MASTER TO MASTER_HOST=’MASTER2’,
MASTER_AUTO_POSITION=1; START SLAVE;
*) - Yes, MariaDb 10.0 has GTID support also, but it is not integrated with Galera.
Confidential
Protecting from Disasters
¤ A common problem is overload situations, which can originate from:
¤ DDOS
¤ Website is loading slow, user reload, creating more and more connections
¤ Eventually the database server runs out of connections (max_connections)
¤ Throttle connections with a load balancer! ¤ E.g HAProxy, ProxySQL, etc.
¤ Cache rarely changing data! ¤ Redis
¤ Memcached
Confidential
Protecting from Disasters
¤ Limit the # of backend connections
¤ HAProxy will queue the requests
Confidential
READY.FOR.PRODUCTION
Confidential
Q&A
Copyright Severalnines AB
43
Confidential
Thank You! ¤ ClusterControl
www.severalnines.com/product/clustercontrol
¤ ClusterControl – Getting Starte www.severalnines.com/getting-started
¤ Polyglot Persistence meetups http://goo.gl/64Ga5z
¤ Severalnines Blog www.severalnines.com/blog
¤ Contact: [email protected]
44