Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ......
Transcript of Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ......
![Page 1: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/1.jpg)
Resolving and Preventing MySQL DowntimeCommon MySQL service impacting challenges, resolutions and prevention.
Jervin Real
![Page 2: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/2.jpg)
2
Jervin Real
• Technical Services Manager – APAC
• Engineer Engineering Engineers
![Page 3: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/3.jpg)
3
What is Downtime?
Application
Users
BOSS
![Page 4: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/4.jpg)
4
Why Prevent Downtime?
• Your business loses money when the application is down
• You and your team’s reputation suffers
![Page 5: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/5.jpg)
5
Agenda
• Real world adventures• Problems
• Solutions
• Prevention
• Putting them all together
![Page 6: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/6.jpg)
6
I Had a Crash on You
![Page 7: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/7.jpg)
7
I Had a Crash on You: Page Corruption
![Page 8: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/8.jpg)
8
I Had a Crash on You: Page Corruption
• Disk bad sectors problem
• No monitoring, checks
• Page corruption on disk level, crashes when reading page from disk
• … and it keeps crashing
![Page 9: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/9.jpg)
9
I Had a Crash on You: Page Corruption
• Percona Server, we tried:• innodb_table_corrupt_action = salvage
• Worked!
• Dropped table, recreated - application back online
• Worst case:• innodb_force_recovery > 0
• Data Recovery
![Page 10: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/10.jpg)
10
I Had a Crash on You: Assertion
• Running 5.6.11, early adopter, InnoDB FULLTEXT
• Upgrade to 5.6.18, MySQL crashed
• Data was unusable - bug#72079
![Page 11: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/11.jpg)
11
I Had a Crash on You: Assertion
• Downgrade and restore from backup
• Re-execute upgrade to avoid the bug
![Page 12: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/12.jpg)
12
I Had a Crash on You: Assertion
• innodb_corrupt_table_action = salvage|warn
• pt-table-checksum• Regularly recurse your data and check for errors in error log
• RAID card health checks• Can vary by vendor
• SMART checks• Be vigilant for disk level errors
• Plan your upgrades properly
![Page 13: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/13.jpg)
13
Nobody’s Watching
![Page 14: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/14.jpg)
14
Nobody’s Watching: Nobody Cared
• Percona XtraDB Cluster, 3 nodes
• Few months ago node 3 went down due to conflict, but nobody noticed
• Few hours ago, node 2 was killed by OOM, cluster lost quorum
• EVERYBODY NOTICED!
![Page 15: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/15.jpg)
15
Nobody’s Watching: Nobody Cared
• Bootstrap remaining node• mysql> SET GLOBAL wsrep_provider_options=’pc.bootstrap=1’;
• SST second and 3rd node
• Define wsrep_notify_cmd temporarily
• Implement better alerting
![Page 16: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/16.jpg)
16
Nobody’s Watching: Dropped the Bomb
• New sysadmin received disk space alert
• du -hx --max-depth=1 /
• /var has lots of data
• find /var/ -size +5G -exec rm -rf {} \;
• Bam, ibdata1gone!
• Restart maintenance occurred later in the day ...
![Page 17: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/17.jpg)
17
Nobody’s Watching: Dropped the Bomb
• Restore from backup
• Really, they were lucky!
• What if there were no backups and innodb_file_per_table = 0?
![Page 18: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/18.jpg)
18
Nobody’s Watching: Prevention
• Percona Monitoring Plugins• pmp-check-deleted-files
• pmp-check-mysql-status
• pmp-check-mysql-innodb
• Define a script executable by mysql user• Triggered on node state changes
• Take backups, and alert on failure• https://github.com/dotmanila/pyxbackup
![Page 19: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/19.jpg)
19
Self Induced Pain
![Page 20: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/20.jpg)
20
Self Induced Pain: Query Cache Lock
• “Waiting for query cache lock”
![Page 21: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/21.jpg)
21
Self Induced Pain: Query Cache Lock
• Global mutex, point of contention• Moreso on hot dataset/table
• Worse, with large QC
![Page 22: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/22.jpg)
22
Self Induced Pain: Query Cache Lock
• Set it to small size - to reduce performance overhead
• Disable completely to to avoid contention
• Hint offending queries to skip the query cache i.e. SELECT SQL_NO_CACHE
![Page 23: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/23.jpg)
23
Self Induced Pain: Buffer Pool Dump/Restore
• Dumps buffer pool page list to disk
• Reloads buffer pool based on this list at startup
• Meant to help speed up buffer pool warmup
![Page 24: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/24.jpg)
24
Self Induced Pain: Buffer Pool Dump/Restore
• Maintenance restart, buffer dump and restore enabled
• Yey! Expecting everything to go well.
• 30mins in performance still really bad, IO trashing
• Large buffer pool, busy read/write
![Page 25: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/25.jpg)
25
Self Induced Pain: Buffer Pool Dump/Restore
• Extend your maintenance period to let the server warmup if possible, otherwise they will contend on IO
• RAID1 of 2 SATA disks is not a license to use buffer pool warmup on 240GB of buffer pool
![Page 26: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/26.jpg)
26
Self Induced Pain: Prevention
• Percona Toolkit• pt-sift
• pt-stalk
• pt-kill
• Optimize for IO
![Page 27: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/27.jpg)
27
MySQL, MySQL! What Have Suffereth Ye Thee?
![Page 28: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/28.jpg)
28
MySQL, MySQL! What Have Suffereth Ye Thee?: Grind to a Halt
• Slow queries
• Connections build up
• Slow response times
• Long running transactions
• Stop the world scenario
![Page 29: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/29.jpg)
29
MySQL, MySQL! What Have Suffereth Ye Thee?: Grind to a Halt
--innodb--txns: 486xACTIVE (28s) 994xnot (0s) 227xLOCK WAIT (25844s)0 queries inside InnoDB, 0 queries in queueMain thread: sleeping, pending reads 0, writes 28, flush 1Log: lsn = 2147483647, chkp = 2147483647, chkp age = 210625191
![Page 30: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/30.jpg)
30
MySQL, MySQL! What Have Suffereth Ye Thee?: Grind to a Halt
---TRANSACTION 230207990, ACTIVE 13779 sec fetching rowsmysql tables in use 1, locked 180337 lock struct(s), heap size 8271400, 10979242 row lock(s)MySQL thread id 671621, OS thread handle 0x7fe03528a700, query id 37505085 localhost magento Sending data
SELECT `sales_flat_quote_item`.* FROM `sales_flat_quote_item` LIMIT 376 OFFSET 491056
![Page 31: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/31.jpg)
31
MySQL, MySQL! What Have Suffereth Ye Thee?: Grind to a Halt
• Kill long running trx
• pt-kill for persistent long running trx
• Deploy immediate code changes to disable erring code
![Page 32: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/32.jpg)
32
MySQL, MySQL! What Have Suffereth Ye Thee?: CPU Load
• MySQL is still responding
• All sorts of mutexes• trx_sys->mutex• block->lock• lock_sys->mutex• lock_sys->wait_mutex
• … and is killing latency
• Service impact means lost income
![Page 33: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/33.jpg)
33
MySQL, MySQL! What Have Suffereth Ye Thee?: CPU Load
• innodb_thread_concurrency > 0
![Page 34: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/34.jpg)
34
MySQL, MySQL! What Have Suffereth Ye Thee?: Prevention• pt-kill –log
• Separate your OLTP from analytics if possible
• Proactive analysis on performance and queries
• pt-query-digest (PMM)
• pt-stalk
• MySQL Server Configuration
• Remember to tune innodb_thread_ concurrency (default is 0)
• Innodb_concurrency_tickets , innodb_sync_spin_loops, etc
• Application Stack Configuration (Schema Design)
• Single tenant per schema
• Multiple tenants per schema (each table has client_id column)
• All tenants in one schema
![Page 35: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/35.jpg)
35
Wizard of OS: Disk Performance
![Page 36: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/36.jpg)
36
Wizard of OS: Disk Performance
• Disk performance cascading to MySQL to application
![Page 37: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/37.jpg)
37
Wizard of OS: Disk Performance
• Slow writes, binlogs, redo logs, syncs
• Transactions stalling on COMMIT, updating, inserting …
• Replication getting delayed if node is a slave
• Translates to latency
![Page 38: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/38.jpg)
38
Wizard of OS: Disk Performance
• RAID Controller in Write-Through
• Could also be bad disk
• Default IO elevator – deadline|noop
• Bad mount options - +noatime
• NFS?
![Page 39: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/39.jpg)
39
Wizard of OS: Swapping
• Swapping heavily, with significant amount of RAM free
![Page 40: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/40.jpg)
40
Wizard of OS: Swapping
• Swapping induces significant amount of IO
• Swapping in and out of disk is mighty expensive
• Affects MySQL in magnificent ways
• Swap insanity!
![Page 41: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/41.jpg)
41
Wizard of OS: Swapping
• NUMA Interleave
• Percona Server is NUMA configurable• numa_interleave
• flush_caches
• Check numastat - perl check_numa.pl
![Page 42: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/42.jpg)
42
Wizard of OS: Prevention
• Tune• NUMA Policy
• vm.swappiness (always have swap space)
• mount options - noatime
• Disk scheduler/IO Elevator – noop|deadline
• Blog: Linux performance tuning tips for MySQL
• Blog: InnoDB performance optimization basics (redux)
![Page 43: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/43.jpg)
43
Summary
• Be proactive and analyze your performance regularly• https://www.percona.com/blog/2016/04/18/percona-monitoring-and-management/
• Monitor, monitor wisely
• Test, tune, repeat
• Plan and plan more
![Page 44: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/44.jpg)
44
Join us at Percona Live Europe
When: October 3-5, 2016Where: Amsterdam, Netherlands
The Percona Live Open Source Database Conference is a great event for users of any level using open source database technologies.
§ Get briefed on the hottest topics
§ Learn about building and maintaining high-performing deployments
§ Listen to technical experts and top industry leaders
Get the advanced registration rate before prices go up on Sep 5th! Register now!
Sponsorship opportunities available as well here.
![Page 45: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/45.jpg)
45
Questions?
![Page 46: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference](https://reader033.fdocuments.us/reader033/viewer/2022051509/5afaf00f7f8b9a2d5d8eec00/html5/thumbnails/46.jpg)
DATABASE PERFORMANCE MATTERS