Monitoring MySQL - blog.koehntopp.deblog.koehntopp.de/uploads/monitoring_mysql_slides_en.pdf ·...

Post on 12-Sep-2018

227 views 0 download

Transcript of Monitoring MySQL - blog.koehntopp.deblog.koehntopp.de/uploads/monitoring_mysql_slides_en.pdf ·...

Monitoring MySQL

Kristian Köhntopp

Mittwoch, 28. Oktober 2009

I am...

• Kristian Köhntopp

• Database architecture at a small travel agency in Amsterdam

• In previous lives: MySQL, web.de, NetUSE, MMC Kiel, PHP, PHPLIB, various FAQs and Howto

Mittwoch, 28. Oktober 2009

You are• Job…

• DBA, Developer, General IT, IT Management

• Using version...

• 3.23, 4.0, 4.1, 5.0, 5.1

• Using MySQL for...

• Webapps, Enterprise, Embedded, ...

• Number of servers...

• 1, <3, <10, <25, <100, 100 or more

Mittwoch, 28. Oktober 2009

Why Monitoring?

• Audience

• Consumers of monitoring data

• Metrics

• What kind of data necessary?

• Toolbox

• Which kind of tool to use?

Mittwoch, 28. Oktober 2009

Why Monitoring?

• Who requires monitoring?

• Operations (Incident)

• Infrastructure Development (Capacity)

• Feature Development (Debug)

• Compliance (SLA, Legal)

Mittwoch, 28. Oktober 2009

Why Monitoring?

• Each kind of monitoring has different

• Purpose

• Metric

• Raw data, Notification latency

• Deliverable

• HA requirements

Mittwoch, 28. Oktober 2009

Incident detection

• Purpose: “Are we still online?”

• Metric: “High level availability test w/ binary outcome”

• Deliverable: Ticket to Helpdesk ➜ Operating ➜ Incident Management

• Latency: Seconds

• HA Requirement: Minutes, high

Mittwoch, 28. Oktober 2009

Capacity planning• Purpose: “When can I guarantee server

overload?” (Negative SLA)

• Metric: detailed records of variables in all subsystems

• Deliverable: weekly/monthly report to IT Management, general Management

• Latency: days

• HA Requirement: days/lowMittwoch, 28. Oktober 2009

Debugging• Purpose: “Which query crashes the

server? Why is this statement slow?”

• Metric: detailed records of variables while processing a single query

• Deliverable: individual report to single developer

• Latency: Seconds, Minutes

• HA Requirement: noneMittwoch, 28. Oktober 2009

Compliance• Purpose: “Are we fulfilling our

contracts?”

• Metric: “high level availability tests w/ binary outcome”, query times

• Deliverable: weekly/monthly report to IT management/customer

• Latency: Days

• HA Requirement: days, lowMittwoch, 28. Oktober 2009

For Audit• Purpose:

• Detect tampering, alteration and access, create accountability records for changes and access

• Metric: high level event records with application semantics

• additionally: inescapable, unforgeable

• Deliverable: daily/weekly report

• HA Requirement: Out of pathMittwoch, 28. Oktober 2009

Metrics

• Data sources at OS level

• Data sources in MySQL

• Derived data sources

Mittwoch, 28. Oktober 2009

OS Level

• Internal Availability Check:

• presence of PID file

• presence of process

• test -f linux.pid is not good enough

• “kill -0 $(cat linux.pid)” is better than “ps axuwww | grep mysql[d]”

Mittwoch, 28. Oktober 2009

OS Level

• External Availability Check :

• ping check

• trivial query

• Set timeouts for the trivial query according to SLA

Mittwoch, 28. Oktober 2009

OS Level

• Memory Checks:

• process size in memory

• VSIZE vs. RSS

• buffer cache size (“free -m”)

• swap check!

• vm.swappiness = 0

Mittwoch, 28. Oktober 2009

OS Level

• “iostat -x 1 3” output

• In general, databases are limited by seek/sec, not MB/sec

• SSD, FusionIO

• Network I/O quality

• Smokeping? (Cluster!)

Mittwoch, 28. Oktober 2009

OS Level

Mittwoch, 28. Oktober 2009

MySQL

• General Counters and Config:

• SHOW /*!50000 GLOBAL */ STATUS;

• SHOW /*!50000 GLOBAL */ VARIABLES;

• What is running?

• SHOW FULL PROCESSLIST;

Mittwoch, 28. Oktober 2009

MySQL

• File Handles:

• SHOW TABLE STATUS;

• SHOW OPEN TABLES;

Mittwoch, 28. Oktober 2009

MySQL

• Replication:

• SHOW SLAVE STATUS;

• SHOW MASTER LOGS;

• SHOW MASTER STATUS;

Mittwoch, 28. Oktober 2009

MySQL

• InnoDB:

• SHOW ENGINE INNODB STATUS;

• SHOW GLOBAL STATUS LIKE 'inno%';

Mittwoch, 28. Oktober 2009

Status: General•qps: questions/uptime

•COM_% Counters:

Mittwoch, 28. Oktober 2009

Status: General

• COM_% Counters

• Read/Writes:( select + qcache_hits ) / ( insert+update+delete+replace )

• Transactions:#commit, rollback/commit, writes/commit

Mittwoch, 28. Oktober 2009

Status: Caches

• Table Cache

• (opened_tables/sec )

• (table_cache_size – open_tables)

• Thread Cache

• (threads_created/sec)

• (thread_cache_size – threads_cached)

Mittwoch, 28. Oktober 2009

Status: Caches

Mittwoch, 28. Oktober 2009

Status: Caches

• Query-Cache Hit Ratio:

• qcache_hits*100 / ( qcache_hits + com_select )

• Hits vs. Inserts vs. Not Cached

Mittwoch, 28. Oktober 2009

Status: Caches

• Lowmem Prunes:

• qcache_lowmem_prunes / uptime

• qcache_lowmem_prunes per second

Mittwoch, 28. Oktober 2009

Status: Caches

• Increase query cache size:

• less prunes, higher hit ratio

• Sometimes it is better to delay writes or to split tables instead

Mittwoch, 28. Oktober 2009

Status: Caches

Mittwoch, 28. Oktober 2009

Status: Connections

• Connections

• max_connections – max_used_connections

• max_connections - threads_connected

Mittwoch, 28. Oktober 2009

Status: Connections

Mittwoch, 28. Oktober 2009

Status: MyISAM

• Key Cache Hit Ratio:

• key_read_requests / key_read

• 300-1000 target

• 99.7% or better hit ratio

Mittwoch, 28. Oktober 2009

Status: MyISAM

• MyISAM Lock Contention

• table_locks_waited * 100 / table_locks_immediate

• <1% good, 1% warning, >3% you are currently dying

• distinctly nonlinear behavior

Mittwoch, 28. Oktober 2009

Status: MyISAM

Mittwoch, 28. Oktober 2009

Status: InnoDB

• Page Cache Usage:

• Innodb_buffer_pool_pages_free *100 / Innodb_buffer_pool_pages_total

• Cache Miss Ratio:

• (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)*100

• target: <3%, <1%

Mittwoch, 28. Oktober 2009

Status: InnoDB

• Cache Monitoring: Innodb_buffer_pool_wait_freemust not count up!

• Log-Monitoring: Innodb_log_waitsmust not count up!

Mittwoch, 28. Oktober 2009

Status: InnoDB

• InnoDB has many more stats

• See Innotop output, read up on theory

• Worth a talk of its own

Mittwoch, 28. Oktober 2009

Status: temp tables

• Temp tables per second:

• created_tmp_tables

• Temp tables to Disk:

• created_disk_tmp_tables * 100 / created_tmp_tables

Mittwoch, 28. Oktober 2009

Status: temp tables

• Additional hints:

• What kind of filesystem is tmpdir pointing to?

• Are we selecting BLOB/TEXT types?

• tmp_table size and max_heap_table_size must match

Mittwoch, 28. Oktober 2009

Status: temp tables

Mittwoch, 28. Oktober 2009

Status: Replication• Functionality:

• Slave_IO_running: YES, Slave_SQL_running: YES

• Lag:

• Seconds_behind_master

• Rate:

• Read_Master_Log_Pos/sec,

Mittwoch, 28. Oktober 2009

Status: Replication

Mittwoch, 28. Oktober 2009

Status: Slow Queries

• Slow Queries in general:

• Slow_queries/sec

• Counting evil queries:

• select_full_join / sec

• select_full_join / com_select

Mittwoch, 28. Oktober 2009

Toolbox: Incidents

• Incident detection:

• Nagios family

• Load Balancer Live Check

• Post Mortem:

• A small shell script running per minute

Mittwoch, 28. Oktober 2009

Toolbox: Incidents

• Nagios Plugins Quality:

• Bad checks

• Scripts

• Not compliant w/ standards

• Incident monitors vs. Compliance monitors

Mittwoch, 28. Oktober 2009

Toolbox: Post Mortem• Record per minute, keep one week

• logged to /var/log/mysql_pl

• uptime, ps auxwww, df -Th, free -m

• show full processlist; show slave status;

• show engine innodb status

• if HAVE_INNODB == YES

Mittwoch, 28. Oktober 2009

Toolbox: Incidents

Mittwoch, 28. Oktober 2009

Toolbox: Capacity

• MySQL Enterprise Monitor

• Cacti or Munin

• Actual overload tests

• increase LB weights

• monitor latency of standardized probe to detect breakage

Mittwoch, 28. Oktober 2009

MEM

Mittwoch, 28. Oktober 2009

Tools: Cacti

• Shiny, but creating templates is a pain!

• Ready-made templates from

• http://code.google.com /p/mysql-cacti-templates/

Mittwoch, 28. Oktober 2009

Tools: Capacity

• Free MySQL SNMP tools are rare

• Exporting Variables, Status and Slave Status to SNMP:

• Perl Coprocess (PoC at best)

• http://mysqldump.azundris.com/archives/63-guid.html

Mittwoch, 28. Oktober 2009

Toolbox: Console

• inntop (mytop is dead)

• maatkit (indispenseable)

• tuning-primer.sh

• http://www.day32.com/MySQL/

• Self-written tools

• Establish a culture of tool creation

Mittwoch, 28. Oktober 2009

Toolbox: Debug

• MEM w/ proxy

• Proxy sometimes problematic

• Alternative: Rig DB access class

• mk-query-digest

• w/ SPAN port at switch

Mittwoch, 28. Oktober 2009

Toolbox: Debug

Mittwoch, 28. Oktober 2009

Toolbox: Audit

• Trailing controls

• Agile development

• Dump comparison

• mysqldump --no-data & git & diff

• mk-show-grants & git & diff

• etc.

Mittwoch, 28. Oktober 2009

Toolbox: Audit

• We have very many servers

• Critical data on isolated servers

• Limited access

• Wonders of an unlimited license

Mittwoch, 28. Oktober 2009

Mittwoch, 28. Oktober 2009