MySQL Performance Monitoring
-
Upload
spil-engineering -
Category
Technology
-
view
1.550 -
download
4
description
Transcript of MySQL Performance Monitoring
MySQL Performance monitoring using Statsd and Graphite Art van Scheppingen Head of Database Engineering
2
1. Who are we? 2. What monitoring tools do we use? 3. What are StatsD, Collectd and Graphite? 4. How MySQL logs to StatsD 5. Graphing examples 6. Challenges 7. QuesHons?
Overview
Who are we? Who is Spil Games?
4
• Company founded in 2001 • 350+ employees world wide • 180M+ unique visitors per month • Over 50M registered users • 45 portals in 19 languages
• Casual games • Social games • Real Hme mulHplayer games • Mobile games
• 35+ MySQL clusters • 60k queries per second (3.5 billion qpd)
Facts
5
Geographic Reach 180 Million Monthly AcHve Users(*)
Source: (*) Google Analy3cs, August 2012
6
Girls, Teens and Family
spielen.com juegos.com gamesgames.com games.co.uk
Brands
Monitoring
We use(d) many many many monitoring tools so far!
8
• Opsview/Nagios (mainly availability) • CacH (using Baron Schwartz/Percona templates) • MONYog • Good ol’ RRD
Existing monitoring systems we use(d)
9
Opsview/Nagios
• Strong points: • Easy to create (nagios) plugins • Slaves for scaling out
• Weak points: • Stats gathering through polling • Low granularity (1 to 5 minutes) • Difficult URIs for graphs
10
Cacti
• Strong points: • Awesome Percona templates • Great overviews and graphs
• Weak points: • Hard to add new metrics (to 90+ servers) • Not scalable • Low granularity (1 to 5 minutes) • Hard to correlate
11
MonYOG
• Strong points: • Easy to set up • Compare any server with another • Compare configuraHons
• Weak points: • “Closed source” • Not scalable • Jack of all trades
12
Poll limitations
• Limited to a set interval • Data gets averaged out • (Host) checks are run serial • Slowdowns in a run means no/less data • Scaling: add more masters/slaves • Sekng up an SSH connecHon is slow
13
Difficult to add a new metric host065!bash-3.2# netstat -s | grep "listen queue"! 26 times the listen queue of a socket overflowed!!host066!bash-3.2# netstat -s | grep "listen queue"! 33 times the listen queue of a socket overflowed!
14
Other things you can’t do!
Statsd + Collectd + Graphite What are they?
16
• Highly scalable real-‐Hme graphing system • Collects numeric Hme-‐series • Backend daemon Carbon
• Carbon-‐cache: receives data • Carbon-‐aggregator: aggregates data • Carbon-‐relay: replicaHon and sharding
• RRD or Whisper database
What is Graphite?
17
• Each metric is in its own bucket • Periods make folders • prod.syseng.mmm.<hostname>.admin_offline
• Metric types • Counters • Gauge
• RetenHon can be set using a regex • [mysql] • pasern = ^prod\.syseng\.mysql\..*$ • retenHons = 2s:1d,1m:3d,5m:7d,1h:5y
Graphite’s capabilities
18
• Unix daemon that gathers system staHsHcs • Over 90 (input/output) plugins • Plugin to send metrics to Graphite/Carbon • Very useful for system metrics
What is Collectd?
19
• Front-‐end proxy for Graphite/Carbon (by Etsy) • NodeJS daemon (also other languages) • Receives UDP (on localhost) • Buffers metrics locally • Flushes periodically data to Graphite/Carbon (TCP) • Client libraries available in about any language • Send any metric you like!
What is StatsD?
20
• StatsD funcHons • update_stats • increment/decrement • set • gauge • Hmers
StatsD functions
21
PHP: $statsd = new StatsD();!$statsd->increment(“prod.app1.pages_rendered”, 1);!$statsd->gauge(“prod.app1.page_concurrency”, 10);!$statsd->set(“prod.app1.unique_users”, $userid);!…!$start = microtime(true); !serve_out_content_to_clients(); !$statsd->timing(”prod.app1.rendering_time", (microtime(true) - $start) * 1000);!!Library:!https://github.com/etsy/statsd/blob/master/examples/php-example.php!!
StatsD PHP code examples
22
Our Graphite cluster(s)
Client requesHng graphs
Graphite Rendering Cluster Carbon relay
Loadbalancer (port 443)
DEV SYSENG SERVICES1 SERVICES2
Server-‐1 Server-‐2 Server-‐n
Loadbalancer (port 2003)
8 nodes
3 nodes 2 nodes
23
Graphite Storage Clusters
24
Collectd
Collectd
Gather data plugins
CPU DISK LOAD ….
Carbon TCP
30 second interval
25
StatsD
StatsD
ApplicaHon Level
# OF LOGINS CACHE HIT/MISS STATUS INNODB STATUS
Carbon TCP
2 second interval
MySQL_Statsd
localhost:8125 UDP
26
Global scale?
MySQL + StatsD
How do we use them?
28
• MySQL plugin for Collectd • Sends SHOW STATUS • No INNODB STATUS • Plugin not flexible
• DBI plugin for Collectd • Metrics based on columns
• Different granularity needed • Separate daemon (with persistent connecHon) • StatsD is easy as ABC
Why use StatsD over Collectd?
29
• Wrisen in Python • Gathers data every 0.5 seconds • Sends to StatsD (localhost) a�er every run • Easy to set up: no configuraHon • Persistent connecHon • Baron Schwartz’ InnoDB status parser (cacH poller) • Other interesHng metrics and counters
• InformaHon Schema • MySQL 5.5/5.6 Performance Schema • MariaDB specific • Galera specific
MySQL StatsD daemon
30
MySQL StatsD overview
MySQLCollector
SHOW STATUS
SHOW INNODB STATUS
SHOW VARIABLES
Persistentconnection
StatsD
Flushedevery
0.5 seconds
31
• Perl (Net::Statsd) • Sends any status change to StatsD (localhost) • Non-‐blocking (thanks to UDP) • Draw as infinite in Graphite
MySQL Multi Master patch
32
use Net::Statsd;!$Net::Statsd::HOST = 'localhost'; # Default!$Net::Statsd::PORT = 8125; # Default!!…!!# ONLINE -> HARD_OFFLINE!unless ($ping && $mysql) {! Net::Statsd::update_stats('prod.syseng.mmm.'.$host.'.hard_offline', 1);! FATAL sprintf("State of host '%s' changed from %s to HARD_OFFLINE (ping: %s, mysql: %s)", $host, $state, ($ping? 'OK' : 'not OK'), ($mysql? 'OK' : 'not OK'));! $agent->state('HARD_OFFLINE');!}!!…!!
MMM Perl code example
33
• Deployments • User iniHated acHons
• Logins • High scores • Comments / raHngs • Images uploaded • Payments
• ApplicaHon metrics • Error counts • Cache staHsHcs (cache hit/miss) • Request Hmers • Image sizes
Other metrics
Start graphing! Now it starts to get interes=ng!
35
• IdenHfy your KPIs • Don’t graph everything
• More graphs == less overview • Combine metrics • Stack clusters
What is important for you?
36
• Include other metrics into your graphs • Deployments • Failover(s)
• Combine applicaHon metrics with your database • Other influences
• Solar flares • Start of the new Maya calendar
Correlate!
37
• URI based rendering API • Support for wildcards
• stats.prod.syseng.mysql.*.status.com_select • sumSeries (stats.prod.syseng.mysql.*.status.com_select) • aliasByNode(stats.prod.syseng.mysql.*.status.com_select, 4)
• Many funcHons • Nth percenHle • Holt-‐Winters Forecast • Timeshi�
Graphite Graphing Engine
38
Graphite Aggregator syseng => {! nodes => [”databasehost1", ”databasehost2"],! copying_relay_instances => 8,! hashing_relay_instances => 8,! cache_instances => 8,! aggregation => {! 0 => {! name => ”mysql",! pattern => '.*\.mysql\..*',! send_raw => 1,! },! }! }!!!stats.<env>.syseng.mysql.cluster1.status.questions.all (2) = !
!sum stats.<env>.syseng.mysql.*.status.questions!!
39
Graphite web interface
40
Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415&until=23%3A59_20130421!
41
Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415&until=23%3A59_20130421!
42
Other examples: MMM
43
Other examples: timeshift
44
Other examples: multiple weeks
Challenges The road ahead
46
• MySQL_statsd rewrite necessary (not opensource yet) • No alerHng through Graphite (yet) • Machine learning • Eternal hunger for more metrics • Abuse of the system
What challenges do we have?
47
• Persistent connecHons + repeatable read • History list skyrocketed
• Too many metrics slows down graphing • Too many metrics can kill a host
• EstatsD for Erlang
What lessons have we learned?
Questions…
49
• Graphite: hsp://graphite.readthedocs.org/en/latest/ • Collectd: hsps://collectd.org/ • StatsD on Github by Etsy: hsps://github.com/etsy/statsd/wiki • Etsy on StatsD: hsp://codeascra�.etsy.com/2011/02/15/measure-‐anything-‐measure-‐everything/
Practical links
50
• PresentaHon can be found at: hsp://spil.com/perconasc2013 • If you wish to contact me: [email protected] • Don’t forget to rate my talk!
Thank you!