Winning the metrics battle
-
Upload
sihil -
Category
Technology
-
view
6.624 -
download
1
description
Transcript of Winning the metrics battle
![Page 1: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/1.jpg)
Winning the metrics battle (finally)
![Page 2: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/2.jpg)
Winning the metrics battle (finally)
Simon Hildrew
Infrastructure Developer
The Guardian
Nick Satterly
Monitoring Engineer
The Guardian
![Page 3: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/3.jpg)
![Page 4: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/4.jpg)
The metrics battlefield
![Page 5: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/5.jpg)
1,400 2,800
50,000
180,000
Total metrics
![Page 6: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/6.jpg)
5 minutes
every 15seconds
http://www.flickr.com/photos/ghostsigns/6676069121
http://www.flickr.com/photos/millynet/134071210
![Page 7: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/7.jpg)
developer dashboards
![Page 8: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/8.jpg)
0
5
10
15
20
Physical screens Screensaver hacks
![Page 9: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/9.jpg)
dev
hack
![Page 10: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/10.jpg)
business dashboards
![Page 11: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/11.jpg)
metrics + dashboards = culture change
![Page 12: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/12.jpg)
http://www.flickr.com/photos/chrisjames_taylor/5454315456
![Page 13: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/13.jpg)
Side project
Incremental upgrade
Use off the shelf tool
Pragmatic solution
Done in a year
our approach➡ Prioritise
➡ Understand the real problem
➡ Question the tools
➡ Be ambitious
➡ Keep learning
![Page 14: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/14.jpg)
Prioritise
![Page 15: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/15.jpg)
drowning in work
http://www.flickr.com/photos/iampeas/246738971
![Page 16: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/16.jpg)
a dedicated monitoring and metrics engineer
![Page 17: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/17.jpg)
Understand the real problem
![Page 18: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/18.jpg)
Urgent issue - current tool end of life
![Page 19: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/19.jpg)
The story so far...
![Page 20: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/20.jpg)
metrics were not helping us solve production outages
![Page 21: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/21.jpg)
ballooning number of applications
![Page 22: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/22.jpg)
but... difficult to instrument applications
![Page 23: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/23.jpg)
T.T. Fix
T.T. Detect+
T.T. Diagnose+
T.T. Resolve
=
![Page 24: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/24.jpg)
inaccessible tools
http://www.flickr.com/photos/kdashy/2678539087
![Page 25: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/25.jpg)
inconsistent data
http://www.flickr.com/photos/sybrenstuvel/2468506922
![Page 26: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/26.jpg)
hypothesising & arguingeasier than measuring
http://www.flickr.com/photos/nouqraz/200049988
![Page 27: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/27.jpg)
The ‘right’ thing
• measure everything
• measure frequently
• measure each data point once
• input and output must be open
![Page 28: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/28.jpg)
Question the tools
![Page 29: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/29.jpg)
Brute force?
http://www.flickr.com/photos/epublicist/3546059144
![Page 30: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/30.jpg)
The safe option?
http://www.flickr.com/photos/alicebartlett/2361209195
![Page 31: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/31.jpg)
Unintuitive?
http://www.flickr.com/photos/merlijnhoek/2841785343
![Page 32: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/32.jpg)
http://www.flickr.com/photos/evansville/8953838/
Imposing a flawed model?
![Page 33: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/33.jpg)
Too difficult / no progress?http://www.flickr.com/photos/ginja_andy/4165849136/
![Page 34: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/34.jpg)
Nagios
• the “IBM” of monitoring tools
• compromise over quantity and frequency of checks
• < insert your criticism of nagios here >
![Page 35: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/35.jpg)
Zabbix
• metric collection tightly coupled to monitoring tool
• confusing UI with poor visualisation
• needed brute force to make limited API work
![Page 36: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/36.jpg)
The ‘right’ thing
• measure everything
• measure frequently
• measure each data point once
• input and output must be open
![Page 37: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/37.jpg)
![Page 38: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/38.jpg)
don’t compromise
![Page 39: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/39.jpg)
Be ambitious
![Page 40: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/40.jpg)
Throw work away
http://www.flickr.com/photos/mugley/2961131550
![Page 41: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/41.jpg)
Draw your dream
![Page 42: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/42.jpg)
Get as far as you can
http://www.flickr.com/photos/sk8geek/7358702704
![Page 43: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/43.jpg)
graphite
Etsy dashboard
FITB ganglia
network applicationshosts
db?
api?
SNMP? syslog?
alerting?
message queue
screens users
![Page 44: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/44.jpg)
Develop missing pieces
http://www.flickr.com/photos/kalexanderson/5969012589
![Page 45: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/45.jpg)
graphite
Etsy dashboard
FITB ganglia
network applicationshosts
mongodb elastic search
ganglia alerts
ganglia-api
syslog alerts
SNMP alerts
alerta
message queue
screens users
![Page 46: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/46.jpg)
Guardian Managementhttps://github.com/guardian/guardian-management
![Page 47: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/47.jpg)
Ganglia APIhttps://github.com/guardian/ganglia-api
![Page 48: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/48.jpg)
rescale image???
Alertahttps://github.com/guardian/alerta
![Page 49: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/49.jpg)
• Ganglia
• FITB
• Graphite
• Etsy dashboards
• Guardian managementhttps://github.com/guardian/guardian-management
• Guardian ganglia-apihttps://github.com/guardian/ganglia-api
• Guardian alertahttps://github.com/guardian/alerta
Current stack
![Page 50: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/50.jpg)
Keep learning
![Page 51: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/51.jpg)
we are not there yet
![Page 52: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/52.jpg)
Watch the cultural changes
![Page 53: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/53.jpg)
detecting
![Page 54: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/54.jpg)
diagnosis
![Page 55: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/55.jpg)
diagnosis
![Page 56: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/56.jpg)
performance testing
![Page 57: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/57.jpg)
confirmation
![Page 58: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/58.jpg)
#monitoringsucks
![Page 59: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/59.jpg)
➡ Prioritise
➡ Understand the real problem
➡ Question the tools
➡ Be ambitious
➡ Keep learning
![Page 60: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/60.jpg)
tools can change culture
![Page 61: Winning the metrics battle](https://reader034.fdocuments.us/reader034/viewer/2022051323/54b76a1f4a795957768b4699/html5/thumbnails/61.jpg)
Thank you
Simon Hildrew@sihil
Nick Satterly@nicksatterly
http://github.com/guardianhttp://gu.com/p/3ap5f