Data Driven Monitoring
-
Upload
daniel-schauenberg -
Category
Technology
-
view
148 -
download
1
Transcript of Data Driven Monitoring
Data Driven Monitoring
Daniel Schauenberg
@mrtazz
@mrtazz
Item by TheBackPackShoppe
http://www.flickr.com/photos/brianglanz/1095706242
@mrtazz
How comfortable are you deploying a
change right now?
“If this is your first day at Etsy, you deploy the site”
@mrtazz
@mrtazz
Ganglia• System level metrics
• Instance per DC/environment
• > 220k RRD files
• Fully configured through Chef role attributes
@mrtazz
Rainbow Graphs!
@mrtazz
StatsD
@mrtazz
Graphite• Application level metrics
• 96G RAM, 20 Cores, 7.3T SSD RAID 10
• 525k metrics/minute
• Mirrored Primary/Primary Setup
• Functionally sharded relays
@mrtazz
@mrtazz
@mrtazz
nagios
@mrtazz
<3 nagios
@mrtazz
@mrtazz
Nagios• 2 instances in each DC/environment
• Fully Chef generated configuration
• Service checks and contacts in git
• Notifications via email->SMS gateway
• ~75% ops on-call
@mrtazz
@mrtazz
Much more…• Syslog-ng
• Logstash
• Logster
• Supergrep
• Eventinator
Information Overload
Image by http://jasoncasteel.deviantart.com/
@mrtazz
Alert Fatigue
We have the data
We can make
it better
Item by PicksFromThePast
@mrtazz
nagios-herald
@mrtazz
nagios-herald
@mrtazz
nagios-herald
@mrtazz
Failed Check nagios-herald
Formatter
Helpers
Graphite Ganglia Logstash
Message
github.com/etsy/nagios-herald
@mrtazz
opsweekly
@mrtazz
@mrtazz
Opsweekly
@mrtazz
Alert categorization
@mrtazz
Wearables!
Item by JennysTrinketShoppe
@mrtazz
Sleep tracking
github.com/etsy/opsweekly
@mrtazz
Summary• Set of trusted tools for monitoring
• Always experiment
• Always learn
• Always improve
• Use the data, Luke
@mrtazz
Shout out to @lozzd
and @Ryan_Frantz
@mrtazz
codeascraft.com etsy.com/codeascraft/talks
etsy.github.com etsy.com/careers
Questions?
Data Driven Monitoring
Daniel Schauenberg
@mrtazz