Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

80
SENSE AND SENSU-BILITY Painless Metrics And Monitoring In The Cloud with Sensu Bethany Erskine Velocity NYC 2013 http://github.com/skymob/sensu-tutorial Monday, October 14, 13

description

Are you unhappy with the state of monitoring in your organization? Are you successfully automating “all the things” except your monitoring checks? Are you tired of looking at monitoring dashboards that hark from another era? Do you long to access your monitoring system via a REST API? Paperless Post recently solved these problems by replacing Nagios with Sensu, a new and awesome free monitoring and metrics router that is designed with configuration management and cloud deployments in mind. In my presentation we’ll take an in-depth look into why we chose Sensu and how we monitor our services and collect system metrics to send to Graphite. Subtopics will include how we planned for and executed the migration, mistakes we made along the way, how we knew when to scale and how we did it. I’ll also cover how we’re making our Sensu setup redundant and highly available, how we’re monitoring and collecting metrics about Sensu, and how we’ve integrated our internal tools with Sensu.

Transcript of Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

Page 1: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SENSE AND SENSU-BILITYPainless Metrics And Monitoring

In The Cloud with Sensu

Bethany ErskineVelocity NYC 2013

http://github.com/skymob/sensu-tutorial

Monday, October 14, 13

Page 2: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

BEFORE I BEGIN...IF YOU DID NOT SET UP SENSU-TUTORIAL

BEFORE THE CLASS:

1. grab a USB key 2. follow the instructions on the README

If you don’t have a computer, no sweat!

Monday, October 14, 13

Page 3: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

DO YOU LOVE YOUR

MONITORING SETUP?

Monday, October 14, 13

Page 4: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

#MONITORINGLOVE

Monday, October 14, 13

Page 5: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

MY STORY

+

(╯(╰,)

Monday, October 14, 13

Page 6: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

Monday, October 14, 13

Page 7: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

Monday, October 14, 13

Page 8: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

+

Monday, October 14, 13

Page 9: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

WHY SENSU

✓Ruby

✓Plugins can be written in any language

✓sensu-chef cookbook

✓community

Monday, October 14, 13

Page 10: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

WHY SENSU

✓re-use Nagios checks!

✓metrics and checks all collected by one system

✓Graphite integration

✓easy to scale

Monday, October 14, 13

Page 11: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

WHY SENSU

✓“Can I do X with Sensu?” probably!

Monday, October 14, 13

Page 12: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

WHY SENSU

Monday, October 14, 13

Page 13: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

WHY SENSU?

✓Sensu source is well-written and easy to parse

✓https://github.com/sensu

Monday, October 14, 13

Page 14: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

WHY SENSU?

✓sensu-community-plugins

✓80 contributors

✓over 600 plugins

✓https://github.com/sensu/sensu-community-plugins

Monday, October 14, 13

Page 15: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

TODAY at PAPERLESS

Two Sensu environments (prod/testing)~ 250 - 275 instances of sensu-client

4-6 Sensu-server instances25k Metrics/Hour to Graphite

1 custom dashboard1 custom CLI

Monday, October 14, 13

Page 16: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

RESOURCES✓All of our Sensu infrastructure is

virtualized.

✓We typically give a sensu-server box 1.5GB RAM and 2 processors, scaling up RAM for any box running more than one Sensu service on it.

✓ 4GB RAM for a monolithic Sensu install (Rabbit, Redis, all Sensu components on one)

Monday, October 14, 13

Page 17: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

AS WE GREWGrowing pains and lessons learned...

Monday, October 14, 13

Page 18: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

NEEDS MORE SENSU

✓High load on Sensu server

✓Backed-up queues in RabbitMQ

✓TIP: set up check to monitor the RabbitMQ ready queue size, you'll want an email when the queue grows about 10K and stays there

Monday, October 14, 13

Page 19: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

HOW TO SCALE

✓Add more sensu-server instances

✓No special configuration needed

✓checks will be distributed in round-robin fashion to the sensu-servers

Monday, October 14, 13

Page 20: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

GRAPHITE PAINS

✓symptoms: backed up queues in RabbitMQ, spotty graphs

✓cluster couldn’t keep up with the large amount of metrics we were now serving it via AMQP

Monday, October 14, 13

Page 21: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

GRAPHITE PAINS

✓Solution: stop collecting metrics every 10 seconds (excessive!)

✓moved staging metrics to staging Graphite cluster

✓Moved prod Graphite cluster to SSD

Monday, October 14, 13

Page 22: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

THE MIGRATIONor, How To Quit Nagios in Ten Easy Steps

Monday, October 14, 13

Page 23: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 1: NUKE AND PAVE

Monday, October 14, 13

Page 24: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 2: PLANMETRICS AND MONITORING SURVEY

Monday, October 14, 13

Page 25: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

METRICS AND MONITORING SURVEY

Monday, October 14, 13

Page 26: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 3: DEFINE GLOBALS

✓CHECKS: must be actionable!

✓METRICS: go nuts

✓HANDLERS: EMAIL for everything initially, added Pagerduty later.

Monday, October 14, 13

Page 27: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

OUR GLOBALS

✓CHECKS: disk usage, swap usage, zombie processes, RO filesystems

✓METRICS: vmstat, disk usage, cpu, memory, interface and disk perf

✓HANDLERS: Email, Campfire, Pagerduty

Monday, October 14, 13

Page 28: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 4: DEFINE SPECIFICS

✓For each server role, define additional states to be checked and alerted on:

✓Process Checks

✓System Checks

✓Service Checks

✓Service Metrics

Monday, October 14, 13

Page 29: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 5: SET UP A PLACE TO TEST

✓Set up a permanent testing Sensu stack using your CM tool of choice

✓we used sensu-chef cookbook

Monday, October 14, 13

Page 30: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 6: SET A WORKFLOW

✓Develop and document a workflow for implementing, testing, deploying and signing off on checks

✓You’ll get the best coverage if anyone (developers or ops) can easily add checks and metrics to Sensu

Monday, October 14, 13

Page 31: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

EXAMPLE WORKFLOW

✓add new sensu_check definitions to the appropriate cookbook in Chef

✓deploy new check to staging env using Chef

✓Pull Request with sample graphs or alerts

✓Code Review from colleague

✓Deploy to Prod

Monday, October 14, 13

Page 32: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SENSU IN CHEF

Monday, October 14, 13

Page 33: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 7: EXECUTE WORKFLOW

✓Starting with the low-hanging fruit (plugins that already existed in sensu-community-plugins repository), configure and deploy each check in the worksheet to the testing Sensu server

✓deploy sensu-client to a few select machines

Monday, October 14, 13

Page 34: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 8: WATCH THE WATCHER

✓Set up some bare-minimum 3rd party monitoring for the Sensu servers

✓We use Panopta’s agent to check for aliveness, disk usage and CPU usage.

Monday, October 14, 13

Page 35: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

Monday, October 14, 13

Page 36: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

MONITOR THE MONITOR

✓Other ideas: have Testing Sensu monitor Prod Sensu

✓Sensu can collect metrics about itself

Monday, October 14, 13

Page 37: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 9: ROLLOUT

✓Deploy your Production server infrastructure

✓Roll out the client and checks to the rest of the your prod environments. 

Monday, October 14, 13

Page 38: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

STEP 10: TUNE

✓Laissez le bon alertes roulent!

✓Expect to need to tune thresholds and alert occurrences.

Monday, October 14, 13

Page 39: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SENSU ARCHITECTURE

Monday, October 14, 13

Page 40: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SENSU ARCHITECTURE

Monday, October 14, 13

Page 41: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

OMNIBUS INSTALLER

is awesome

Monday, October 14, 13

Page 42: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

LET’S PLAY WITH SENSU

If you haven’t been able to get your sandboxes up and running,

please pair with someone near you.

Monday, October 14, 13

Page 43: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SANDBOX GOALS✓Get familiar with Sensu

configuration

✓Install a Handler

✓Deploy a check

✓Trigger an alert on that check

✓Give you something to take home and hack on

Monday, October 14, 13

Page 44: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

OOPS

If you mess anything up:

vagrant halt; vagrant up

Worst case:

vagrant destroy; vagrant up

Monday, October 14, 13

Page 45: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

TWO VIRTUALBOXES

Sensu-Server and Sensu-ClientVagrant/Chef

Centos 6.4Sensu Version 0.10.2

Monday, October 14, 13

Page 46: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SENSU CONFIGURATION

✓Please open up a terminal and SSH into both your sensu-server and sensu-client VMs

✓sudo su -

✓cd /etc/sensu

Monday, October 14, 13

Page 47: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SENSU CONFIGURATION✓/etc/sensu/config.json - config for

redis, rabbitmq, api and dashboard

✓/etc/sensu/conf.d/ - checks go here

✓/etc/sensu/conf.d/client.json - client configuration, subscriptions

✓/etc/sensu/{extensions|handlers|mutators|plugins}

Monday, October 14, 13

Page 48: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

TRIGGER AN ALERT!

On sensu-client:

service sensu-client stop

Monday, October 14, 13

Page 49: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

CHECK YOUR DASHBOARD

✓Open a web browser and go to http://10.254.254.10:8080

✓username: admin / password: secret

Monday, October 14, 13

Page 50: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

HANDLERS

✓A HANDLER takes action on an event using a pipe, TCP, UDP, AMQP, or a set of other handlers

✓Examples: send an email, send event to Pagerduty, send metrics to Graphite

✓Default is “debug”

Monday, October 14, 13

Page 51: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

HANDLER EXAMPLES

✓BASIC: send an email to ops@

✓ADVANCED: attempt to remediate the alert (i.e. run a custom script that spins up additional ec2 instances)

Monday, October 14, 13

Page 52: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

HANDLERS

✓Let’s configure an EMAIL handler to send a informative email for an event.

✓/etc/sensu/handlers/mailer.rb plugin is installed for you, we just need to configure and install it

Monday, October 14, 13

Page 53: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

CONFIGURE THE PLUGIN

{ "mailer": { "mail_from": "[email protected]", "mail_to": "[email protected]" }}

ON SENSU SERVER:vim /etc/sensu/conf.d/handlers/mailer.json

Monday, October 14, 13

Page 54: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

CONFIGURE THE HANDLER

cp /etc/sensu/conf.d/handlers/default.json /etc/sensu/conf.d/handlers/email.json

vim /etc/sensu/conf.d/handlers/email.json

Monday, October 14, 13

Page 55: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

EMAIL.JSON

"handlers": { "email": { "type": "pipe", "command": "/etc/sensu/handlers/mailer.rb" }}

Monday, October 14, 13

Page 56: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

CHECK GEM DEPENDENCIES

/opt/sensu/embedded/bin/gem list | grep mail

Monday, October 14, 13

Page 57: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

FIX PERMISSIONS

chown -R .sensu /etc/sensu/conf.d/

Monday, October 14, 13

Page 58: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

RESTART SERVICES

service sensu-server restart

tail -100 /var/log/sensu/sensu-server.log | grep mail

Monday, October 14, 13

Page 59: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

CHECKS✓Sensu-client runs CHECKS that are

defined and scheduled either locally (standalone) or on the sensu-server (subscription).

✓A CHECK sends a RESULT as an EVENT to a HANDLER - this applies to anything - service checks, metrics, etc

Monday, October 14, 13

Page 60: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

CHECK EXECUTION

✓Either scheduled by the server (subscription) or scheduled by the client (standalone)

✓Today we will configure a subscription-based check on the server that will run on our client

Monday, October 14, 13

Page 61: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

LETS CONFIGURE A CHECK

✓Use check-procs.rb to make sure at least one instance of cornbread is running

Monday, October 14, 13

Page 62: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

DETERMINE OUR CHECK COMMAND

On your SENSU CLIENT:

/opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check-procs.rb -p cornbread -W1

Monday, October 14, 13

Page 63: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

INSTALL OUR CHECK

✓On your SENSU SERVER:

✓vim /etc/sensu/conf.d/checks/cornbread_process.json

Monday, October 14, 13

Page 64: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

CORNBREAD_PROCESS.JSON

Monday, October 14, 13

Page 65: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

RESTART SERVICES

service sensu-server restart

tail -100 /var/log/sensu/sensu-server.log | grep cornbread

Monday, October 14, 13

Page 66: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

CHECK YOUR DASHBOARD

Monday, October 14, 13

Page 67: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

CHECK YOUR EMAIL

Monday, October 14, 13

Page 68: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SENSU API

✓REST API

✓HTTP/4567

✓on SENSU SERVER try:

curl -l http://localhost:4567/events \ | python -mjson.tool

Monday, October 14, 13

Page 69: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SENSU SERVICES

✓Sensu API

✓Sensu Server

✓Sensu Client

✓Sensu Dashboard

Monday, October 14, 13

Page 70: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

EVERYTHING OK?

✓/etc/init.d/sensu-service {client|server|api|dashboard} {start|stop|status|restart}

✓ps -ef | grep sensu

✓tail -f /var/log/sensu/*.log

✓curl -l localhost:4567/info

Monday, October 14, 13

Page 71: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

COOL SENSU TRICKS

Monday, October 14, 13

Page 72: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

SEND DIRECTLY TO SENSUnetcat to: 127.0.0.0:3030

Monday, October 14, 13

Page 73: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

AGGREGATE ALERTS

✓Handy for preventing alert floods

✓Alert when X% of checks are are not OK

Monday, October 14, 13

Page 74: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

MY SENSU TIPS

✓install the RabbitMQ management web interface and bookmark it (see http://10.254.254.10:15672/#/ )

✓lock your plugins’ gem dependency versions

Monday, October 14, 13

Page 75: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

TIPS TIPS TIPS

✓have alternate ways to access your Dashboard information

✓we integrated our command-line developer tools with Sensu API

✓we also created our own Ops dashboard that queries Sensu, Graphite and our app for data

Monday, October 14, 13

Page 76: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

MORE TIPS

✓Put NGINX in front of sensu-dashboard

Monday, October 14, 13

Page 77: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

HA SENSU

✓Redundancy is easy (bring up more sensu-servers)

✓Making Redis and RabbitMQ HA more challenging

✓We’re still running one solitary Redis and RabbitMQ but are OK with this risk for now

Monday, October 14, 13

Page 78: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

WHERE TO GO FOR HELP

✓http://docs.sensuapp.org

✓IRC: #sensu - freenode

✓sensu-users mailing list

Monday, October 14, 13

Page 79: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

QUESTIONS

Monday, October 14, 13

Page 80: Sense and Sensu-bility: Painless Metrics And Monitoring In The Cloud with Sensu

THANK YOU

[email protected]@skymob - twitter

robotwitharose - #sensu on IRC (freenode)

Monday, October 14, 13