WTF is Sensu and Monitoring
-
Upload
toby-jackson -
Category
Internet
-
view
1.753 -
download
0
Transcript of WTF is Sensu and Monitoring
![Page 1: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/1.jpg)
.WTF/is/sensuA DevOps guide to monitoring
![Page 2: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/2.jpg)
.WTF/is/monitoringA DevOps guide to monitoring
![Page 3: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/3.jpg)
.WTF/whoisself: author: ‘Toby Jackson <[email protected]>’ role: ‘Operations Engineer’ twitter: ‘@warmfusion’ github: ‘github.com/warmfusion’ employer: ‘www.futureplc.com/yourfuturejob/’
![Page 4: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/4.jpg)
.WTF/is/monitoring?experience●Developer turned Engineer●Implemented Sensu at Future PLC
○340+ hosts, vms, switches etc●Helped shape our approach to monitoring
![Page 5: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/5.jpg)
.WTF/is/monitoring?_index
Why do we monitor our systems?What should we look for?How can Sensu help us?Questions…?
![Page 6: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/6.jpg)
.WTF/is/monitoring?why
Part One - Why do we monitor our systems
![Page 7: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/7.jpg)
.WTF/is/monitoring?why● Client - Are they down, or is it just me?● CEO - Are we making money?● Manager - Are we meeting SLA agreements?● Engineer - Am I woken up for right reasons?● Developer - Did my deploy work?● Everyone...
○ What’s happening in our environment?
![Page 8: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/8.jpg)
.WTF/is/monitoring?why_tomorrow● Client - Is maintenance going to happen soon?● CEO - Are we going to keep making money?● Manager - Can we meet new SLA agreements?● Engineer - Why might I get woken up tonight?● Developer - When do I need to optimise?● Everyone...
○ Whats going to happen in our environment?
![Page 9: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/9.jpg)
.WTF/is/monitoring?what
Part Two - What should we look for?
![Page 10: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/10.jpg)
.WTF/is/monitoring?disclaimer
Some approaches work better than othersdon’t be afraid to experiment.
![Page 11: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/11.jpg)
.WTF/is/monitoring?principles
Focus on your customersUse a couple of monitoring systemsDe-couple your checks from your codeRemember workflow eventsMany simple checks > Fewer clever checksDon’t wake me up if it can wait
![Page 12: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/12.jpg)
.WTF/is/monitoring?first_steps● Look for the big impact entry points● Review past incidents for danger zones● Don’t be afraid to admit that risky code exists
![Page 13: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/13.jpg)
.WTF/is/monitoring?common●Disk, Ram, Load, Network●Patches available●Uptime●Logged in users●Config Management status
![Page 14: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/14.jpg)
.WTF/is/monitoring?services●Create http status endpoints●JSON is great●200 OK / 503 Service Unavailable●Lightweight
●Downstream dependencies?●Service metrics?
![Page 15: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/15.jpg)
.WTF/is/monitoring?clusters●Aggregate checks●Members don’t matter●Deploys and maintenance is ok●Avoid bypassing balancers
![Page 16: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/16.jpg)
.WTF/is/monitoring?company●Programmatic goals can be monitored●See if revenue, purchases or direct
customer interactions can be watched●Watch for social media mentions
![Page 17: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/17.jpg)
.WTF/is/monitoring?practise_simple
● nginx & php running● Balancer: 200 OK● nginx: 200 OK● Cron: ignore for now
Web Load Balancer
Web01nginxphpcron
Web02nginxphp
![Page 18: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/18.jpg)
.WTF/is/monitoring?practise_adv● Balancer
>50% backends up● Nginx
< 200ms response● Cron
err log empty && <1hr old
Web Load Balancer
Web01nginxphpcron
Web02nginxphp
![Page 19: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/19.jpg)
.WTF/is/monitoring?practise_clever● Spike in traffic● Failure counts
above thresholds● Response sizes are
curiously large● Lots of (valid) API
Auth requests
Web Load Balancer
Web01nginxphpcron
Web02nginxphp
![Page 20: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/20.jpg)
Your users matter Know when they’re in pain
Develop a standardised app status pageConventional checks are used more frequently
Check lots of small thingsScales better and helps to isolate incidents quickly
.WTF/is/monitoring?what
![Page 21: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/21.jpg)
.WTF/is/sensu
Part Three - How can Sensu help us
![Page 22: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/22.jpg)
.WTF/is/sensu?introduction
“New generation” of monitoring solutionsOpen source with paid for Enterprise edition
Site: sensuapp.orgGitHub: github.com/sensuIRC: freenode - #sensu
![Page 23: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/23.jpg)
.WTF/is/sensu?what
Consistent way to describe a service check
Executes those checks as required
Reliably handles events (and metrics)
![Page 24: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/24.jpg)
.WTF/is/sensu?why●Tries to do one thing well; handle events
●Compatible with existing check scripts
●Large active open-source community
●Scales effectively
![Page 25: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/25.jpg)
.WTF/is/sensu?experience●Replaced nagios, crons etc●Raised visibility of monitoring●Devolved control to development●340 (ish) hosts, vms, switches, firewalls etc●Managed exclusively through Puppet●Developed custom plugins and extensions
![Page 26: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/26.jpg)
.WTF/is/sensu?architecture_simple
![Page 27: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/27.jpg)
.WTF/is/sensu?howThe Sensu Standalone Check Process:
a. Sensu-Client runs a script with 1 line output and an exit code
b. Sensu-Client converts event into JSON and puts on RabbitMQ
c. Sensu-Server reads event and sends to handlersd. Handlers process event, performing some action
![Page 28: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/28.jpg)
.WTF/is/sensu?architecture_simple
You are here
![Page 29: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/29.jpg)
.WTF/is/sensu?standalone_check● Describes
○ what check to run○ how to handle events
● Runs at a given interval (default 60s)
● sensu-client handles output and emits events over message brokers
● Can include custom configuration which is included in event sent to handlers
sensu::checks: 'sensu-server': command: 'check-procs.rb -p bin/sensu-server -c 1' handlers: ['high', 'pagerduty'] custom: runbook: 'https://wiki.ftr.com/x/4oqq' tip: 'Check /var/log/sensu-server.log' slack: channels: - '#craggyisland'
![Page 30: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/30.jpg)
.WTF/is/sensu?runbook
URI to page summary of Impacted servicesTroubleshootingCommon problemsHow to fixWho to talk toReferences to other information
![Page 31: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/31.jpg)
.WTF/is/sensu?tip
Tweet length one-linerGets included in Pagerduty and Slack noticesUseful at 4am on a Sunday morning
![Page 32: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/32.jpg)
.WTF/is/sensu?architecture_simple
You are here
![Page 33: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/33.jpg)
.WTF/is/sensu?architecture_simple
You are here
![Page 34: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/34.jpg)
.WTF/is/sensu?handler● Process events● Perform some (or no) action● Typically used to send alerts or
emails
sensu::handler: slack: type: 'pipe' command: 'slack.rb' config: webhook_token: 'SECRET/KEY' bot_name: 'sensu' channel: '#alerts' pagerduty: type: 'pipe' command: 'pagerduty.rb' severities: ['ok', 'critical'] config: api_key: SECRET_TOKEN_HERE
![Page 35: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/35.jpg)
.WTF/is/sensu?standalone_metrics● The same as checks but...● handlers: [‘metrics’]
○ A special handler for this kind of result
● type: metric○ Tells sensu to always send
the output to the handler
sensu::checks: cpu-pcnt-usage-metrics: command: 'cpu-pcnt-usage-metrics.rb' handlers: ['metrics'] type: metric
![Page 36: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/36.jpg)
.WTF/is/sensu?metric_exampleix-sensu01.cpu.user 70.92 1440425049ix-sensu01.cpu.nice 0.00 1440425049ix-sensu01.cpu.system 8.16 1440425049ix-sensu01.cpu.idle 19.90 1440425049ix-sensu01.cpu.iowait 0.00 1440425049ix-sensu01.cpu.irq 0.00 1440425049ix-sensu01.cpu.softirq 1.02 1440425049ix-sensu01.cpu.steal 0.00 1440425049ix-sensu01.cpu.guest 0.00 1440425049
Key Value Timestamp
![Page 37: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/37.jpg)
.WTF/is/sensu?dashboards● Uchiwa - github.com/sensu/uchiwa● Mosaic - github.com/warmfusion/mosaic● Sensu-Grid - github.com/alex-leonhardt/sensu-grid
![Page 38: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/38.jpg)
.WTF/is/sensu?issues●Uchiwa isn’t perfect●Sensu-API can crash sometimes●No maintained history (over 20 events)●Check dependencies are handled on clients●Redis for datastore
○Redundancy is a little harder (for me at least)
![Page 39: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/39.jpg)
.WTF/is/sensu?wins●Alerts into Slack channels●Handles network partitions really well●Easy to create new checks and handlers
![Page 40: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/40.jpg)
.WTF/is/monitoring?further_readingProgrammatic Alert Correlation - Elik Eizenberg
youtu.be/EXk19d09n54
Effective Incident Communication - Scott Kleinyoutu.be/ySSdqfZlC7Y
Search for Operability 2015 in YouTube
![Page 41: WTF is Sensu and Monitoring](https://reader035.fdocuments.us/reader035/viewer/2022062503/58ecf6fc1a28abf5758b4593/html5/thumbnails/41.jpg)
.WTF/whois?q=self: author: ‘Toby Jackson <[email protected]>’ role: ‘Operations Engineer’ twitter: ‘@warmfusion’ github: ‘github.com/warmfusion’ employer: ‘www.futureplc.com/yourfuturejob/’
Any Questions…?