How Yelp Uses Sensu to Monitor Services in a SOA World
-
Upload
kyle-anderson -
Category
Technology
-
view
673 -
download
0
Transcript of How Yelp Uses Sensu to Monitor Services in a SOA World
Outline
● Let’s visit the dark ages● How Sensu Works● Special (open source) Yelp + Sensu Sauce● Mini-Demo● How PaaSTA Uses Sensu● Second Demo
The Dark Ages
● One Word: Nagios● Monitoring for Services: “Also Nagios”● Probably alerts go to OPS anyway● Probably just making sure the LB is up● Very little developer visibility● Hard to articulate to nagios what you want
An Aside: Map Versus Territory
● Territory: The actual things in production running right now
● Map: What your monitoring system *thinks* is running right now
Who/What keeps these in sync?????
How Sensu Works
Client Server
Check Results
Any Events for me to
handle?
Some Host
RabbitMQ
Clients execute checks
Servers don’t know what checks exist beforehand, they just operate on events
How Sensu Works - In Words
● Clients can Schedule and Execute checks, but just put the results on the queue
● Servers handle results off the queue, route them to things like email, pagerduty, JIRA, etc.
● Also API, CLI, check history, silencing, dashboard, etc.
Special (Open Source) Yelp-Sensu Sauce
● https://github.com/Yelp/sensu_handlers● “Smart” handlers that respond to Sensu
events based on the event data● Team is the “primary key” when
determining what to do
Declare Your Teamssensu_handlers::teams:
dev:
pagerduty_api_key: 1234
pages_irc_channel: 'dev1-pages'
notifications_irc_channel: 'devs'
ops:
pagerduty_api_key: 78923
pages_irc_channel: 'ops-pages'
notifications_irc_channel: 'operations-notifications'
notification_email: 'operations@localhost'
project: OPS
hardware:
# Uses the ops Pagerduty service for page-worthy events,
# but otherwise just jira tickets
pagerduty_api_key: 78923
project: METAL
Mini - Demo
What does it look like when you can dynamically define checks on Sensu clients in a team-centric way?
{
"name": "test_alert_for_kwa",
"team": "kwa",
"irc_channels": [],
"notification_email": "[email protected]",
"ticket": false,
"project": false,
"page": false,
"output": "Test output from send-test-sensu-alert",
"status": 2,
"command": "send-test-sensu-alert",
}
What just happened?
How PaaSTA Uses Sensu
● Take advantage of Sensu’s ability to receive arbitrary events
● We already know which team owns each service (started documenting that with the soa-configs)
● We already know where services are deployed and what latency zones they are in
Sensu + PaaSTA Demo
What if your monitoring system knew all about your services and how they are supposed to be deployed?
What just happened?
● We “went behind PaaSTA’s back” to simulate a failure of an AZ
● We got a replication alert because of of the latency zones didn’t meet our expected replication count. (0 out of 3)
● We decided to “remediate” it by expanding our latency zone to “region”
● Paasta “Made it so”, and our alert resolved and the status command reflected the fact that we are expecting 6 in that one region
How Did Sensu “Know”?● Sensu doesn’t “Know” anything except for
the “Teams” metadata hash● PaaSTA checks Haproxy in each latency
zone because it can read the same SOA configs that SmartStack does!
● PaaSTA “Knows” which team owns each service because we told it in SOA configs!
● Sensu just processes the event like normal
Conclusion● Use a monitoring system that can receive
and process arbitrary events for easy integration (Sensu)
● Keep service metadata in an easy-to-access place for pieces to integrate easily (SOA configs)
● Monitor the exact thing you care about (replication in each latency zone)
Reading Comprehension Question:(What was the purpose of this talk?)A. To Describe how cool Sensu isB. To Make viewers feel inadequate of their own Nagios installationC. To tease viewers about Sensu glue that is not open source yetD. To Inspire viewers to build their own dynamic Monitoring based on some of these ideas!E. Other?
Reading Comprehension Question:(What was the purpose of this talk?)A. To Describe how cool Sensu isB. To Make viewers feel inadequate of their own Nagios installationC. To tease viewers about Sensu glue that is not open source yetD. To Inspire viewers to build their own dynamic Monitoring based on some of these ideas!E. Other?
Tools Used:● Sensu:
https://sensuapp.org/● Yelp’s Sensu Handlers: https://github.
com/Yelp/sensu_handlers● Mesos:
http://mesos.apache.org/● Marathon:
https://mesosphere.github.io/marathon/● Smartstack: http://nerds.airbnb.com/smartstack-service-
discovery-cloud/