So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...
Transcript of So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...
![Page 1: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/1.jpg)
So you want to switch off ?
Time to say goodbye to your Nagios based setup!
© 2014 - Olivier Jan - Check my Website@olivjan - [email protected]
![Page 2: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/2.jpg)
About me
❖ System admin and architect
❖ Co-founder of « Communauté Francophone de la Supervision Libre »
❖ Writer of the book « Nagios 3 au cœur de la supervision Open Source »
❖ Co-founder of Check my Website, a SaaS service for remote monitoring of websites and applications (current)
![Page 3: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/3.jpg)
Content
❖ Why switch off ? the good and maybe not so good reasons to do so !
❖ Which way to take ?
❖ Building a monitoring solution without Nagios :
❖ Tools available
❖ A personal work in progress
❖ Migrating from Nagios to this kind of solution
![Page 4: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/4.jpg)
Some reasons to switch off…❖ The godfather of OSS monitoring is dead as an
Open Source project ?
❖ Can’t do better with it
❖ Cool new kids out there
❖ Better « cloud » support
❖ Clear states, metrics and messages monitoring distinction
❖ Better charting solution
❖ Near realtime monitoring
❖ Routing, aggregation, correlation…
❖ YOUR reasons ;)
![Page 5: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/5.jpg)
Which way to take ?❖ The « 4 mousquetaires »
❖ Naemon
❖ Icinga 2
❖ Shinken
❖ Centreon
❖ Reboot from building blocks❖ Collect❖ Store❖ Visualize❖ Alert
![Page 6: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/6.jpg)
Tools : Collecting metrics and messages❖ Packetbeat (metrics & messages)
❖ Rsyslog, NX log, Syslog-ng (messages)
❖ sFlow Toolkit, Host sFlow
❖ Logstash-forwarder (messages)
❖ Collectd (metrics)
❖ Diamond (metrics)
❖ OSquery, WMI (metrics)
❖ Network level (sFlow)
❖ System Level
❖ Application Level
![Page 7: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/7.jpg)
Tools : External collecting
❖ End user perspective
❖ Controls done closest to the end-user
❖ Application behavior
❖ Real User Monitoring
❖ Webpagetest
❖ Selenium
❖ PhantomasJS
❖ Boomerang
❖ Bucky
![Page 8: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/8.jpg)
Tools : Routing metrics and messages
❖ Messages : Logstash, Flume, Fluentd
❖ Metrics : StatsD
❖ Metrics : Carbon Relay NG
One or more messages can fire an event
![Page 9: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/9.jpg)
Tools : Databases
❖ Graphite : The most used.
❖ OpenTSDB : HBase
❖ KairosDB : Cassandra
❖ InfluxDB : The most promising ?
❖ Elasticsearch : Index database
![Page 10: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/10.jpg)
Tools : Visualizing metrics and messages❖ Kibana❖ Grafana❖ Dashboards collection
![Page 11: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/11.jpg)
Tools : Alerting
❖ Seyren : Alerting dashboard for Graphite.
❖ Cabot : Get alerted when services go down or metrics go crazy
❖ Bosun : An advanced, open-source monitoring and alerting system
❖ Skyline : Real-time anomaly detection system
❖ Oculus : Anomaly correlation component of Etsy's Kale system
❖ Esper : Complex Event Processing
![Page 12: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/12.jpg)
The French Monitoring Community Xperience
❖ Reboot from building blocks❖ Collect❖ Store❖ Visualize❖ Alert
![Page 13: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/13.jpg)
The French Monitoring Community Xperience
Is it working ? What is not working ?
![Page 14: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/14.jpg)
Collecting metrics : Collectd
❖ InfluxDB Collectd proxy
❖ In Golang like InfluxDB
❖ Temporary solution
❖ Native Collectd plugin
LoadPlugin network
<Plugin network> # proxy address Server "127.0.0.1" "8096" </Plugin>
❖ PHP5-FPM metrics
❖ Nginx metrics
❖ MariaDB metrics
❖ System metrics
❖ <metricname>:<value>|<type>
![Page 15: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/15.jpg)
Collecting messages : Rsyslog❖ Nearly ready log consumption
❖ Native distribution package
❖ Nginx Log, MySQL slow query log
template(name=« ls_json" type=« list" option.json="on") { constant(value=« {") constant(value="\"@timestamp\":\"") property(name="timereported" dateFormat=« rfc3339") constant(value=« \",\"@version\":\"1") constant(value="\",\"message\":\"") property(name=« msg") constant(value="\",\"host\":\"") property(name=« hostname") constant(value="\",\"severity\":\"") property(name=« syslogseverity-text") constant(value="\",\"facility\":\"") property(name=« syslogfacility-text") constant(value="\",\"programname\":\"") property(name=« programname") constant(value="\",\"procid\":\"") property(name=« procid") constant(value=« \"}\n") }
![Page 16: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/16.jpg)
Collecting @ network level : Packetbeat
❖ Specific agent
❖ Collect traffic for
❖ HTTP
❖ MySQL
❖ PostgreSQL
❖ Redis
![Page 17: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/17.jpg)
Routing messages : Logstash❖ Inputs
❖ Codecs/filters
❖ Outputsinput { udp { port => 10514 codec => "json" type => "syslog" } }
filter { # This replaces the host field with the host that generated the message (sysloghost) if [sysloghost] { mutate { replace => [ "host", "%{sysloghost}" ] remove_field => "sysloghost" } } }
output { elasticsearch { host => localhost } }
![Page 18: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/18.jpg)
Routing metrics : StatsD❖ Is now a protocol implemented
in all languages
❖ InfluxDB plugin
❖ Collectd can behave as a statsD daemon (plugin)
❖ Very easy to push metrics
echo "foo:1|c" | nc -u -w0 127.0.0.1 8125
![Page 19: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/19.jpg)
Storing metrics : InfluxDB
❖ Make it behave like Graphite
❖ graphite-api
❖ carbon-relay-ng
❖ graphite-influxdb
❖ Cluster, cluster, cluster
❖ Design for events and metrics
![Page 20: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/20.jpg)
Storing messages : Elasticsearch
❖ Index database
❖ Cluster, cluster, cluster
❖ Full text search
![Page 21: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/21.jpg)
Visualizing @ network level : Packetbeat
❖ Kibana 3 modified version
❖ Dashboards ready out of the box
![Page 22: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/22.jpg)
Visualizing metrics : Grafana❖ Compatible
❖ Graphite
❖ InfluxDB
❖ OpenTSDB
❖ Built on Kibana 3
![Page 23: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/23.jpg)
Visualizing messages : Kibana 4
❖ Easy install
❖ Interactive dashboards
❖ Multiple indices
![Page 24: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/24.jpg)
What's missing ? Wishes
❖ Alerting
❖ External monitoring
❖ Repository for dashboards…
❖ Giving sense to metrics and messages
![Page 25: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/25.jpg)
Alerting reboot❖ Alert only on end user problems from an end
user perspective❖ IRC, Chat channel…❖ Alert thresholds based on history vs static
thresholds❖ Statistics functions❖ Boolean conditions
❖ Dynamic thresholds
❖ Anomaly detection
❖ Standard deviation
![Page 26: So you want to switch off ? © 2014 - Olivier Jan - Check my Website ...](https://reader035.fdocuments.us/reader035/viewer/2022070605/586e19a61a28ab29208b9d91/html5/thumbnails/26.jpg)
Coming from Nagios
❖ Graphios will inject perfdatas in Graphite or InfluxDB
❖ Check_graphite can query Graphite API from Nagios for alert based on history
❖ Logstash will send events to NSCA
❖ Nagios log in Kibana with Grok %{NAGIOSLINE}
❖ Keep Nagios for states ?