Graphite, an introduction
-
Upload
jamesrwu -
Category
Technology
-
view
1.050 -
download
0
description
Transcript of Graphite, an introduction
Graphite:An Introduction
Scaling real-time monitoring
The purpose today
What is graphite
Why it’s so great
How to graph(It’s really easy!)
How we use graphite
First, a definition
Alerts+Metrics=Monitoring
Graphite Cacti Munin
NagiosIcinga
BothZenoss Hyperic ZabbixPNP4Nagios
Alerting Metrics
What is graphite
About graphite
● Django web application consisting of 3 parts:○ carbon (relays, caches, aggregates metrics)○ whisper (graphite’s equivalent of RRD files)○ Web UI (graph composer, simple dashboard)
Why graphite?
Why graphing?
Discover trends and patternsWhat time of the day do we get the most users?When x happened, what was the effect on y?How many hits am I getting per hour? How does this compare to last week? last month?
Predict future eventsWhen will we need to add more servers? Databases?
Negative feedbackDid the release into production fix problem x?
Cacti SUCKS
A few reasons:
Ancient user interface (no javascript/ajax), terrible workflow, cannot push metrics, no
formulas, no graph introspection, cannot push metrics, cannot feed out of sequence
metrics, ugly graphs, no API, expose system/os metrics on host via snmp, no graph
composer, no custom graphs, predefine metrics, predefine graphs, static polling interval,
unscalable, tons of work to create one graph, no 3rd party ecosystem, etc.
Graphite ++
Simple
Powerful
Functions(sum, derivatives, integrals, timeshift, mostDeviant, scale,
averages, etc.)
API(Nagios integration, 3rd party custom dashboards)
Scalable
Easy to feed data
Wide ecosystem of 3rd party tools and dashboards
http://graphite.readthedocs.org/en/latest/tools.html
Tools
StatsD
Logster
Skyline
Collectd
Dashboards
Graphite --
No poller
No all in one solution
No easy backups
It probably will become business critical
How to graph
There are tons of ways to feed graphite your data
Bash
#!/bin/bash
timestamp = `date +%s`
value = 10
echo "dot.delimited.metric.name $value $timestamp" | nc -w 1 graphite.
host.name 2003
Python
def send_msg(message, HOST, PORT):
sock = socket.create_connection((HOST, PORT))
sock.send(message)
sock.close()
Python using graphite-pymetrics
from metrics import timing
@timing("heavy.task")
def heavy_task(x, y, z):
# do heavy stuff here
Ruby
require 'socket'
Host = 'somegraphitehost'
conn = TCPSocket.new Host, 2003
conn.puts 'Metrics value timestamp'
conn.close
Java
import java.io.DataOutputStream;
import java.net.Socket;
Socket conn = new Socket("somegraphitehost" , 2003);
DataOutputStream dos = new DataOutputStream(conn .getOutputStream());
dos.writeBytes("metrics value timestamp" );
conn.close();
How we use graphite
700K + metrics per minute
A Common Graphite Stack
Graphite-web
Collectd
Poller(s)
Applications
Carbon Whisper
Dashboards
Statsd
Scripts
Nagios
Collectd
Agent for system/hardware level metricsGrowing repository of plugins for a wide variety of applications:
disk i/o, disk space, cpu, memory, mysql, JMX, java, Redis, file sizes, load, etc.https://collectd.org/wiki/index.php/Table_of_Plugins
Write your custom plugin in python
Nagios integration
You can write Nagios plugins that can alert off of metrics valuesNagios can also feed graphite
performance data, events (ie: update counter each time email is sent), etc.
What to collect?
Hardware/OS metrics
Load
Disk space
Disk I/O
Network data
Application metrics
How often function x is called
Average value of function x
Average running time of function x
Database/Datastore
performance metrics
number of records with value == ?
number of slow queries
Events
Deployments
send a 1, draw as infinite
Log files
http access logs (2xx, 3xx, 4xx, 5xx)
Application logsException counts, results, important events, hits
Final Musings
Treat graphite like ‘Big Data’
You don’t know what metrics you need until you need it
Get Raid 10 SSD’s once you decide to scale
More devopsy
You can start graphing today!