PDX DevOps Graphite replacement

77
Riemann + InfluxDB + Grafana An easier to deploy and (hopefully) better performing replacement for Graphite

description

A presentation on a replacement Graphite stack.

Transcript of PDX DevOps Graphite replacement

Page 1: PDX DevOps Graphite replacement

Riemann + InfluxDB + Grafana

An easier to deploy and (hopefully) better performing replacement for Graphite

Page 2: PDX DevOps Graphite replacement

Me

Sysadmin at NetXposure Inc.First post-college job

[email protected]

https://github.com/nickchappell

Page 3: PDX DevOps Graphite replacement

Graphite

Page 4: PDX DevOps Graphite replacement

Other tools in the graphing/metrics space

Old school toolsRRD for data storage

RRDtool for generating graphs

Cacti for managing dashboards of RRD

graphs

Page 5: PDX DevOps Graphite replacement

Graphite architecture3 main components

whisper

graphite-web

Carbon takes in metrics

Whisper stores metrics

graphite-web retrieves metrics

carbon-cache/carbon-relay

API

Other dashboards

Page 6: PDX DevOps Graphite replacement

GraphiteWhat’s wrong with it?

A few different things, mainly with 2 parts of the stack

carbon-cache/carbon-relay whisper

graphite-web

Page 7: PDX DevOps Graphite replacement

graphite-webGraphite web is actually quite good and featureful (is that a word?) as an API

The built-in dashboard and graph builder isn’t very stylish, but is great for exploring and finding out what metrics you want to query via the API or graph in another dashboard

tool

Page 8: PDX DevOps Graphite replacement

What’s wrong with Whisper?Disk IO problems when running queries for lots of metrics

!graphite-web ends up having to touch lots and lots of files

Page 9: PDX DevOps Graphite replacement

What’s wrong with carbon?The stock Python interpreter, CPython, specifically, the GIL

(global interpreter lock)

Multiple threads are not allowed to execute native code at the same time

https://wiki.python.org/moin/GlobalInterpreterLock

Why? According to the Python folks, memory management is not thread safe

Your time series data is getting munged on somewhere in here

Page 10: PDX DevOps Graphite replacement

Does Graphite have any redeeming qualities?

The format that Graphite receives metrics in is dead-simple

metric_name value timestamp\n

foo.bar.baz 42 74857843

skynet.cyberdyne.mil.cpu-system 423 74857843

Page 11: PDX DevOps Graphite replacement

Metrics formatsFor better or worse, Graphite’s format is the lingua franca of

the metrics world right now

Most everything that outputs metrics can output them in Graphite’s format

Page 12: PDX DevOps Graphite replacement

Scaling GraphiteSome strategies include…

Split up the roles for Carbon into Carbon relays and Carbon caches

…decoupling components

Get a faster CPU

…throwing more hardware at it

Get faster disks (SSDs or set up RAM drives)

Put HAproxy in front of a few Carbon instances to spread the load around

To this end, set up multiple Carbon instances on the same machine, listening on different ports

and tie them to separate CPU cores

Page 13: PDX DevOps Graphite replacement

What are other people doing?

Some people are trying to rewrite parts of the stack

Others have set up some pretty impressive (but complex) Graphite architectures (next few slides)...

Dieterbe rewrote carbon-cache and carbon-relay in Go: https://github.com/graphite-ng/

Page 14: PDX DevOps Graphite replacement

http://grey-boundary.com/the-architecture-of-clustering-graphite/

Page 15: PDX DevOps Graphite replacement

http://adminberlin.de/graphite-scale-out/

Page 16: PDX DevOps Graphite replacement

http://librelist.com/browser/graphite/2013/9/10/graphite-carbon-using-apache-cassandra-via-backend-database-

plugin/

Page 17: PDX DevOps Graphite replacement

https://www.hostedgraphite.com

Uses Riak as a storage backend

Page 18: PDX DevOps Graphite replacement

What are the Graphite people doing?

The Graphite folks have 2 projects in the works to overcome some of its problems

megacarbon: replacement for Carbon, supposedly will perform better

ceres: replacement for Whisper, supposedly will perform better and natively allow writes from multiple carbon/

megacarbon instances

Page 19: PDX DevOps Graphite replacement

Will these help?Probably not.

megacarbon is a branch of the main carbon repository that is ahead of carbon’s master branch, but hasn’t yet been merged

back in and may never be

ceres hasn’t been touched since December 2013

For that matter, whisper hasn’t been touched since January 2014

Page 20: PDX DevOps Graphite replacement

One more thing….Graphite is a PITA to install/deploy

It’s somewhat easier now that carbon, whisper and graphite-web are available in pip

Manual setup: • use apt/yum to install a bunch of pre-

requisite Python packages • set up Apache/Nginx with WSGI or Gunicorn

to run graphite-web, which is actually a Django app

• BYOIS/UF

Before, you had to clone each component’s Git repo and check out a tagged release

Page 21: PDX DevOps Graphite replacement

3 newer metrics/monitoring tools

Logstash

Heka

Riemann

Page 22: PDX DevOps Graphite replacement

Logstash

Accepts logs, processes and stores them in Elasticsearch

A web app, Kibana, accesses the index data in Elasticsearch

Sound familiar?

There are tons of other ways to use it, though!

Page 23: PDX DevOps Graphite replacement

Heka

Newest monitoring tool of the bunchWritten in Go

https://github.com/mozilla-services/heka

Page 24: PDX DevOps Graphite replacement

RiemannAn “event stream” processor

http://riemann.io

Page 25: PDX DevOps Graphite replacement

What is an event?“Event stream processor” sounds really abstract

Any data your systems or application could emit, like syslog messages, stack traces,

metrics, etc. are events

Riemann can have lots of overlap with Logstash when it comes to data that’s primarily

text

Riemann does for events what Logstash does for logs (aggregates and processes them,

sends them off elsewhere)

Page 26: PDX DevOps Graphite replacement

RiemannCan take in almost anything (log line, Graphite-format metric,

etc.)

Like Logstash/Heka, can process it and then send it elsewhere

Outputs include…

…IM like IRC

…email

…a more traditional monitoring system like Nagios/Icinga

Page 27: PDX DevOps Graphite replacement

Riemann and HipChat

https://github.com/aphyr/riemann/blob/master/src/riemann/hipchat.clj

Page 28: PDX DevOps Graphite replacement

Riemann and Slack

https://github.com/aphyr/riemann/blob/master/src/riemann/slack.clj

Page 29: PDX DevOps Graphite replacement

Riemann has packages!One advantage over carbon-cache/relay: Riemann

has Debian and RPM packages, including init scripts!

The only dependency Riemann has is the JDK itself

Page 30: PDX DevOps Graphite replacement

RiemannWritten in Clojure, a functional programming language

Riemann being written in Clojure is kinda neat, but what makes it special is what Clojure runs on: the JVM

Clojure also brings one benefit that Graphite desperately needs: safe threading!

Unlike CPython, the JVM can actually run more than 1 thread at a time and can use more than 1 CPU

core

Clojure has software transactional memory and other tools for parallel/concurrent programming

Page 31: PDX DevOps Graphite replacement
Page 32: PDX DevOps Graphite replacement

Riemann events

Like Logstash, events in Riemann are pieces of text with multiple fields

{ :host riemann1.local, :service cpu-0.cpu-wait, :metric 3.399911, :tags collectd, :time 1405715017, :ttl 30 }

Page 33: PDX DevOps Graphite replacement

The Riemann indexRiemann can store incoming events in memory in the index

Events can be given a TTL

(in seconds)…

...and removed with the periodically-expire function:

{ :host riemann1.local, :service cpu-0.cpu-wait, :metric 3.399911, :tags collectd, :time 1405715017, :ttl 60 }

;Scan the index for expired events every 30 seconds: (periodically-expire 30)

Events in the index are what we can use for alerting!(I’ll mention possible tool integrations later on…)

Page 34: PDX DevOps Graphite replacement

Riemann configsRiemann configs are Clojure programs

https://github.com/nickchappell/vagrantfiles/blob/master/metrics/riemann/files/riemann/configs/riemann.config

Page 35: PDX DevOps Graphite replacement

Riemann as part of a Graphite replacement

Riemann can actually take Graphite-format metrics as an input

http://riemann.io/api/riemann.transport.graphite.html

Can Riemann be used in the place of Carbon?

Yes. It can be instructed to listen on arbitrary TCP and UDP ports

Page 36: PDX DevOps Graphite replacement

graphite-server is just Netty under the hood

http://netty.io

Page 37: PDX DevOps Graphite replacement

Riemann graphite-server config

TCP and UDP

graphite-server functions

Page 38: PDX DevOps Graphite replacement

Other parts of the stackSo, we have Carbon replaced. What else do we need?

A replacement for Whisper

A replacement for the API component of graphite-web

A replacement for the web UI component of graphite-web

Page 39: PDX DevOps Graphite replacement

InfluxDB

A time series database written in Go

Page 40: PDX DevOps Graphite replacement

InfluxDB specifics

Some conceptual similarities to SQL DBs:

A database in InfluxDB is just like a database in a SQL system

A series in InfluxDB is like a table in SQL

A point or event in a series is like a row in a table

Points can have columns of values

Points in a series don’t all have to have the same columns, so InfluxDB is sort of schema-less

Page 41: PDX DevOps Graphite replacement

InfluxDB storage engines

RocksDB and HyperLevelDB are based on LevelDB

All 3 LevelDB variants compress data on disk as its written (LMDB doesn't)

All 3 LevelDB variants can shrink storage engine files when getting rid of old metrics (LMDB deletes don't

actually reclaim disk space)

Uses LevelDB as the underlying storage engine (can also be configured to use RocksDB, HyperLevelDB

or LMDB)

Page 42: PDX DevOps Graphite replacement

InfluxDB is going to move to RocksDB as the default in the next version

http://influxdb.com/blog/2014/06/20/leveldb_vs_rocksdb_vs_hyperleveldb_vs_lmdb_performance.html

Storage engine benchmarks:

InfluxDB storage engines

Page 43: PDX DevOps Graphite replacement

SQL-ish query langageHas a SQL-like query language

select value from response_times!where time > '2013-08-12 23:32:01.232' and time < '2013-08-13';

Selecting with date/time ranges:

select value from response_times where time > now() - 1h limit 1000;Relative time ranges and limits:

select * from events!where (email =~ /.*gmail.* or email =~ /.*yahoo.*/) and state = 'ny';

Regexes and compound where statements:

select hosta.value + hostb.value!from cpu_load as hosta!inner join cpu_load as hostb!where hosta.host = 'hosta.influxdb.orb' and hostb.host = 'hostb.influxdb.org';

Joins:

http://influxdb.com/docs/v0.8/api/query_language.html

Page 44: PDX DevOps Graphite replacement

InfluxDB advantages over Whisper

All of the underlying storage engines can perform better than Whisper

InfluxDB has Debian and RPM packages! (with init scripts too!)

Page 45: PDX DevOps Graphite replacement

Can InfluxDB replace Whisper?Yes!

One particular feature it has that Whisper does not is clustering

A group of InfluxDB nodes can communicate via a Raft-based protocol to coordinate writes and reads and split up data into

shards

Whisper is still single-instance only

InfluxDB replaces Whisper for storage and graphite-web for metric retrieval

Page 46: PDX DevOps Graphite replacement

InfluxDB disadvantages

API and built-in math functions are not as featureful as graphite-web, at least not yet

This can be solved with development work, and unlike the Graphite projects, InfluxDB is being actively developed!

Page 47: PDX DevOps Graphite replacement

Riemann + InfluxDBGuess what Riemann can write data out to?

Page 48: PDX DevOps Graphite replacement

InfluxDB data partitioningDo I store1 series per host? 1 series per metric? 1 series per

hostname + metric name combo?

InfluxDB works best with large numbers of series with fewer columns in each one

Why? Points are indexed by time, not by any other columns.

Arbitrary column indexes are going to be added in the future, though

Page 49: PDX DevOps Graphite replacement

InfluxDB data partitioning

Time Name Host Metric Service32141234 cpu web0

178 cpu

32141235 disk_io web02

98844 disk_io32141236 load db1 5 load32141237 eth0_in ldap0

35875 eth0_in

Time Name Host Metric Service32141234 cpu web01 78 cpu32141235 cpu web01 45 cpu32141236 cpu web01 38 cpu32141237 cpu web01 92 cpu

Time Name Host Metric Service32141234 disk_io web01 87323 disk_io32141235 disk_io web01 98844 disk_io32141236 disk_io web01 9233 disk_io32141237 disk_io web01 93262 disk_io

Bad: only using 1 series in a DB

Good: Using multiple series in a DB

Time Name Host Metric Service32141234 cpu web02 78 cpu32141235 cpu web02 45 cpu32141236 cpu web02 38 cpu32141237 cpu web02 92 cpu

Time Name Host Metric Service32141234 disk_io web02 87323 disk_io32141235 disk_io web02 98844 disk_io32141236 disk_io web02 9233 disk_io32141237 disk_io web02 93262 disk_io

Page 50: PDX DevOps Graphite replacement

InfluxDB data partitioningIsn't a series for every host+service combo excessive?

Because of the way InfluxDB's storage engines

work, no!

We can include the series name as the first part of our

query:

By doing that, InfluxDB can ignore all of the other data in the other series and will only have to access 1 LevelDB/RocksDB file on disk per Grafana query

(show Riemann config that creates a series per host+service)

Page 51: PDX DevOps Graphite replacement

InfluxDB data partitioning

By doing this, we’re not really querying by hostname or metric name, ie. querying

by those columns

It’s kind of a hack, but it takes advantage of the characteristics of how

InfluxDB’s storage engines work

Page 52: PDX DevOps Graphite replacement

InfluxDB data partitioningI did this originally to get

around a quirk of Grafana’s UI

For InfluxDB data sources, Grafana doesn’t let you use where or do selects by

more than 1 column

Without splitting data up into more than 1 series, there’s no way to get metric values for an individual host, metric

or host+metric combo

Page 53: PDX DevOps Graphite replacement

Riemann and InfluxDB data partitioningRiemann’s built-in InfluxDB output function looks like this:

Page 54: PDX DevOps Graphite replacement

Riemann and InfluxDB data partitioning :series #(str (:host %) "." (:service %))

…tells Riemann to take this:{ :host riemann1.local, :service cpu-0.cpu-wait, :metric 3.399911, :tags collectd, :time 1405715017, :ttl 30 }

…and write it to InfluxDB with riemann1.local.cpu-0.cpu-wait as the automatically generated series name

InfluxDB behaves like Graphite with new metrics: it will automatically create a new series if it’s for a hostname.metric

combo it doesn’t already have a series for

Page 55: PDX DevOps Graphite replacement

Synchronous vs asynchronous InfluxDB writes

The included InfluxDB writer opens up a new socket to InfluxDB for EVERY single data

point that gets written!

7 VMs with 8-9 collectd plugins enabled on each resulted in 2000 open sockets from

Riemann to InfluxDB!

It only stopped increasing when InfluxDB cleared out the older sockets after they sat

unused!

Page 56: PDX DevOps Graphite replacement

Synchronous vs asynchronous InfluxDB writes

Github issue for this: !

https://github.com/aphyr/riemann/issues/411

Comment with code sample for an asynchronous InfluxDB writer:

https://github.com/aphyr/riemann/issues/411#issue-36716498

Capacitor, the Clojure InfluxDB library Riemann uses, can send writes in batches

Page 57: PDX DevOps Graphite replacement

Asynchronous InfluxDB writes

Page 58: PDX DevOps Graphite replacement

(fn [event] (let [series (format "%s.%s" (:host event) (:service event))] (write-influxdb-async series { :host (:host event) :time (:time event) :value (:metric event) :name (:service event)})))

The let [series... statement is what creates series names as hostname.metricname with the async writer

Asynchronous InfluxDB writes

Page 59: PDX DevOps Graphite replacement

Much better performance!Asynchronous InfluxDB writes

3 sockets open instead of 2000+

Much less CPU usage by Riemann and InfluxDB:

Performance should improve when InfluxDB adds support for protobuf inputs

Page 60: PDX DevOps Graphite replacement

GrafanaNow, we just need a dashboard...

Page 61: PDX DevOps Graphite replacement

Grafana data sourcesBased on Kibana 3 (it's just HTML, JS and CSS)

Deploy it by unzipping a tarball to a place where your webserver can serve the contents

Edit config.js to add data sources:

Grafana can graph data from Graphite, InfluxDB and

OpenTSDB

Page 62: PDX DevOps Graphite replacement

Grafana and ElasticsearchGrafana can store dashboards in Elasticsearch

Usage is tiny, only as many JSON docs as dashboards that you save

Elasticsearch is not used to store metrics!

Edit config.js to add an Elasticsearch

server:

Page 63: PDX DevOps Graphite replacement

A new Graphite stack

Riemann

collectdcollectdcollectdcollectd

statsd statsd

InfluxDB

Grafana

Page 64: PDX DevOps Graphite replacement

Show me some graphs!

(do a demo)

Page 65: PDX DevOps Graphite replacement

Demo time!

(do an install of Riemann)

(do an install of InfluxDB)

(do an install of Grafana)

Page 66: PDX DevOps Graphite replacement

Disadvantages of this new stackGrafana doesn't take advantage of all of InfluxDB's features

InfluxDB doesn't have as many built-in functions as graphite-web

Riemann's documentation is sparse, and if you've never written Clojure, there's a learning curve for

writing configs to do more than basic stuff

There are a bunch of dashboard tools that can talk to graphite-web's API

Not as many can talk to InfluxDB

Page 67: PDX DevOps Graphite replacement

InfluxDB and GraphiteInfluxDB actually has a Graphite listener built in

So why use Riemann in the middle? Alerting on metrics!

Page 68: PDX DevOps Graphite replacement

Riemann and alertingHaving events in the index means we can keep track of

metrics over short periods of time

In the collectd metrics for load average across every machine, calculate an average over the last 5 mins and send an email

alert if it's over a certain threshold

In the collectd metrics for load average on an individual machine, calculate a derivative over the last 5 mins and send

a Nagios alert if the derivative is above a certain level

Page 69: PDX DevOps Graphite replacement

Native Riemann outputs

Some tools, like CollectD, can output data to Riemann in Riemann's native binary protobuf format

Graphite's format for metrics has become the most commonly used format

For things that only output Graphite format metrics, Riemann's Graphite server functionality is incredibly

useful

Page 70: PDX DevOps Graphite replacement

Pie in the sky: Scaling Riemann

Because of the way it works internally, Riemann doesn't have support for clustering the way InfluxDB does

https://github.com/jdmaturen/reimann/blob/master/riemann.config.guide#L234

You can set up multiple Riemann servers, and they can forward events to a central one, or one of several behind

HAproxy

Specifically, there's no way to share the in-memory index across nodes

Page 71: PDX DevOps Graphite replacement

Pie in the sky: Scaling Riemann

This sounds like how you would scale Graphite, but because we have InfluxDB, each Riemann instance can write to the same InfluxDB instance, or one of many nodes in a cluster

Because InfluxDB can take data in over a network connection, can cluster and is not plain-file-based like Whisper, multiple

Riemanns writing to 1 or more InfluxDBs in a cluster shouldn't be an issue

Some of the monitoring uses of Riemann (calculating moving averages or derivatives) will break because the in-memory

indexes can’t be shared, though

Page 72: PDX DevOps Graphite replacement

Pie in the sky: Scaling InfluxDBInfluxDB has out-of-the-box support for clustering

Uses Raft for a consensus protocol

Sharding is done by blocks of time (time periods are configurable

Metadata is shared via Raft that lets each node know which shards covering what time periods and series/DBs are on

each node

http://sssslide.com/speakerdeck.com/pauldix/the-internals-of-influxdb

Also has a WAL (write ahead log)

Databases and series can be broken up into shards

Page 73: PDX DevOps Graphite replacement

Tool integrationsRiemann and Logstash can output events to each other

Both can output events to Graphite

This can get really confusing, really quickly

One possible use: Logstash takes in web server logs, counts response codes, and outputs metrics in Graphite format to

Riemann

Page 74: PDX DevOps Graphite replacement

Pies in the skyWhat I want to experiment with and get working:

Email alerts from Riemann

Riemann telling Nagios/Icinga to send alerts based on thresholds for averages or derivatives of metric values

Send web logs to Logstash and make Logstash output metrics from them to Riemann (how many HTTP 404, 503, etc. responses is my web server sending out?)

Page 75: PDX DevOps Graphite replacement

Links

My super basic Riemann Puppet module: https://github.com/nickchappell/puppet-riemann/

Page 76: PDX DevOps Graphite replacement

Linkshttp://riemann.io/

http://influxdb.com/

http://grafana.org/

Page 77: PDX DevOps Graphite replacement

LinksMonitorama PDX 2014 Grafana workshop:

http://vimeo.com/95316672

Monitorama PDX 2014 InfluxDB talk: http://vimeo.com/95311877

Monitorama Boston 2013 Riemann talk: http://vimeo.com/67181466