Mo' Metrics, Mo' Problems

47
MO’ METRICS, MO’ PROBLEMS Erin Willingham Infrastructure Engineer at Krux Digital Twitter: GreenSilex https://www.linkedin.com/in/erin-willingham-104082126

Transcript of Mo' Metrics, Mo' Problems

Page 1: Mo' Metrics, Mo' Problems

MO’ METRICS, MO’ PROBLEMS

Erin WillinghamInfrastructure Engineer at Krux Digital

Twitter : GreenSilexhttps://www.linkedin.com/in/erin-willingham-104082126

Page 2: Mo' Metrics, Mo' Problems

Krux

http://www.krux.com

Page 3: Mo' Metrics, Mo' Problems

GRAPHITE: THEN & NOWWhat works, what doesn't and why we did what we did

http://www.lowcountryafricana.com/wp-content/uploads/2015/10/Research-Plan-Chalkboard-Slate-1000px.jpg

Page 4: Mo' Metrics, Mo' Problems

GRAPHShttp://i.stack.imgur.com/WBsLg.png

Page 5: Mo' Metrics, Mo' Problems

<metric path> <metric value> <metric timestamp>

test.bash.stats.count_ps 50 1473048113

test/bash/stats/count_ps.wsp

Page 6: Mo' Metrics, Mo' Problems

statsd & collectd

relay

aggregator

graphite whisper

Page 7: Mo' Metrics, Mo' Problems

GRAPHITE 1.0 ARCHITECTURE

Page 8: Mo' Metrics, Mo' Problems

RULES, MERGING, EFFICIENCY & OPERATIONS

https://s-media-cache-ak0.pinimg.com/236x/21/ba/0f/21ba0fe48349a1d5382c261ac25cb6c6.jpg

Graphite

v1

Page 9: Mo' Metrics, Mo' Problems

Relays are aware of aggregation rules

Page 10: Mo' Metrics, Mo' Problems

Graphite Whisper merges metrics!

Page 11: Mo' Metrics, Mo' Problems

Graphite Aggregators are really efficient.

Page 12: Mo' Metrics, Mo' Problems

THREADING, SCALING, RELAY CPU, & STORAGE

http://i.dailymail.co.uk/i/pix/2012/06/30/article-2166781-13BCE32D000005DC-492_634x948.jpg

Graphite

v1

Page 13: Mo' Metrics, Mo' Problems

Python - single threaded

Page 14: Mo' Metrics, Mo' Problems

Relay is CPU intensive

Page 15: Mo' Metrics, Mo' Problems

Graphite Whisper - requires sharding and is very I/O intensive

http://obfuscurity.com/

Page 16: Mo' Metrics, Mo' Problems

Slow UI when using distributed remote backends

Page 17: Mo' Metrics, Mo' Problems

What are we trying to solve? What is forcing the change?

http://oakdome.com/k5/lesson-plans/photo-editing/wanted-poster/wanted-reward-poster-background.jpg

Storage!

Page 18: Mo' Metrics, Mo' Problems

Relay & Aggregator CPU usage high

Page 19: Mo' Metrics, Mo' Problems

Faster UI

Page 20: Mo' Metrics, Mo' Problems

KEEP COSTS LOW

http://3.bp.blogspot.com/-r9l7rltAjnM/Udq8kGlp65I/AAAAAAAAANo/VyQZN48nfMk/s1600/treasurepile.jpg

Page 21: Mo' Metrics, Mo' Problems

GRAPHITE ALTERNATIVES

http://3.bp.blogspot.com/-r9l7rltAjnM/Udq8kGlp65I/AAAAAAAAANo/VyQZN48nfMk/s1600/treasurepile.jpg

Circonus: All the insights you ever wantedHosted Graphite

Zabbix: OSS self hosted monitoring

Page 22: Mo' Metrics, Mo' Problems

CARBON-C-RELAY, KAFKA, SOCAT, CARBON-RELAY-NG, KAFKACAT

https://wtfbabe.files.wordpress.com/2015/06/kung-fury-23-wtf-watch-the-film-saint-pauly.jpeg

The Tools

Page 23: Mo' Metrics, Mo' Problems

Carbon-c-relay

https://github.com/grobian/carbon-c-relay

GRAPHITE 2.0TOOLS

Page 24: Mo' Metrics, Mo' Problems

Carbon-relay-ng

https://github.com/graphite-ng/carbon-relay-ng

GRAPHITE 2.0TOOLS

Page 25: Mo' Metrics, Mo' Problems

Kafka Producertcp-stream-kafka-producer

https://github.com/krux/tcp-stream-kafka-producer

GRAPHITE 2.0TOOLS

Page 26: Mo' Metrics, Mo' Problems

kafkacat

https://github.com/edenhill/kafkacat

GRAPHITE 2.0TOOLS

Page 27: Mo' Metrics, Mo' Problems

GRAPHITE 2.0TOOLS

socat

“exec:/usr/bin/kafkacat

-C

-o end

-b <kafka broker>

-t <kafka topic>”

,pty,ctty,echo=0,

tcp4-connect:localhost:<relay port>

Page 28: Mo' Metrics, Mo' Problems

BACKEND - STORAGE

http://www.xzbackup.com/content/wp-content/uploads/2016/01/datacenter_triinti.jpg

Page 29: Mo' Metrics, Mo' Problems

• Whisper

• Ceres

• InfluxDB

• Cyanite

• Riak

• KairosDB

• OpenTSDB

Page 30: Mo' Metrics, Mo' Problems

Graphite - Whisper

Page 31: Mo' Metrics, Mo' Problems

InfluxDB

Page 32: Mo' Metrics, Mo' Problems

KairosDB

Page 33: Mo' Metrics, Mo' Problems

GRAPHITE 2.0 ARCHITECTURE

Page 34: Mo' Metrics, Mo' Problems

GRAPHITE ARCHITECTURE - SCALABLE

http://www.dinopit.com/wp-content/uploads/2012/07/dinosaur-cowboy.jpg

Why?

Page 35: Mo' Metrics, Mo' Problems

LOAD TESTING THE PARTS AND THE PIPELINE

https://github.com/feangulo/graphite-stresser

All the Metrics!

Metrics / min

Page 36: Mo' Metrics, Mo' Problems

WHAT WORKED?

http://www.xzbackup.com/content/wp-content/uploads/2016/01/datacenter_triinti.jpg

Pre-aggregatedPost Aggregated

Page 37: Mo' Metrics, Mo' Problems

http://www.xzbackup.com/content/wp-content/uploads/2016/01/datacenter_triinti.jpg

Page 38: Mo' Metrics, Mo' Problems

MIRROR PRODUCTION DATA

https://c2.staticflickr.com/6/5278/5903002116_762783602c_b.jpg

Page 39: Mo' Metrics, Mo' Problems

UH OH!THE GRAPHS DON’T MATCH

http://www.xzbackup.com/content/wp-content/uploads/2016/01/datacenter_triinti.jpg

Old Cluster

New Cluster

Page 40: Mo' Metrics, Mo' Problems

HOW DO WE FIX THIS?

http://www.startres.net/startresWP/wp-content/uploads/2013/06/3702A.jpg

Page 41: Mo' Metrics, Mo' Problems

TESTING CARBON-RELAY-NG

http://www.xzbackup.com/content/wp-content/uploads/2016/01/datacenter_triinti.jpg

Page 42: Mo' Metrics, Mo' Problems

Carbon-relay-ng uses more than 2 CPUs!

Page 43: Mo' Metrics, Mo' Problems

FAILURE POINT FOR CARBON-RELAY-NG

http://www.xzbackup.com/content/wp-content/uploads/2016/01/datacenter_triinti.jpg

Post Aggregated

Pre-aggregated

Page 44: Mo' Metrics, Mo' Problems

Carbon-relay-ng: room for improvement

Page 45: Mo' Metrics, Mo' Problems

• scale out aggregators horizontally• monitor for metrics per second and scale out as

needed• pass metrics that don’t need to be aggregated

directly to the backend

https://github.com/edenhill/kafkacat

SOLUTION

Page 46: Mo' Metrics, Mo' Problems

http://www.xzbackup.com/content/wp-content/uploads/2016/01/datacenter_triinti.jpg

Page 47: Mo' Metrics, Mo' Problems

QUESTIONS?