Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in...

78
Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ContainerCon Toronto Aug 24, 2016 Ilan Rabinovitch Director, Technical Community Datadog

Transcript of Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in...

Page 1: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Monitoring In MotionChallenges in monitoring kubernetes, containers, and dynamic infrastructure.

ContainerCon TorontoAug 24, 2016

Ilan Rabinovitch Director, Technical Community Datadog

Page 2: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

$ finger ilan@datadog

[datadoghq.com]Name: Ilan RabinovitchRole: Director, Technical CommunityInterests: * Monitoring and Metrics * Large scale web operations * FL/OSS Community Events

Page 3: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

• SaaS based infrastructure and app monitoring • Open Source Agent • Time series data (metrics and events) • Processing nearly a trillion data points per day • Intelligent Alerting • We’re hiring! (www.datadoghq.com/careers/)

Datadog Overview

Page 4: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches, Queues and more...

Monitor Everything

Page 5: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 6: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

$ cat ~/.plan

1. Intro: The Importance of Monitoring

2. The Challenge: Monitoring Dynamic Infrastructure

3. Finding the Signal: How do we know what to monitor?

4. Implementation: Applying it to Containerized Workloads

Page 7: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Our Focus Area

Culture

Automation

Metrics

Sharing

Damon Edwards and John Willis DevOps Day LA

Page 8: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Culture

“organizations which design systems ... are constrained to produce designs

which are copies of the communication structures of these organizations”

- Melvin E. Conway

Page 9: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 10: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 11: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 12: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 13: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 14: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Follow @honest_update on Twitter

Page 15: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Collecting data is cheap; not having it when you need it can be expensive

Page 16: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Instrument all the things!

Page 17: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

SharingLooping Back on Culture

Describe the problem as your “enemy” not each other

Learn Together

Page 18: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Sharing

Using and Sharing the same metrics and measurements

across teams is key to avoiding misunderstandings.

Page 19: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Source: http://bit.ly/1SvvbuP

Page 20: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Source: http://bit.ly/1RQRsXW

Page 21: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Operational Complexity Increases with..

• Number of things to measure

• Velocity of change

Page 22: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

https://www.datadoghq.com/docker-adoption/

Page 23: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

How much we measure?1 instance

• 10 metrics from cloud providers1 operating system (e.g., Linux)

• 100 metrics50~ metrics per application

Page 24: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 25: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

How much we measure?1 instance

• 10 metrics from cloud providers1 operating system (e.g., Linux)

• 100 metrics50~ metrics per application N containers

• 150*N metrics

Page 26: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Operational Complexity

100instances

500containers

Page 27: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Operational Complexity: Scale

160metrics per host

800metrics per host

Assuming 5 containers per host

Page 28: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Operational Complexity: Scale

100instances

80,000metrics

Assuming 5 containers per host

Page 29: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

How much we measure?1 instance

• 10 metrics from cloud providers1 operating system (e.g., Linux)

• 100 metrics50~ metrics per application N containers

• 150*N metricsMetrics Overload!

Page 30: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Operational Complexity Increases with..

• Number of things to measure

• Velocity of change

Page 31: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Source: Datadog

Page 32: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Source: http://bit.ly/1qFylWK

Page 33: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 34: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Operational Complexity Increases with..

• Number of things to measure

• Velocity of change

Page 35: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 36: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 37: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Open Questions

• Where is my container running? • What is the capacity of my cluster? • What port is my app running on? • What’s the total throughput of my app? • What’s its response time per tag? (app, version, region) • What’s the distribution of 5xx error per container?

Page 38: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Source: http://bit.ly/1YxJ7Jy

Page 39: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

More Details at: http://www.datadoghq.com/blog/monitoring-101-alerting/

Page 40: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Monitoring 101

Page 41: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Finding Signal - Categorizing Your Metrics

Page 42: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 43: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 44: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 45: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 46: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Examples: NGINX - Metrics

Work Metrics:

• Requests Per Second • Request Time • Error Rates (4xx or 5xx) • Success (2xx)

Resource Metrics:

• Disk I/O • Memory • CPU • Queue Length

Page 47: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Examples: NGINX - Events

• Configuration Change • Code Deployment • Service Started / Stopped

Page 48: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Examples: Events

Page 49: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

When to let a sleeping engineer lie?

Page 50: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

When to alert?

Page 51: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Recurse until you find root cause

Page 52: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

What to demand from our monitoring tooling?

Page 53: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Cryptic Alerts

WHAT?

Page 54: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

EVERY ALERT MUST BE ACTIONABLE

Page 55: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Host Centric

Page 56: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Service Centric

Page 57: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Static configurations tracking dynamic infrastructure are not a recipe for success.

Static vs Dynamic

Page 58: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 59: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 60: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Query Based Monitoring“What’s the average throughput of application:nginx per version ?”

“Alert me when one of my pod from replication

controller:foo is not behaving like the others?”

“Show me rate of HTTP 500 responses from nginx”

“… across all data centers”

“… running my app version 2….”

Page 61: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Getting at the metrics…

Page 62: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Resource MetricsUtilization: • CPU (user + system) • memory • i/o • network traffic

Saturation • throttling • swap

Error • Network Errors

(receive vs transmit)

Page 63: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Container Events

• Starting / Stopping Containers • Scaling Events for Underlying Instances • Deploying a new container build

Page 64: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 65: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

How do we get at the upper layers?

Page 66: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Getting at the Metrics

CPU METRICS MEMORY METRICS I/O METRICS NETWORK METRICS

pseudo-files Yes Yes Some Yes, in 1.6.1+

stats command Basic Basic No Basic

API Yes Yes Some Yes

Page 67: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Pseudo-files

• Provide visibility into container metrics via the file system. • Generally under: /cgroup/<resource>/docker/$CONTAINER_ID/ or /sys/fs/cgroup/<resource>/docker/$CONTAINER_ID/

Page 68: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Pseudo-files: CPU Metrics$ cat /sys/fs/cgroup/cpuacct/docker/$CONTAINER_ID/cpuacct.stat > user 2451 # time spent running processes since boot > system 966 # time spent executing system calls since boot

$ cat /sys/fs/cgroup/cpu/docker/$CONTAINER_ID/cpu.stat > nr_periods 565 # Number of enforcement intervals that have elapsed > nr_throttled 559 # Number of times the group has been throttled > throttled_time 12119585961 # Total time that members of the group were throttled (12.12 seconds)

Pseudo-files: CPU Throttling

Page 69: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Docker API• Detailed streaming metrics as JSON HTTP socket

$ curl -v --unix-socket /var/run/docker.sock http://localhost/containers/28d7a95f468e/stats

Page 70: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

STATS Command

# Usage: docker stats CONTAINER [CONTAINER...] $ docker stats $CONTAINER_ID CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O BLOCK I/O ecb37227ac84 0.12% 71.53 MiB/490 MiB 14.60% 900.2 MB/275.5 MB 266.8 MB/872.7 MB

Page 71: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Side Car Containers

Page 72: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Aren’t we still missing a layer?

Page 73: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Open Questions

• What is the capacity of my cluster? • What’s the total throughput of my app? • What’s its response time per tag? (app, version, region) • What’s the distribution of 5xx error per container? • Where is my container running? what port?

Page 74: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Service Discovery

Docker API Orchestrator

Monitoring Agent Container

A O A O

Containers List & Metadata

Additional Metadata (Tags, etc)

Config Backend Integration Configurations

Host Level Metrics

Page 75: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations
Page 76: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Custom Metrics

• Instrument custom applications

• You know your key transactions best.

• Use async protocols like Etys’ STATSD or DogstatsD

Page 77: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

Source: http://bit.ly/1NoW6aj

Page 78: Monitoring In Motion - events.static.linuxfound.org · Monitoring In Motion Challenges in monitoring kubernetes, containers, and dynamic infrastructure. ... Static configurations

ResourcesMonitoring 101: Alerting https://www.datadoghq.com/blog/monitoring-101-alerting/

Monitoring 101: Collecting the Right Data https://www.datadoghq.com/blog/monitoring-101-collecting-data/

Monitoring 101: Investigating performance issues https://www.datadoghq.com/blog/monitoring-101-investigation/

The Power of Tagged Metrics https://www.datadoghq.com/blog/the-docker-monitoring-problem/

How to Collect Docker Metrics https://www.datadoghq.com/blog/how-to-collect-docker-metrics/

8 surprising facts about Docker Adoption https://www.datadoghq.com/docker-adoption/