Nagios Conference 2014 - David Josephsen - Alert on What You Draw

Post on 02-Jul-2015

110 views 4 download

description

David Josephsen's presentation on Alert on What You Draw. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference

Transcript of Nagios Conference 2014 - David Josephsen - Alert on What You Draw

hi.

hi.dave@librato.com@davejosephsen

github: djosephsen

Graphing Nagios

hi.

dave@librato.com@davejosephsen

github: djosephsen

Alert on What you Draw

hi.

Alert on What you Draw

hi.

hi.

hi.

hi.

hi.

hi.

hi.

hi.

T

tim.mycorp.com bob.mycorp.com

hi.Developer cat

Wants to change things

Change control

Says no.

hi.

You guys ok?

YUP YUP

hi.Process Oriented Model

Given a finite number of reliable systems, and full environmental control, run processes for as long

as possible

hi.

The Cloud

VirtualVirtual

Massive Infrastructure

Maintenance Coersion

hi.

The Cloud

Virtual

Massive Infrastructure

Maintenance Coersion

hi.

The Cloud

Virtual

Multi-Tenant

Massive Infrastructure

Maintenance Coersion

hi.

The Cloud

Virtual

Multi-Tenant

Massive Infrastructure

Compulsory Maintenance

hi.

The Cloud

Virtual

Multi-Tenant

Massive Infrastructure

Compulsory Maintenance

hi.

The Cloud

hi.

The Cloud

hi.

The Cloud

hi.

US-EAST

AZ-1 AZ-2

US-EAST

AZ-1 AZ-2

US-EAST

AZ-1

AZ-2

AZ-3

US-EAST

AZ-1

AZ-2

AZ-3

US-EAST

AZ-1

AZ-2

AZ-3

US-EAST

AZ-1

AZ-2

AZ-3

US-EAST

AZ-1

AZ-2

AZ-3

XXXX

hi.

hi.

hi.Services Oriented Model

Design reliable services atop an infinite number of unreliable, and uncontrollable systems.

hi.

hi.

hi.

hi.

hi.

hi.

hi.here’s a log line wrapped in an http GET request

hi.HTTP 200 OK!

hi.

}<100ms

}<10

0ms

tim.mycorp.com

It Scales

I can change how it works

It’s resilient

It ScalesIt’s resilient

I can change how it works

It ScalesIt’s Resilient

I can change how it works

I can change how it works

Latency, Queues, Workers

Summarized at the service level

I can change how it works

Latency, Queues, Workers

Monitored from within..

I can change how it works

Latency, Queues, Workers

Monitored from within..

…by Developers

Summarized at the service level

I can change how it works

Latency, Queues, Workers

Monitored from within..

…by Developers

Summarized at the service level

I can change how it worksSummarized at the service level

I can change how it works

Can be polled externallyState Data about hosts the order of minutes

Operations controls and configures hosts and services

Must be instrumented internally

Performance Data about services the order of seconds

Any engineer can create new ad-hoc metrics

I can change how it works

internally instrumented

metrics measured every few seconds

write-accessible by every engineer

I can change how it works

I can change how it works

I can change how it works

Undermines Credibility

I can change how it works

Undermines CredibilitySilo’s Knowledge

Multiplies Burden

I can change how it works

Undermines CredibilitySilo’s KnowledgeMultiplies Burden

Heka

Riemannhttp://riemann.io/

http://hekad.readthedocs.org/en/v0.7.2/

LIVE Demo Ahead!

Questions?