Download - Car Alarms & Smoke Alarms [Monitorama]

Transcript
Page 1: Car Alarms & Smoke Alarms [Monitorama]

Car Alarms & Smoke Alarms& Monitoring

Page 2: Car Alarms & Smoke Alarms [Monitorama]

Who’s this punk?

• Dan Slimmon

• @danslimmon on the Twitters

• Senior Platform Engineer at Exosite

• Previously Operations Team Manager at Blue State Digital

Page 3: Car Alarms & Smoke Alarms [Monitorama]
Page 4: Car Alarms & Smoke Alarms [Monitorama]

Learn to do some stats and visualization.

You’ll be right much more often, & people will THINK you’re right even

more often than that!

Page 5: Car Alarms & Smoke Alarms [Monitorama]

Signal-To-Noise Ratio

Page 6: Car Alarms & Smoke Alarms [Monitorama]

A word problem

You’ve invented an automated test for plagiarism.

Page 7: Car Alarms & Smoke Alarms [Monitorama]
Page 8: Car Alarms & Smoke Alarms [Monitorama]

• Plagiarism: 90% chance of positive

• No Plagiarism: 20% chance of positive

• Jerkwad kids plagiarize 30% of the time

A word problem

Page 9: Car Alarms & Smoke Alarms [Monitorama]

Question 1

Given a random paper, what’s the probability that you’ll get a negative result?

• Plagiarism: 90% chance of positive

• No Plagiarism: 20% chance of positive

• 30% chance of plagiarism

Page 10: Car Alarms & Smoke Alarms [Monitorama]

Question 2

If there’s plagiarism, what’s the probability PLAJR will detect it?

• Plagiarism: 90% chance of positive

• No plagiarism: 20% chance of positive

• 30% chance of plagiarism

Page 11: Car Alarms & Smoke Alarms [Monitorama]

Question 2

If there’s plagiarism, what’s the probability you’ll detect it?

• Plagiarism: 90% chance of positive

• No plagiarism: 20% chance of positive

• 30% chance of plagiarism

Page 12: Car Alarms & Smoke Alarms [Monitorama]

Question 3

If you get a positive result, what’s the probability that the paper is plagiarized?

• Plagiarism: 90% chance of positive

• No plagiarism: 20% chance of positive

• 30% chance of plagiarism

Page 13: Car Alarms & Smoke Alarms [Monitorama]

No Plagiarism Plagiarism

Page 14: Car Alarms & Smoke Alarms [Monitorama]

No Plagiarism

Negative

Positive

Page 15: Car Alarms & Smoke Alarms [Monitorama]

No Plagiarism

Negative

Positive

Plagiarism

Negative

Positive

Page 16: Car Alarms & Smoke Alarms [Monitorama]

Question 1

Given a random paper, what’s the probability that you’ll get a negative result?

Page 17: Car Alarms & Smoke Alarms [Monitorama]

No Plagiarism

Negative

Positive

Plagiarism

Negative

Positive

Page 18: Car Alarms & Smoke Alarms [Monitorama]

Question 2

If the paper is plagiarized, what’s the probability that you’ll get a positive result?

Page 19: Car Alarms & Smoke Alarms [Monitorama]

No Plagiarism

Negative

Positive

Plagiarism

Negative

Positive

Page 20: Car Alarms & Smoke Alarms [Monitorama]

Question 3

If you get a positive result, what’s the probability that the paper was plagiarized?

Page 21: Car Alarms & Smoke Alarms [Monitorama]

No Plagiarism

Negative

Positive

Plagiarism

Negative

Positive

Page 22: Car Alarms & Smoke Alarms [Monitorama]

Question 3

If you get a positive result, what’s the probability that the paper was plagiarized?

Dark Green

------------------------------------------

(Dark Blue) + (Dark Green)

Page 23: Car Alarms & Smoke Alarms [Monitorama]

Question 3

If you get a positive result, what’s the probability that the paper was plagiarized?

27

------------------------------------------

14 + 27

Page 24: Car Alarms & Smoke Alarms [Monitorama]

Question 3

If you get a positive result, what’s the probability that the paper was plagiarized?

65.8%

Page 25: Car Alarms & Smoke Alarms [Monitorama]

Sensitivity & Specificity

Sensitivity:

% of actual positives that are identified as such

Specificity:

% of actual negatives that are identified as such

Page 26: Car Alarms & Smoke Alarms [Monitorama]

Sensitivity & Specificity

Sensitivity:

High sensitivity

Test is very sensitive to problems

Specificity:

High specificity

Test works for a specific type of problem

Page 27: Car Alarms & Smoke Alarms [Monitorama]

Specificity:

Probability that, if a paper isn’t plagiarized, you’ll get a negative.

Sensitivity & Specificity

Sensitivity:

Probability that, if a paper is plagiarized, you’ll get a positive.

90% 80%

Page 28: Car Alarms & Smoke Alarms [Monitorama]

Specificity

Sensitivity

Prevalence

Page 29: Car Alarms & Smoke Alarms [Monitorama]

http://i.imgur.com/LkxcxLt.png

Page 30: Car Alarms & Smoke Alarms [Monitorama]

Positive Predictive Value

The probability that

If you get a positive result,

Then it’s a true positive.

Page 31: Car Alarms & Smoke Alarms [Monitorama]

When you get paged at 3 AM, Positive Predictive

Value is the probability that something is actually

wrong.

Page 32: Car Alarms & Smoke Alarms [Monitorama]

Imagine if you will...

• Service has 99.9% uptime

• Probe has 99% sensitivity

• Probe has 99% specificity

Page 33: Car Alarms & Smoke Alarms [Monitorama]

Pretty decent, right?

Page 34: Car Alarms & Smoke Alarms [Monitorama]

Let’s calculate the PPV.

Page 35: Car Alarms & Smoke Alarms [Monitorama]

TrueNegative

False Negative

False Positive

TruePositive

PositiveResult

NegativeResult

ConditionPresent

ConditionAbsent

Page 36: Car Alarms & Smoke Alarms [Monitorama]

The true-positive probability

P(TP) = (prob. of service failure) * (sensitivity)

P(TP) = 0.1% * 99%

P(TP) = 0.099%

Let’s calculate the probability that any given probe run will produce a true positive.

Page 37: Car Alarms & Smoke Alarms [Monitorama]

The true-positive probability

P(TP) = 0.099%

So roughly 1 in every 1000 checks will be a true positive.

Page 38: Car Alarms & Smoke Alarms [Monitorama]

The false-positive probability

P(FP) = (prob. working) * (100% - specificity)

P(FP) = 99.9% * 1%

P(FP) = 0.99%

So roughly 1 in every 100 checks will be a false positive.

Page 39: Car Alarms & Smoke Alarms [Monitorama]
Page 40: Car Alarms & Smoke Alarms [Monitorama]

Positive predictive value

PPV = P(TP) / [P(TP) + P(FP)]

PPV = 0.099% / (0.099% + 0.99%)

PPV = 9.1%

If you get a positive, there’s only a 1 in 10 chance that something’s actually wrong.

Page 41: Car Alarms & Smoke Alarms [Monitorama]

Why is this terrible?

Page 42: Car Alarms & Smoke Alarms [Monitorama]

Car Alarms

http://inserbia.info/news/wp-content/uploads/2013/06/carthief.jpg

Page 43: Car Alarms & Smoke Alarms [Monitorama]

Smoke Alarms

http://www.props.eric-hart.com/wp-content/uploads/2011/03/nysf_firedrill_2011.jpg

Page 44: Car Alarms & Smoke Alarms [Monitorama]

You want smoke alarms, not car alarms.

Page 45: Car Alarms & Smoke Alarms [Monitorama]

Practical Advice

Page 46: Car Alarms & Smoke Alarms [Monitorama]

(Semi-)Practical Advice

Page 47: Car Alarms & Smoke Alarms [Monitorama]

Why do we have such noisy checks?

Page 48: Car Alarms & Smoke Alarms [Monitorama]

“Office Space”, 1999.

Page 49: Car Alarms & Smoke Alarms [Monitorama]

Monty Python’s Flying Circus, 1975.

Page 50: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

Undetected outages are embarrassing, so we tend to focus on sensitivity.

That’s good.

But be careful with thresholds.

Page 51: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

Response Time Threshold

PositivePredictive

Value

Page 52: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

Get more degrees of freedom.

Page 53: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

Response Time Threshold

PositivePredictive

Value

Page 54: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

Hysteresis is a great way to add degrees of freedom.

• State machines

• Time-series analysis

Page 55: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

As your uptime increases, so must your specificity.

It affects your PPV much more than sensitivity.

Page 56: Car Alarms & Smoke Alarms [Monitorama]

Specificity

Sensitivity

Uptime Prevalence

False Positive

Rate

False Negative Rate

Page 57: Car Alarms & Smoke Alarms [Monitorama]

Specificity

Sensitivity

Uptime

Page 58: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

Separate the concerns of problem detection and problem identification

Page 59: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

• Check Apache process count

• Check swap usage

• Check median HTTP response time

• Check requests/second

Page 60: Car Alarms & Smoke Alarms [Monitorama]

Your alerting should tell you whether work is getting

done.Baron Schwartz(paraphrased)

Page 61: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

• Check Apache process count

• Check swap usage

• Check median HTTP response time

• Check requests/second

Page 62: Car Alarms & Smoke Alarms [Monitorama]

Semi-Practical Advice

• Check Apache process count

• Check swap usage

• Check median HTTP response time & requests/second

Page 63: Car Alarms & Smoke Alarms [Monitorama]

A Pony I Want

Something like Nagios, but which

• Helps you separate detection from diagnosis

• Is SNR-aware

Page 64: Car Alarms & Smoke Alarms [Monitorama]

• Medical paper with a nice visualization:http://tinyurl.com/specsens

• Blog post with some algebra: http://tinyurl.com/carsmoke

• Base rate fallacy:http://tinyurl.com/brfallacy

• Bischeck:http://tinyurl.com/bischeck

Other useful stuff

Page 65: Car Alarms & Smoke Alarms [Monitorama]
Page 66: Car Alarms & Smoke Alarms [Monitorama]

Come find meand chat.