Anomaly Detection - New York Machine Learning

85
© 2014 MapR Technologies 1 © MapR Technologies, confidential How to Find What You Didn’t Know to Look For Anomaly Detection October 14, 2014

description

Anomaly detection is the art of finding what you don't know how to ask for. In this talk, I walk through the why and how of building probabilistic models for a variety of problems including continuous signals and web traffic. This talk blends theory and practice in a highly approachable way.

Transcript of Anomaly Detection - New York Machine Learning

Page 1: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 1

© MapR Technologies, confidential

How to Find What You Didn’t Know to Look For

Anomaly Detection

October 14, 2014

Page 2: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 2

Anomaly Detection: How To Find What You Didn’t Know to Look For

Ted Dunning, Chief Applications Architect MapR Technologies

Email [email protected] [email protected]

Twitter @Ted_Dunning

Ellen Friedman, Consultant and Commentator

Email [email protected]

Twitter @Ellen_Friedman

Page 3: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 3

e-book available courtesy of MapR

http://bit.ly/1jQ9QuL

A New Look at Anomaly Detectionby Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)

Page 4: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 4

Practical Machine Learning series (O’Reilly)

• Machine learning is becoming mainstream• Need pragmatic approaches that take into account real world

business settings:– Time to value– Limited resources– Availability of data– Expertise and cost of team to develop and to maintain system

• Look for approaches with big benefits for the effort expended

Page 5: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 5

Anomaly Detection

Page 6: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 6

Who Needs Anomaly Detection?

Utility providers using smart meters

Page 7: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 7

Who Needs Anomaly Detection?

Feedback from manufacturing assembly lines

Page 8: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 8

Who Needs Anomaly Detection?

Monitoring data traffic on communication networks

Page 9: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 9

What is Anomaly Detection?

• The goal is to discover rare events – especially those that shouldn’t have happened

• Find a problem before other people see it– especially before it causes a problem for customers

• Why is this a challenge?– I don’t know what an anomaly looks like (yet)

Page 10: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 10

Spot the Anomaly

Page 11: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 11

Spot the Anomaly

Looks pretty anomalous

to me

Page 12: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 12

Spot the Anomaly

Will the real anomaly please stand up?

Page 13: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 13

Basic idea:Find “normal” first

Page 14: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 14

Steps in Anomaly Detection

• Build a model: Collect and process data for training a model• Use the machine learning model to determine what is the normal

pattern • Decide how far away from this normal pattern you’ll consider to

be anomalous• Use the AD model to detect anomalies in new data

– Methods such as clustering for discovery can be helpful

Page 15: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 15

How hard is it to set an alert for anomalies?

Grey data is from normal events; x’s are anomalies.Where would you set the threshold?

Page 16: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 16

Basic idea:Set adaptive thresholds

Page 17: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 17

What Are We Really Doing

• We want action when something breaks (dies/falls over/otherwise gets in trouble)

• But action is expensive• So we don’t want too many false alarms• And we don’t want too many false negatives

• What’s the right threshold to set for alerts?– We need to trade off costs

Page 18: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 18

A Second Look

Page 19: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 19

A Second Look

99.9%-ile

Page 20: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 20

New algorithm: t-digest

Page 21: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 21

Online Summarizer

99.9%-ile

t

x > t ? Alarm !x

How Hard Can it Be?

Page 22: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 22

Detecting Anomalies in Sporadic Events

Page 23: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 23

Using t-Digest

• Apache Mahout uses t-digest as an on-line percentile estimator– very high accuracy for extreme tails– new in version Mahout v 0.9

• t-digest also available elsewhere– in streamlib (open source library on github)– standalone (github and Maven Central)

• What’s the big deal with anomaly detection?

• This looks like a solved problem

Page 24: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 24

Already Done? Etsy Skyline?

Page 25: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 25

What About This?

Page 26: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 26

Model Delta Anomaly Detection

Online Summarizer

δ > t ?

99.9%-ile

t

Alarm !

Model

-

+ δ

Page 27: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 27

The Real Inside Scoop

• The model-delta anomaly detector is really just a sum of random variables– the model we know about already– and a normally distributed error

• The output (delta) is (roughly) the log probability of the sum distribution (really δ2)

• Thinking about probability distributions is good

• But how do you handle AD in systems with sporadic events?

Page 28: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 28

Spot the Anomaly

Anomaly?

Page 29: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 29

Maybe not!

Page 30: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 30

Where’s Waldo?

This is the real anomaly

Page 31: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 31

Normal Isn’t Just Normal

• What we want is a model of what is normal

• What doesn’t fit the model is the anomaly

• For simple signals, the model can be simple …

• The real world is rarely so accommodating

Page 32: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 32

We Do Windows

Page 33: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 33

We Do Windows

Page 34: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 34

We Do Windows

Page 35: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 35

We Do Windows

Page 36: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 36

We Do Windows

Page 37: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 37

We Do Windows

Page 38: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 38

We Do Windows

Page 39: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 39

We Do Windows

Page 40: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 40

We Do Windows

Page 41: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 41

We Do Windows

Page 42: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 42

We Do Windows

Page 43: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 43

We Do Windows

Page 44: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 44

We Do Windows

Page 45: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 45

We Do Windows

Page 46: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 46

We Do Windows

Page 47: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 47

Windows on the World

• The set of windowed signals is a nice model of our original signal• Clustering can find the prototypes

– Fancier techniques available using sparse coding

• The result is a dictionary of shapes• New signals can be encoded by shifting, scaling and adding

shapes from the dictionary

Page 48: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 48

Most Common Shapes (for EKG)

Page 49: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 49

Reconstructed signal

Original signal

Reconstructed signal

Reconstructionerror

< 1 bit / sample

Page 50: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 50

An Anomaly

Original technique for finding 1-d anomaly works against reconstruction error

Page 51: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 51

Close-up of anomaly

Not what you want your heart to do.

And not what the model expects it to do.

Page 52: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 52

A Different Kind of Anomaly

Page 53: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 53

Model Delta Anomaly Detection

Online Summarizer

δ > t ?

99.9%-ile

t

Alarm !

Model

-

+ δ

Page 54: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 54

The Real Inside Scoop

• The model-delta anomaly detector is really just a sum of random variables– the model we know about already– and a normally distributed error

• The output (delta) is (roughly) the log probability of the sum distribution (really δ2)

• Thinking about probability distributions is good

Page 55: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 55

Anomalies among sporadic events

Page 56: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 56

Sporadic Web Traffic to an e-Business Site

It’s important to know if traffic is stopped or delayed because of a problem…

But visits to site normally come at varying intervals.

How long after the last event should you begin to worry?

Page 57: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 57

Sporadic Web Traffic to an e-Business Site

It’s important to know if traffic is stopped or delayed because of a problem…

But visits to site normally come at varying intervals.

And how do you let your CEO sleep through the night?

Page 58: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 58

Basic idea:Time interval between events is how to

convert to something useful you can measure

Page 59: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 59

Sporadic Events: Finding Normal and Anomalous Patterns

• Time between intervals is much more usable than absolute times

• Counts don’t link as directly to probability models

• Time interval is log ρ

• This is a big deal

Page 60: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 60

Event Stream (timing)

• Events of various types arrive at irregular intervals– we can assume Poisson distribution

• The key question is whether frequency has changed relative to expected values– This shows up as a change in interval

• Want alert as soon as possible

Page 61: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 61

Converting Event Times to Anomaly

99.9%-ile

99.99%-ile

Page 62: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 62

But in the real world, event rates often change

Page 63: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 63

Time Intervals Are Key to Modeling Sporadic Events

Page 64: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 64

Model-Scaled Intervals Solve the Problem

Page 65: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 65

Model Delta Anomaly Detection

Online Summarizer

δ > t ?

99.9%-ile

t

Alarm !

Model

-

+ δ

log p

Page 66: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 66

Detecting Anomalies in Sporadic Events

Page 67: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 67

Detecting Anomalies in Sporadic Events

Page 68: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 68

Slipped Week: Simple Rate Predictor

Page 69: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 69

Poisson Distribution

• Time between events is exponentially distributed

• This means that long delays are exponentially rare

• If we know λ we can select a good threshold– or we can pick a threshold empirically

Page 70: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 70

Seasonality Poses a Challenge

Page 71: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 71

Something more is needed …

Page 72: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 72

We need a better rate predictor…

Page 73: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 73

A New Rate Predictor for Sporadic Events

Page 74: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 74

Improved Prediction with Adaptive Modeling

Page 75: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 75

Anomaly Detection + Classification Useful Pair

• Use the AD model to detect anomalies in new data– Methods such as clustering for discovery can be helpful

• Once you have well-defined models in your system, you may also want to use classification to tag those

• Continue to use the AD model to find new anomalies

Page 76: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 76

Recap (out of order)

• Anomaly detection is best done with a probability model• -log p is a good way to convert to anomaly measure• Adaptive quantile estimation (t-digest) works for auto-setting

thresholds

Page 77: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 77

Recap

• Different systems require different models• Continuous time-series

– sparse coding to build signal model

• Events in time– rate model base on variable rate Poisson– segregated rate model

• Events with labels– language modeling– hidden Markov models

Page 78: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 78

Why Use Anomaly Detection?

Page 79: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 79

Keep in mind…

• Model normal, then find anomalies

• t-digest for adaptive threshold

• Probabilistic models for complex patterns

-

Page 80: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 80

Keep in mind…

• Time intervals are key for sporadic events

• Complex time shift to predict rate with seasonality

• Sequence of events reveals phishing attack

Page 81: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 81

e-book available courtesy of MapR

http://bit.ly/1jQ9QuL

A New Look at Anomaly Detectionby Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)

Page 82: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 82

Coming in October: Time Series Databasesby Ted Dunning and Ellen Friedman © Oct 2014 (published by O’Reilly)

Page 83: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 83

Thank you for coming today!

Page 84: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 85

© MapR Technologies, confidential

Page 85: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 86

Sandbox