NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming...

67
NY Open Statistical Programming Meetup - Feb 2019 Time Series, Two Ways: Anomaly Detection & Forecasting Catherine Zhou

Transcript of NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming...

Page 1: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

NY Open Statistical Programming Meetup - Feb 2019

Time Series, Two Ways:Anomaly Detection & Forecasting

Catherine Zhou

Page 2: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

About Me_● Proud New YorkeR● Currently @ Codecademy● Formerly @ JetBlue &

New York Sports Clubs

catzhou.comtwitter @catherinezh#rstats#rstatsnyc#rladies

Page 3: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

AgendaBy the end of this talk you will be able to:

● Assess data quality in time series ● Forecast seasonal trends ● Manage holidays + special events in your forecast ● Plot and visualize Google Trends data● Explore different anomaly detection algorithms● Explain case studies for forecasting & anomaly detection

Page 4: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

PART ONE

The Origin Story_… how I got roped into talking about time-series

Page 5: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 6: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 7: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 8: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 9: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Forecasting with Time-Series… what to do when your company confuses data science with fortune-telling

PART TWO

Page 10: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 11: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 12: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

EXPECTATION

We want to work with data that is:● Clean and well-organized● Daily or weekly patterns● Clear seasonal trends● Key metrics to monitor● Actionable insights

Page 13: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

EXPECTATION

We want to work with data that is:● Clean and well-organized● Daily or weekly patterns● Clear seasonal trends● Key metrics to monitor● Actionable insights

We often work with data that has:● Inconsistent trends and patterns● Terabytes in size● Missing or unclean data● Multiple key metrics

○ Difficult to monitor○ Difficult to interpret

REALITYVS.

Page 14: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

When should we forecast our time series data?

Page 15: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Seasonal trends: holidays and known events Use cases:● Sales + Promotions● Workforce Planning

Page 16: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Seasonal trends: holidays and known events Use cases:● Sales + Promotions● Workforce Planning

Page 17: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Enough historical data and domain knowledgeeg: quarterly earnings calls

Page 18: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

When shouldn’t we forecast our time series data?

Page 19: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

External factors influence your primary metric.

Page 20: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Behavior is too unpredictable to forecast.

Page 21: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

We only started collecting data recently (or a change of events) = not enough data to forecast.

Page 22: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

For data scientists, growth is a double-edged sword.

Page 23: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Growth creates uncertainty in time series forecasting...

...and uncertain forecasts aren’t very useful.

Page 24: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

LIVE CODE SESSION: PROPHET

Let’s get started!

Follow along:twitter @catherinezhgithub @cattystats

Page 25: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 26: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 27: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

y(t)= g(t) + s(t) + h(t) + εt

y = forecast outputg = growth trend (directional trend)s = seasonality (weekly, monthly, etc)h = holidays (user-provided)

Page 28: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 29: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

1. LOAD PROPHET + PREPARE DATA

Page 30: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

2. SPECIFY CEILING AND FLOOR USING DOMAIN KNOWLEDGE

Page 31: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

3. MAKE FUTURE DATA FRAME. NOW FORECAST!

Page 32: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

FORECASTING WEEKLY GOOGLE SEARCHES FOR ‘EARNINGS’ FORECAST OUTPUT

Predicts Increased Search Volume for April 2019 Q1 Earnings Call Season

Page 33: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

FORECASTING WEEKLY GOOGLE SEARCHES FOR ‘EARNINGS’ PLOT FORECAST

Page 34: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

FORECASTING WEEKLY GOOGLE SEARCHES FOR ‘EARNINGS’ FORECAST COMPONENTS

Page 35: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

INPUT RELEVANT HOLIDAYS, COMPANY EARNINGS CALLS, PRODUCT RELEASES

Page 36: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

INPUT RELEVANT HOLIDAYS, COMPANY EARNINGS CALLS, PRODUCT RELEASES

Page 37: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

● Set clear expectations○ What level of certainty can a forecast provide given the data at hand?

● Share upper and lower ranges○ Sharing just the midpoint might lead to speculation if the upper and

lower ranges are broad.● Update your priors (capacity, floor, holidays/launches, etc)● Differentiate company forecasts vs targets● Domain knowledge is valuable! (Use your ’spreadsheet people’)

Forecasting Best Practices_

Page 38: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

PART THREE

Anomaly Detection_

Page 39: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Anomaly detection on key metrics can lead to earlier detection of irregularities and reduce the number of fire drills.

We can be proactive instead of reactive.

Page 40: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine
Page 41: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

DATA SCIENCE FIRE DRILLS

Page 42: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Anomaly Detection with Time Series… or how to know when something is terribly wrong

SENIOR DATA SCIENTIST + MANAGER

PART THREE

Page 43: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

● Fraud Detection● KPI Monitoring● Identify Breakage● Workforce Planning● Nature (e.g. weather)● … and more!

“monitor key metrics, website breakage, and fraudulent activity… we can build a system for anomaly detection to uncover blind spots in large datasets and reduce fire drills at work”

- me talking about why anomaly detection matters

Applications of Anomaly Detection_

Page 44: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Before...

Page 45: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

AFTER!

Page 46: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

LIVE CODE SESSION: ANOMALIZE

Let’s get started!

Follow along:twitter @catherinezhgithub @cattystats

bit.ly/anomaly-r

Page 47: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

install + load gtrendsR: choose a keyword that interests you

1. CREATE A DATA FRAME

Page 48: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

install + load tidyverse and anomalize

2. PREPARE DATA

Page 49: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

install + load tidyverse and anomalize

2. PREPARE DATA

Page 50: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

anomalize!

3. ANOMALIZE!

Page 51: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

3. ANOMALIZE... TADA! KEYWORD: VOTE

‘04 presidential

Obama ‘08 Obama ‘12

‘10 midterms

‘16 presidential

‘18 midterms outrank previous pres elections in relative interest!

Page 52: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

LET’S TRY THIS WITH... KEYWORD: FLORIDA

‘16 presidential

Hurricane Irma

‘18 midterms

Page 53: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

LET’S TRY THIS WITH... KEYWORD: FLORIDA

‘16 presidential

Hurricane Irma

‘18 midterms

Page 54: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

KEYWORD: FLORIDA

plot_anomaly_decomposition()

visualize inner workings of how algorithm detects anomalies in the “remainder”

Page 55: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

anomalize cheat sheet:

Twitter + GESD better for highly seasonal data

STL + IQR if seasonality is not a major factor

adjust trend period using domain knowledge

4. EXPLORE METHODS BASED ON TIME SERIES ATTRIBUTES

Page 56: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

STL + IQR KEYWORD: MOVERS

Page 57: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

TWITTER DECOMPOSE CONTROLS FOR SEASONALITY KEYWORD: MOVERS

Page 58: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

TWITTER + GESD PULLS IN THE MEDIAN/MEAN KEYWORD: MOVERS

Page 59: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

TRY THIS ON DIFFERENT KEYWORDS

Page 60: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

PETE DAVIDSON

recent news, and 2018 data only -- seasonality is not really a factor, so we go back to using STL + IQR

Page 61: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

PETE DAVIDSON

outlier #1 - engaged weeks after they start dating, internet goes wild

Page 62: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

PETE DAVIDSON

outlier #2 - they call offtheir engagement, internet breaks again

outlier #1 - gets engaged

Page 63: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

TRY THIS AT HOME!

Keywords To Try:● NBA● Weather● Politicians● Elon Musk● Bitcoin● Holidays● Memes● Events

… and anything else you might think of!

Page 64: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

Additional Resources● R Code + Notebook● Introducing Anomalize● Github: Anomalize● Facebook Prophet● NSSD Podcast: Sean Taylor (on prophet)● Codecademy (Learn R coming soon!)

Page 65: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

March Meetup:Choosing A Deep Learning Library

Page 66: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

March Meetup:Choosing A Deep Learning Library

Page 67: NY Open Statistical Programming Meetup - Feb 2019Time ...€¦ · NY Open Statistical Programming Meetup - Feb 2019Time Series, Two Ways: Anomaly Detection & Forecasting Catherine

SUPPORT EACH OTHER.GOOD LUCK AND HAVE FUN!

twitter @catherinezhgithub @cattystatscatzhou.com#rstats#rstatsnyc#rladies