Predictive Analytics_Stephen

29
27 May 2015 Theory & Practice

Transcript of Predictive Analytics_Stephen

Page 1: Predictive Analytics_Stephen

27 May 2015

Theory & Practice

Page 2: Predictive Analytics_Stephen

• Predictive Analytics

• Definition

• Process

• Theory

• Predictive Analytics Classification Model

27 May 2015 2

Page 3: Predictive Analytics_Stephen

Goal driven analysis of “large” data sets… To identify an approach for

allocating organizational resources

That enhances performance on the organization’s self-defined performance metrics

To better achieve the organization’s business objectives

Using a repeatable consistent strategy

27 May 2015 3

Business

Objectives

DataQuantitative

Analysis

Spread

Sheets

BusinessIntelligence

Data Mining

Predictive

Analytics

Page 4: Predictive Analytics_Stephen

1. Define Business Objective

2. Define Performance Metrics

3. Identify Behaviors that Impact Performance

4. Identify Scarce Resources to be Allocated

5. Confirm Availability of Historical Data

27 May 2015 4

Page 5: Predictive Analytics_Stephen

Pro

cess

Ste

ps

Participants Barriers

Business Expert(s) Data Expert(s) Math Expert(s)

Determine Business Question(s) & Performance Metric(s)

Knowledge of Available Data: Within Internal and External Sources

Access to – or Quick Creation of – Usable Data; Minimizing Replication & Security

Issues

Connection to Front-Line Business Issues

Time to Devote to Project; Collaboration with Fellow Data Scientists

Integration with Many Modeling Software Tools

Volume of Data to Validate Models

Data Expert Support to Embed Working Model into System; Business Expert

Support to Implement Business Process Changes

Identify Relevant Data

Pull Relevant Data

Data Preparation

Model Data

Evaluate Models

Deploy Validated Model(s)

Reporting Structure & Access to Expertise

27 May 2015 5

Page 6: Predictive Analytics_Stephen

Statistically Significant Difference, p < 0.05 ?

27 May 2015 6

Page 7: Predictive Analytics_Stephen

Identification of the Extremes

Hold-Out Sample Validation

27 May 2015 7

Page 8: Predictive Analytics_Stephen

x

x

x

x

x

x

x

x

x

x

x

x

o

o

o

o

o

o

o

o

o

o . . . . . .

. .

. .

. .

. .

. . . . .

. . .

. . . . . . . . .

.

. . .

. . .

.

.

.

. . .

Classification Forecasting

Success

27 May 2015 8

Page 9: Predictive Analytics_Stephen

Phase 1: Positive Behavior

Phase 2: Negative Behavior

Phase 3: Conflict Resolution

Phase 4: Ranking Across Continuum

-Tony Rathburn and The

Modeling Agency

Phased Development

27 May 2015 9

Page 10: Predictive Analytics_Stephen

Phase 5: Enhance Dimensionality

Phase 6: Refine Precision

Phase 7: Forecasting

-Tony Rathburn and The

Modeling Agency

Phased Development

27 May 2015 10

Page 11: Predictive Analytics_Stephen

27 May 2015 11

Overfitting occurs when a person looking at a set of

data develops a model that perfectly explains the

previous data points but has poor predictive power

Overfitting: “…the most important scientific

problem you’ve never heard of.” - Nate Silver

Page 12: Predictive Analytics_Stephen

27 May 2015 12

Valid

ation T

echniq

ues

Data

Train Test Validate

Holdout Sample Monte Carlo

Bootstrapping

K-Fold Cross Validation Target Shuffling (Placebo) 1. Build model, note strength

2. Randomly shuffle target

3. Build new “bogus” model,

save strength

4. Repeat 2 & 3 often to create

distribution of strengths

5. Evaluate where “true” strength

is on or beyond distribution

Page 13: Predictive Analytics_Stephen

27 May 2015 13

Row

Row

Row

Row

Row

Row

Co

lum

n

Co

lum

n

Co

lum

n

Co

lum

n

Co

lum

n

Co

lum

n

Co

lum

n

Co

lum

n

Unit of Analysis

Each row

represents a given

consumer,

household, SKU,

Store, etc.

Variables

Each column contains information

that is unique to each row

Page 14: Predictive Analytics_Stephen

27 May 2015 14

Row

Row

Row

Row

Row

Row

Unit of Analysis

Each row

represents a given

consumer,

household, SKU,

Store, etc.

Condition Variables

Everything else we know

about each row in the

data set

1

0

1

1

0

0

Outcome Variable

What we are trying to

predict; the outcome

Page 15: Predictive Analytics_Stephen

• Analytics: The insights from data are used to recommend action or to guide decision making rooted in business context. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology. – Wikipedia –

• Advanced Analytics: While the traditional analytical tools that comprise basic business intelligence (BI) examine historical data, tools for advanced analytics focus on forecasting future events and behaviors, allowing businesses to conduct what-if analyses to predict the effects of potential changes in business strategies. Predictive analytics, data mining, big data analytics, and location intelligence are just some of the analytical categories that fall under the heading of advanced analytics. http://searchbusinessanalytics.techtarget.com/definition/advanced-analytics

• There is an increasing use of the term advanced analytics, typically used to describe the technical aspects of analytics, especially predictive modeling, machine learning techniques, and neural networks. http://en.wikipedia.org/wiki/Analytics

27 May 2015 15

Page 16: Predictive Analytics_Stephen

What it is: • Data Expert/ Engineer (Business

Analyst, Data Analyst & ETL) • Business process integration • Data exploration/ mining • Applying math in a predictive

fashion to a data set • Visual analytics used for data

exploration • Story Telling (using visualizations) • Predictive & Prescriptive Analytics • Future Orientation

27 May 2015 16

What it is not:

• Reporting

• Dashboarding

• Database modeling of star schema, relational

• Technical system implementation

• Descriptive & Diagnostic Analytics

• Past or current orientation

Page 17: Predictive Analytics_Stephen

27 May 2015 17

Descriptive Predictive

Predictive Analytics

Classification to

Forecasting

Reporting

Descriptive &

Diagnostic

“Gray” Area

Optimization,

Factor Analysis,

Cluster Analysis

Page 18: Predictive Analytics_Stephen

27 May 2015 18

Page 19: Predictive Analytics_Stephen

27 May 2015 19

Item Sodium

(mg) Fat (g) Calories Item Cost

($) Number (1

to 5) Sodium

(mg) Fat (g) Calories Item Cost

($)

Beef Patty 50 17 220 $0.25 5 250 85 1100 $1.25

Bun 330 9 260 $0.15 5 1650 45 1300 $0.75

Cheese 310 6 70 $0.10 1 310 6 70 $0.10

Onions 1 2 10 $0.09 5 5 10 50 $0.45

Pickles 260 0 5 $0.03 1 260 0 5 $0.03

Lettuce 3 0 4 $0.04 3 9 0 12 $0.12

Ketchup 160 0 20 $0.02 3 480 0 60 $0.06

Tomato 3 0 9 $0.04 1 3 0 9 $0.04

Totals: 2967 146 2606 $2.80

Simple math, but can get very complicated.

For example, Trade Spend Optimization.

Page 20: Predictive Analytics_Stephen

27 May 2015 20

Page 21: Predictive Analytics_Stephen

Identified two groups: Low/High levels and low/high variation

Identified clear seasonality component

27 May 2015 21

Page 22: Predictive Analytics_Stephen

1. Ability to analyze data and build reports.

2. Ability to build predictive models.

3. Repeatable analytics. Repeatable process for building and deploying analytic models.

4. Enterprise level analytics. Analytics are used throughout an enterprise and integrated together.

5. Strategy driven analytics. Analytics integrated with an analytic strategy.

27 May 2015 22

Page 23: Predictive Analytics_Stephen

Analytic Maturity Level 2 organizations make predictions about future events instead of summarizing past events.

Organizations at Analytic Maturity Level 2 know the difference between business rules and analytics and integrate both of them into deployed systems.

27 May 2015 23

Page 24: Predictive Analytics_Stephen

Analytic Maturity Level 3 organizations remove barriers to building models, such as when modelers do not have easy access to the data.

Analytic Maturity Level 3 organizations remove barriers to deploying models.

27 May 2015 24

Page 25: Predictive Analytics_Stephen

Analytic Maturity Level 4 organizations use a consistent and repeatable process to produce analytic models across the enterprise.

Analytic Maturity Level 4 organizations integrate analytic models from across the organization to improve decision making.

Analytic Maturity Level 4 organizations have a culture of analytics.

27 May 2015 25

Page 26: Predictive Analytics_Stephen

27 May 2015 26

Organizational Evolution

Analy

tic S

trate

gy

CEO

CEO CFO

CEO

CFO CIO

CEO

CFO CIO CAO

IT is recognized

as having a

strategic role

Data & analytics

are recognized as

having strategic

roles

Page 27: Predictive Analytics_Stephen

27 May 2015 27

“New research by the McKinsey Global Institute (MGI)

projects that by 2018, the United States alone may face a 50

to 60 percent gap between supply and the requisite demand

of deep analytic talent, i.e., people with advanced training

in statistics or machine learning…

The United States alone faces a shortage of 140,000 to

190,000 people with analytical expertise and 1.5 million

managers and analysts with the skills to understand and

make decisions based on the analysis of big data.”

http://www.mckinsey.com/features/big_data

Page 28: Predictive Analytics_Stephen

27 May 2015 28

• Data mining solves a common paradox…

– The more customer data you have, the more difficult and time-consuming it is to effectively analyze and draw meaning from them.

• What should be a gold mine often lies unexplored due to a lack of personnel, time, or expertise.

– Data mining uses a clear business orientation and powerful analytic technologies to quickly and thoroughly explore mountains of data, pulling out the valuable, usable information – the business insight – that you need.

-SPSS White Paper

Older Term vs. “Predictive Analytics”

Page 29: Predictive Analytics_Stephen

1. Business Understanding: Achieve a clear understanding of your business challenges

2. Data Understanding: Determine what data are available to mine for answers

3. Data Preparation: Prepare the data in the appropriate format to answer your business questions

4. Modeling: Design data models to meet your requirements

5. Evaluation: Test your results against the goals of your project

6. Deployment: Make the results of the project available to decision makers

27 May 2015 29

Process