Predictive Analytics_Stephen
-
Upload
stephen-greenway -
Category
Documents
-
view
17 -
download
1
Transcript of Predictive Analytics_Stephen
27 May 2015
Theory & Practice
• Predictive Analytics
• Definition
• Process
• Theory
• Predictive Analytics Classification Model
27 May 2015 2
Goal driven analysis of “large” data sets… To identify an approach for
allocating organizational resources
That enhances performance on the organization’s self-defined performance metrics
To better achieve the organization’s business objectives
Using a repeatable consistent strategy
27 May 2015 3
Business
Objectives
DataQuantitative
Analysis
Spread
Sheets
BusinessIntelligence
Data Mining
Predictive
Analytics
1. Define Business Objective
2. Define Performance Metrics
3. Identify Behaviors that Impact Performance
4. Identify Scarce Resources to be Allocated
5. Confirm Availability of Historical Data
27 May 2015 4
Pro
cess
Ste
ps
Participants Barriers
Business Expert(s) Data Expert(s) Math Expert(s)
Determine Business Question(s) & Performance Metric(s)
Knowledge of Available Data: Within Internal and External Sources
Access to – or Quick Creation of – Usable Data; Minimizing Replication & Security
Issues
Connection to Front-Line Business Issues
Time to Devote to Project; Collaboration with Fellow Data Scientists
Integration with Many Modeling Software Tools
Volume of Data to Validate Models
Data Expert Support to Embed Working Model into System; Business Expert
Support to Implement Business Process Changes
Identify Relevant Data
Pull Relevant Data
Data Preparation
Model Data
Evaluate Models
Deploy Validated Model(s)
Reporting Structure & Access to Expertise
27 May 2015 5
Statistically Significant Difference, p < 0.05 ?
27 May 2015 6
Identification of the Extremes
Hold-Out Sample Validation
27 May 2015 7
x
x
x
x
x
x
x
x
x
x
x
x
o
o
o
o
o
o
o
o
o
o . . . . . .
. .
. .
. .
. .
. . . . .
. . .
. . . . . . . . .
.
. . .
. . .
.
.
.
. . .
Classification Forecasting
Success
27 May 2015 8
Phase 1: Positive Behavior
Phase 2: Negative Behavior
Phase 3: Conflict Resolution
Phase 4: Ranking Across Continuum
-Tony Rathburn and The
Modeling Agency
Phased Development
27 May 2015 9
Phase 5: Enhance Dimensionality
Phase 6: Refine Precision
Phase 7: Forecasting
-Tony Rathburn and The
Modeling Agency
Phased Development
27 May 2015 10
27 May 2015 11
Overfitting occurs when a person looking at a set of
data develops a model that perfectly explains the
previous data points but has poor predictive power
Overfitting: “…the most important scientific
problem you’ve never heard of.” - Nate Silver
27 May 2015 12
Valid
ation T
echniq
ues
Data
Train Test Validate
Holdout Sample Monte Carlo
Bootstrapping
K-Fold Cross Validation Target Shuffling (Placebo) 1. Build model, note strength
2. Randomly shuffle target
3. Build new “bogus” model,
save strength
4. Repeat 2 & 3 often to create
distribution of strengths
5. Evaluate where “true” strength
is on or beyond distribution
27 May 2015 13
Row
Row
Row
Row
Row
Row
Co
lum
n
Co
lum
n
Co
lum
n
Co
lum
n
Co
lum
n
Co
lum
n
Co
lum
n
Co
lum
n
Unit of Analysis
Each row
represents a given
consumer,
household, SKU,
Store, etc.
Variables
Each column contains information
that is unique to each row
27 May 2015 14
Row
Row
Row
Row
Row
Row
Unit of Analysis
Each row
represents a given
consumer,
household, SKU,
Store, etc.
Condition Variables
Everything else we know
about each row in the
data set
1
0
1
1
0
0
Outcome Variable
What we are trying to
predict; the outcome
• Analytics: The insights from data are used to recommend action or to guide decision making rooted in business context. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology. – Wikipedia –
• Advanced Analytics: While the traditional analytical tools that comprise basic business intelligence (BI) examine historical data, tools for advanced analytics focus on forecasting future events and behaviors, allowing businesses to conduct what-if analyses to predict the effects of potential changes in business strategies. Predictive analytics, data mining, big data analytics, and location intelligence are just some of the analytical categories that fall under the heading of advanced analytics. http://searchbusinessanalytics.techtarget.com/definition/advanced-analytics
• There is an increasing use of the term advanced analytics, typically used to describe the technical aspects of analytics, especially predictive modeling, machine learning techniques, and neural networks. http://en.wikipedia.org/wiki/Analytics
27 May 2015 15
What it is: • Data Expert/ Engineer (Business
Analyst, Data Analyst & ETL) • Business process integration • Data exploration/ mining • Applying math in a predictive
fashion to a data set • Visual analytics used for data
exploration • Story Telling (using visualizations) • Predictive & Prescriptive Analytics • Future Orientation
27 May 2015 16
What it is not:
• Reporting
• Dashboarding
• Database modeling of star schema, relational
• Technical system implementation
• Descriptive & Diagnostic Analytics
• Past or current orientation
27 May 2015 17
Descriptive Predictive
Predictive Analytics
Classification to
Forecasting
Reporting
Descriptive &
Diagnostic
“Gray” Area
Optimization,
Factor Analysis,
Cluster Analysis
27 May 2015 18
27 May 2015 19
Item Sodium
(mg) Fat (g) Calories Item Cost
($) Number (1
to 5) Sodium
(mg) Fat (g) Calories Item Cost
($)
Beef Patty 50 17 220 $0.25 5 250 85 1100 $1.25
Bun 330 9 260 $0.15 5 1650 45 1300 $0.75
Cheese 310 6 70 $0.10 1 310 6 70 $0.10
Onions 1 2 10 $0.09 5 5 10 50 $0.45
Pickles 260 0 5 $0.03 1 260 0 5 $0.03
Lettuce 3 0 4 $0.04 3 9 0 12 $0.12
Ketchup 160 0 20 $0.02 3 480 0 60 $0.06
Tomato 3 0 9 $0.04 1 3 0 9 $0.04
Totals: 2967 146 2606 $2.80
Simple math, but can get very complicated.
For example, Trade Spend Optimization.
27 May 2015 20
Identified two groups: Low/High levels and low/high variation
Identified clear seasonality component
27 May 2015 21
1. Ability to analyze data and build reports.
2. Ability to build predictive models.
3. Repeatable analytics. Repeatable process for building and deploying analytic models.
4. Enterprise level analytics. Analytics are used throughout an enterprise and integrated together.
5. Strategy driven analytics. Analytics integrated with an analytic strategy.
27 May 2015 22
Analytic Maturity Level 2 organizations make predictions about future events instead of summarizing past events.
Organizations at Analytic Maturity Level 2 know the difference between business rules and analytics and integrate both of them into deployed systems.
27 May 2015 23
Analytic Maturity Level 3 organizations remove barriers to building models, such as when modelers do not have easy access to the data.
Analytic Maturity Level 3 organizations remove barriers to deploying models.
27 May 2015 24
Analytic Maturity Level 4 organizations use a consistent and repeatable process to produce analytic models across the enterprise.
Analytic Maturity Level 4 organizations integrate analytic models from across the organization to improve decision making.
Analytic Maturity Level 4 organizations have a culture of analytics.
27 May 2015 25
27 May 2015 26
Organizational Evolution
Analy
tic S
trate
gy
CEO
CEO CFO
CEO
CFO CIO
CEO
CFO CIO CAO
IT is recognized
as having a
strategic role
Data & analytics
are recognized as
having strategic
roles
27 May 2015 27
“New research by the McKinsey Global Institute (MGI)
projects that by 2018, the United States alone may face a 50
to 60 percent gap between supply and the requisite demand
of deep analytic talent, i.e., people with advanced training
in statistics or machine learning…
The United States alone faces a shortage of 140,000 to
190,000 people with analytical expertise and 1.5 million
managers and analysts with the skills to understand and
make decisions based on the analysis of big data.”
http://www.mckinsey.com/features/big_data
27 May 2015 28
• Data mining solves a common paradox…
– The more customer data you have, the more difficult and time-consuming it is to effectively analyze and draw meaning from them.
• What should be a gold mine often lies unexplored due to a lack of personnel, time, or expertise.
– Data mining uses a clear business orientation and powerful analytic technologies to quickly and thoroughly explore mountains of data, pulling out the valuable, usable information – the business insight – that you need.
-SPSS White Paper
Older Term vs. “Predictive Analytics”
1. Business Understanding: Achieve a clear understanding of your business challenges
2. Data Understanding: Determine what data are available to mine for answers
3. Data Preparation: Prepare the data in the appropriate format to answer your business questions
4. Modeling: Design data models to meet your requirements
5. Evaluation: Test your results against the goals of your project
6. Deployment: Make the results of the project available to decision makers
27 May 2015 29
Process