What’s in your wallet? Opportunity modeling approaches and applications
-
Upload
axel-leblanc -
Category
Documents
-
view
29 -
download
0
description
Transcript of What’s in your wallet? Opportunity modeling approaches and applications
© 2006 IBM Corporation
IBM Research
What’s in your wallet? Opportunity modeling approaches and applications
Claudia Perlich
Chief Scientist
Formerly: IBM Research
Collaborators: Saharon Rosset, Rick Lawrence, Srujana Merugu, et al.
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010
Publications & Recognition
2009 Finalist in the INFORMS Edelman competition
2007 Data Mining Practice Prize at KDD 2007, “Predictive modeling for marketing”, Runner Up
2007 IBM Outstanding Technical Award, “Opportunity models and validation for the Market Alignment Program (MAP)”
2005 IBM Research Award for contributions to Market Alignment Program (MAP)
“Operations Research Improves Sales Force Productivity at IBM” R. Lawrence, C.Perlich, S.Rosset, et al. Forthcoming INFORMS Journal on Computing
“Analytics-driven solutions for customer targeting and sales force allocation”, J. Arroyo, M. Callahan, M. Collins, A. Ershov, I. Khabibrakhmanov, R. Lawrence, S.Mahatma, M. Niemaszyk, C. Perlich, S. Rosset, S. Weiss. IBM Systems Journal 46 (4) (2007)
“A Data Mining Case Study: Analytics-driven solutions for customer targeting and sales force allocation” R. Lawrence, C. Perlich, S. Rosset, I. Khabibrakhmanov, S. Mahatma, S. Weiss. Second Workshop on Data Mining Case Studies and Practice Prize at SIGKDD 2007
“High Quantile Modeling for Customer Wallet Estimation with Other Applications” Perlich, C., S. Rosset, R. Lawrence, and B. Zadrozny, 13th SIGKDD International Conference on Knowledge Discovery and Data Mining 2007
“Quantile Modeling for Marketing”, Perlich, C., S. Rosset and B. Zadrozny. Workshop on Data Mining for Business Applications at 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006
“A New Multi-View Regression Approach with an Application to Customer Wallet Estimation” Merugu, S. S.Rosset and C. Perlich. 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006
“Wallet Estimation Models” Rosset, S., C. Perlich, B. Zadrozny, S. Merugu, S. Weiss and R. Lawrence. International Workshop on Customer Relationship Management: Data Mining Meets Marketing, NYU 2005
“Modeling Quantiles” Perlich, C., S. Rosset and B.Zadrozny. In Encyclopedia of Data Warehousing and Mining, Second Edition
© 2011 IBM Corporation
Presentation Outline
Wallet Definitions and Business Considerations
Modeling Approaches
Evaluation of Wallet Models
Business Impact – Market Alignment Project (MAP)
© 2011 IBM Corporation
What is Wallet/Opportunity? Total amount of money that the customer
(company) can spend in a certain product category in a given period
Company Revenue
IT Wallet
IBM Sales
IBM sales IT wallet Company revenue
Company Revenue
© 2011 IBM Corporation
Why Are We Interested in Wallet?
Customer targeting
– Focus on acquiring customers with high wallet
– Evaluate customers’ growth potential by combining wallet estimates and sales history
– For existing customers, focus on high wallet, low share-of-wallet customers
Sales force management
– Make resource assignment decisions
• Concentrate resources on untapped
– Evaluate success of sales personnel and sales channel by share-of-wallet they attain
© 2011 IBM Corporation
Wallet Modeling Challenge
The customer wallet is never observed
–Nothing to “fit a model”
–Even if you have a model, how do you evaluate it?
Need a predictive approach from available data
– Firmographics (Sales, Industry, Employees)
– IBM Sales and transaction history
© 2011 IBM Corporation
Existing Approaches to Wallet Modeling Bottom up: learn a model for individual companies
– Get “true” wallet values through surveys
– Very expensive
– Small, typically not representative sample
– Unreliable because ill defined
– Coarse level of IT categories
Top down: this approach was used by IBM Market Intelligence in North America (called ITEM)
– Use econometric models to assign total “opportunity” to segment (e.g., industry geography)
– Assign to companies in segment proportional to their size
– Completely Ad hoc without any validation
© 2011 IBM Corporation
Multiple Wallet Definitions TOTAL: Total customer available budget in the relevant area (e.g.,
total IT)
– Can we really hope to attain all of it?
SERVED: Total customer spending on IT products covered by IBM
– Better definition for our marketing purposes
REALISTIC: IBM spending of the “best similar customers”
REALISTIC SERVED TOTAL
Company Revenue
TOTAL
SERVED
REALISTIC
IBM Sales
© 2011 IBM Corporation Slide 9
We formulate the problem as Quantile Estimation Imagine 1,000 customers with identical customer features
Consider the distribution of the IBM Sales to these customers:
Opportunity is High Quantile
BestCustomers
IBM Sales
© 2011 IBM Corporation
Distribution of IBM sales s to the customer given customer attributes x: s|x ~ f,x
Two obvious ways to get at the pth percentile:
– Estimate the conditional by integrating over a neighborhood of similar customers Take pth percentile of spending in neighborhood
– Create a global model for pth percentile Build global regression models, e.g.,
Formally: Percentile of Conditional
),(~| 2 xNxs
E(s|x) REALISTIC
© 2011 IBM Corporation
Evaluation and Validation- Quantile Loss
- MAP Feedback
‘Ad HOC’
Overview of analytical approaches
kNN-Industry
- Size
Optimization
Quantile Regression Decomposition
Model Form - Linear - Decision Tree - Quanting
- Linear Model- Adjustment
General kNN - K - Distance - Features
© 2011 IBM Corporation
K-Nearest Neighbor
Distance metric:
– Industry match
– Euclidean distance on firmographics and past IBM sales
– Scaling issung
Neighborhood sizes (k):
– Neighborhood size has significant effect on prediction quality
Prediction:
– Quantile of firms in the neighborhoodIndustry
Employees Revenue
Universe of IBM customers with D&B information
Neighborhood of target company
Target company i
Fre
qu
en
cy
IBM Sales
Wallet Estimate
© 2011 IBM Corporation
Global Estimation: the Quantile Loss Function
The mean minimizes a sum of squared residuals:
The median minimizes a sum of absolute residuals.
The p-th quantile minimizes an asymmetrically weighted sum of absolute residuals:
n
iiy
1
2)(min
n
iim my
1
||min
n
iiipy yyL
i1
ˆ )ˆ,(min
-3 -2 -1 0 1 2 3
01
23
4
p=0.8
p=0.5 (absolute loss)
yyyyp
yyyypyyLp ˆ if )ˆ()1(
ˆ if )ˆ()ˆ,(
© 2011 IBM Corporation
Quantile Regression Traditional Regression:
– Estimation of conditional expected value by minimizing sum of squares:
Quantile Regression:
– Minimize Quantile loss:
Implementation:
– assume linear function , solution using linear programming
n
iiip xfyL
1
)),(,(min
quantile regression
loss function
n
iii xfy
1
2)),((min
yyyyp
yyyypyyLp ˆ if )ˆ()1(
ˆ if )ˆ()ˆ,(
xy
© 2011 IBM Corporation
Linear Quantile Regression (Koenker)
Slide 15
10 20 30 40 50 60 70 801
2
3
4
5
6
7
8
9
Firm Sales
IBM
Re
venu
e
Company Sales
IBM
Re
ven
ue
Opportunity for C 2
Opportunity for C 1
C1
C2
10 20 30 40 50 60 70 801
2
3
4
5
6
7
8
9
Firm Sales
IBM
Re
venu
e
Company Sales
IBM
Re
ven
ue
Opportunity for C 2
Opportunity for C1
C1
C2
10 20 30 40 50 60 70 801
2
3
4
5
6
7
8
9
10 20 30 40 50 60 70 801
2
3
4
5
6
7
8
9
Firm Sales
IBM
Re
venu
e
Company Sales
IBM
Re
ven
ue
Opportunity for C 2
Opportunity for C 1
C1
C2
10 20 30 40 50 60 70 801
2
3
4
5
6
7
8
9
20 30 40 50 60 70 801
2
3
4
5
6
7
8
9
Firm Sales
IBM
Re
venu
e
Company Sales
IBM
Re
ven
ue
Opportunity for C 2
Opportunity for C1
C1
C2
yyyyp
yyyypyyLp ˆ if )ˆ()1(
ˆ if )ˆ()ˆ,(
© 2011 IBM Corporation
Quantile Regression Tree Motivation:
– Identify a locally optimal definition of neighborhood
– Inherently nonlinear
Adjustments of M5/CART for Quantile prediction:
– Predict the percentile rather than the mean of the leaf
– Splitting/pruning criteria: Quantile or squared error loss?
Industry = ‘Banking’
Sales<100K
IBM Rev 2003>10K
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
no
yes
no
no
yes
yes
Industry = ‘Banking’
Sales<100K
IBM Rev 2003>10K
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
Fre
qu
ency
IBM Sales
Wallet Estimate
no
yes
no
no
yes
yes
© 2011 IBM Corporation
Quanting
Transform the quantile regression into a series of classification
– non-linearity, if non-linear classifiers are used
– theoretical guarantee: if the classifiers minimize the expected classification error, the quanting algorithm minimizes the quantile loss
Training
– Each classifier is trained to decide whether or not the conditional quantile is above a threshold T
– Original observations are re-labeled and re-weighted to train each classifier appropriately similar to the quantile loss
Prediction
– Find the threshold where the classifier predictions switch from one to zero
0
0
C5000
00111
00011
C6000C4000C3000C2000C1000
250
Prediction
350
© 2011 IBM Corporation
(Graphical model approach to SERVED Wallets)
Wallet is unobserved, all other variables are
Two families of variables --- firmographics and IBM relationship are conditionally independent given wallet
We develop inference procedures and demonstrate them
Theoretically attractive, practically questionable
Company
firmographics
IT spendwith IBM
Historical relationship
with IBM
SERVEDWallet
© 2011 IBM Corporation 19
Empirical Evaluation of Quantile Estimation
Setup
– Four domains with relevant quantile modeling problems
– Performance on test set in terms of 0.9 quantile loss
– Approaches: Linear quantile regression, Q-kNN, Quantile trees, Bagged quantile trees, Quanting
Baselines
– Best constant
– Traditional regression models for expected values, adjusted under Gaussian assumption (+1.28)
© 2011 IBM Corporation 20
Performance on Quantile Loss
Best result in BOLD, variance in parenthesis
Observations– Regression + 1.28 is not competitive (because the residuals are not normal)
– Splitting criterion is irrelevant
– Q-kNN is not competitive
– Quanting (using decision trees) and bagged quantile tree perform comparably
© 2011 IBM Corporation 21
Additional Insights
Irrelevance of splitting criterion
– Good news! Because squared error is much more efficient
– Reason:
• SSE measures the decrease of the conditional variance• SSE measures the ‘goodness’ of the local neighborhood• Good estimate of the conditional distribution -> good quantile
Linear model does well on IBM and KDD-CUP98
– Match of model bias
• Both domains have strong autocorrelation • Last years donation/revenue is a great predictor of this years• Hard for tree-based models to express linear relationships
© 2011 IBM Corporation
Evaluating REALISTIC Wallet
We still don’t know the truth
Quantile loss only evaluates the ability to predict quantile – but is a quantile a good wallet?
– Which quantile 80%, 90%, 99%?
Distribution is highly skewed
– Most error measure are very sensitive to outliers
– What is the right scale ? Log?
Even good survey data is not the truth
– Not available on a IBM product level
– Probably irrelevant for the REALISTIC wallet
Slide 23
MAP: Market Alignment Program Re-deploying IBM sales resources
Old Sales Process
Use prior-year revenue as proxy for future revenue generation
Assign quota based largely on recent revenue history
New Sales Process Using MAP ...
Use OR models to develop forward-looking view of opportunity by client
Assign quota based on future opportunity and productivity
Focused on Existing Relationship Focused on Future Opportunity
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010Slide 24
Integrated Data
The MAP process and components
IBM Sales Team Interviews
MAP Models
MAP Web Interface
Modeled Opportunity
Validated Opportunity
MAP Workshops
Data Model
Realign Sales Resources
Expert Feedback
Model Estimates
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010Slide 25
Explanatory features are extracted from multiple sources
IBM Client Transactions
Dun & Bradstreet(D&B) Data
IBM Transactional Features
Entity MatchingFeature Extraction
IndustryRevenue (Rank)EmployeesState D&B Structure Code…
Prior-year revenue in other product brands
Long-term revenue in other product brands
…
D&B Features
Train model against current year revenue based on previous year
Apply model by rolling forward to current year and predicting future opportunity
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010Slide 26
MAP Validation and Expert Feedback
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18 20
Expert Feedback
MODEL_OPPTY
Experts reduced opportunity to 0(15%)
Experts acceptopportunity (45%)
Experts changeopportunity (40%)
Increase (17%)
Decrease (23%)
kNN Opportunity
Exp
ert
-va
lida
ted
Op
port
unity
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18 20
Expert Feedback
MODEL_OPPTY
Experts reduced opportunity to 0(15%)
Experts acceptopportunity (45%)
Experts changeopportunity (40%)
Increase (17%)
Decrease (23%)
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18 20
Expert Feedback
MODEL_OPPTY
Experts reduced opportunity to 0(15%)
Experts acceptopportunity (45%)
Experts changeopportunity (40%)
Increase (17%)
Decrease (23%)
kNN Opportunity
Exp
ert
-va
lida
ted
Op
port
unity
Model Opportunity (log)
Exp
ert
Val
idat
es O
pp
ort
un
ity
(lo
g)
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010
Observations
Many accounts are set for external reasons to zero Exclude from evaluation since no model can predict the competitive
environment
Exponential distribution of opportunities Evaluation on the original (non-log) scale suffers from huge outliers
Experts seem to make percentage adjustments Consider log scale evaluation in addition to original scale and root
as intermediate Suspect strong “anchoring” bias, 45% of opportunities were not
touched
© 2011 IBM Corporation
Evaluation Measures
Different scales to avoid outlier artifacts
– Original: e = model - expert
– Root: e = root(model) - root(expert)
– Log: e = log(model) - log(expert)
Statistics on the distribution of the errors
– Mean of e2
– Mean of |e|
Total of 6 criteria
© 2011 IBM Corporation
Model Comparison Results
Model Rational DB2 Tivoli
Displayed Model (kNN) 6 6 4 5 6 6
Max 03-05 Revenue 1 1 0 3 1 4
Linear Quantile 0.8 5 6 2 4 3 5
Regression Tree 1 3 2 4 1 2
Q-kNN 50 + flooring 2 3 6 6 4 6
Decomposition Center 0 0 3 5 0 4
Quantile Tree 0.8 0 1 2 4 1 4
(Anchoring)
(Best)
We count how often a model scores within the top 10 and 20 for each of the 6 measures:
© 2011 IBM Corporation
MAP Experiments Conclusions Q-kNN performs very well after flooring but is typically
inferior prior to flooring
80th percentile Linear quantile regression performs consistently well (flooring has a minor effect)
Experts are strongly influenced by displayed opportunity (and displayed revenue of previous years)
Models without last year’s revenue don’t perform well
Use Linear Quantile Regression with q=0.8 in MAP 06
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010Slide 31
Scope and some of the tedious details
• 3 Million customers
• 20 Brands (Product categories)
• 4 Markets
• Annual model refresh
• The Quantile is chosen for each brand and market separately based on market insights on IBM market share
• Whitespace model for customers with no prior IBM revenue are build using the same methodology but only D&B features
• Entity matching between IBM customer records and D&B hierarchy is HARD
• Evaluation remains somewhat subjective and we collect feedback
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010Slide 32
In 2008 MAP covered 50+ countries and ~100% of IBM revenue and opportunity
g
Resources shifted to high growth Markets and Accounts Shifted resources performed >10 pts better
2005 2006 2007 2008 ,
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010Slide 33
Va
lida
ted
Re
ven
ue
Op
po
rtu
nit
y
Prior Year Actual Revenue
Core GrowthModest growth
potential
MAP output drives account segmentation and resource allocation decisions
Core OptimizeFlat or declining
InvestHigh
growth potential
Opportunistic
Small Accounts
Resource implications
Shift resources to Core Growth and Invest Accounts
Reduce resource overlap 8,000 sellers shifted
(2006 – 2009 )
Sellers shifted
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010Slide 34
MAP drove significant revenue impact in 2008
[3,000 Sellers] x [$2M Revenue / Seller] x [10% Performance Improvement]
Va
lida
ted
Re
ven
ue
Op
po
rtu
nit
y
Prior Year Actual Revenue
Core Growth
Core Optimize
Invest
Opportunistic
3,000 sellers shifted (2008)
$53B of Revenue
$9B of Revenue
30,000 sellers
= $600M (2008 Revenue Impact)
Predictive Modeling Group – Mathematical Sciences – IBM Research
© Copyright IBM Corporation 2010
MAP Take away
Interesting predictive modeling task that calls for an unorthodox loss function
Combination of data mining AND expert feedback Integration into the annual sales management cycle Significant effort on data collection and preparation Many additional analytical tools were build on top of
MAP Territory definition and assignmentQuota assignment
Substantial impact on the bottom line