What’s in your wallet? Opportunity modeling approaches and applications

36
© 2006 IBM Corporation IBM Research What’s in your wallet? Opportunity modeling approaches and applications Claudia Perlich Chief Scientist Formerly: IBM Research Collaborators: Saharon Rosset, Rick Lawrence, Srujana Merugu, et al.

description

What’s in your wallet? Opportunity modeling approaches and applications. Claudia Perlich Chief Scientist Formerly: IBM Research. Collaborators: Saharon Rosset, Rick Lawrence, Srujana Merugu, et al. Publications & Recognition. 2009 Finalist in the INFORMS Edelman competition - PowerPoint PPT Presentation

Transcript of What’s in your wallet? Opportunity modeling approaches and applications

© 2006 IBM Corporation

IBM Research

What’s in your wallet? Opportunity modeling approaches and applications

Claudia Perlich

Chief Scientist

Formerly: IBM Research

Collaborators: Saharon Rosset, Rick Lawrence, Srujana Merugu, et al.

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010

Publications & Recognition

2009 Finalist in the INFORMS Edelman competition

2007 Data Mining Practice Prize at KDD 2007, “Predictive modeling for marketing”, Runner Up

2007 IBM Outstanding Technical Award, “Opportunity models and validation for the Market Alignment Program (MAP)”

2005 IBM Research Award for contributions to Market Alignment Program (MAP)

“Operations Research Improves Sales Force Productivity at IBM” R. Lawrence, C.Perlich, S.Rosset, et al. Forthcoming INFORMS Journal on Computing

“Analytics-driven solutions for customer targeting and sales force allocation”, J. Arroyo, M. Callahan, M. Collins, A. Ershov, I. Khabibrakhmanov, R. Lawrence, S.Mahatma, M. Niemaszyk, C. Perlich, S. Rosset, S. Weiss. IBM Systems Journal 46 (4) (2007)

“A Data Mining Case Study: Analytics-driven solutions for customer targeting and sales force allocation” R. Lawrence, C. Perlich, S. Rosset, I. Khabibrakhmanov, S. Mahatma, S. Weiss. Second Workshop on Data Mining Case Studies and Practice Prize at SIGKDD 2007

“High Quantile Modeling for Customer Wallet Estimation with Other Applications” Perlich, C., S. Rosset, R. Lawrence, and B. Zadrozny, 13th SIGKDD International Conference on Knowledge Discovery and Data Mining 2007

“Quantile Modeling for Marketing”, Perlich, C., S. Rosset and B. Zadrozny. Workshop on Data Mining for Business Applications at 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006

“A New Multi-View Regression Approach with an Application to Customer Wallet Estimation” Merugu, S. S.Rosset and C. Perlich. 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006

“Wallet Estimation Models” Rosset, S., C. Perlich, B. Zadrozny, S. Merugu, S. Weiss and R. Lawrence. International Workshop on Customer Relationship Management: Data Mining Meets Marketing, NYU 2005

“Modeling Quantiles” Perlich, C., S. Rosset and B.Zadrozny. In Encyclopedia of Data Warehousing and Mining, Second Edition

© 2011 IBM Corporation

Presentation Outline

Wallet Definitions and Business Considerations

Modeling Approaches

Evaluation of Wallet Models

Business Impact – Market Alignment Project (MAP)

© 2011 IBM Corporation

What is Wallet/Opportunity? Total amount of money that the customer

(company) can spend in a certain product category in a given period

Company Revenue

IT Wallet

IBM Sales

IBM sales IT wallet Company revenue

Company Revenue

© 2011 IBM Corporation

Why Are We Interested in Wallet?

Customer targeting

– Focus on acquiring customers with high wallet

– Evaluate customers’ growth potential by combining wallet estimates and sales history

– For existing customers, focus on high wallet, low share-of-wallet customers

Sales force management

– Make resource assignment decisions

• Concentrate resources on untapped

– Evaluate success of sales personnel and sales channel by share-of-wallet they attain

© 2011 IBM Corporation

Wallet Modeling Challenge

The customer wallet is never observed

–Nothing to “fit a model”

–Even if you have a model, how do you evaluate it?

Need a predictive approach from available data

– Firmographics (Sales, Industry, Employees)

– IBM Sales and transaction history

© 2011 IBM Corporation

Existing Approaches to Wallet Modeling Bottom up: learn a model for individual companies

– Get “true” wallet values through surveys

– Very expensive

– Small, typically not representative sample

– Unreliable because ill defined

– Coarse level of IT categories

Top down: this approach was used by IBM Market Intelligence in North America (called ITEM)

– Use econometric models to assign total “opportunity” to segment (e.g., industry geography)

– Assign to companies in segment proportional to their size

– Completely Ad hoc without any validation

© 2011 IBM Corporation

Multiple Wallet Definitions TOTAL: Total customer available budget in the relevant area (e.g.,

total IT)

– Can we really hope to attain all of it?

SERVED: Total customer spending on IT products covered by IBM

– Better definition for our marketing purposes

REALISTIC: IBM spending of the “best similar customers”

REALISTIC SERVED TOTAL

Company Revenue

TOTAL

SERVED

REALISTIC

IBM Sales

© 2011 IBM Corporation Slide 9

We formulate the problem as Quantile Estimation Imagine 1,000 customers with identical customer features

Consider the distribution of the IBM Sales to these customers:

Opportunity is High Quantile

BestCustomers

IBM Sales

© 2011 IBM Corporation

Distribution of IBM sales s to the customer given customer attributes x: s|x ~ f,x

Two obvious ways to get at the pth percentile:

– Estimate the conditional by integrating over a neighborhood of similar customers Take pth percentile of spending in neighborhood

– Create a global model for pth percentile Build global regression models, e.g.,

Formally: Percentile of Conditional

),(~| 2 xNxs

E(s|x) REALISTIC

© 2011 IBM Corporation

Evaluation and Validation- Quantile Loss

- MAP Feedback

‘Ad HOC’

Overview of analytical approaches

kNN-Industry

- Size

Optimization

Quantile Regression Decomposition

Model Form - Linear - Decision Tree - Quanting

- Linear Model- Adjustment

General kNN - K - Distance - Features

© 2011 IBM Corporation

K-Nearest Neighbor

Distance metric:

– Industry match

– Euclidean distance on firmographics and past IBM sales

– Scaling issung

Neighborhood sizes (k):

– Neighborhood size has significant effect on prediction quality

Prediction:

– Quantile of firms in the neighborhoodIndustry

Employees Revenue

Universe of IBM customers with D&B information

Neighborhood of target company

Target company i

Fre

qu

en

cy

IBM Sales

Wallet Estimate

© 2011 IBM Corporation

Global Estimation: the Quantile Loss Function

The mean minimizes a sum of squared residuals:

The median minimizes a sum of absolute residuals.

The p-th quantile minimizes an asymmetrically weighted sum of absolute residuals:

n

iiy

1

2)(min

n

iim my

1

||min

n

iiipy yyL

i1

ˆ )ˆ,(min

-3 -2 -1 0 1 2 3

01

23

4

p=0.8

p=0.5 (absolute loss)

yyyyp

yyyypyyLp ˆ if )ˆ()1(

ˆ if )ˆ()ˆ,(

© 2011 IBM Corporation

Quantile Regression Traditional Regression:

– Estimation of conditional expected value by minimizing sum of squares:

Quantile Regression:

– Minimize Quantile loss:

Implementation:

– assume linear function , solution using linear programming

n

iiip xfyL

1

)),(,(min

quantile regression

loss function

n

iii xfy

1

2)),((min

yyyyp

yyyypyyLp ˆ if )ˆ()1(

ˆ if )ˆ()ˆ,(

xy

© 2011 IBM Corporation

Linear Quantile Regression (Koenker)

Slide 15

10 20 30 40 50 60 70 801

2

3

4

5

6

7

8

9

Firm Sales

IBM

Re

venu

e

Company Sales

IBM

Re

ven

ue

Opportunity for C 2

Opportunity for C 1

C1

C2

10 20 30 40 50 60 70 801

2

3

4

5

6

7

8

9

Firm Sales

IBM

Re

venu

e

Company Sales

IBM

Re

ven

ue

Opportunity for C 2

Opportunity for C1

C1

C2

10 20 30 40 50 60 70 801

2

3

4

5

6

7

8

9

10 20 30 40 50 60 70 801

2

3

4

5

6

7

8

9

Firm Sales

IBM

Re

venu

e

Company Sales

IBM

Re

ven

ue

Opportunity for C 2

Opportunity for C 1

C1

C2

10 20 30 40 50 60 70 801

2

3

4

5

6

7

8

9

20 30 40 50 60 70 801

2

3

4

5

6

7

8

9

Firm Sales

IBM

Re

venu

e

Company Sales

IBM

Re

ven

ue

Opportunity for C 2

Opportunity for C1

C1

C2

yyyyp

yyyypyyLp ˆ if )ˆ()1(

ˆ if )ˆ()ˆ,(

© 2011 IBM Corporation

Quantile Regression Tree Motivation:

– Identify a locally optimal definition of neighborhood

– Inherently nonlinear

Adjustments of M5/CART for Quantile prediction:

– Predict the percentile rather than the mean of the leaf

– Splitting/pruning criteria: Quantile or squared error loss?

Industry = ‘Banking’

Sales<100K

IBM Rev 2003>10K

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

no

yes

no

no

yes

yes

Industry = ‘Banking’

Sales<100K

IBM Rev 2003>10K

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

Fre

qu

ency

IBM Sales

Wallet Estimate

no

yes

no

no

yes

yes

© 2011 IBM Corporation

Quanting

Transform the quantile regression into a series of classification

– non-linearity, if non-linear classifiers are used

– theoretical guarantee: if the classifiers minimize the expected classification error, the quanting algorithm minimizes the quantile loss

Training

– Each classifier is trained to decide whether or not the conditional quantile is above a threshold T

– Original observations are re-labeled and re-weighted to train each classifier appropriately similar to the quantile loss

Prediction

– Find the threshold where the classifier predictions switch from one to zero

0

0

C5000

00111

00011

C6000C4000C3000C2000C1000

250

Prediction

350

© 2011 IBM Corporation

(Graphical model approach to SERVED Wallets)

Wallet is unobserved, all other variables are

Two families of variables --- firmographics and IBM relationship are conditionally independent given wallet

We develop inference procedures and demonstrate them

Theoretically attractive, practically questionable

Company

firmographics

IT spendwith IBM

Historical relationship

with IBM

SERVEDWallet

© 2011 IBM Corporation 19

Empirical Evaluation of Quantile Estimation

Setup

– Four domains with relevant quantile modeling problems

– Performance on test set in terms of 0.9 quantile loss

– Approaches: Linear quantile regression, Q-kNN, Quantile trees, Bagged quantile trees, Quanting

Baselines

– Best constant

– Traditional regression models for expected values, adjusted under Gaussian assumption (+1.28)

© 2011 IBM Corporation 20

Performance on Quantile Loss

Best result in BOLD, variance in parenthesis

Observations– Regression + 1.28 is not competitive (because the residuals are not normal)

– Splitting criterion is irrelevant

– Q-kNN is not competitive

– Quanting (using decision trees) and bagged quantile tree perform comparably

© 2011 IBM Corporation 21

Additional Insights

Irrelevance of splitting criterion

– Good news! Because squared error is much more efficient

– Reason:

• SSE measures the decrease of the conditional variance• SSE measures the ‘goodness’ of the local neighborhood• Good estimate of the conditional distribution -> good quantile

Linear model does well on IBM and KDD-CUP98

– Match of model bias

• Both domains have strong autocorrelation • Last years donation/revenue is a great predictor of this years• Hard for tree-based models to express linear relationships

© 2011 IBM Corporation

Evaluating REALISTIC Wallet

We still don’t know the truth

Quantile loss only evaluates the ability to predict quantile – but is a quantile a good wallet?

– Which quantile 80%, 90%, 99%?

Distribution is highly skewed

– Most error measure are very sensitive to outliers

– What is the right scale ? Log?

Even good survey data is not the truth

– Not available on a IBM product level

– Probably irrelevant for the REALISTIC wallet

Slide 23

MAP: Market Alignment Program Re-deploying IBM sales resources

Old Sales Process

Use prior-year revenue as proxy for future revenue generation

Assign quota based largely on recent revenue history

New Sales Process Using MAP ...

Use OR models to develop forward-looking view of opportunity by client

Assign quota based on future opportunity and productivity

Focused on Existing Relationship Focused on Future Opportunity

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010Slide 24

Integrated Data

The MAP process and components

IBM Sales Team Interviews

MAP Models

MAP Web Interface

Modeled Opportunity

Validated Opportunity

MAP Workshops

Data Model

Realign Sales Resources

Expert Feedback

Model Estimates

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010Slide 25

Explanatory features are extracted from multiple sources

IBM Client Transactions

Dun & Bradstreet(D&B) Data

IBM Transactional Features

Entity MatchingFeature Extraction

IndustryRevenue (Rank)EmployeesState D&B Structure Code…

Prior-year revenue in other product brands

Long-term revenue in other product brands

D&B Features

Train model against current year revenue based on previous year

Apply model by rolling forward to current year and predicting future opportunity

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010Slide 26

MAP Validation and Expert Feedback

0

2

4

6

8

10

12

14

16

18

20

0 2 4 6 8 10 12 14 16 18 20

Expert Feedback

MODEL_OPPTY

Experts reduced opportunity to 0(15%)

Experts acceptopportunity (45%)

Experts changeopportunity (40%)

Increase (17%)

Decrease (23%)

kNN Opportunity

Exp

ert

-va

lida

ted

Op

port

unity

0

2

4

6

8

10

12

14

16

18

20

0 2 4 6 8 10 12 14 16 18 20

Expert Feedback

MODEL_OPPTY

Experts reduced opportunity to 0(15%)

Experts acceptopportunity (45%)

Experts changeopportunity (40%)

Increase (17%)

Decrease (23%)

0

2

4

6

8

10

12

14

16

18

20

0 2 4 6 8 10 12 14 16 18 20

Expert Feedback

MODEL_OPPTY

Experts reduced opportunity to 0(15%)

Experts acceptopportunity (45%)

Experts changeopportunity (40%)

Increase (17%)

Decrease (23%)

kNN Opportunity

Exp

ert

-va

lida

ted

Op

port

unity

Model Opportunity (log)

Exp

ert

Val

idat

es O

pp

ort

un

ity

(lo

g)

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010

Observations

Many accounts are set for external reasons to zero Exclude from evaluation since no model can predict the competitive

environment

Exponential distribution of opportunities Evaluation on the original (non-log) scale suffers from huge outliers

Experts seem to make percentage adjustments Consider log scale evaluation in addition to original scale and root

as intermediate Suspect strong “anchoring” bias, 45% of opportunities were not

touched

© 2011 IBM Corporation

Evaluation Measures

Different scales to avoid outlier artifacts

– Original: e = model - expert

– Root: e = root(model) - root(expert)

– Log: e = log(model) - log(expert)

Statistics on the distribution of the errors

– Mean of e2

– Mean of |e|

Total of 6 criteria

© 2011 IBM Corporation

Model Comparison Results

Model Rational DB2 Tivoli

Displayed Model (kNN) 6 6 4 5 6 6

Max 03-05 Revenue 1 1 0 3 1 4

Linear Quantile 0.8 5 6 2 4 3 5

Regression Tree 1 3 2 4 1 2

Q-kNN 50 + flooring 2 3 6 6 4 6

Decomposition Center 0 0 3 5 0 4

Quantile Tree 0.8 0 1 2 4 1 4

(Anchoring)

(Best)

We count how often a model scores within the top 10 and 20 for each of the 6 measures:

© 2011 IBM Corporation

MAP Experiments Conclusions Q-kNN performs very well after flooring but is typically

inferior prior to flooring

80th percentile Linear quantile regression performs consistently well (flooring has a minor effect)

Experts are strongly influenced by displayed opportunity (and displayed revenue of previous years)

Models without last year’s revenue don’t perform well

Use Linear Quantile Regression with q=0.8 in MAP 06

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010Slide 31

Scope and some of the tedious details

• 3 Million customers

• 20 Brands (Product categories)

• 4 Markets

• Annual model refresh

• The Quantile is chosen for each brand and market separately based on market insights on IBM market share

• Whitespace model for customers with no prior IBM revenue are build using the same methodology but only D&B features

• Entity matching between IBM customer records and D&B hierarchy is HARD

• Evaluation remains somewhat subjective and we collect feedback

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010Slide 32

In 2008 MAP covered 50+ countries and ~100% of IBM revenue and opportunity

g

Resources shifted to high growth Markets and Accounts Shifted resources performed >10 pts better

2005 2006 2007 2008 ,

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010Slide 33

Va

lida

ted

Re

ven

ue

Op

po

rtu

nit

y

Prior Year Actual Revenue

Core GrowthModest growth

potential

MAP output drives account segmentation and resource allocation decisions

Core OptimizeFlat or declining

InvestHigh

growth potential

Opportunistic

Small Accounts

Resource implications

Shift resources to Core Growth and Invest Accounts

Reduce resource overlap 8,000 sellers shifted

(2006 – 2009 )

Sellers shifted

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010Slide 34

MAP drove significant revenue impact in 2008

[3,000 Sellers] x [$2M Revenue / Seller] x [10% Performance Improvement]

Va

lida

ted

Re

ven

ue

Op

po

rtu

nit

y

Prior Year Actual Revenue

Core Growth

Core Optimize

Invest

Opportunistic

3,000 sellers shifted (2008)

$53B of Revenue

$9B of Revenue

30,000 sellers

= $600M (2008 Revenue Impact)

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010

MAP Take away

Interesting predictive modeling task that calls for an unorthodox loss function

Combination of data mining AND expert feedback Integration into the annual sales management cycle Significant effort on data collection and preparation Many additional analytical tools were build on top of

MAP Territory definition and assignmentQuota assignment

Substantial impact on the bottom line

Predictive Modeling Group – Mathematical Sciences – IBM Research

© Copyright IBM Corporation 2010

Questions?