Predicting Customer Conversion with Random Forests

22
Predicting Customer Conversion with Random Forests Daniel Gerlanc, Principal Enplus Advisors, Inc. www.enplusadvisors.com [email protected] A Decision Trees Case Study

description

Talk given for New England Artificial Intelligence on October 10, 2012.

Transcript of Predicting Customer Conversion with Random Forests

Page 1: Predicting Customer Conversion with Random Forests

Predicting Customer Conversion with Random Forests

Daniel Gerlanc, PrincipalEnplus Advisors, [email protected]

A Decision Trees Case Study

Page 2: Predicting Customer Conversion with Random Forests

Topics

Objectives Research Question

DataBank Prospect

Conversion

MethodsDecision Trees

Random Forests

Results

Page 3: Predicting Customer Conversion with Random Forests

Objective

•Which customer or prospects should you call today?

•To whom should you offer incentives?

Page 4: Predicting Customer Conversion with Random Forests

Dataset

•Direct Marketing campaign for bank loans

•http://archive.ics.uci.edu/ml/datasets/Bank+Marketing

•45211 records, 17 features

Page 5: Predicting Customer Conversion with Random Forests

Dataset

Page 6: Predicting Customer Conversion with Random Forests

Decision Trees

Page 7: Predicting Customer Conversion with Random Forests

Decision Trees

Coat

Sunny

yes

no

Windy

No Coat

Coat

Page 8: Predicting Customer Conversion with Random Forests

Statistical Decision Trees

•Randomness

•May not know the relationships ahead of time

Page 9: Predicting Customer Conversion with Random Forests

Decision Trees

Page 10: Predicting Customer Conversion with Random Forests

Splitting

Deterministic process

Page 11: Predicting Customer Conversion with Random Forests

Decision Tree Codetree.1 <- rpart(takes.loan ~ ., data=bank)

• See the ‘rpart’ and ‘rpart.plot’ R packages.• Many parameters available to control the fit.

Page 12: Predicting Customer Conversion with Random Forests

Make Predictionspredict(tree.1, type=“vector”)

Page 13: Predicting Customer Conversion with Random Forests

How’d it do?

Actual

Predicted no yes

no (1) 38,904(2) 1,018

(3) 3,444(4) 1,845yes

Naïve Accuracy: 11.7%

Decision Tree Precision: 34.8%

Page 14: Predicting Customer Conversion with Random Forests

Decision Tree Problems

•Overfitting the data (high variance)

•May not use all relevant features

Page 15: Predicting Customer Conversion with Random Forests

Random Forests

One Decision Tree

Many Decision Trees (Ensemble)

Page 16: Predicting Customer Conversion with Random Forests

Building RF

•Sample from the data

•At each split, sample from the available variables

•Repeat for each tree

Page 17: Predicting Customer Conversion with Random Forests

Motivations for RF

•Create uncorrelated trees

•Variance reduction

•Subspace exploration

Page 18: Predicting Customer Conversion with Random Forests

Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)

Most important parameters are:

Variable

Description Default

ntree Number of Trees 500

mtry Number of variables to randomly select at each node

• square root of # predictors for classification

• # predictors / 3 for regression

Page 19: Predicting Customer Conversion with Random Forests

How’d it do?

Naïve Accuracy: 11.7%

Random Forest • Precision: 64.5% (2541 / 3937)• Recall: 48% (2541 / 5289)

Actual

Predicted yes no

yes (1)2,541 (3) 2748

no (2) 1,396 (4) 38,526

Page 20: Predicting Customer Conversion with Random Forests

Tuning RF

rffit.1 <- tuneRF(X, y, mtryStart=1, stepFactor=2,improve=0.05)

Page 21: Predicting Customer Conversion with Random Forests

Benefits of RF

•Good accuracy with default settings

•Relatively easy to make parallel

•Many implementations

•R, Weka, RapidMiner, Mahout

Page 22: Predicting Customer Conversion with Random Forests

References

• A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.

• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.

• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.htm

• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.