Predicting Customer Conversion with Random Forests
-
Upload
enplus-advisors-inc -
Category
Documents
-
view
1.724 -
download
2
description
Transcript of Predicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random Forests
Daniel Gerlanc, PrincipalEnplus Advisors, [email protected]
A Decision Trees Case Study
Topics
Objectives Research Question
DataBank Prospect
Conversion
MethodsDecision Trees
Random Forests
Results
Objective
•Which customer or prospects should you call today?
•To whom should you offer incentives?
Dataset
•Direct Marketing campaign for bank loans
•http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
•45211 records, 17 features
Dataset
Decision Trees
Decision Trees
Coat
Sunny
yes
no
Windy
No Coat
Coat
Statistical Decision Trees
•Randomness
•May not know the relationships ahead of time
Decision Trees
Splitting
Deterministic process
Decision Tree Codetree.1 <- rpart(takes.loan ~ ., data=bank)
• See the ‘rpart’ and ‘rpart.plot’ R packages.• Many parameters available to control the fit.
Make Predictionspredict(tree.1, type=“vector”)
How’d it do?
Actual
Predicted no yes
no (1) 38,904(2) 1,018
(3) 3,444(4) 1,845yes
Naïve Accuracy: 11.7%
Decision Tree Precision: 34.8%
Decision Tree Problems
•Overfitting the data (high variance)
•May not use all relevant features
Random Forests
One Decision Tree
Many Decision Trees (Ensemble)
Building RF
•Sample from the data
•At each split, sample from the available variables
•Repeat for each tree
Motivations for RF
•Create uncorrelated trees
•Variance reduction
•Subspace exploration
Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)
Most important parameters are:
Variable
Description Default
ntree Number of Trees 500
mtry Number of variables to randomly select at each node
• square root of # predictors for classification
• # predictors / 3 for regression
How’d it do?
Naïve Accuracy: 11.7%
Random Forest • Precision: 64.5% (2541 / 3937)• Recall: 48% (2541 / 5289)
Actual
Predicted yes no
yes (1)2,541 (3) 2748
no (2) 1,396 (4) 38,526
Tuning RF
rffit.1 <- tuneRF(X, y, mtryStart=1, stepFactor=2,improve=0.05)
Benefits of RF
•Good accuracy with default settings
•Relatively easy to make parallel
•Many implementations
•R, Weka, RapidMiner, Mahout
References
• A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.
• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.
• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.htm
• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.