Fitting Models to Data
description
Transcript of Fitting Models to Data
Fitting Models to DataLinear and Quadratic Discriminant Analysis Decision Trees
Year What Notes Who1963 AID: Automatic Interaction Detector Continuous James Morgan
John Sonquist
1973 THAID: THeta AID Categorical James Morgan Robert Messenger
1980 CHAID: CHi-Square AID Multiple Splits Kass
1984 CART: Classification and Regression Trees
Popular Approach Leo Breiman
1986 Iterative Dichotomiser 3 (ID3) Categorical Quinlan Ross
1994 C4.5 Algorithm Continuous and Categorical Quinlan Ross
1994 Bagging Resampling Leo Breiman
Boosting Cascading Small Trees Rob SchapireJerry Friedman
2001 Random Forests Many trees Leo BreimanAdele Cutler
AID: Automatic Interaction Detector
AssociationCo-Occurence
CHAID
CART: Classification and Regression Trees CART family is oriented to statistics using the concept of impurityMeasures how well are the two classes separated – Ideally we would like toseparate all 0s and 1
http://freakonometrics.hypotheses.org/1279
Fitting Models to Data
OverFitting
Bagging• Builds multiple decision trees by repeatedly
resampling training data with replacement
• Fit a Model to each Sample• Voting across the trees for a consensus prediction.
• Learns slowly• Given the current model, we fit a decision tree to the
residuals (misclassifications) from the model. • We then add this new decision tree into the fitted
function in order to update the residuals.• Each of these trees can be rather small, with just a
few terminal nodes, determined by the parameter d in the algorithm.• By fitting small trees to the residuals, we slowly
improve fit in areas where it does not perform well
Boosting
Random Forests
http://www.stat.berkeley.edu/~breiman/RandomForests/
Gradient Boosting
Many AlgorithmsDecision Trees
rpart (CART)tree (CART)ctree (conditional inference tree)CHAID (chi-squared automatic interaction detection)evtree (evolutionary algorithm)mvpart (multivariate CART)knnTree (nearest-neighbor-based trees)RWeka (J4.8, M50, LMT)LogicReg (Logic Regression)BayesTreeTWIX (with extra splits)party (conditional inference trees, model-based trees)
Random ForestsrandomForest(CART-based random forests)randomSurvivalForest(for censored responses)party(conditional random forests)gbm(tree-based gradient boosting)mboost(model-based and tree-based gradient boosting)