Random Forests for Classification and Regression · Random Forests for Regression and...

90
Random Forests for Regression and Classification Adele Cutler Utah State University 1

Transcript of Random Forests for Classification and Regression · Random Forests for Regression and...

Page 1: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Random Forests for Regression and Classification

Adele CutlerUtah State University

1

Page 2: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Leo Breiman, 1928 - 2005

1954: PhD Berkeley (mathematics)

1960 -1967: UCLA (mathematics)

1969 -1982: Consultant

1982 - 1993 Berkeley (statistics)

1984 “Classification & Regression Trees” (with Friedman, Olshen, Stone)

1996 “Bagging”

2001 “Random Forests”

2

Page 3: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Impact

CART (1984) 27,926 citations

Bagging (1996)13,090

Random Forests (2001) 17,492 citations

Total 72,796

3

Page 4: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Random Forests for Regression and Classification

4

Page 5: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Outline

• Background.• Trees.• Bagging.• Random Forests.• Variable importance.• Partial dependence plots and interpretation of effects.• Proximity.• Visualization.• New developments.

5

Page 6: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

What is Regression?

Given data on predictor variables (inputs, X) and a continuous response variable (output, Y) build a model for:– Predicting the value of the response from the

predictors.– Understanding the relationship between the

predictors and the response.e.g. predict a person’s systolic blood pressure based on

their age, height, weight, etc.

6

Page 7: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Regression Examples

• Y: incomeX: age, education, sex, occupation, …

• Y: crop yieldX: rainfall, temperature, humidity, …

• Y: test scoresX: teaching method, age, sex, ability, …

• Y: selling price of homesX: size, age, location, quality, …

7

Page 8: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Regression Background

• Linear regression 𝑌𝑌 = β0 + β1𝑋𝑋 + ε• Multiple linear regression e.g.

𝑌𝑌 = β0 + β1𝑋𝑋1 + β2𝑋𝑋2 + ε• Nonlinear regression (parametric) e.g.

𝑌𝑌 = β0 + β1𝑋𝑋 + β1𝑋𝑋2 + ε• Nonparametric regression (smoothing)

– Kernel smoothing– B-splines– Smoothing splines– Wavelets

8

Data-driven

Page 9: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

9

Page 10: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

10

Page 11: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

11

Page 12: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

12

Page 13: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

13

Page 14: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

14

Page 15: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

15

Page 16: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

16

Page 17: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

What is Classification?

Given data on predictor variables (inputs, X) and a categorical response variable (output, Y) build a model for:– Predicting the value of the response from the

predictors.– Understanding the relationship between the

predictors and the response.e.g. predict a person’s 5-year-survival (yes/no) based

on their age, height, weight, etc.

17

Page 18: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Examples

• Y: presence/absence of diseaseX: diagnostic measurements

• Y: land cover (grass, trees, water, roads…)X: satellite image data (frequency bands)

• Y: loan defaults (yes/no)X: credit score, own or rent, age, marital status, …

• Y: dementia statusX: scores on a battery of psychological tests

18

Page 19: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Background

• Linear discriminant analysis (1930’s)• Logistic regression (1944)• Nearest neighbors classifiers (1951)

19

Page 20: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Picture

20

Page 21: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Picture

21

Page 22: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Picture

22

Page 23: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Picture

23

Page 24: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Picture

24

Page 25: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Picture

25

Page 26: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Picture

26

Page 27: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Picture

27

Page 28: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification Picture

28

Page 29: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Regression and Classification

Given data D = { (xi,yi), i=1,…,n} where xi =(xi1,…,xip), build a model 𝑓𝑓 so that

�𝑌𝑌 = 𝑓𝑓 (X) for random variables X = (X1,…,Xp) and Y.

Then 𝑓𝑓 will be used for:– Predicting the value of the response from the predictors: �y0 = 𝑓𝑓(x0) where x0 = (x01,…,x0p).

– Understanding the relationship between the predictors and the response.

29

Page 30: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Assumptions

• Independent observations– Not autocorrelated over time or space– Not usually from a designed experiment– Not matched case-control

• Goal is prediction and (sometimes) understanding– Which predictors are useful? How? Where?– Is there any “interesting” structure?

30

Page 31: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Predictive Accuracy

• Regression– Expected mean squared error

• Classification– Expected (classwise) error rate

31

Page 32: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Estimates of Predictive Accuracy

• Resubstitution– Use the accuracy on the training set as an

estimate of generalization error. • AIC etc

– Use assumptions about model.• Crossvalidation

– Randomly select a training set, use the rest as the test set.

– 10-fold crossvalidation.

32

Page 33: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

10-Fold Crossvalidation

Divide the data at random into 10 pieces, D1,…,D10.• Fit the predictor to D2,…,D10; predict D1.• Fit the predictor to D1,D3,…,D10; predict D2.• Fit the predictor to D1,D2,D4,…,D10; predict D3.• …• Fit the predictor to D1,D2,…,D9; predict D10.

Compute the estimate of predictive accuracy using the assembled predictions and their observed values.

33

Page 34: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Estimates of Predictive Accuracy

Typically, resubstitution estimates are optimistic compared to crossvalidation estimates.

Crossvalidation estimates tend to be pessimistic because they are based on smaller samples.

Random Forests has its own way of estimating predictive accuracy (“out-of-bag” estimates).

34

Page 35: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Case Study: Cavity Nesting birds in the Uintah Mountains, Utah

• Red-naped sapsucker (Sphyrapicus nuchalis)(n = 42 nest sites)

• Mountain chickadee • (Parus gambeli) (n = 42 nest sites)

• Northern flicker (Colaptes auratus)(n = 23 nest sites)

• n = 106 non-nest sites

35

Page 36: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

• Response variable is the presence (coded 1) or absence (coded 0) of a nest.

• Predictor variables (measured on 0.04 ha plots around the sites) are:– Numbers of trees in various size classes from less than

1 inch in diameter at breast height to greater than 15 inches in diameter.

– Number of snags and number of downed snags.– Percent shrub cover.– Number of conifers.– Stand Type, coded as 0 for pure aspen and 1 for mixed

aspen and conifer.

Case Study: Cavity Nesting birds in the Uintah Mountains, Utah

36

Page 37: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Assessing Accuracy in Classification

ActualClass

Predicted Class

TotalAbsence Presence

0 1Absence, 0 a b a+b

Presence, 1 c d c+dTotal a+c b+d n

37

Page 38: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Assessing Accuracy in Classification

Error rate = ( c + b ) / nFor class 1: b/(a+b)For class 2: c/(c+d)

ActualClass

Predicted Class

TotalAbsence Presence

0 1Absence, 0 a b a+b

Presence, 1 c d c+dTotal a+c b+d n

38

Page 39: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Resubstitution Accuracy (fully grown tree)

Error rate = ( 0 + 1 )/213 = (approx) 0.005 or 0.5%

ActualClass

Predicted Class

TotalAbsence Presence

0 1Absence, 0 105 1 106

Presence, 1 0 107 107Total 105 108 213

39

Page 40: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Crossvalidation Accuracy (fully grown tree)

Error rate = ( 22 + 23 )/213 = (approx) .21 or 21%

ActualClass

Predicted Class

TotalAbsence Presence

0 1Absence, 0 83 23 106

Presence, 1 22 85 107Total 105 108 213

40

Page 41: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Outline

• Background.• Trees.• Bagging predictors.• Random Forests algorithm.• Variable importance.• Proximity measures.• Visualization.• Partial plots and interpretation of effects.

41

Page 42: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification and Regression TreesPioneers:• Morgan and Sonquist (1963).• Breiman, Friedman, Olshen, Stone (1984). CART• Quinlan (1993). C4.5

42

Page 43: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification and Regression Trees

• Grow a binary tree.• At each node, “split” the data into two

“daughter” nodes.• Splits are chosen using a splitting criterion.• Bottom nodes are “terminal” nodes.

43

Page 44: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification and Regression Trees

• For regression the predicted value at a node is the average response variable for all observations in the node.

• For classification the predicted class is the most common class in the node (majority vote).

• For classification trees, can also get estimated probability of membership in each of the classes

44

Page 45: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

A Classification TreePredict hepatitis (0=absent, 1=present)using protein and alkaline phosphate.

“Yes” goes left.

|protein< 45.43

protein>=26

alkphos< 171

protein< 38.59alkphos< 129.40

19/0 04/0

11/2

11/4

10/3

17/114

45

Page 46: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Splits are chosen to minimize a splitting criterion:

• Regression: residual sum of squares RSS = ∑left (yi – yL*)2 + ∑right (yi – yR*)2

where yL* = mean y-value for left node yR* = mean y-value for right node

• Classification: Gini criterion Gini = NL ∑k=1,…,K pkL (1- pkL) + NR ∑k=1,…,K pkR (1- pkR)

where pkL = proportion of class k in left nodepkR = proportion of class k in right node

46

Page 47: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Choosing the best horizontal split

Best horizontal split is at 3.67 with RSS = 68.09.

47

RSS

Page 48: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Choosing the best vertical split

Best vertical split is at 1.05 with RSS = 61.76.

48

Page 49: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Regression tree (prostate cancer)

49

Page 50: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Choosing the best split in the left node

Best horizontal split is at 3.66 with RSS = 16.11.

50

Page 51: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Choosing the best split in the left node

Best vertical split is at -.48 with RSS = 13.61.

51

Page 52: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Regression tree (prostate cancer)

52

Page 53: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Choosing the best split in the right node

Best horizontal split is at 3.07 with RSS = 27.15.

53

Page 54: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Choosing the best split in the right node

Best vertical split is at 2.79 with RSS = 25.11.

54

Page 55: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Regression tree (prostate cancer)

55

Page 56: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Choosing the best split in the third node

Best horizontal split is at 3.07 with RSS = 14.42, but this is too close to the edge. Use 3.46 with RSS = 16.14.

56

Page 57: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Choosing the best split in the third node

Best vertical split is at 2.46 with RSS = 18.97.

57

Page 58: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Regression tree (prostate cancer)

58

Page 59: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Regression tree (prostate cancer)

59

Page 60: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Regression tree (prostate cancer)

lcavollweight

lpsa

60

Page 61: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification tree (hepatitis)|protein< 45.43

protein>=26

alkphos< 171

protein< 38.59alkphos< 129.40

19/0 04/0

11/2

11/4

10/3

17/11

61

Page 62: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification tree (hepatitis)|protein< 45.43

protein>=26

alkphos< 171

protein< 38.59alkphos< 129.40

19/0 04/0

11/2

11/4

10/3

17/11

62

Page 63: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification tree (hepatitis)|protein< 45.43

protein>=26

alkphos< 171

protein< 38.59alkphos< 129.40

19/0 04/0

11/2

11/4

10/3

17/11

63

Page 64: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification tree (hepatitis)|protein< 45.43

protein>=26

alkphos< 171

protein< 38.59alkphos< 129.40

19/0 04/0

11/2

11/4

10/3

17/11

64

Page 65: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification tree (hepatitis)|protein< 45.43

protein>=26

alkphos< 171

protein< 38.59alkphos< 129.40

19/0 04/0

11/2

11/4

10/3

17/11

65

Page 66: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Pruning

• If the tree is too big, the lower “branches” are modeling noise in the data (“overfitting”).

• The usual paradigm is to grow the trees large and “prune” back unnecessary splits.

• Methods for pruning trees have been developed. Most use some form of crossvalidation. Tuning may be necessary.

66

Page 67: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Case Study: Cavity Nesting birds in the Uintah Mountains, Utah

Choose cp = .035

67

Page 68: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Crossvalidation Accuracy (cp = .035)

ActualClass

Predicted Class

TotalAbsence Presence

0 1Absence, 0 85 21 106

Presence, 1 19 88 107Total 104 109 213

Error rate = ( 19 + 21 )/213 = (approx) .19 or 19%

68

Page 69: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification and Regression TreesAdvantages• Applicable to both regression and classification problems.

• Handle categorical predictors naturally.

• Computationally simple and quick to fit, even for large problems.

• No formal distributional assumptions (non-parametric).

• Can handle highly non-linear interactions and classification boundaries.

• Automatic variable selection.

• Handle missing values through surrogate variables.

• Very easy to interpret if the tree is small.

69

Presenter
Presentation Notes
Arguably one of the most successful tools of the last 20 years. Why?
Page 70: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification and Regression Trees

Advantages (ctnd)

• The picture of the tree can give valuable insights into which variables are important and where.

• The terminal nodes suggest a natural clustering of data into homogeneous groups.

|protein< 45.43

fatigue< 1.5

alkphos< 171

age>=28.5

albumin< 2.75varices< 1.

firm>=1.5

024/0

10/2

11/4

10/3

02/0

02/1

10/4

13/109

70

Presenter
Presentation Notes
Arguably one of the most successful tools of the last 20 years. Why?
Page 71: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Classification and Regression Trees

Disadvantages• Inaccuracy - current methods, such as support vector

machines and ensemble classifiers often have 30% lower error rates than CART.

• Instability – if we change the data a little, the tree picture can change a lot. So the interpretation is not as straightforward as it appears.

Today, we can do better!Random Forests

71

Presenter
Presentation Notes
Arguably one of the most successful tools of the last 20 years. Why?
Page 72: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Outline

• Background.• Trees.• Bagging predictors.• Random Forests algorithm.• Variable importance.• Proximity measures.• Visualization.• Partial plots and interpretation of effects.

72

Page 73: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Data and Underlying Function

-3 -2 -1 0 1 2 3

-1.0

-0.5

0.0

0.5

1.0

73

Page 74: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Single Regression Tree

-3 -2 -1 0 1 2 3

-1.0

-0.5

0.0

0.5

1.0

74

Page 75: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

10 Regression Trees

-3 -2 -1 0 1 2 3

-1.0

-0.5

0.0

0.5

1.0

75

Page 76: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Average of 100 Regression Trees

-3 -2 -1 0 1 2 3

-1.0

-0.5

0.0

0.5

1.0

76

Page 77: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Hard problem for a single tree:

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0

1

1

0

1

1

0

0

1

0

0

1

1

0

0

1

1

1

00

1

0

1

1

0

1

0

1

0

0 0

1

1

1

1

1

0

0

1

1

0

1

0

01

1

1

01

0

0

1

0

0

0

1

0

1

110

00

1

1

0

1

1

0

10

0

1

1

0

0

1

0

0

1

0

1

1

0

1

0

0

0

0

1

0

0

0

1

0

0

0

1

1

0

0

0

0

1

11

1

0

00

0

0

11

0 0

1

0

1

0

1

1

1

0

0

1

0

0

1

0

1

1

1

10

1

1

0

1

1

1

00

1

1

0

11

11

00

0

0

0

1 1

0

0

0

0

0

0

11 1

0

1

0

1

0

0

1

0

0

01

0

111

1

10

1

0

0

10

0

1

1

0

1

1

0

1

0

0

1

0

1

1

1

0

11

0

1

0

0

1

11

0

1

1

0

0

0

1

0

1

0

1

0

1

0

0

1

1

1

1

00

0

0

1

0

0

11

0

0

11

0

0

1

1

1

1

1

0 1

01

0

1

1

0

1

0

11

0

0

1

0

1

0

0

1

1

0

1

1

1

1

0

1

0

0

0

1

1

1

1

1

0

1

0

0

11

0

1

1

1

0

1

1

1

1

0

0

0

1

11

0

0

1

1

1

0

0

1

1

0 00

1

0

0

1

0

00

1

1

1

0

10

00

1

1

0

0

0

10

0

1 1

1

1

0

0

0

1

1

1

00

1

1

0

0

1

1

0 0

1

0

0

0

1

0

0

1

1

00 0

1

1 1

1

00

1

0

0

00

1

0

1

1

0

0

0

0

0

100

0

0

1

00

0

1

1

0

0 1

000

0

10

1

1

1

0

1

1

1

0

0

1

10

11

1

0

1

0

1

1

1

1

10

0

0

1

0

1

0

1

1

10

0

0

1

1

1

0

1

1

1

0

0

0

1

0

11

0

1

0

0

0

00

1

0

10

1

0

1 1

0

1

1

0

1

0

1

1

0

0

0

1

0

0

0

0

0

1

77

Page 78: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Single tree:

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

78

Page 79: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

25 Averaged Trees:

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

79

Page 80: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

25 Voted Trees:

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

80

Page 81: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Bagging (Bootstrap Aggregating)

Breiman, “Bagging Predictors”, Machine Learning, 1996.

Fit classification or regression models to bootstrap samples fromthe data and combine by voting (classification) or averaging (regression).

Bootstrap sample ➾ f1(x)Bootstrap sample ➾ f2(x)Bootstrap sample ➾ f3(x) Combine f1(x),…, fM(x) ➾ f(x) …Bootstrap sample ➾ fM(x) fi(x)’s are “base learners”

MODEL AVERAGING

81

Page 82: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Bagging (Bootstrap Aggregating)

• A bootstrap sample is chosen at random with replacement from the data. Some observations end up in the bootstrap sample more than once, while others are not included (“out of bag”).

• Bagging reduces the variance of the base learner but has limited effect on the bias.

• It’s most effective if we use strong base learners that have very little bias but high variance (unstable). E.g. trees.

• Both bagging and boosting are examples of “ensemble learners” that were popular in machine learning in the ‘90s.

82

Page 83: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Bagging CARTDataset # cases # vars # classes CART Bagged

CARTDecrease

%

Waveform 300 21 3 29.1 19.3 34

Heart 1395 16 2 4.9 2.8 43

Breast Cancer 699 9 2 5.9 3.7 37

Ionosphere 351 34 2 11.2 7.9 29

Diabetes 768 8 2 25.3 23.9 6

Glass 214 9 6 30.4 23.6 22

Soybean 683 35 19 8.6 6.8 21

83

Leo Breiman (1996) “Bagging Predictors”, Machine Learning, 24, 123-140.

Page 84: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Outline

• Background.• Trees.• Bagging predictors.• Random Forests algorithm.• Variable importance.• Proximity measures.• Visualization.• Partial plots and interpretation of effects.

84

Page 85: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Random ForestsDataset # cases # vars # classes CART Bagged

CARTRandom

Forests

Waveform 300 21 3 29.1 19.3 17.2

Breast Cancer 699 9 2 5.9 3.7 2.9

Ionosphere 351 34 2 11.2 7.9 7.1

Diabetes 768 8 2 25.3 23.9 24.2

Glass 214 9 6 30.4 23.6 20.6

85

Leo Breiman (2001) “Random Forests”, Machine Learning, 45, 5-32.

Page 86: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Random ForestsGrow a forest of many trees. (R default is 500)

Grow each tree on an independent bootstrap sample* from the training data.

At each node:1. Select m variables at random out of all M possible

variables (independently for each node).2. Find the best split on the selected m variables.

Grow the trees to maximum depth (classification).

Vote/average the trees to get predictions for new data.

*Sample N cases at random with replacement.

86

Presenter
Presentation Notes
Regression: m = p/3 classification sqrt(p).
Page 87: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Random ForestsInherit many of the advantages of CART:• Applicable to both regression and classification problems. Yes.

• Handle categorical predictors naturally. Yes.

• Computationally simple and quick to fit, even for large problems. Yes.

• No formal distributional assumptions (non-parametric). Yes.

• Can handle highly non-linear interactions and classification boundaries.Yes.

• Automatic variable selection. Yes. But need variable importance too.

• Handles missing values through surrogate variables. Using proximities.

• Very easy to interpret if the tree is small. NO!

87

Presenter
Presentation Notes
With 100 variables, 100 trees in a forest can be grown in the same time as growing 3 single CART trees.
Page 88: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Random Forests

But do not inherit:

• The picture of the tree can give valuable insights into which variables are important and where.

• The terminal nodes suggest a natural clustering of data into homogeneous groups.

|protein< 50.5

albumin< 3.8

alkphos< 171

fatigue< 1.5

bilirubin< 0.65

alkphos< 71.5

025/0

10/2

10/5

10/8

03/1

10/9

10/102

|protein< 46.5

albumin< 3.9

alkphos< 191

bilirubin< 0.65

alkphos< 71.5 varices< 1.5

firm>=1.5

021/1

10/2

10/7

02/0

11/11

02/0

10/6

10/102

|protein< 45.43

prog>=1.5

fatigue< 1.5 sgot>=123.8

025/0

10/2

02/0

10/11

11/114

|protein< 45.43

bilirubin>=1.8

alkphos< 149

albumin< 3.9

albumin< 2.75

varices< 1.5

bilirubin>=1.8021/0

09/0

10/4

10/7

03/0

04/0

10/7

12/98

|protein< 45

alkphos< 171

fatigue< 1.5

bilirubin>=3.65

bilirubin< 0.5

sgot< 29protein< 66.9

age< 50

021/1

10/2

10/8

03/1

02/0

05/0

10/2

10/20

11/89

|protein< 45.43

sgot>=62

prog>=1.5

bilirubin>=3.65

019/1

04/1

10/7

03/0

14/116

NO!

NO!

88

Presenter
Presentation Notes
Arguably one of the most successful tools of the last 20 years. Why?
Page 89: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

Random Forests

Improve on CART with respect to:

• Accuracy – Random Forests is competitive with the best known machine learning methods (but note the “no free lunch” theorem).

• Stability – if we change the data a little, the individual trees may change but the forest is relatively stable because it is a combination of many trees.

89

Presenter
Presentation Notes
Arguably one of the most successful tools of the last 20 years. Why?
Page 90: Random Forests for Classification and Regression · Random Forests for Regression and Classification Adele Cutler Utah State University. 1. Leo Breiman, 1928 - 2005 1954: PhD Berkeley

References• Leo Breiman, Jerome Friedman, Richard Olshen, Charles Stone (1984)

“Classification and Regression Trees” (Wadsworth).• Leo Breiman (1996) “Bagging Predictors” Machine Learning, 24, 123-140.• Leo Breiman (2001) “Random Forests” Machine Learning, 45, 5-32.• Trevor Hastie, Rob Tibshirani, Jerome Friedman (2009) “Statistical Learning”

(Springer).

90