Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao....

32
Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks Ensemble Learning C. Andy Tsao Institute of Statistics/Department of Applied Math National Dong Hwa University, Hualien March, 2015 Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao

Transcript of Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao....

Page 1: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Ensemble Learning

C. Andy Tsao

Institute of Statistics/Department of Applied MathNational Dong Hwa University, Hualien

March, 2015Kaohsiung, Taiwan

Ensemble Learning C. Andy Tsao

Page 2: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Outline

Overview

Regression: theme and variations

CART and tree-based methods

Random Forests

Boosting: AdaBoost and its variants

Concluding Remarks

Ensemble Learning C. Andy Tsao

Page 3: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Ensemble Learning

Figure : What are they?

Ensemble Learning C. Andy Tsao

Page 4: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Figure : Machine Learning

Ensemble Learning C. Andy Tsao

Page 5: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Figure : (Jazz) Ensemble

Ensemble Learning C. Andy Tsao

Page 6: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Figure : Ensemble 13

Ensemble Learning C. Andy Tsao

Page 7: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Supervised Learning

Training data:{(xi , yi )}ni=1, where xi ∈ X ⊂ Rp and yi ∈ Y = {±1} forclassification;(Y = R = (−∞, ∞) for regression)

Testing (generalization) data: {(x ′j , y ′j )}mj=1

Data: (x , y)from← (X ,Y )

iid∼ PX ,Y

Machine or classifier: F ∈ F such that F : X → Y

Training error:

TE =1

n

n

∑i=1

1[yi 6=F (xi )]=

1

n

n

∑i=1

1[yi F (xi )<0]

Testing (generalization) error:

GE =1

m

m

∑j=1

1[y ′j F (x ′j )<0] and GE = EX ,Y {1[YF (X )<0]}

Ensemble Learning C. Andy Tsao

Page 8: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Supervised Learning

Training data:{(xi , yi )}ni=1, where xi ∈ X ⊂ Rp and yi ∈ Y = {±1} forclassification;(Y = R = (−∞, ∞) for regression)

Testing (generalization) data: {(x ′j , y ′j )}mj=1

Data: (x , y)from← (X ,Y )

iid∼ PX ,Y

Machine or classifier: F ∈ F such that F : X → YTraining error:

TE =1

n

n

∑i=1

1[yi 6=F (xi )]=

1

n

n

∑i=1

1[yi F (xi )<0]

Testing (generalization) error:

GE =1

m

m

∑j=1

1[y ′j F (x ′j )<0] and GE = EX ,Y {1[YF (X )<0]}

Ensemble Learning C. Andy Tsao

Page 9: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Supervised Learning-II

With respect to a loss L

TE (F ) =1

n

n

∑i=1

L(yi ,F (xi )), GE (F ) =1

m

m

∑j=1

L(y ′j ,F (x′j ))

again GE is an estimate for EY ,XL(Y ,F (X )).

For regression, L(y ,F (x)) = (y − F (x))2 is widely used.

Ensemble Learning C. Andy Tsao

Page 10: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Regression-theme

(Classical) Regression

Data: (xi , yi )ni=1, where xi ∈ X = Rp, yi ∈ Y = R.Distribution: (yi |xi )ni=1 ∼indep. PY |x .

Class of learners: F = {f (X ) : f (X ) = β0 + β′X =β0 + ∑p

j=1 βjXj , for β0 ∈ R, β ∈ Rp}.Construction: Least square errors (LSE)

SSE (F ) = ||Y − Y ||2 =n

∑i=1

(yi − F (xi ))2 = min

F∈F

n

∑i=1

(yi −F (xi ))2

where F (x) = β0 + β′x .

Evaluation: Sum of square errors (SSE or equivalently MSE).

Ensemble Learning C. Andy Tsao

Page 11: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Regression-v-class

Naive regression (Classification)

Data:(xi , yi )ni=1, where xi ∈ X = Rp, yi ∈ Y = {±1}.Distribution: (yi , xi )ni=1 ∼indep. PY ,X .

Class of learners: F , the collection of linear functions of1,X1, · · · ,Xp.

Construction: Least square errors (LSE)

Evaluation: TE, GE (with respect to zero-one loss function)

TE =1

n

n

∑i=1

1[yi F (xi )<0], GE =1

m

m

∑j=1

1[y ′j F (x ′j )<0]

Ensemble Learning C. Andy Tsao

Page 12: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Regression-v-random

(random ensemble) Regression

Data: (xi , yi )ni=1, where xi ∈ X = Rp, yi ∈ Y = R.Distribution: (yi , xi )ni=1 ∼indep. PY ,X .

Class of base learners: FB = {1,X1, · · · ,Xp}Construction: Random subset regression

For k = 1, 2, · · · ,K

1 Randomly choose m base learners f ∈ FB , m < p + 1.2 Fit a (subset) regression (LSE): fk , i.e. regress Y on the

chosen m independent variables.

F = ∑Kk=1 wk fk where w ’s are the weights of the k-th learner.

Usually ∑k wk = 1, 0 < wk < 1.

Evaluation: Sum of square errors (SSE or equivalently meansquare errors MSE)

Ensemble Learning C. Andy Tsao

Page 13: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Random (Average) Regression

Data:(xi , yi )ni=1, where xi ∈ X = Rp, yi ∈ Y = R = (−∞, ∞).Distribution: (yi , xi )ni=1 ∼indep. PY ,X .

Class of base learners:F = {1,X1, · · · ,Xp}Construction: Random subset regression

Repeat 1, 2 for K times

1 Randomly choose m X’s from F , m < p + 1.2 Fit(LSE) a (subset) regression: fk , i.e. regress Y on the

chosen m independent variables.

F (x) = 1K ∑K

k=1 fk (x).

Evaluation: SSE or MSE

Ensemble Learning C. Andy Tsao

Page 14: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Recap

Random Average Scheme (RA) + Base Learners

Random Average Regression: RA+(subset) Regression

Random Forests: RA+(subset) CART

Boosting and variations

Weighted average of CARTs with reweighted data-feeds

The population version of AdaBoost is a Newton-like updatesfor minimizing exponential criterion EY |x{e−YF (x)} (loss forconstruction)

Friedman, Hastie and Tibishirani (2000)

Ensemble Learning C. Andy Tsao

Page 15: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

CART Algorithm (Regression)

Data: (xi , yi )ni=1, wherexi ∈ X = Rp, yi ∈ Y = R = (−∞, ∞) wherexi = (xi1, · · · , xip)′, i = 1, · · · , n.Greedy recursive binary partition

1 Find the split variable/point (j , s) that solve

SSE1 = minj ,t

minc1

∑xi∈R1(j ,t)

(yi − c1)2 + min

c2∑

xi∈R2(j ,t)

(yi − c2)2

(1)

where R1(j , t) = {X |Xj ≤ t},R2(j , t) = {X |Xj > t}2 Given (j , t), (c1, c2) solves the inner minimization and

cl = ave(yi |xi ∈ Rl (j , t)), l = 1, 2.3 Continue adding split one at a time R1, · · · ,RM

F (x) = ∑Mm=1 ˆcm1[x∈Rm ].

Evaluation: SSE or MSE

Hastie, Tibshirani and Friedman (2001).Ensemble Learning C. Andy Tsao

Page 16: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

RF and Boosting R package

Titanic: Getting Started With R

rpart, randomForest packages at CRAN

Boosting algorithms

ada, adabag, gbm, mboost packages

Link: Rstudio, kaggle

Ensemble Learning C. Andy Tsao

Page 17: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Random Forests (for regression)

Data: (xi , yi )ni=1, where xi ∈ X = Rp, yi ∈ Y = R.Distribution: (yi , xi )ni=1 ∼iid PY ,X .

Class of base learners:FB : the collection of regression trees with independentvariables being a subset of X1, · · · ,Xp.

Construction: Random ForestsFor k = 1, 2, · · · ,K

1 Draw a bootstrap sample Z ∗ of size n from the data.2 Randomly choose m independent variables among X1, · · · ,Xp

3 Fit a regression tree (CART): i.e. Construct fk for Y on thechosen m independent variables with Z ∗

F = 1K ∑K

k=1 fk .

Evaluation criterion: SSE, MSE.

Ensemble Learning C. Andy Tsao

Page 18: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Random Forests (classification)

The classification algorithm is similar except

class of base learners = the collection of classification trees

In choosing the split variable/point, square error is substitutedby classification impurity measures, say misclassification error.

F (x) = ∑Mm=1 cm1[x∈Rm ] where cm = majority(yi |xi ∈ Rm).

Ensemble Learning C. Andy Tsao

Page 19: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

AdaBoost-I

Given (xi , yi )ni=1 where xi ∈ X and yi ∈ Y = {±1}.Algorithm

1 Initialize D1(i) = n−1 for i = 1, 2, · · · , nDt(i) = the weight on the i-th case in the t-th iteration.

2 For t = 1, 2, · · · ,T

Construct the (trained) learner ht : X → Y using the weightDt on the data.Compute the error εt = ∑n

i=1Dt(i)1[ht (xi )yi<0].

Then αt = ln( 1−εtεt

).

Update the weights Dt+1(i) = Dt(i)eαt1[ht (xi )yi<0] ∀i

Normalization Dt+1(i) = Dt+1(i) (∑ni=1Dt+1(i))

−1 ∀i

3 Output the final hypothesis F (x) = sgn[∑T

t=1 αtht(x)]

.

Ensemble Learning C. Andy Tsao

Page 20: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

ABC about Boosting

Start with a simple type learner (reg, tree, etc)

Train the learner several times (learning process) and obtainthe fitted learner using data

Higher weights to misclassified cases each time

Final prediction: weighted majority vote

Criteria: TE: Training error and GE: Testing error

Under weak base hypothesis assumption, TE → 0.

Rather immune to overfitting for less noisy data in apps.

Breiman (1996): “Best off-the-shelf classifier in the world”

Ensemble Learning C. Andy Tsao

Page 21: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

ABC about Boosting

Start with a simple type learner (reg, tree, etc)

Train the learner several times (learning process) and obtainthe fitted learner using data

Higher weights to misclassified cases each time

Final prediction: weighted majority vote

Criteria: TE: Training error and GE: Testing error

Under weak base hypothesis assumption, TE → 0.

Rather immune to overfitting for less noisy data in apps.

Breiman (1996): “Best off-the-shelf classifier in the world”

Ensemble Learning C. Andy Tsao

Page 22: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

ABC about Boosting

Start with a simple type learner (reg, tree, etc)

Train the learner several times (learning process) and obtainthe fitted learner using data

Higher weights to misclassified cases each time

Final prediction: weighted majority vote

Criteria: TE: Training error and GE: Testing error

Under weak base hypothesis assumption, TE → 0.

Rather immune to overfitting for less noisy data in apps.

Breiman (1996): “Best off-the-shelf classifier in the world”

Ensemble Learning C. Andy Tsao

Page 23: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

FHT’s Results

Friedman, Hastie, and Tibishirani (2000)

The population version of AdaBoost is a Newton-like updatesfor minimizing exponential criterion EY |x{e−YF (x)}.

Let Je(F ) = EY |x{e−YF (x)} and

arg minf

Je(F + f ) arg mins,c

Je(F + sc),

where Je is an approximation of Je .

The solutions are

s =

{+1, if Ewe{Y |x} > 0,−1, otherwise,

and c =1

2ln

(1− err

err

),

where we = e−YF (x) and err = Ewe

{1[Y 6=s(x)]

}.

Ensemble Learning C. Andy Tsao

Page 24: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

FHT’s Results

Friedman, Hastie, and Tibishirani (2000)

The population version of AdaBoost is a Newton-like updatesfor minimizing exponential criterion EY |x{e−YF (x)}.Let Je(F ) = EY |x{e−YF (x)} and

arg minf

Je(F + f ) arg mins,c

Je(F + sc),

where Je is an approximation of Je .

The solutions are

s =

{+1, if Ewe{Y |x} > 0,−1, otherwise,

and c =1

2ln

(1− err

err

),

where we = e−YF (x) and err = Ewe

{1[Y 6=s(x)]

}.

Ensemble Learning C. Andy Tsao

Page 25: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

FHT’s Results

Friedman, Hastie, and Tibishirani (2000)

The population version of AdaBoost is a Newton-like updatesfor minimizing exponential criterion EY |x{e−YF (x)}.Let Je(F ) = EY |x{e−YF (x)} and

arg minf

Je(F + f ) arg mins,c

Je(F + sc),

where Je is an approximation of Je .

The solutions are

s =

{+1, if Ewe{Y |x} > 0,−1, otherwise,

and c =1

2ln

(1− err

err

),

where we = e−YF (x) and err = Ewe

{1[Y 6=s(x)]

}.

Ensemble Learning C. Andy Tsao

Page 26: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Exponential Loss

The 0-1 loss can be bounded by exponential loss

1[Y 6=F (x)] = 1[YF (x)<0] ≤ e−YF (x)def= Le [Y ,F (x)].

−0.5 0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5

YF(X)

Loss

Exponential Loss0−1 Loss

Alternative losses can be used for classification and regressionproblems.

Ensemble Learning C. Andy Tsao

Page 27: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Concluding remarks

Framework

RF, Boosting are powerful and better ”off-the-shelf”(?)learners

High dimensional problem ready

Ensemble Learning C. Andy Tsao

Page 28: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Figure : What does a Data Scientist do?

Ensemble Learning C. Andy Tsao

Page 29: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Thanks for your attention!

Ensemble Learning C. Andy Tsao

Page 30: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

References

Biau, G., Devroye, L. and Lugosi, G. (2008). Consistency of randomforests and other average classifiers, Journal of Machine Learningand Research, 9, 2039–2057.Breiman, L. (2000). Some infinite theory for predictor ensembles.Technical Report 577, Statistics Department, UC Berkeley, 2000.Breiman, L. (2001). Random Forests, Machine Learning, 45, 5–32.Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984).Classification and Regression Trees. Wadsworth.Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Element ofStatistical Learning: Data mining, inference and prediction.Springer-Verlag.Friedman, J. H., Hastie, T. and Tibishirani, R. (2000).Additive logistic regression: a statistical view of boosting. Annals ofStatistics, 28, 337–407.Hong, B.-Z. (2013). Random Average Regression Methods. MasterThesis. National Dong Hwa University. Taiwan.Tsao, C. A. (2014). A Statistical Introduction to Ensemble LearningMethods. Journal of Chinese Statistical Association; 52, 115–132.

Ensemble Learning C. Andy Tsao

Page 31: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

Performance of Average Learner

For the training data D = (yi , xi )ni=1, the MSE for learner fk is

MSE (fk) =1

n

n

∑i=1

(yi − fk(xi ))2.

Let wk = the weight of fk with ∑Kk=1 wk = 1 and 0 < wk < 1.

Note

EwMSE (fk) =K

∑k=1

wk

(1

n

n

∑i=1

(yi − fk(xi ))2

)

≥ 1

n

n

∑i=1

(yi − Ew fw (xi ))2 = MSE (Ew fw )

where Ew fw (x) = ∑Kk=1 wk fk(x).

Ensemble Learning C. Andy Tsao

Page 32: Kaohsiung, Taiwan - WordPress.com · Kaohsiung, Taiwan Ensemble Learning C. Andy Tsao. OverviewRegression: theme and variationsCARTRF and Boosting in R Random Forests BoostingConcluding

Overview Regression: theme and variations CART RF and Boosting in R Random Forests Boosting Concluding remarks

When wk = 1/K , Ew fw (x) is the average of the K learners.

MSE (Ew fw ) ≤ averagekMSE (fk) =1

K ∑k

MSE (fk).

And it is not necessarily

MSE (Ew fw ) ≤ MSE (fk) for all k.

Ensemble Learning C. Andy Tsao