© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble...

23
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

Transcript of © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble...

Page 1: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Ensemble Methods

An ensemble method constructs a set of base classifiers from the training data

– Ensemble or Classifier Combination

Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

Page 2: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

General Idea

OriginalTraining data

....D1D2 Dt-1 Dt

D

Step 1:Create Multiple

Data Sets

C1 C2 Ct -1 Ct

Step 2:Build Multiple

Classifiers

C*Step 3:

CombineClassifiers

Page 3: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3

Why does it work?

Suppose there are 25 base classifiers

– Each classifier has error rate, = 0.35

– Assume classifiers are independent

– Probability that the ensemble classifier makes a wrong prediction:

25

13

25 06.0)1(25

i

ii

i

Page 4: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4

Methods

By manipulating the training dataset: a classifier is built for a sampled subset of the training dataset.

– Two ensemble methods: bagging (bootstrap averaging) and boosting.

Page 5: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5

Characteristics

Ensemble methods work better with unstable classifiers.

– Base classifiers that are sensitive to minor perturbations in the training set. For example, decision trees and ANNs.

– The variability among training examples is one of the primary sources of errors in a classifier.

Page 6: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6

Bias-Variance Decomposition

Consider the trajectories of a projectile launched at a particular angle. The observed distance can be divided into 3 components.

– Force (f) and angle (θ)

– Suppose the target is t, but the projectile hits at x at distance d away from t.

tff NoiseVarianceBaistyd ),(,

Page 7: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7

Two Decision Trees (1)

Page 8: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8

Two Decision Trees (2)

Bias: The stronger the assumptions made by a classifier about the nature of its decision boundary, the larger the classifier’s bias will be. – A smaller tree has a stronger assumption.– An algorithm cannot learn the target.

Variance: Variability in the training data affects the expected error, because different compositions of the training set may lead to different decision boundaries.

Intrinsic noise in the target class.– Target class for some domain can be non-

deterministic.– Same attributes values with different class labels.

Page 9: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9

Bagging

Sampling with replacement

Build classifier on each bootstrap sample

Each sample has probability 1 - (1 – 1/n)n of being selected. When n is large, a bootstrap sample Di contains about 63.2% of the training data.

Original Data 1 2 3 4 5 6 7 8 9 10Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

Page 10: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10

Bagging Algorithm

Page 11: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11

A Bagging Example (1)

Consider a one-level binary decision tree x <= k where k is a split point to minimize the entropy.

Without bagging, the best decision stump is

– x <= 0.35 or x >= 0.75

Page 12: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12

A Bagging Example (2)

Page 13: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13

A Bagging Example (3)

Page 14: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14

A Bagging Example (4)

Page 15: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15

Summary on Bagging

Bagging improves generalization error by reducing the variance of the base classifier.

Bagging depends on the stability of the base classifier.

If a base classifier is unstable, bagging helps to reduce the errors associated with random fluctuations in the training data.

If a base classifier is stable, then the error of the ensemble is primarily caused by bias in the base classifier. Bagging may make error larger, because the sample size is 37% smaller.

Page 16: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16

Boosting

An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records

– Initially, all N records are assigned equal weights

– Unlike bagging, weights may change at the end of boosting round

– The weights can be used by a base classifier to learn a model that is biased toward higher-weight examples.

Page 17: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17

Boosting

Records that are wrongly classified will have their weights increased

Records that are classified correctly will have their weights decreased

Original Data 1 2 3 4 5 6 7 8 9 10Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify

• Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

Page 18: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18

Example: AdaBoost

Base classifiers: C1, C2, …, CT

Error rate:

Importance of a classifier:

N

jjjiji yxCw

1

)(

i

ii

1ln

2

1

Page 19: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19

Example: AdaBoost

Weight update:

If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated

Classification:

( )( 1)

(j 1)i

i

exp if ( )

exp if ( )

where is the normalization factor to ensure w 1.

j

j

jj i ij i

ij j i i

j

C x yww

Z C x y

Z

T

jjj

yyxCxC

1

)(maxarg)(*

Page 20: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20

Page 21: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21

A Boosting Example (1)

Consider a one-level binary decision tree x <= k where k is a split point to minimize the entropy.

Without bagging, the best decision stump is

– x <= 0.35 or x >= 0.75

Page 22: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22

A Boosting Example (2)

Page 23: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23

A Boosting Example (3)