Boosting

12
Boosting Rong Jin

description

Boosting. Rong Jin. Bagging. D. Boostrap Sampling. …. D 1. D 2. D k. h 1. h 2. h k. Inefficiency with Bagging. Inefficient boostrap sampling: Every example has equal chance to be sampled No distinction between “easy” examples and “difficult” examples. - PowerPoint PPT Presentation

Transcript of Boosting

Page 1: Boosting

Boosting

Rong Jin

Page 2: Boosting

Inefficiency with Bagging

D

Bagging

D1 D2 Dk

Boostrap Sampling

Pr( | , )ii c h x

h1 h2 hk

Inefficient boostrap sampling:• Every example has equal chance to be sampled• No distinction between “easy” examples and “difficult” examples

Inefficient model combination:• A constant weight for each classifier• No distinction between accurate classifiers and inaccurate classifiers

Page 3: Boosting

Improve the Efficiency of Bagging

Better sampling strategy• Focus on the examples that are difficult to classify

Better combination strategy• Accurate model should be assigned larger weights

Page 4: Boosting

Intuition

Training Examples

X1

Y1

X2

Y2

X3

Y3

X4

Y4

MistakesX1

Y1

X3

Y3

Classifier1 Classifier2

MistakesX1

Y1

+Classifier3

No training mistakes !! May overfitting !!

+

Page 5: Boosting

AdaBoost Algorithm

Page 6: Boosting

AdaBoost Example: t=ln2x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

1/5 1/5 1/51/5 1/5D0:x5, y5x3, y3x1, y1

Sample

h1

Training

2/7 1/7 2/71/7 1/7D1:

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

Update Weightsh1

Samplex3, y3x1, y1

h2

Training

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

h2 Update Weights

2/9 1/9 4/91/9 1/9D2: Sample …

Page 7: Boosting

How To Choose t in AdaBoost?How to construct the best distribution Dt+1(i)1. Dt+1(i) should be significantly different from Dt(i)2. Dt+1(i) should create a situation that classifier ht performs poorly

Page 8: Boosting

How To Choose t in AdaBoost?

Page 9: Boosting

Optimization View for Choosing t

ht(x): x{1,-1}; a base (weak) classifierHT(x): a linear combination of basic classifiers

Goal: minimize training error

Approximate error swith a exponential function

Page 10: Boosting

AdaBoost: Greedy OptimizationFix HT-1(x), and solve hT(x) and t

Page 11: Boosting

Empirical Study of AdaBoost

AdaBoosting decision trees• Generate 50 decision trees by

AdaBoost• Linearly combine decision trees

using the weights of AdaBoost

In general:• AdaBoost = Bagging > C4.5• AdaBoost usually needs less number

of classifiers than Bagging

Page 12: Boosting

Bia-Variance Tradeoff for AdaBoost• AdaBoost can reduce both variance and bias

simultaneously

single decision tree

Bagging decision tree

bias

variance

AdaBoosting decision trees