AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas,...
Transcript of AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas,...
![Page 1: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/1.jpg)
AdaBoostLecturer:Jiří Matas
Authors:Jan Šochman, Jiří Matas,
Jana Kostlivá, Ondřej Drbohlav
Centre for Machine PerceptionCzech Technical University, Prague
http://cmp.felk.cvut.cz
Last update: 16.11.2017
![Page 2: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/2.jpg)
2/33AdaBoost
Presentation outline
� AdaBoost algorithm
• Why is it of interest?
• How it works?
• Why it works?
� AdaBoost variants
History
� 1990 – Boost-by-majority algorithm (Freund)
� 1995 – AdaBoost (Freund & Schapire)
� 1997 – Generalized version of AdaBoost (Schapire & Singer)
� 2001 – AdaBoost in Face Detection (Viola & Jones)
![Page 3: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/3.jpg)
3/33What is Discrete AdaBoost?
AdaBoost is an algorithm for designing a strong classifier H(x) from weak classifiers ht(x)(t = 1, ..., T ) selected from the weak classifier set B. The strong classifier H(x) isconstructed as:
H(x) = sign(f(x)) ,
where
f(x) =
T∑t=1
αtht(x)
is a linear combination of weak classifiers ht(x) with positive weights αt > 0. Every weakclassifier ht is a binary classifier which outputs −1 or 1.
Adaboost deals both with the selection of ht(x) ∈ B, and with choosing αt, for graduallyincreasing t.
The set of weak classifiers B = {h(x)} can be finite or infinite.
![Page 4: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/4.jpg)
4/33Example 1 – Training Set and Weak Classifier Set
x1
x2
optimal decisionboundary
p(x|1)
p(x|2)
x1
p(x|1)
p(x|2)
0 3.7e-04 7.3e-04
0 3.2e-05 6.4e-05
the profile of the distributions along the shown line
Training set:Samples generated from the twodistributions shown, with
p(1) = p(2) = 0.5 (1)
The Bayes error is 2.6%.
In the slides to follow, the classes arerenamed from (1, 2) to (1,−1).
![Page 5: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/5.jpg)
5/33Example 1 – Training Set and Weak Classifier Set
3 2 1 0 1 2 3
2
1
0
1
2
1-1
Training set:(x1, y1), . . . , (xL, yL), where xi ∈ R2 andyi ∈ {−1, 1}, generated from the twodistributions, N = 200 points from each.
The class distributions are not known toAdaBoost.
![Page 6: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/6.jpg)
5/33Example 1 – Training Set and Weak Classifier Set
3 2 1 0 1 2 3
2
1
0
1
2projection direction w
bias b
1-1
Training set:(x1, y1), . . . , (xL, yL), where xi ∈ R2 andyi ∈ {−1, 1}, generated from the twodistributions, N = 200 points from each.
The class distributions are not known toAdaBoost.
Weak classifier: a linear classifier
hw,b(x) = sign(w · x + b) ,
where w is the projection direction vectorand b is the bias.
![Page 7: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/7.jpg)
5/33Example 1 – Training Set and Weak Classifier Set
3 2 1 0 1 2 3
2
1
0
1
2projection direction w
bias b
1-1
Training set:(x1, y1), . . . , (xL, yL), where xi ∈ R2 andyi ∈ {−1, 1}, generated from the twodistributions, N = 200 points from each.
The class distributions are not known toAdaBoost.
Weak classifier: a linear classifier
hw,b(x) = sign(w · x + b) ,
where w is the projection direction vectorand b is the bias.
Weak classifier set B:
{hw,b | w ∈ {w1,w2, ...,wN}, b ∈ R}
� N is the number of projectiondirections used
![Page 8: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/8.jpg)
5/33Example 1 – Training Set and Weak Classifier Set
3 2 1 0 1 2 3
2
1
0
1
2
1-1
3 2 1 0 1 2 3offset (−b)
0.3
0.4
0.5
0.6
0.7
train
ing
erro
r ε
min
Training set:(x1, y1), . . . , (xL, yL), where xi ∈ R2 andyi ∈ {−1, 1}, generated from the twodistributions, N = 200 points from each.
The class distributions are not known toAdaBoost.
Weak classifier: a linear classifier
hw,b(x) = sign(w · x + b) ,
where w is the projection direction vectorand b is the bias.
Weak classifier set B:
{hw,b | w ∈ {w1,w2, ...,wN}, b ∈ R}
� N is the number of projectiondirections used
� for each projection direction w,varying bias b results in differenttraining errors ε.
![Page 9: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/9.jpg)
6/33AdaBoost Algorithm – Singer & Schapire (1997)
Input: (x1, y1), . . . , (xL, yL), where xi ∈ X and yi ∈ {−1, 1}Initialize data weights D1(i) = 1/L.
For t = 1, ..., T :� Find ht = arg min
h∈Bεt; εt =
L∑i=1
Dt(i)Jyi 6= h(xi)K↑
JtrueK def= 1, JfalseK def
= 0
(WeakLearn)
� If εt ≥ 12 then stop
� Set αt = 12 log
(1−εtεt
)Note: αt > 0 because εt < 1
2
� Update
Dt+1(i) =Dt(i)e
−αtyiht(xi)
Zt, Zt =
L∑i=1
Dt(i)e−αtyiht(xi) ,
where Zt is a normalization factor chosen so that Dt+1 is a distribution.
Output the final classifier:
H(x) = sign(f(x)) , f(x) =
T∑t=1
αtht(x)
![Page 10: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/10.jpg)
7/33Note: Data Reweighting
Dt(i): previous data weightsDt+1(i): new data weights
Weight update (recall that αt > 0):
Dt+1(i) =Dt(i)e
−αtyiht(xi)
Zt(2)
e−αtyiht(xi) =
{e−αt < 1, if yi = ht(xi)eαt > 1, if yi 6= ht(xi)
(3)
⇒ Increase (decrease) weight of wrongly (correctly) classified examples.
![Page 11: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/11.jpg)
8/33Example 1 – iteration 1
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 28.8%
αt = 0.454
εtrainHt= 28.7%
εtestHt= 33.7%
Zt = 0.905
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
1.0 0.5 0.0 0.5 1.0
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 12: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/12.jpg)
9/33Example 1 – iteration 2
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 32.1%
αt = 0.375
εtrainHt= 28.7%
εtestHt= 33.7%
Zt = 0.934
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
1.0 0.5 0.0 0.5 1.0
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 13: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/13.jpg)
10/33Example 1 – iteration 3
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 29.2%
αt = 0.443
εtrainHt= 13.0%
εtestHt= 18.3%
Zt = 0.909
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 14: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/14.jpg)
11/33Example 1 – iteration 4
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 28.3%
αt = 0.465
εtrainHt= 20.2%
εtestHt= 24.5%
Zt = 0.901
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.4 0.2 0.0 0.2 0.4
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 15: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/15.jpg)
12/33Example 1 – iteration 5
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 34.9%
αt = 0.312
εtrainHt= 19.2%
εtestHt= 25.1%
Zt = 0.953
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.6 0.4 0.2 0.0 0.2 0.4 0.6
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 16: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/16.jpg)
13/33Example 1 – iteration 6
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 29.0%
αt = 0.447
εtrainHt= 9.50%
εtestHt= 12.9%
Zt = 0.908
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.4 0.3 0.2 0.10.0 0.1 0.2 0.3 0.4
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 17: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/17.jpg)
14/33Example 1 – iteration 7
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 29.8%
αt = 0.429
εtrainHt= 7.75%
εtestHt= 12.4%
Zt = 0.915
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.4 0.2 0.0 0.2 0.4
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 18: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/18.jpg)
15/33Example 1 – iteration 8
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 32.3%
αt = 0.369
εtrainHt= 11.0%
εtestHt= 16.7%
Zt = 0.935
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.4 0.2 0.0 0.2 0.4
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 19: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/19.jpg)
16/33Example 1 – iteration 9
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 31.0%
αt = 0.400
εtrainHt= 3.75%
εtestHt= 7.90%
Zt = 0.925
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 20: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/20.jpg)
17/33Example 1 – iteration 10
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 29.2%
αt = 0.443
εtrainHt= 4.50%
εtestHt= 9.13%
Zt = 0.909
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.4 0.2 0.0 0.2 0.4
0°
45°90°
135°
180°
225°270°
315°
0.00.3
1 2 3 4 5 6 7 8 9 10iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 21: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/21.jpg)
18/33Example 1 – iteration 20
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 32.0%
αt = 0.376
εtrainHt= 2.25%
εtestHt= 5.27%
Zt = 0.933
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4
0°
45°90°
135°
180°
225°270°
315°
0.00.3
0 5 10 15 20iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 22: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/22.jpg)
19/33Example 1 – iteration 40
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 40.4%
αt = 0.194
εtrainHt= 1.00%
εtestHt= 3.34%
Zt = 0.982
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.3 0.2 0.1 0.0 0.1 0.2 0.3
0°
45°90°
135°
180°
225°270°
315°
0.00.3
0 5 10 15 20 25 30 35 40iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 23: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/23.jpg)
20/33Example 1 – iteration 60
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 41.1%
αt = 0.179
εtrainHt= 0.250%
εtestHt= 3.33%
Zt = 0.984
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.3 0.2 0.1 0.0 0.1 0.2 0.3
0°
45°90°
135°
180°
225°270°
315°
0.00.3
0 10 20 30 40 50 60iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 24: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/24.jpg)
21/33Example 1 – iteration 68
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 38.3%
αt = 0.239
εtrainHt= 0.00%
εtestHt= 3.35%
Zt = 0.972
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.3 0.2 0.1 0.0 0.1 0.2 0.3
0°
45°90°
135°
180°
225°270°
315°
0.00.3
0 10 20 30 40 50 60iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 25: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/25.jpg)
22/33Example 1 – iteration 100
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 40.7%
αt = 0.189
εtrainHt= 0.00%
εtestHt= 3.36%
Zt = 0.982
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.2 0.1 0.0 0.1 0.2
0°
45°90°
135°
180°
225°270°
315°
0.00.3
0 20 40 60 80 100iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 26: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/26.jpg)
23/33Example 1 – iteration 150
ε(w) at optimal b
f ′t(x)
Error of Ht
Ht(x) = sign(f ′(x))
εt = 42.6%
αt = 0.149
εtrainHt= 0.00%
εtestHt= 3.40%
Zt = 0.989
2 1 0 1 2
2
1
0
1
2
1-1
2 1 0 1 2
2
1
0
1
2
2 1 0 1 2
2
1
0
1
2
0.2 0.1 0.0 0.1 0.2
0°
45°90°
135°
180°
225°270°
315°
0.00.3
0 20 40 60 80 100 120 140iteration t
0.050.000.050.100.150.200.250.300.35
traintest
![Page 27: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/27.jpg)
24/33Upper bound theorem (1/2)
Theorem: The following upper bound holds, in iteration T , for the training error ε of HT :
ε =1
L
L∑i=1
Jyi 6= HT (xi)K ≤T∏t=1
Zt .
1 0 1yfT(x)
0
1
y HT(x)exp(− yfT(x))
Proof: There holds that:
JHT (xi) 6= yiK ≤ exp(−yifT (xi)) , (4)
which can be checked by a simple observation (the inequality follows from the first and lastcolumns):
JHT (x) 6= yK classification yHT (x) yfT (x) exp(−yfT (x))0 correct 1 > 0 ≥ 01 incorrect −1 < 0 ≥ 1
Summing over the training dataset and dividing by L, we get
ε =1
L
∑i
JHT (xi) 6= yiK ≤1
L
∑i
exp(−yifT (xi))
![Page 28: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/28.jpg)
25/33Upper bound theorem (2/2)
Theorem: The following upper bound holds, in iteration T , for the training error ε of HT :
ε =1
L
L∑i=1
Jyi 6= HT (xi)K ≤T∏t=1
Zt .
Proof (contd.):ε =
1
L
∑i
JHT (xi) 6= yiK ≤1
L
∑i
exp(−yifT (xi))
But from the distribution update rule:
DT+1(i) =exp(−yifT (xi))
L∏Tt=1Zt
we have that1
L
∑i
exp(−yifT (xi)) =
(T∏t=1
Zt
)(∑i
DT+1(i)
)︸ ︷︷ ︸
= 1
,
which completes the proof.
![Page 29: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/29.jpg)
26/33
AdaBoost as a Minimiser of the Upper Boundon the Empirical Error
� The main objective is to minimize ε = 1L
∑Li=1Jyi 6= HT (xi)K
(plus maximize the margin).
� ε has just been shown to be upperbounded: ε(HT ) ≤ ΠTt=1Zt .
� Adaboost is minimizing this upper bound.
� It does so by greedily minimizing Zt in each iteration.
� Recall that
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) ;
given the dataset {(xi, yi)} and the distribution Dt in iteration t, the variables tominimize Zt over are αt and ht.
![Page 30: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/30.jpg)
27/33Choosing αt
Let us minimize Zt =∑iDt(i)e
−αtyiht(xi) with respect to αt:
dZ
dαt= −
L∑i=1
Dt(i)e−αtyiht(xi)yiht(xi) = 0
−∑
i:yi=ht(xi)
Dt(i)︸ ︷︷ ︸1− εt
e−αt +∑
i:yi 6=ht(xi)
D(i)︸ ︷︷ ︸εt
eαt = 0
−e−αt(1− εt) + eαtεt = 0
αt + log εt = −αt + log(1− εt)
αt =1
2log
1− εtεt
0.0 0.1 0.2 0.3 0.4 0.5 0.6εt
0
1
2
3
4
5
αt
Dependence of αt on εt
![Page 31: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/31.jpg)
28/33Choosing ht
Let us substitute αt = 12 log 1−εt
εtinto Zt:
Zt =
L∑i=1
Dt(i)e−αtyiht(xi)
=∑
i:yi=ht(xi)
Dt(i)e−αt +
∑i:yi 6=ht(xi)
Dt(i)eαt
= (1− εt)e−αt + εteαt
= 2√εt(1− εt)
⇒ Zt is minimised by selecting ht with minimalweighted error εt.
Weak classifier examples
� Decision tree, Perceptron – B infinite
� Selecting the best one from a given finite set B
0.0 0.1 0.2 0.3 0.4 0.5εt
0.0
0.2
0.4
0.6
0.8
1.0
Zt
Dependence of Zt on εt
![Page 32: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/32.jpg)
29/33
Minimization of an Upper Bound on the Empirical Error -Recapitulation
Choosing αt and ht
� For any weak classifier ht with error εt, Zt(α) is a convex differentiable function witha single minimum at αt:
αt =1
2log
1− εtεt
� Zt = 2√εt(1− εt) ≤ 1 for optimal αt ⇒ Zt is minimized by ht with minimal εt.
Comments
� The process of selecting αt and ht(x) can be interpreted as a single optimisation stepminimising the upper bound on the empirical error. Improvement of the bound isguaranteed, provided that ε < 1/2.
� The process can be interpreted as a component-wise local optimisation(Gauss-Southwell iteration) in the (possibly infinite dimensional!) space of~α = (α1, α2, . . .) starting from ~α0 = (0, 0, . . .).
![Page 35: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/35.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
t = 1
![Page 36: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/36.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
t = 1
![Page 37: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/37.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
t = 1
![Page 38: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/38.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
t = 1
![Page 39: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/39.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
Output the final classifier:
H(x) = sign
(T∑t=1
αtht(x)
)Comments
� The computational complexity of selecting ht isindependent of t
� All information about previously selected “features”is captured in Dt
t = 1
![Page 40: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/40.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
Output the final classifier:
H(x) = sign
(T∑t=1
αtht(x)
)Comments
� The computational complexity of selecting ht isindependent of t
� All information about previously selected “features”is captured in Dt
t = 1
0 10 20 30 400
0.2
0.4
0.6
0.8
1
Train error
Test error
UB emp. error
Bayes error
![Page 41: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/41.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
Output the final classifier:
H(x) = sign
(T∑t=1
αtht(x)
)Comments
� The computational complexity of selecting ht isindependent of t
� All information about previously selected “features”is captured in Dt
t = 2
0 10 20 30 400
0.2
0.4
0.6
0.8
1
Train error
Test error
UB emp. error
Bayes error
![Page 42: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/42.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
Output the final classifier:
H(x) = sign
(T∑t=1
αtht(x)
)Comments
� The computational complexity of selecting ht isindependent of t
� All information about previously selected “features”is captured in Dt
t = 3
0 10 20 30 400
0.2
0.4
0.6
0.8
1
Train error
Test error
UB emp. error
Bayes error
![Page 43: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/43.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
Output the final classifier:
H(x) = sign
(T∑t=1
αtht(x)
)Comments
� The computational complexity of selecting ht isindependent of t
� All information about previously selected “features”is captured in Dt
t = 4
0 10 20 30 400
0.2
0.4
0.6
0.8
1
Train error
Test error
UB emp. error
Bayes error
![Page 44: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/44.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
Output the final classifier:
H(x) = sign
(T∑t=1
αtht(x)
)Comments
� The computational complexity of selecting ht isindependent of t
� All information about previously selected “features”is captured in Dt
t = 5
0 10 20 30 400
0.2
0.4
0.6
0.8
1
Train error
Test error
UB emp. error
Bayes error
![Page 45: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/45.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
Output the final classifier:
H(x) = sign
(T∑t=1
αtht(x)
)Comments
� The computational complexity of selecting ht isindependent of t
� All information about previously selected “features”is captured in Dt
t = 6
0 10 20 30 400
0.2
0.4
0.6
0.8
1
Train error
Test error
UB emp. error
Bayes error
![Page 46: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/46.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
Output the final classifier:
H(x) = sign
(T∑t=1
αtht(x)
)Comments
� The computational complexity of selecting ht isindependent of t
� All information about previously selected “features”is captured in Dt
t = 7
0 10 20 30 400
0.2
0.4
0.6
0.8
1
Train error
Test error
UB emp. error
Bayes error
![Page 47: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/47.jpg)
30/33Summary of the Algorithm
Initialization ...For t = 1, ..., T :
� Find ht = arg minh∈B
εt; εj =L∑i=1
Dt(i)Jyi 6= h(xi)K
� If εt ≥ 1/2 then stop
� Set αt = 12 log(1−εtεt
)
� UpdateDt+1(i) =
Dt(i)e−αtyiht(xi)
Zt
Zt =
L∑i=1
Dt(i)e−αtyiht(xi) = 2
√εt(1− εt)
Output the final classifier:
H(x) = sign
(T∑t=1
αtht(x)
)Comments
� The computational complexity of selecting ht isindependent of t
� All information about previously selected “features”is captured in Dt
t = 40
0 10 20 30 400
0.2
0.4
0.6
0.8
1
Train error
Test error
UB emp. error
Bayes error
100 steps 0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Train error
Test error
UB emp. error
Bayes error detail 30 32 34 36 38 400
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Train error
Test error
UB emp. error
Bayes error
![Page 48: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/48.jpg)
31/33Does AdaBoost generalise?
Margins in SVM
max min(x,y)∈T
y(~α · ~h(x))
‖~α‖2Margins in AdaBoost
max min(x,y)∈T
y(~α · ~h(x))
‖~α‖1
Maximising margins in AdaBoost
PS[yf(x) ≤ θ] ≤ 2TT∏t=1
√ε1−θt (1− εt)1+θ where f(x) =
~α · ~h(x)
‖~α‖1
Upper bounds based on margin
PD[yf(x) ≤ 0] ≤ PS[yf(x) ≤ θ] +O
1√L
(d log2(L/d)
θ2+ log(1/δ)
)1/2
![Page 49: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/49.jpg)
32/33Pros and cons of AdaBoost
Advantages
� Very simple to implement
� Feature selection on very large sets of features
� Fairly good generalisation
� Linear classifier with all its desirable properties.
� Output converges to the logarithm of likelihood ratio.
� Feature selector with a principled strategy (minimisation of upper bound on empiricalerror)
� Close to sequential decision making (it produces a sequence of gradually more complexclassifiers).
Disadvantages
� Suboptimal solution for ~α
� Can overfit in the presence of noise
![Page 50: AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost](https://reader033.fdocuments.us/reader033/viewer/2022043017/5f39f9231a7e7440e002dfcd/html5/thumbnails/50.jpg)
33/33AdaBoost variants
Freund & Schapire 1995
� Discrete (h : X → {0, 1})
� Multiclass AdaBoost.M1 (h : X → {0, 1, ..., k})
� Multiclass AdaBoost.M2 (h : X → [0, 1]k)
� Real valued AdaBoost.R (Y = [0, 1], h : X → [0, 1])
Schapire & Singer 1997
� Confidence rated prediction (h : X → R, two-class)
� Multilabel AdaBoost.MR, AdaBoost.MH (different formulation of minimised loss)
... Many other modifications since then (WaldBoost, cascaded AB, online AB, ...)