ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf ·...
Transcript of ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf ·...
![Page 1: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/1.jpg)
ICML 2009 Tutorial
Survey of Boosting
from an Optimization Perspective
Part I: Entropy Regularized LPBoost
Part II: Boosting from an Optimization
Perspective
Manfred K. Warmuth - UCSCS.V.N. Vishwanathan - Purdue & Microsoft Research
Updated: March 23, 2010Warmuth (UCSC) ICML ’09 Boosting Tutorial 1 / 62
![Page 2: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/2.jpg)
Outline
1 Introduction to Boosting
2 What is Boosting?
3 Entropy Regularized LPBoost
4 Overview of Boosting algorithms
5 Conclusion and Open Problems
Warmuth (UCSC) ICML ’09 Boosting Tutorial 2 / 62
![Page 3: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/3.jpg)
Introduction to Boosting
Outline
1 Introduction to Boosting
2 What is Boosting?
3 Entropy Regularized LPBoost
4 Overview of Boosting algorithms
5 Conclusion and Open Problems
Warmuth (UCSC) ICML ’09 Boosting Tutorial 3 / 62
![Page 4: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/4.jpg)
Introduction to Boosting
Setup for Boosting
[Giants of field: Schapire,Freund]
examples: 11 apples
+1 if artificial- 1 if natural
goal:classification
Warmuth (UCSC) ICML ’09 Boosting Tutorial 4 / 62
![Page 5: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/5.jpg)
Introduction to Boosting
Setup for Boosting
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
+1/-1 examples
weight dn ≈ size
separable
Warmuth (UCSC) ICML ’09 Boosting Tutorial 5 / 62
![Page 6: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/6.jpg)
Introduction to Boosting
Weak hypotheses
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
weak hypotheses:decision stumps on twofeaturesone can’t do it
goal:find convex combinationof weak hypotheses thatclassifies all
Warmuth (UCSC) ICML ’09 Boosting Tutorial 6 / 62
![Page 7: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/7.jpg)
Introduction to Boosting
Boosting: 1st iteration
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
First hypothesis:error: 1
11
edge: 911
low error = high edge
edge = 1− 2 error
Warmuth (UCSC) ICML ’09 Boosting Tutorial 7 / 62
![Page 8: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/8.jpg)
Introduction to Boosting
Update after 1st
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
Misclassified examplesincreased weights
After update
edge of hypothesisdecreased
Warmuth (UCSC) ICML ’09 Boosting Tutorial 8 / 62
![Page 9: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/9.jpg)
Introduction to Boosting
Before 2nd iteration
y
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
Hard examples
high weight
Warmuth (UCSC) ICML ’09 Boosting Tutorial 9 / 62
![Page 10: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/10.jpg)
Introduction to Boosting
Boosting: 2nd hypothesis
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
Pick hypotheseswith high (weighted) edge
Warmuth (UCSC) ICML ’09 Boosting Tutorial 10 / 62
![Page 11: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/11.jpg)
Introduction to Boosting
Update after 2nd
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
After update
edges of all pasthypotheses should besmall
Warmuth (UCSC) ICML ’09 Boosting Tutorial 11 / 62
![Page 12: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/12.jpg)
Introduction to Boosting
3rd hypothesis
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
Warmuth (UCSC) ICML ’09 Boosting Tutorial 12 / 62
![Page 13: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/13.jpg)
Introduction to Boosting
Update after 3rd
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
Warmuth (UCSC) ICML ’09 Boosting Tutorial 13 / 62
![Page 14: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/14.jpg)
Introduction to Boosting
4th hypothesis
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
Warmuth (UCSC) ICML ’09 Boosting Tutorial 14 / 62
![Page 15: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/15.jpg)
Introduction to Boosting
Update after 4th
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
Warmuth (UCSC) ICML ’09 Boosting Tutorial 15 / 62
![Page 16: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/16.jpg)
Introduction to Boosting
Final convex combination of all hypotheses
Decision:∑T
t=1 wtht(x) ≥ 0 ?
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
Positive total weight - Negative total weight
Warmuth (UCSC) ICML ’09 Boosting Tutorial 16 / 62
![Page 17: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/17.jpg)
Introduction to Boosting
Protocol of Boosting [FS97]
Maintain distribution on N ±1 labeled examples
At iteration t = 1, . . . ,T :- Receive “weak” hypothesis ht of high edge- Update dt−1 to dt more weights on “hard” examples
Output convex combination of the weak hypotheses∑Tt=1 wt ht(x)
Two sets of weights:- distribution d on examples- distribution w on hypotheses
Warmuth (UCSC) ICML ’09 Boosting Tutorial 17 / 62
![Page 18: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/18.jpg)
Introduction to Boosting
Data representation
ynht(xn) := ut
n
perfect +1opposite -1neutral 0
examples xn labels yn h1(xn) u1
-1 -1 1
-1 -1 1
-1 -1 1
-1 1 −1
1 1 1
1 1 1
1 1 1
1 -1 −1
Warmuth (UCSC) ICML ’09 Boosting Tutorial 18 / 62
![Page 19: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/19.jpg)
Introduction to Boosting
Edge vs. margin [Br99]
Edge of a hypothesis ht for a distribution d on the examples
N∑n=1
accuracy of example︷︸︸︷ut
n dn︸ ︷︷ ︸weighted accuracy of hypothesis
d ∈ PN
Margin of example n for current hypothesis weighting w
T∑t=1
accuracy of example︷︸︸︷ut
n wt︸ ︷︷ ︸weighted accuracy of example
w ∈ PT
Warmuth (UCSC) ICML ’09 Boosting Tutorial 19 / 62
![Page 20: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/20.jpg)
Introduction to Boosting
Edge vs. margin [Br99]
Edge of a hypothesis ht for a distribution d on the examples
N∑n=1
accuracy of example︷︸︸︷ut
n dn︸ ︷︷ ︸weighted accuracy of hypothesis
d ∈ PN
Margin of example n for current hypothesis weighting w
T∑t=1
accuracy of example︷︸︸︷ut
n wt︸ ︷︷ ︸weighted accuracy of example
w ∈ PT
Warmuth (UCSC) ICML ’09 Boosting Tutorial 19 / 62
![Page 21: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/21.jpg)
Introduction to Boosting
AdaBoost
Initialize t = 0 and d0n = 1
N
For t = 1, . . . ,T
Get ht whose edge w.r.t current distribution is 1− 2εt
Set wt = 12
ln(
1−εtεt
)Update distribution as follows
d tn =
d t−1n exp(−wtu
tn)∑
n′ dt−1n′ exp(−wtut
n′)
Final hypothesis: sgn(∑T
t=1 wtht(·))
Warmuth (UCSC) ICML ’09 Boosting Tutorial 20 / 62
![Page 22: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/22.jpg)
Introduction to Boosting
Objectives
Edge
Edges of past hypotheses should be small after update
Minimize maximum edge of past hypotheses
Margin
Choose convex combination of weak hypothesesthat maximizes the minimum margin
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
featu
re 2
Which margin?
SVM 2-norm (weights on examples)Boosting 1-norm (weights on base hypotheses)
Connection between objectives?Warmuth (UCSC) ICML ’09 Boosting Tutorial 21 / 62
![Page 23: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/23.jpg)
Introduction to Boosting
Edge vs. margin
min max edge = max min margin
mind∈SN
maxq=1,2,...,t−1
uq · d︸ ︷︷ ︸edge of hypothesis q
= maxw∈St−1
minn=1,2,...,N
t−1∑q=1
uqn wq︸ ︷︷ ︸
margin of example n
Linear Programming duality
Warmuth (UCSC) ICML ’09 Boosting Tutorial 22 / 62
![Page 24: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/24.jpg)
Introduction to Boosting
Boosting as zero-sum-game [FS97]
Rock, Paper, Scissors game
row player
column playerR P Sw1 w2 w3
R d1 0 1 -1P d2 -1 0 1S d3 1 -1 0
gain matrix
Single row is pure strategy ofrow player and d is mixed strategy
Single column is pure strategy ofcolumn player and w is mixed strategy
Row player minimizesColumn player maximizes
payoff = dTU w=
∑i ,j diUi ,jwj
Warmuth (UCSC) ICML ’09 Boosting Tutorial 23 / 62
![Page 25: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/25.jpg)
Introduction to Boosting
Optimum strategy
R P Sw1 w2 w3
.33 .33 .33
R d1 .33 0 1 -1P d2 .33 -1 0 1S d3 .33 1 -1 0
Min-max theorem:
mind
maxw
dTUw = mind
maxj
dT Uej
= maxw
mind
dTUw = maxw
mini
eiUw
= value of the game ( 0 in example )
ej is pure strategy
Warmuth (UCSC) ICML ’09 Boosting Tutorial 24 / 62
![Page 26: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/26.jpg)
Introduction to Boosting
Connection to Boosting?
Rows are the examples
Columns uq encode weak hypothesis hq
Row sum: margin of example
Column sum: edge of weak hypothesis
Value of game:
min max edge = max min margin
Van Neumann’s Minimax Theorem
Warmuth (UCSC) ICML ’09 Boosting Tutorial 25 / 62
![Page 27: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/27.jpg)
Introduction to Boosting
Edges/margins
R P Sw1 w2 w3 margin.33 .33 .33
R d1 .33 0 1 -1 0
P d2 .33 -1 0 1 0 minS d3 .33 1 -1 0 0
edge 0 0 0
max
value of game 0
Warmuth (UCSC) ICML ’09 Boosting Tutorial 26 / 62
![Page 28: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/28.jpg)
Introduction to Boosting
New column added: boosting
R P Sw1 w2 w3 w4 margin.44 0 .22 .33
R d1 .22 0 1 -1 1 .11
P d2 .33 -1 0 1 1 .11 minS d3 .44 1 -1 0 -1 .11
edge .11 -.22 .11 .11
max
Value of game increases from 0 to .11
Warmuth (UCSC) ICML ’09 Boosting Tutorial 27 / 62
![Page 29: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/29.jpg)
Introduction to Boosting
Row added: on-line learning
R P Sw1 w2 w3 margin.33 .44 .22
R d1 0 0 1 -1 .22
P d2 .22 -1 0 1 -.11 minS d3 .44 1 -1 0 -.11
d4 .33 -1 1 -1 -.11
edge -.11 -.11 -.11
max
Value of game decreases from 0 to -.11
Warmuth (UCSC) ICML ’09 Boosting Tutorial 28 / 62
![Page 30: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/30.jpg)
Introduction to Boosting
Boosting: maximize margin incrementally
w 11
d11 0
d12 1
d13 -1
iteration 1
w 21 w 2
2
d21 0 -1
d22 1 0
d23 -1 1
iteration 2
w 31 w 3
2 w 33
d31 0 -1 1
d32 1 0 -1
d33 -1 1 0
iteration 3
In each iteration solve optimization problem to update d
Column player / oracle provides new hypothesis
Boosting is column generation method in d domainand coordinate descent in w domain
Warmuth (UCSC) ICML ’09 Boosting Tutorial 29 / 62
![Page 31: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/31.jpg)
What is Boosting?
Outline
1 Introduction to Boosting
2 What is Boosting?
3 Entropy Regularized LPBoost
4 Overview of Boosting algorithms
5 Conclusion and Open Problems
Warmuth (UCSC) ICML ’09 Boosting Tutorial 30 / 62
![Page 32: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/32.jpg)
What is Boosting?
Boosting = greedy method for increasing margin
Converges to optimum margin w.r.t. all hypotheses
Want small number of iterationsWarmuth (UCSC) ICML ’09 Boosting Tutorial 31 / 62
![Page 33: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/33.jpg)
What is Boosting?
Assumption on next weak hypothesis
For current weighting of examples,oracle returns hypothesis of edge ≥ g
Goal
For given ε, produce convex combination of weak hypotheseswith soft margin ≥ g − εNumber of iterations O( log N
ε2)
Warmuth (UCSC) ICML ’09 Boosting Tutorial 32 / 62
![Page 34: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/34.jpg)
What is Boosting?
Recall min max thm
mind∈SN
maxq=1,2,...,t
uq · d︸ ︷︷ ︸edge of hypothesis q
= maxw∈St
minn=1,2,...,N
(t∑
q=1
uqn wq
)︸ ︷︷ ︸
margin of example n
Warmuth (UCSC) ICML ’09 Boosting Tutorial 33 / 62
![Page 35: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/35.jpg)
What is Boosting?
Visualizing the margin
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8fe
atu
re 2
Warmuth (UCSC) ICML ’09 Boosting Tutorial 34 / 62
![Page 36: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/36.jpg)
What is Boosting?
Min max thm - inseparable case
Slack variables in w domain = capping in d domain
mind∈SN ,d≤ 1
ν1
maxq=1,2,...,t
uq · d︸ ︷︷ ︸edge of hypothesis q
= maxw∈St ,ψ≥0
minn=1,2,...,N
(t∑
q=1
uqn wq + ψn
)︸ ︷︷ ︸soft margin of example n
−1
ν
N∑n=1
ψn
Warmuth (UCSC) ICML ’09 Boosting Tutorial 35 / 62
![Page 37: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/37.jpg)
What is Boosting?
Visualizing the soft margin
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
hypothesis 1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8hypoth
esi
s 2
ψ
Warmuth (UCSC) ICML ’09 Boosting Tutorial 36 / 62
![Page 38: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/38.jpg)
What is Boosting?
LPBoost
0.0 0.2 0.4 0.6 0.8 1.0
d1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Obje
ctiv
e v
alu
e
PLP
Choose distribution thatminimizes the maximumedge of current hypothesesby solving:
minPn dn=1,d≤ 1
ν1
maxq=1,2,...,t
uq · d︸ ︷︷ ︸Pt
LP
All weight is put onexamples with minimumsoft margin
Warmuth (UCSC) ICML ’09 Boosting Tutorial 37 / 62
![Page 39: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/39.jpg)
Entropy Regularized LPBoost
Outline
1 Introduction to Boosting
2 What is Boosting?
3 Entropy Regularized LPBoost
4 Overview of Boosting algorithms
5 Conclusion and Open Problems
Warmuth (UCSC) ICML ’09 Boosting Tutorial 38 / 62
![Page 40: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/40.jpg)
Entropy Regularized LPBoost
Entropy Regularized LPBoost
minPn dn=1,d≤ 1
ν1
maxq=1,2,...,t
uq · d +1
η∆(d,d0)
dn =exp−η soft margin of example n
Z”soft min”
Form of weights first in ν-Arc algorithm [RSS+00]
Regularization in d domain makes problem strongly convex
Gradient of dual Lipschitz continuous in w [e.g. HL93,RW97]
Warmuth (UCSC) ICML ’09 Boosting Tutorial 39 / 62
![Page 41: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/41.jpg)
Entropy Regularized LPBoost
The effect of entropy regularization
Different distribution on the examples
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
feature 1
0.0
0.2
0.4
0.6
0.8
featu
re 2
LPBoost: lots of zeros / brittle ERLPBoost: smoother
Warmuth (UCSC) ICML ’09 Boosting Tutorial 40 / 62
![Page 42: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/42.jpg)
Overview of Boosting algorithms
Outline
1 Introduction to Boosting
2 What is Boosting?
3 Entropy Regularized LPBoost
4 Overview of Boosting algorithms
5 Conclusion and Open Problems
Warmuth (UCSC) ICML ’09 Boosting Tutorial 41 / 62
![Page 43: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/43.jpg)
Overview of Boosting algorithms
AdaBoost [FS97]
d tn :=
d t−1n exp(−wtu
tn)∑
n′ dt−1n′ exp(−wtut
n′),
where wt s.t.∑
n′ dt−1n′ exp(−w ut
n′) is minimized
i.e.∂
Pn′ d
t−1n′ exp(−w ut
n′ )
∂w|w=wt =
∑n ut
nd t−1
n exp(−wt utn)P
n′ dt−1n′ exp(−wt ‘ ut
n′ )= ut · dt = 0
Easy to implement
Adjusts distribution so that edge of last hypothesis is zero
Gets within half of the optimal hard margin [RSD07]but only in the limit
Warmuth (UCSC) ICML ’09 Boosting Tutorial 42 / 62
![Page 44: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/44.jpg)
Overview of Boosting algorithms
Corrective versus totally corrective
Processing last hypothesis versus all past hypotheses
Corrective Totally CorrectiveAdaBoost LPBoost
LogitBoost TotalBoostAdaBoost* SoftBoostSS,Colt08 ERLPBoost
Warmuth (UCSC) ICML ’09 Boosting Tutorial 43 / 62
![Page 45: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/45.jpg)
Overview of Boosting algorithms
From AdaBoost to ERLPBoost
AdaBoost (as interpreted in [KW99,La99])Primal:
mind
∆(d,dt−1)
s.t. d · ut = 0, ‖d‖1 = 1
Dual:
maxw−ln
∑n d t−1
n exp(−ηutnwt)
s.t. w ≥ 0Achieves half of optimum hard margin in the limit
AdaBoost∗ [RW05]Primal:
mind
∆(d,dt−1)
s.t. d · ut ≤ γt ,‖d‖1 = 1
Dual:
maxw−ln
∑n d t−1
n exp(−ηutnwt)
−γt ||w||1s.t. w ≥ 0
where edge bound γt is adjusted downward by a heuristicGood iteration bound for reaching optimum hard margin
Warmuth (UCSC) ICML ’09 Boosting Tutorial 44 / 62
![Page 46: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/46.jpg)
Overview of Boosting algorithms
SoftBoost [WGR07]Primal:
mind
∆(d,d0)
s.t. ‖d‖1 = 1, d ≤ 1ν1
d · uq ≤ γt ,1 ≤ q ≤ t
Dual:
minw,ψ
− ln∑n
d0n exp(−η
t∑q=1
uqnwq
−ηψn)− 1ν‖ψ‖1 − γt‖w‖1
s.t. w ≥ 0, ψ ≥ 0
where edge bound γt is adjusted downward by a heuristicGood iteration bound for reaching soft marginERLPBoost [WGV08]
Primal:
mind,γ
γ + 1η
∆(d,d0)
s.t. ‖d‖1 = 1, d ≤ 1ν1
d · uq ≤ γ,1 ≤ q ≤ t
Dual:
minw,ψ
− 1η
ln∑n
d0n exp(−η
t∑q=1
uqnwq
−ηψn)− 1ν‖ψ‖1
s.t. w ≥ 0, ‖w‖1 = 1, ψ ≥ 0
where for the iteration bound η is fixed to max(2ε
ln Nν, 1
2)
Good iteration bound for reaching soft marginWarmuth (UCSC) ICML ’09 Boosting Tutorial 45 / 62
![Page 47: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/47.jpg)
Overview of Boosting algorithms
Corrective ERLPBoost [SS08]Primal:
mind
∑tq=1 wq(uq · d) + 1
η∆(d,d0)
s.t. ‖d‖1 = 1, d ≤ 1ν1
Dual:
minψ− 1η
ln∑n
d0n exp(−η
t∑q=1
uqnwq − ηψn)− 1
ν‖ψ‖1
s.t. ψ ≥ 0
where for the iteration bound η is fixed to max(2ε
ln Nν, 1
2)
Good iteration bound for reaching soft margin
Warmuth (UCSC) ICML ’09 Boosting Tutorial 46 / 62
![Page 48: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/48.jpg)
Overview of Boosting algorithms
Iteration bounds
Corrective Totally CorrectiveAdaBoost LPBoost
LogitBoost TotalBoostAdaBoost* SoftBoostSS,Colt08 ERLPBoost
Strong oracle: returnshypothesis with maximumedge
Weak oracle: returnshypothesis with edge ≥ g
In O(log N
ν
ε2) iterations
within ε of maximum soft margin for strong oracleor within ε of g for weak oracle
Ditto for hard margin case
In O( log Ng2 ) iterations consistency with weak oracle
Warmuth (UCSC) ICML ’09 Boosting Tutorial 47 / 62
![Page 49: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/49.jpg)
Overview of Boosting algorithms
LPBoost may require Ω(N) iterations
w1 w2 w3 w4 w5 margin0 0 0 0 0
d1 .125 +1 -.95 -.93 -.91 -.99 −d2 .125 +1 -.95 -.93 -.91 -.99 −d3 .125 +1 -.95 -.93 -.91 -.99 −d4 .125 +1 -.95 -.93 -.91 -.99 −d5 .125 -.98 +1 -.93 -.91 +.99 −d6 .125 -.97 -.96 +1 -.91 +.99 −d7 .125 -.97 -.95 -.94 +1 +.99 −d8 .125 -.97 -.95 -.93 -.92 +.99 −
edge .0137 -.7075 -.6900 -.6725 .0000
value -1
Warmuth (UCSC) ICML ’09 Boosting Tutorial 48 / 62
![Page 50: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/50.jpg)
Overview of Boosting algorithms
LPBoost may require Ω(N) iterations
w1 w2 w3 w4 w5 margin1 0 0 0 0
d1 0 +1 -.95 -.93 -.91 -.99 1
d2 0 +1 -.95 -.93 -.91 -.99 1
d3 0 +1 -.95 -.93 -.91 -.99 1
d4 0 +1 -.95 -.93 -.91 -.99 1
d5 1 -.98 +1 -.93 -.91 +.99 -.98
d6 0 -.97 -.96 +1 -.91 +.99 -.97
d7 0 -.97 -.95 -.94 +1 +.99 -.97
d8 0 -.97 -.95 -.93 -.92 +.99 -.97
edge -.98 1 -.93 -.91 .99
value -1 -.98
Warmuth (UCSC) ICML ’09 Boosting Tutorial 49 / 62
![Page 51: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/51.jpg)
Overview of Boosting algorithms
LPBoost may require Ω(N) iterations
w1 w2 w3 w4 w5 margin0 1 0 0 0
d1 0 +1 -.95 -.93 -.91 -.99 -.95
d2 0 +1 -.95 -.93 -.91 -.99 -.95
d3 0 +1 -.95 -.93 -.91 -.99 -.95
d4 0 +1 -.95 -.93 -.91 -.99 -.95
d5 0 -.98 +1 -.93 -.91 +.99 1
d6 1 -.97 -.96 +1 -.91 +.99 -.96
d7 0 -.97 -.95 -.94 +1 +.99 -.95
d8 0 -.97 -.95 -.93 -.92 +.99 -.95
edge -.97 -.96 1 -.91 .99
value -1 -.98 -.96
Warmuth (UCSC) ICML ’09 Boosting Tutorial 50 / 62
![Page 52: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/52.jpg)
Overview of Boosting algorithms
LPBoost may require Ω(N) iterations
w1 w2 w3 w4 w5 margin0 0 1 0 0
d1 0 +1 -.95 -.93 -.91 -.99 -.93
d2 0 +1 -.95 -.93 -.91 -.99 -.93
d3 0 +1 -.95 -.93 -.91 -.99 -.93
d4 0 +1 -.95 -.93 -.91 -.99 -.93
d5 0 -.98 +1 -.93 -.91 +.99 -.93
d6 0 -.97 -.96 +1 -.91 +.99 1
d7 1 -.97 -.95 -.94 +1 +.99 -.94
d8 0 -.97 -.95 -.93 -.92 +.99 -.93
edge -.97 -.95 -.94 1 .99
value -1 -.98 -.96 -.94
Warmuth (UCSC) ICML ’09 Boosting Tutorial 51 / 62
![Page 53: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/53.jpg)
Overview of Boosting algorithms
LPBoost may require Ω(N) iterations
w1 w2 w3 w4 w5 margin0 0 0 1 0
d1 0 +1 -.95 -.93 -.91 -.99 -.91
d2 0 +1 -.95 -.93 -.91 -.99 -.91
d3 0 +1 -.95 -.93 -.91 -.99 -.91
d4 0 +1 -.95 -.93 -.91 -.99 -.91
d5 0 -.98 +1 -.93 -.91 +.99 -.91
d6 0 -.97 -.96 +1 -.91 +.99 -.91
d7 0 -.97 -.95 -.94 +1 +.99 1
d8 1 -.97 -.95 -.93 -.92 +.99 -.92
edge -.97 -.95 -.94 -.92 .99
value -1 -.98 -.96 -.94 -.92
Warmuth (UCSC) ICML ’09 Boosting Tutorial 52 / 62
![Page 54: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/54.jpg)
Overview of Boosting algorithms
LPBoost may require Ω(N) iterations
w1 w2 w3 w4 w5 margin.5 .0026 0 0 .4975
d1 .497 +1 -.95 -.93 -.91 -.99 .0051
d2 0 +1 -.95 -.93 -.91 -.99 .0051
d3 0 +1 -.95 -.93 -.91 -.99 .0051
d4 0 +1 -.95 -.93 -.91 -.99 .0051
d5 0 -.98 +1 -.93 -.91 +.99 .0051
d6 .490 -.97 -.96 +1 -.91 +.99 .0051
d7 0 -.97 -.95 -.94 +1 +.99 .0051
d8 .013 -.97 -.95 -.93 -.92 +.99 .0051
edge .0051 .0051 .9055 .9100 .0051
value -1 -.98 -.96 -.94 -.92 .0051No ties!
Warmuth (UCSC) ICML ’09 Boosting Tutorial 53 / 62
![Page 55: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/55.jpg)
Overview of Boosting algorithms
LPBoost may return bad final hypothesis
How good is the master hypothesis returned by LPBoost comparedto the best possible convex combination of hypotheses?
Any linearly separable dataset can be reduced to a dataset on whichLPBoost misclassifies all examples by
adding a bad example
adding a bad hypothesis
Warmuth (UCSC) ICML ’09 Boosting Tutorial 54 / 62
![Page 56: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/56.jpg)
Overview of Boosting algorithms
Adding a bad example
w1 w2 w3 w4 w5 margin.5 .0026 0 0 .4975
d1 0 +1 -.95 -.93 -.91 -.99 .0051
d2 0 +1 -.95 -.93 -.91 -.99 .0051
d3 0 +1 -.95 -.93 -.91 -.99 .0051
d4 0 +1 -.95 -.93 -.91 -.99 .0051
d5 0 -.98 +1 -.93 -.91 +.99 .0051
d6 0 -.97 -.96 +1 -.91 +.99 .0051
d7 0 -.97 -.95 -.94 +1 +.99 .0051
d8 0 -.97 -.95 -.93 -.92 +.99 .0051
d9 1 -.03 -.03 -.03 -.03 -.03 −.03
edge −.03 −.03 −.03 −.03 −.03
value -1 -.98 -.96 -.94 -.92 −.03
Warmuth (UCSC) ICML ’09 Boosting Tutorial 55 / 62
![Page 57: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/57.jpg)
Overview of Boosting algorithms
Adding a bad hypothesis
w1 w2 w3 w4 w5 w6 margin0 0 0 0 0 1
d1 0 +1 -.95 -.93 -.91 -.99 -.01 .0051
d2 0 +1 -.95 -.93 -.91 -.99 -.01 .0051
d3 0 +1 -.95 -.93 -.91 -.99 -.01 .0051
d4 0 +1 -.95 -.93 -.91 -.99 -.01 .0051
d5 0 -.98 +1 -.93 -.91 +.99 -.01 .0051
d6 0 -.97 -.96 +1 -.91 +.99 -.01 .0051
d7 0 -.97 -.95 -.94 +1 +.99 -.01 .0051
d8 0 -.97 -.95 -.93 -.92 +.99 -.01 .0051
d9 1 -.03 -.03 -.03 -.03 -.03 -.02 .0051
edge −.03 −.03 −.03 −.03 −.03 −.02
value -1 -.98 -.96 -.94 -.92 -.03
Warmuth (UCSC) ICML ’09 Boosting Tutorial 56 / 62
![Page 58: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/58.jpg)
Overview of Boosting algorithms
Adding a bad hypothesis
w1 w2 w3 w4 w5 w6 margin0 0 0 0 0 1
d1 0 +1 -.95 -.93 -.91 -.99 -.01 −.01
d2 0 +1 -.95 -.93 -.91 -.99 -.01 −.01
d3 0 +1 -.95 -.93 -.91 -.99 -.01 −.01
d4 0 +1 -.95 -.93 -.91 -.99 -.01 −.01
d5 0 -.98 +1 -.93 -.91 +.99 -.01 −.01
d6 0 -.97 -.96 +1 -.91 +.99 -.01 −.01
d7 0 -.97 -.95 -.94 +1 +.99 -.01 −.01
d8 0 -.97 -.95 -.93 -.92 +.99 -.01 −.01
d9 1 -.03 -.03 -.03 -.03 -.03 -.02 −.02
edge −.03 −.03 −.03 −.03 −.03 −.02
value -1 -.98 -.96 -.94 -.92 -.03 -.02
Warmuth (UCSC) ICML ’09 Boosting Tutorial 56 / 62
![Page 59: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/59.jpg)
Overview of Boosting algorithms
Adding a bad hypothesis
w1 w2 w3 w4 w5 w6 margin0 0 0 0 0 1
d1 0 +1 -.95 -.93 -.91 -.99 -.01 −.01
d2 0 +1 -.95 -.93 -.91 -.99 -.01 −.01
d3 0 +1 -.95 -.93 -.91 -.99 -.01 −.01
d4 0 +1 -.95 -.93 -.91 -.99 -.01 −.01
d5 0 -.98 +1 -.93 -.91 +.99 -.01 −.01
d6 0 -.97 -.96 +1 -.91 +.99 -.01 −.01
d7 0 -.97 -.95 -.94 +1 +.99 -.01 −.01
d8 0 -.97 -.95 -.93 -.92 +.99 -.01 −.01
d9 1 -.03 -.03 -.03 -.03 -.03 -.02 −.02
edge −.03 −.03 −.03 −.03 −.03 −.02
value -1 -.98 -.96 -.94 -.92 -.03 -.02
Warmuth (UCSC) ICML ’09 Boosting Tutorial 56 / 62
![Page 60: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/60.jpg)
Overview of Boosting algorithms
Adding a bad hypothesis
w1 w2 w3 w4 w5 w6 margin.5 0 0 0 .5 0
d1 0 +1 -.95 -.93 -.91 -.99 -.01 +.005
d2 0 +1 -.95 -.93 -.91 -.99 -.01 +.005
d3 0 +1 -.95 -.93 -.91 -.99 -.01 +.005
d4 0 +1 -.95 -.93 -.91 -.99 -.01 +.005
d5 0 -.98 +1 -.93 -.91 +.99 -.01 +.005
d6 0 -.97 -.96 +1 -.91 +.99 -.01 +.01
d7 0 -.97 -.95 -.94 +1 +.99 -.01 +.01
d8 0 -.97 -.95 -.93 -.92 +.99 -.01 +.01
d9 1 -.03 -.03 -.03 -.03 -.03 -.02 −.03
Warmuth (UCSC) ICML ’09 Boosting Tutorial 56 / 62
![Page 61: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/61.jpg)
Overview of Boosting algorithms
Synopsis
LPBoost often unstable
For safety, add relative entropy regularization
Corrective algs
Sometimes easy to codeFast per iteration
Totally corrective algs
Smaller number of iterationsFaster overall time when ε small
Weak versus strong oracle makes a big difference in practice
Warmuth (UCSC) ICML ’09 Boosting Tutorial 57 / 62
![Page 62: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/62.jpg)
Overview of Boosting algorithms
O( log Nε2
) iteration bounds
Good
Bound is major design tool
Any reasonable Boosting algorithm should have this bound
Bad
Bound is weak
ln Nε2≥ N
ε = .01 N ≤ 1.2 105
ε = .001 N ≤ 1.7 107
Why are totally corrective algorithms much better in practice?
Warmuth (UCSC) ICML ’09 Boosting Tutorial 58 / 62
![Page 63: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/63.jpg)
Overview of Boosting algorithms
Lower bounds on the number of iterations
Majority of Ω( log Ng2 ) hypotheses for achieving consistency with
weak oracle of guarantee g [Fr95]
Easy: Ω( 1ε2
) iteration bound for getting within ε of hard marginwith strong oracle
Harder: Ω( log Nε2
) iteration bound for stron oracle [Ne83?]
Warmuth (UCSC) ICML ’09 Boosting Tutorial 59 / 62
![Page 64: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/64.jpg)
Conclusion and Open Problems
Outline
1 Introduction to Boosting
2 What is Boosting?
3 Entropy Regularized LPBoost
4 Overview of Boosting algorithms
5 Conclusion and Open Problems
Warmuth (UCSC) ICML ’09 Boosting Tutorial 60 / 62
![Page 65: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/65.jpg)
Conclusion and Open Problems
Conclusion
Adding relative entropy regularization of LPBoostleads to good boosting alg.
Boosting is instantiation of MaxEnt and MinxEnt principles[Jaines 57,Kullback 59]
Relative entropy regularizationsmoothes one-norm regularization
Open
When hypotheses have one-sided error thenO( log N
ε) iterations suffice [As00,HW03]
Does ERLPBoost have O( log Nε
) bound when hypothesesone-sided?
Replace geometric optimizers by entropic ones
Compare ours with Freund’s algorithms that don’t just cap, butforget examplesWarmuth (UCSC) ICML ’09 Boosting Tutorial 61 / 62
![Page 66: ICML 2009 Tutorial Survey of Boosting from an Optimization ...vishy/introml/notes/Boosting1.pdf · Survey of Boosting from an Optimization Perspective ... AdaBoost Initialize t =](https://reader038.fdocuments.us/reader038/viewer/2022102709/5b15e10d7f8b9a824f8bdc07/html5/thumbnails/66.jpg)
Conclusion and Open Problems
Acknowledgment
Rob Schapire and Yoav Freund for pioneering Boosting
Gunnar Ratsch for bringing in optimization
Karen Glocer for helping with figures and plots
Warmuth (UCSC) ICML ’09 Boosting Tutorial 62 / 62