Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows:...

Transcript of Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows:...

Online Learning, Games, Boosting

Arindam Banerjee

. – p.1

Page 2: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Predicting with Expert Advice

Basic setting: n experts, boolean prediction

Prediction proceeds in trialsEach expert makes a boolean prediction“Master algorithm” combines the predictionsThe true label is revealed

No explicit notion of input

Master algorithms with “good” cumulative/relative loss

. – p.2

Page 3: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

The Halving Algorithm

If one of the experts is always known to be correct:

Always vote based on the majority

If the master is wrong, get rid of the experts that were wrong

Proposition: The master makes at most log2 n mistakes

. – p.3

Page 4: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Weighted Majority (Simple)

Initialize the weights wi = 1, [i]n1

Each expert makes a boolean prediction zi ∈ {0, 1}, [i]n1

The master predicts 1 if

∑

i:zi=1

wi ≥∑

i:zi=0

wi ,

and 0 otherwise

The true label ` is received

Modify the weights as follows: if zi = `, no change; if zi 6= `,wi ← wi/2, [i]n1

Theorem: The master makes at most 2.41(m + log2 n) mistakes,where m is the number of mistakes by the best expert so far

. – p.4

Page 5: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Weighted Majority (Randomized)

Initialize the weights wi = 1, [i]n1

Each expert makes a boolean prediction zi ∈ {0, 1}, [i]n1

The master predicts zi with probability wi/W , where W =∑

i wi

The true label ` is received

Modify the weights as follows: if zi = `, no change; if zi 6= `,wi ← βwi, [i]n1 for β ∈ [0, 1]

Theorem: The expected number of mistakes the master makes is atmost m ln(1/β)+ln n

1−β , where m is the number of mistakes by the bestexpert so far

Corollary: If β is adjusted dynamically, or, if the total number of trialsis known upfront, the expected number of mistakes is at mostm + ln n + O(

√m lnn)

. – p.5

Page 6: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Extensions

The range of prediction is more general, i.e., not necessarilyboolean

An explicit notion of input exists

The adversary is not arbitrary, e.g., choosing labels from afunction, a fixed distribution, etc.

Labels from a function with random noise

Learning (boolean) functions

Realizable learning

Agnostic learning

Random Noise

. – p.6

Page 7: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Game Theory

Rock Paper ScissorsRock 1

2 1 0Paper 0 1

2 1Scissors 1 0 1

2

The loss matrix M for the row player

The actual game has payoffs 2(M − 12 )

Pure strategies: Rock, Paper, Scissors

Randomized playRow player chooses distribution P

Column player chooses distribution Q

The expected loss for the row player

M(P, Q) = P T MQ =∑

ij

P (i)Q(j)M(i, j)

. – p.7

Page 8: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Sequential Play

Row player chooses a (randomized) strategy P

Given P , column player chooses a strategy Q such that the lossof row player is maximized, i.e., maxQ M(P, Q)

Row player chooses P upfront to minimize the maximum loss

minP

maxQ

M(P, Q)

If column player chooses first, then the loss of the row player

maxQ

minP

M(P, Q)

“Obviously” maxQ

minP

M(P, Q) ≤ minP

maxQ

M(P, Q)

Von Neumann says the following:

maxQ

minP

M(P, Q) = minP

maxQ

M(P, Q)

. – p.8

Page 9: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Repeated Play

Row is the learner, Column is the adversaryThe game is being played repeatedly t = 1, . . . , T

On round t, learner chooses Pt

Adversary chooses Qt

Learner gets losses M(i, Qt) for individual strategiesLearner suffers loss M(Pt, Qt)

Learner’s algorithm (Hedging)Initialize the weights w1(i) = 1, [i]n1At round t, choose

Pt(i) =wt(i)

∑

i wt(i)

Upon receiving loss M(i, Qt), update

wt+1(i) = wt(i)βM(i,Qt)

. – p.9

Page 10: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Main Results

Theorem: The sequence of strategies P1, . . . , PT , chosen by thealgorithm satisfies

T∑

t=1

M(Pt, Qt) ≤ aβminP

T∑

t=1

M(P, Qt) + cβ ln n

where aβ = ln(1/β)1−β and cβ = 1

1−β

Corollary: With β = 1

1+√

2 ln n

T

, the average per-trial loss satisfies

1

T

T∑

t=1

M(Pt, Qt) ≤ minP

1

T

T∑

t=1

M(P, Qt) + ∆T

where ∆T = O

(

√

ln nT

)

. – p.10

Page 11: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Learning in Games

Corollary: If v is the value of the game

1

T

T∑

t=1

M(Pt, Qt) ≤ v + ∆T

A (new) proof of the minimax theorem

minP

maxQ

PT MQ ≤ maxQ

1

T

∑

t

PTt MQ

≤ 1

T

∑

t

maxQ

PTt MQ =

1

T

∑

t

PTt MQt

≤ minP

1

T

∑

t

PT MQt + ∆t

≤ maxQ

minP

PT MQ + ∆t

. – p.11

Page 12: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Weak Learning

A weak learner predicts slightly better than random

The PAC setting (basics)Let X be a instance space, c : X 7→ {0, 1} be a targetconcept, H be a hypothesis space (h : X 7→ {0, 1})Let Q be a fixed but unknown distribution on X

An algorithm, after training on (xi, c(xi)), [i]m1 , selects h ∈ H

such that

Px∼Q[h(x) 6= c(x)] ≤ 1

2− γ

The algorithm is called a γ-weak learner

We assume the existence of such a learner

. – p.12

Page 13: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

The Boosting Model

Boosting converts a weak learner to a strong learner

Boosting proceeds in roundsBooster constructs Dt on X, the train setWeak learner produces a hypothesis ht ∈ H so thatPx∼Dt

[ht(x) 6= c(x)] ≤ 12 − γ

After T rounds, the weak hypotheses ht, [t]T1 are combined

into a final hypothesis hfin

We need proceduresfor obtaining Dt at each stepfor combining the weak hypotheses

. – p.13

Page 14: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Minimax and Majority Votes

Consider the mistake matrix M(h, x) = 1 if h(x) 6= c(x) and 0otherwise

The row player can choose P over HThe column player can choose Q over XNow minP maxx M(P, x) = v = maxQ minh M(h, Q)

However M(h, Q) = Px∼Q[h(x) 6= c(x)]

Now ∃Q∗, M(h, Q∗) = Px∼Q∗ [h(x) 6= c(x)] ≥ v, ∀hFrom weak learning, ∃h, Px∼Q∗ [h(x) 6= c(x)] ≤ 1

2 − γ

Hence v ≤ 12 − γ

Also ∃P ∗ such that ∀x,

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

Majority voting according to P ∗ is equivalent to c

. – p.14

Page 15: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

The row player can choose P over H

The column player can choose Q over XNow minP maxx M(P, x) = v = maxQ minh M(h, Q)

2 − γ

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

. – p.14

Page 16: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

The row player can choose P over HThe column player can choose Q over X

Now minP maxx M(P, x) = v = maxQ minh M(h, Q)

2 − γ

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

. – p.14

Page 17: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 − γ

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

. – p.14

Page 18: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 − γ

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

. – p.14

Page 19: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Now ∃Q∗, M(h, Q∗) = Px∼Q∗ [h(x) 6= c(x)] ≥ v, ∀h

From weak learning, ∃h, Px∼Q∗ [h(x) 6= c(x)] ≤ 12 − γ

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

. – p.14

Page 20: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 − γ

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

. – p.14

Page 21: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 − γ

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

. – p.14

Page 22: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 − γ

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

. – p.14

Page 23: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 − γ

M(P ∗, x) = Ph∼P∗ [h(x) 6= c(x)] ≤ v ≤ 1

2− γ <

1

2

. – p.14

Page 24: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Putting it together

Boosting maintains distribution over data

Minimax argument requires distribution over hypothesis

Change the mistake matrix to M ′ = 1−MT so thatM ′(x, h) = 1−M(h, x) = 1 if h(x) = c(x) and 0 otherwise

On round t of boostingLet Dt = Pt from hedging over rows (data)Weak learner returns ht with Px∼Dt

[ht(x) = c(x)] ≥ 12 + γ

Update weights using Qt = ht by adversary

Let hfin be the majority vote of ht, [t]T1

. – p.15

Page 25: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Putting it together

[ht(x) = c(x)] ≥ 12 + γ

. – p.15

Page 26: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Putting it together

[ht(x) = c(x)] ≥ 12 + γ

. – p.15

Page 27: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Putting it together

[ht(x) = c(x)] ≥ 12 + γ

. – p.15

Page 28: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Putting it together

[ht(x) = c(x)] ≥ 12 + γ

. – p.15

Page 29: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

Analysis of Boosting

We have M ′(Pt, ht) = Px∼Pt[ht(x) = c(x)] ≥ 1

2 + γ

Hence, from Corollary,

1

2+ γ ≤ 1

T

∑

t

M ′(Pt, ht) ≤ minx

1

T

∑

t

M ′(x, ht) + ∆T

Therefore, ∀x (and sufficiently large T )

1

T

∑

t

M ′(x, ht) ≥1

2+ γ −∆T >

1

2

Hence, the majority vote hfin(x) = c(x), ∀x

But what happens on unseen data?

. – p.16

Page 30: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 + γ

1

2+ γ ≤ 1

T

∑

t

1

T

∑

t

M ′(x, ht) + ∆T

1

T

∑

t

M ′(x, ht) ≥1

2+ γ −∆T >

1

2

. – p.16

Page 31: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 + γ

1

2+ γ ≤ 1

T

∑

t

1

T

∑

t

M ′(x, ht) + ∆T

1

T

∑

t

M ′(x, ht) ≥1

2+ γ −∆T >

1

2

. – p.16

Page 32: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 + γ

1

2+ γ ≤ 1

T

∑

t

1

T

∑

t

M ′(x, ht) + ∆T

1

T

∑

t

M ′(x, ht) ≥1

2+ γ −∆T >

1

2

. – p.16

Page 33: Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows: if zi = ‘, no change; if zi 6= ‘, wi ˆwi=2; [i]n 1 Theorem: The master makes

2 + γ

1

2+ γ ≤ 1

T

∑

t

1

T

∑

t

M ′(x, ht) + ∆T

1

T

∑

t

M ′(x, ht) ≥1

2+ γ −∆T >

1

2

. – p.16

Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows:...

Documents

Transcript of Online Learning, Games, Boostingbaner029/Teaching/Spring06/online.pdfModify the weights as follows:...