(Useful) Information...

(Useful) Information GeometryShane Gu and Nilesh Tripuranenisg717@cam.ac.uk,nt357@cam.ac.uk

02/04/2015

AdaBoost

I Input: a pool of weak learners/rule ∈ H, training data(xi , yi ) ∈ X × {−1, 1}, and initial sample weight distribution D0.

I Weak Learners : Find a weak rule ht ∈ H that gives the smallestweighted error εt under Dt .

I Booster : Adjust sample weights

Dt+1(i) = 1Zt

Dt(i) exp(−αtyiht(xi )) (1)

where αt = 12 ln 1−εt

εtand Zt =

∑i Dt(i) exp(−αtyiht(xi ))

I Repeat until convergence or satisfactionI Output: Strong Learner/Rule F (x) = sgn(

∑t αtht(x))

2 of 13

AdaBoost

I At the end of each round, the weight of example is (exp)proportional to its loss:

Dt+1(i) = Πtt′

exp(−yiαt′ht′(xi ))Zt′

∝ exp(−yiFt(xi ))

I Total loss is proportional to Zt :∑i

exp(−yiFt(xi )) =∑

iexp(−yi (Ft−1(xi ) + αtht(xi )))

∝∑

iDt(i) exp(−yiαtht(xi )) ≡ Zt

3 of 13

Sequential Error Minimization

Find αt and ht(xi ) to minimize Zt(αt , ht):

Zt (αt , ht ) =∑

Dt (i) exp(−αtyi ht (xi )) =∑

i :yi =ht (xi )

Dt (i) exp(−αt ) +∑

i :yt 6=ht (xi )

Dt (i) exp(αt )

= (exp(αt )− exp(−αt ))(N∑

Dt (i)I(yi = ht (xi )) + exp(−αt )N∑

Dt (i)

Choose ht (xi ) to minimize weighted error.

Zt (αt , ht (xi )) =∑

Dt (i) exp(−αtyi ht (xi )) =∑i :yt =ht (xi )

Dt (i) exp(−αt ) +∑

i :yt 6=ht (xi )

Dt (i) exp(αt )

= exp(−αt )(1− εt ) + exp(αt )εt

Choose αt such that dZtdαt

= 0 =⇒ αt = 12 ln( 1−εt

4 of 13

Orthogonality of D

So, αt chosen such that dZtdαt

= 0, and choose ht(xi ) to minimizeI(yi = ht(xi )).Then, the booster constructs a new distribution Dt+1, such that thecorrelation with ht is zero:∑

iDt+1(i)yiht(xi ) = 1

Dt(i) exp(−αtyiht(xi ))yiht(xi )

= − 1Zα

dZtdαt

5 of 13

Alternative View of AdaBoost

I Weak Learners : Given Dt , find ht ∈ H minimizing weighted error

Dt(i)I(ht(xi ) = yi )

or equivalently, minimizing weighted error.I Booster : Given ht , compute Dt+1 such that∑

iDt+1(i)yiht(xi ) = 0

i.e. is the booster pursuing a distribution D such that∑i

D(i)yihj(xi ) = 0

for every hj ∈ H ?Set of Constraints Linear in D

6 of 13

Optimization Problem for AdaBoost?

Solve:

KL(D||U)

such that ∑i

D(i)yihj(xi ) = 0, ∀j

D(i) ≥ 0, ∀i∑i

D(i) = 1

Let us assume the feasible set P defined by constraints is non-empty.

7 of 13

Iterative ProjectionsI Initialize D1 = UI Choose ht ∈ H corresponding to one constraint (Weak Learner)I Find Dt+1 = argminD:

∑i D(i)yi ht (xi )=0KL(D||Dt) (Booster)

I IterateGreedy Selection of Constraints: Choose ht so that KL(Dt+1||Dt) ismaximized.

Each round of Iterative Projection is equivalent to one round ofAdaboost.

8 of 13

Equivalence ProofI Booster Using Lagrange multipliers/duality:

maxα,µminDL(α, µ,D) = KL(D||Dt) + α∑

iD(i)yiht(xi )

+µ(∑

iD(i)− 1)

0 = ∂L∂D(i) = ln D(i)

Dt(i) + 1 + αyiht(xi ) + µ

D∗(i) = Dt(i) exp(−αyiht(xi )− 1− µ) = 1Z (α)Dt(i) exp(−αyiht(xi ))

L(α) = − ln Z (α)

Choose α to minimize Zt , so D, α,Z ≡ Dt+1, αt ,Zt for same ht .I Weak Learner Find ht to maximize:

KL(Dt+1||Dt) =∑

iDt+1(i)(−αtyiht(xi )− ln Zt) = − ln Zt

Equiv to choosing ht to minimize Zt

9 of 13

Convergence of AdaBoost

I Recall that P is the feasible set of constraints, and define Q as the set ofD ∝ exp(−

∑Ni=1 λj yi hj (xi )). If d ∈ P ∩Q then by Pythagorean theorem (as before)

d uniquely solves minp∈P KL(p||U).I Dt computed by iterative projection converges to unique point d ∈ P ∩Q

By Pythagorean Theorem:

KL(D∗,Dt+1) = KL(D∗,Dt )− KL(Dt+1,Dt )

we are always getting closer!

I i.e. the loss ≥ 0 and non-increasing, so drop in loss must convergeto 0.

I Moreover if drop in loss = 0, then D ∈ PI Construction of d implies D∗ ∈ Q

10 of 13

Duality

Minimizing the exponential loss E(exp(−yF (x))) is the convexdual of solving the KL-projection problem subject to linearconstraints.

11 of 13

Afterthoughts and Bregman Divergences

Why and When does this work?For convex function F , the induced Bregman divergence is:

BF (p||q) = F (p)− F (q)−∇F (q)(p − q)

Bregman Divergences are in 1-to-1 correspondence with exponentialfamilies (i.e. contours of equal density define the Bregman distance)Theorem: For a large family of Bregman divergences, there exists aunique d∗ satisfying

I d∗ ∈ P ∩QI d∗ = argminp∈PBF (p||q0)I Pythagorean Theorem

12 of 13

CitationsR.E. Schapire, Y. Freund, Boosting: Foundations and AlgorithmsM. Collins, R. E. Schapire, and Y. Singer, ”Logistic regression,adaboost and bregman distances,” Machine Learning, vol. 48, no. 1-3,pp. 253-285, 2002.

13 of 13

(Useful) Information...

Documents

Transcript of (Useful) Information...

Ingenuity'and'Imagination'in'Early' Modern'Northern'Art'and ......Alexander&Marr& UniversityofCambridge& ajm300@cam.ac.uk& RichardOosterhoff& UniversityofCambridge& ro289@cam.ac.uk&

tr346@cam.ac.uk rc10001@cam.ac.uk …hidden side road beyond. There seems to be a clear impe-tus towards replacing this complicated pipeline with a sim-1 arXiv:2003.13402v1 [cs.CV]

Edinburgh Research Explorer · 2012-11-16 · pulse, 1 Hz repetition , at a wavelength coinciding with the a Electronic mail: pjwh4@cam.ac.uk. b Electronic mail: hjc37@cam.ac.uk.

How to do a literature search Emma Coonan emc35@cam.ac.uk.

IHE-RO Prabhakar Tripuraneni, MD, FACR, FASTRO On behalf of IHE-RO & ASTRO.

What is applied psychometrics? Tim Croudace tjc39@cam.ac.uktjc39@cam.ac.uk Department of Psychiatry John Rust jnr24@cam.ac.ukjnr24@cam.ac.uk The Psychometrics.

Python 3: Plotting simple graphs - University of …...Python 3: Plotting simple graphs Bruce Beckles mbb10@cam.ac.uk Bob Dowling rjd4@cam.ac.uk 4 February 2013 What’s in this course

Challenges in Animal Infectious Diseases Modelling James Wood Department of Veterinary Medicine jlnw2@cam.ac.uk.

THE RADICAL HUMANIST MOVEMENT - WordPress.com · Tripuraneni Gopichand, writer, director became the first state secretary of Radical Democaratic party. He was a prolific writer who

Sources and Methods of English Common Law John Bell jsb48@cam.ac.uk.

agk34, vb292, rc10001 @cam.ac.uk arXiv:1511.02680v1 [cs.CV ...

After the Honeymoon? The Next Decade of South-South Development Cooperation Emma Mawdsley eem10@cam.ac.uk.

Chemistry Add-in for Word OR 10 Joe Townsend University of Cambridge jat45@cam.ac.uk.

Introduction to the Unix Command Line - opeyemiolakitan.com...Introduction to the Unix Command Line Bob Dowling rjd4@cam.ac.uk Julian King jpk28@cam.ac.uk 8 March 2010 ... To launch

Education as Dialogue Robin Alexander University of Cambridge Contact: rja40@cam.ac.ukrja40@cam.ac.uk Download text at: .

Bionanotechnology Dr Cait MacPhee (cem48@cam.ac.uk) Dr Paul Barker (pdb30@cam.ac.uk) Mondays 12 pm, Tuesdays 11 am.

PDF - Price Protection Strategies for an Oil Company ...€¦ · Price Protection Strategies for an Oil Company E.A. Medova ∗ A. Sembos eam28@cam.ac.uk as267@cam.ac.uk Centre for

Priyanka Tripuraneni Ngan Cao 03/02/09. An angiosperm is a flowering plant. It has three basic organs, the roots, stems, and leaves. Angiosperms.

Oil and Development The Resource Curse in Practice Ben Paarmann (bp283@cam.ac.uk)

POL4: Comparative Politics · Pieter Van Houten (pjv24@cam.ac.uk) General Lectures in Lent . Christopher Bickerton (cb799@cam.ac.uk) Lectures for Michaelmas Term Modules: Please check