A decision-theoretic generalization of on-line learning...

25
AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis A decision-theoretic generalization of on-line learning and an application to boosting [1] From Regret Learning to AdaBoost Xing Wang Department of Computer Science, TAMU Date: May 6, 2015

Transcript of A decision-theoretic generalization of on-line learning...

Page 1: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

A decision-theoretic generalization of on-linelearning and an application to boosting [1]

From Regret Learning to AdaBoost

Xing Wang

Department of Computer Science, TAMU

Date: May 6, 2015

Page 2: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Table of Contents

1 AdaBoost Algorithm

2 Upper Bound for Adaboost Algorithm

3 Experiment EvaluationExperiment 1Experiment 2

4 Generalization Analysis

Page 3: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

External Regret Learning

Initial∑N

i=1 w1i = 1, w1

i ∈ [0, 1];for t = 1 . . .T do

get pt = w t/∑

w ti ;

receive loss vector l t ∈ [0, 1]N ;suffer loss pt · l t ;update weight w t+1

i = w ti β

l ti

endAlgorithm 1: PW Algorithm

Page 4: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

From Regret to Adaptive Boosting

input : N labeled samples (x1, y1), . . .distribution D over N samplesweak learning algorithm WeakLearn

Initial w1i = D(i);

for t = 1 . . .T doprovide WeakLearn with distribution pt = w t/

∑w ti over

samples, get a hypothesis ht : X → [0, 1];

calculate the error of ht , εt =∑N

i=1 pti |ht(xi )− yi |;

set βt = εt/(1− εt), update weight vector

w t+1i = w t

i β1−|ht(xi )−yi |t ;

endAlgorithm 2: Adaboost

hf (x) =

{1 if

∑Tt=1 ht(x)log(1/βt) ≥ 0.5

∑Tt=1 log(1/βt)

0 otherwise(1)

Page 5: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Table of Contents

1 AdaBoost Algorithm

2 Upper Bound for Adaboost Algorithm

3 Experiment EvaluationExperiment 1Experiment 2

4 Generalization Analysis

Page 6: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Error Bound for AdaBoost

Theorem The error for the final hypothesis hf ,ε =

∑i :hf (xi )6=yi

D(i), is bounded by ε ≤∏T

t=1 2√εt(1− εt)

Figure 1: 2√εt(1− εt)

Page 7: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Theorem proof, part 1

wT+1i = D(i)

T∏t=1

β1−|ht(xi )−yi |t (2)

hf (x) = 1 if∑T

t=1 ht(x)log(1/βt) ≥ 0.5∑T

t=1 log(1/βt).Then hf makes mistake (hf (x) 6= yi ) is equivalent to

T∏t=1

β−|ht(xi )−yi |t ≥ (

T∏t=1

βt)−1/2 (3)

plug 3 in 2 for the mislabeled samples, we have

N∑i

wT+1i ≥

∑i :hf (x) 6=yi

wT+1i ≥ (

∑i :hf (x)6=yi

D(i))(T∏t=1

βt)1/2 = ε(

T∏t=1

βt)1/2

(4)

Page 8: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Theorem proof, part 2

N∑i=1

w t+1i =

N∑i=1

w ti β

1−|ht(xi )−yi |t

≤N∑i=1

w ti (1− (1− β)(1− |ht(xi )− yi |))

= (N∑i=1

w ti )(1− (1− εt)(1− βt))

(5)

where εt =∑N

i w ti |ht(xi )− yi |/

∑Nj=1 w

tj

N∑i=1

wT+1i ≤

N∑i=1

w1i

T∏t=1

(1− (1− εt)(1− βt))

≤T∏t=1

(1− (1− εt)(1− βt))

(6)

Page 9: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Theorem proof, part 3

Based on 4 ε(∏T

t=1 βt)1/2 ≤

∑Ni=1 w

T+1i and 6∑N

i=1 wT+1i ≤

∏Tt=1(1− (1− εt)(1− βt)), we get:

ε ≤T∏t=1

1− (1− εt)(1− βt)√βt

(7)

The right part get minimal value when βt = εt/(1− εt), plug inthis value and finish the proof ε ≤ 2T

∏Tt=1

√εt(1− εt).

Page 10: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Table of Contents

1 AdaBoost Algorithm

2 Upper Bound for Adaboost Algorithm

3 Experiment EvaluationExperiment 1Experiment 2

4 Generalization Analysis

Page 11: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Experiment settings

Two dataset:

DRIVE [2] retinal image, blood vessel vs backgroundUCI [4] Japanese credit screening dataset

Decision Tree as weak learner,

package from sklearnmax depth of 4initial sample weight w1

i = 0.5/|j : lj == li |

Page 12: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Retinal blood vessel / background classification

(a) (b)

20 training images, a total of 4,541,006 pixels, 569,415 bloodvessel pixel.

two shape features, energy and symmetry derived from daisygraph [3].

Page 13: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Evalution results on the Retina Image

Figure 2: εt → 0.5, βt → 1

There is little update on the sample weightlog(1/βt)→ 0, the corresponding classifier contribute less.2√εt(1− εt)→ 1, no reduce on 2T

∏Tt=1

√εt(1− εt)

Page 14: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Credit Screening

UCI Japanese Credit Screening : http://goo.gl/4gBRXb, 532samples.

Feature used : 2,3,8,11,14,15. six continuous features.

Class label: +(296)/-(357)

Page 15: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Evalution results on the Credit Screening

εt of each round is below 0.4ε on training set converge to 0 after 40 rounds.

Page 16: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Table of Contents

1 AdaBoost Algorithm

2 Upper Bound for Adaboost Algorithm

3 Experiment EvaluationExperiment 1Experiment 2

4 Generalization Analysis

Page 17: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

PAC framework and VC dimension

Based on [5], with probability 1− δ,

errortrue(h) ≤ errortrain(h) +

√ln(H) + ln(1/δ)

2m(8)

H is the VC dimension of the hypothesis class

m is the sample numbers

δ = 0.05 for later analysis

Page 18: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

VC dimension

VC dimension of a class of hypothesis is the largest number ofsamples, any assignment of label to the samples could beseperated by one hypotheiss in the hypothesis class.Example: In one-dimension, with hypothesis class as{+/− x > a}.

exists two samples, always separable

any three samples, exist one label assignment not separable

Page 19: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

VC dimension of decision tree

The VC dimesion of decision tree of depth k on n-dimension spaceis bounded by:

Lower bound: 2k−1(n + 1)

Upper bound[5]: 2(2n)2k−1

Page 20: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

VC dimension of the Adaboost

Let H be the class of hypothesis given by the WeakLearner withVC dimension d ≥ 2, then the VC dimesion of the hypothesis giverby Adaboost after T rounds is at most

2(d + 1)(T + 1)log2(e(T + 1)) (9)

Page 21: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Mean of Leave-one-out generalization test

Figure 3: Generalization test on Credit Screening

The optimal interation number given by PAC framework isless than the optimal iterations needed.Consistent with the papers results.

Page 22: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Thanks, Q&A

Page 23: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Reference I

Freund, Yoav, and Robert E. Schapire. ”A decision-theoreticgeneralization of on-line learning and an application toboosting.” Journal of computer and system sciences 55.1(1997): 119-139.

J.J. Staal, M.D. Abramoff, M. Niemeijer, M.A. Viergever, B.van Ginneken, ”Ridge based vessel segmentation in colorimages of the retina”, IEEE Transactions on Medical Imaging,2004, vol. 23, pp. 501-509.

Huajun, Ying, Wang Xing, and Liu Jyh-Charn. ”Statisticalpattern analysis of blood vessel features on retina images andits application to blood vessel mapping algorithms.” EMBC2014.

Page 24: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Reference II

Lichman, M. (2013). UCI Machine Learning Repository[http://archive.ics.uci.edu/ml]. Irvine, CA: University ofCalifornia, School of Information and Computer Science.

Luke Zettlemoyer. PAC-learning, VC Dimesion. UW, 2012.

Page 25: A decision-theoretic generalization of on-line learning ...students.cse.tamu.edu/xingwang/courses/ece689607/regret_boost.pdf · A decision-theoretic generalization of on-line learning

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Iteration statistic

There are cases the boost iterate less than 40 times. Theiteration ends because the error rate does not change.