A decision-theoretic generalization of on-line learning...

Post on 11-Jul-2020

4 views 0 download

Transcript of A decision-theoretic generalization of on-line learning...

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

A decision-theoretic generalization of on-linelearning and an application to boosting [1]

From Regret Learning to AdaBoost

Xing Wang

Department of Computer Science, TAMU

Date: May 6, 2015

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Table of Contents

1 AdaBoost Algorithm

2 Upper Bound for Adaboost Algorithm

3 Experiment EvaluationExperiment 1Experiment 2

4 Generalization Analysis

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

External Regret Learning

Initial∑N

i=1 w1i = 1, w1

i ∈ [0, 1];for t = 1 . . .T do

get pt = w t/∑

w ti ;

receive loss vector l t ∈ [0, 1]N ;suffer loss pt · l t ;update weight w t+1

i = w ti β

l ti

endAlgorithm 1: PW Algorithm

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

From Regret to Adaptive Boosting

input : N labeled samples (x1, y1), . . .distribution D over N samplesweak learning algorithm WeakLearn

Initial w1i = D(i);

for t = 1 . . .T doprovide WeakLearn with distribution pt = w t/

∑w ti over

samples, get a hypothesis ht : X → [0, 1];

calculate the error of ht , εt =∑N

i=1 pti |ht(xi )− yi |;

set βt = εt/(1− εt), update weight vector

w t+1i = w t

i β1−|ht(xi )−yi |t ;

endAlgorithm 2: Adaboost

hf (x) =

{1 if

∑Tt=1 ht(x)log(1/βt) ≥ 0.5

∑Tt=1 log(1/βt)

0 otherwise(1)

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Table of Contents

1 AdaBoost Algorithm

2 Upper Bound for Adaboost Algorithm

3 Experiment EvaluationExperiment 1Experiment 2

4 Generalization Analysis

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Error Bound for AdaBoost

Theorem The error for the final hypothesis hf ,ε =

∑i :hf (xi )6=yi

D(i), is bounded by ε ≤∏T

t=1 2√εt(1− εt)

Figure 1: 2√εt(1− εt)

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Theorem proof, part 1

wT+1i = D(i)

T∏t=1

β1−|ht(xi )−yi |t (2)

hf (x) = 1 if∑T

t=1 ht(x)log(1/βt) ≥ 0.5∑T

t=1 log(1/βt).Then hf makes mistake (hf (x) 6= yi ) is equivalent to

T∏t=1

β−|ht(xi )−yi |t ≥ (

T∏t=1

βt)−1/2 (3)

plug 3 in 2 for the mislabeled samples, we have

N∑i

wT+1i ≥

∑i :hf (x) 6=yi

wT+1i ≥ (

∑i :hf (x)6=yi

D(i))(T∏t=1

βt)1/2 = ε(

T∏t=1

βt)1/2

(4)

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Theorem proof, part 2

N∑i=1

w t+1i =

N∑i=1

w ti β

1−|ht(xi )−yi |t

≤N∑i=1

w ti (1− (1− β)(1− |ht(xi )− yi |))

= (N∑i=1

w ti )(1− (1− εt)(1− βt))

(5)

where εt =∑N

i w ti |ht(xi )− yi |/

∑Nj=1 w

tj

N∑i=1

wT+1i ≤

N∑i=1

w1i

T∏t=1

(1− (1− εt)(1− βt))

≤T∏t=1

(1− (1− εt)(1− βt))

(6)

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Theorem proof, part 3

Based on 4 ε(∏T

t=1 βt)1/2 ≤

∑Ni=1 w

T+1i and 6∑N

i=1 wT+1i ≤

∏Tt=1(1− (1− εt)(1− βt)), we get:

ε ≤T∏t=1

1− (1− εt)(1− βt)√βt

(7)

The right part get minimal value when βt = εt/(1− εt), plug inthis value and finish the proof ε ≤ 2T

∏Tt=1

√εt(1− εt).

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Table of Contents

1 AdaBoost Algorithm

2 Upper Bound for Adaboost Algorithm

3 Experiment EvaluationExperiment 1Experiment 2

4 Generalization Analysis

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Experiment settings

Two dataset:

DRIVE [2] retinal image, blood vessel vs backgroundUCI [4] Japanese credit screening dataset

Decision Tree as weak learner,

package from sklearnmax depth of 4initial sample weight w1

i = 0.5/|j : lj == li |

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Retinal blood vessel / background classification

(a) (b)

20 training images, a total of 4,541,006 pixels, 569,415 bloodvessel pixel.

two shape features, energy and symmetry derived from daisygraph [3].

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Evalution results on the Retina Image

Figure 2: εt → 0.5, βt → 1

There is little update on the sample weightlog(1/βt)→ 0, the corresponding classifier contribute less.2√εt(1− εt)→ 1, no reduce on 2T

∏Tt=1

√εt(1− εt)

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Credit Screening

UCI Japanese Credit Screening : http://goo.gl/4gBRXb, 532samples.

Feature used : 2,3,8,11,14,15. six continuous features.

Class label: +(296)/-(357)

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Evalution results on the Credit Screening

εt of each round is below 0.4ε on training set converge to 0 after 40 rounds.

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Table of Contents

1 AdaBoost Algorithm

2 Upper Bound for Adaboost Algorithm

3 Experiment EvaluationExperiment 1Experiment 2

4 Generalization Analysis

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

PAC framework and VC dimension

Based on [5], with probability 1− δ,

errortrue(h) ≤ errortrain(h) +

√ln(H) + ln(1/δ)

2m(8)

H is the VC dimension of the hypothesis class

m is the sample numbers

δ = 0.05 for later analysis

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

VC dimension

VC dimension of a class of hypothesis is the largest number ofsamples, any assignment of label to the samples could beseperated by one hypotheiss in the hypothesis class.Example: In one-dimension, with hypothesis class as{+/− x > a}.

exists two samples, always separable

any three samples, exist one label assignment not separable

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

VC dimension of decision tree

The VC dimesion of decision tree of depth k on n-dimension spaceis bounded by:

Lower bound: 2k−1(n + 1)

Upper bound[5]: 2(2n)2k−1

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

VC dimension of the Adaboost

Let H be the class of hypothesis given by the WeakLearner withVC dimension d ≥ 2, then the VC dimesion of the hypothesis giverby Adaboost after T rounds is at most

2(d + 1)(T + 1)log2(e(T + 1)) (9)

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Mean of Leave-one-out generalization test

Figure 3: Generalization test on Credit Screening

The optimal interation number given by PAC framework isless than the optimal iterations needed.Consistent with the papers results.

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Thanks, Q&A

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Reference I

Freund, Yoav, and Robert E. Schapire. ”A decision-theoreticgeneralization of on-line learning and an application toboosting.” Journal of computer and system sciences 55.1(1997): 119-139.

J.J. Staal, M.D. Abramoff, M. Niemeijer, M.A. Viergever, B.van Ginneken, ”Ridge based vessel segmentation in colorimages of the retina”, IEEE Transactions on Medical Imaging,2004, vol. 23, pp. 501-509.

Huajun, Ying, Wang Xing, and Liu Jyh-Charn. ”Statisticalpattern analysis of blood vessel features on retina images andits application to blood vessel mapping algorithms.” EMBC2014.

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Reference II

Lichman, M. (2013). UCI Machine Learning Repository[http://archive.ics.uci.edu/ml]. Irvine, CA: University ofCalifornia, School of Information and Computer Science.

Luke Zettlemoyer. PAC-learning, VC Dimesion. UW, 2012.

AdaBoost Algorithm Upper Bound for Adaboost Algorithm Experiment Evaluation Generalization Analysis

Iteration statistic

There are cases the boost iterate less than 40 times. Theiteration ends because the error rate does not change.