Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K....

Online Learning of Maximum Margin Classifiers

Kohei HATANOKyusyu University

(Joint work with K. Ishibashi and M. Takeda)

p-Norm with Bias

COLT 2008

Plan of this talk

1. Introduction 2. Preliminalies

― ROMMA

3. Our result– Our new algorithm – Our implicit reduction

4. Experiments

Maximum Margin Classification

margin

• SVMs [Boser et al. 92]– 2-norm margin

• Boosting [Freund&Schapire 97] – ∞-norm margin (approximtely)

• Why maximum (or large) margin?– Good generalization [Schapire et al. 98]

[Shawe-Taylor et al. 98]– Formulated as convex

optimization problems(QP, LP)

Scaling up Max. Margin Classification

1. Decomposition Methods (for SVMs)– Break original QP into smaller QPs – SMO [Platt 99],SVMlight [Joachims 99], LIBSVM [Chang & Lin 01]– state-of-the-art implementations

2. Online Learning (our approach)

Online Learning

Advantages of online Learning • Simple & easy to implement• Uses less memory• Adaptive for changing concepts

Online Learning AlgorithmFor t=1 to T1. Receive an instance xt in Rn

2. Guess a label ŷt=sign(wt ∙ xt+bt)

3. Receive the label yt in {-1,1}

4. Update (wt+1,bt+1)=UPDATE_RULE(wt,bt,xt,yt)

(w t,b t)(w t+1,b t+1

Online Learning Algorithms for maximum margin classification

• Max Margin Perceptron [Kowalzyk 00]

• ROMMA [Li & Long 02]

• ALMA [Gentile 01]

• LASVM [Bordes et al. 05]

• MICRA [Tsampouka&Shawe-Taylor 07]

• Pegasos [Shalev-Shwalz et al. 07]

• Etc.

Most of online algs cannot learn hyperplane with bias!

hyperplane with bias

hyperplane w/o bias

Typical Reduction to deal with bias [Cf. Cristianini& Shawe-Taylor 00]

Adding an extra dimension corresponding bias.

),( bu

Original space Augmented space1),(~ n

jj R Rxx

),( bu )/,(~ Rbuu

jjR xmax:instance

hyperplane 1u

margin (over normalized Instances)

jjR x~max:~

R~~)~~(y

minγ~ jj

jj~~b xuxu NOTE: is equivalent with (u,b) )u~ (

This reduction weaken the guarantee of margin:

2~ γγγ

→it might cause significant difference in genealization!

Our New Online Learning Algorithm

PUMMA(P-norm Utilizing Maximum Margin Algorithm)

• PUMMA can learn maximum margin classifiers with bias directly (without using the typical reduction!).

• Margin is defined as p-norm (p≥2)– For p=2, similar to Perceptron.– For p=O(ln n) [Gentile ’03], similar to Winnow [Littlestone ‘89].

Fast when the target is sparse.

• Extended to linearly inseparable case (omitted).– Soft margin with2-norm slack variables.

Problem of finding the p-norm maximum margin hyperplane [Cf. Mangasarian 99]

)1( 1y :to sub.

minarg

T,...j)b(

Given: (linearly separable) S=((x1,y1),…,(xT,yT)),

Goal: Find an approximate solution of (w*,b*)　　　

　　　We want an online alg. solving the problem with small # of updates.

q-norm (dual norm)1/p+1/q=1

E.g.p=2, q=2p=∞, q=1

ROMMA(Relaxed Online Maximum Margin Algorithm)[Li&Long,’02]

,1)(y :to sub. 21

.minarg

Given: S=((x1,y1),…,(xt-1,yt-1)), xt,1. Predict ŷt=sign(wt∙xt), and receive yt

2. If yt(wt ·xt )<1-δ (margin is “insufficient”), 3. update:

4. Otherwise, wt+1=wt

Constraint over the last example which causes an update

Constraint over the last hyperplane

2 constraints only!

NOTE: bias is fixed with 0

ROMMA [Li&Long,’02]

weght space

1,...,4)(j

,1y :to sub.21

bias) (without SVM

,1)(y :to sub.21

feasible region of SVM

Solution of ROMMA

where , Otherwise, (ii)

where ,

, If (i)

Solution of ROMMA is an additive update:

,1 :to sub. 21

.min),(

bias is optimized

q-norm(1/p+1/q=1)

xpost, xneg

: last positive and negative examples which incur updates

qii wwsignf

link function [Grove et al. 97]

Given: S=((x1,y1),…,(xt-1,yt-1)), xt,1.Predict ŷt=sign(wt∙xt), and receive yt

2.If yt(wt ·xt +bt)>1-δ, update:

3.Otherwise, wt+1=wt

○≧1

2tt www

○≧1

2)(qtt wwfw

Solution of PUMMA

Observation:For p=2, the solution is the same as that of ROMMA for zt = xt

pos – xtneg.

,)(βα)(βα.)β,α(

,βyα

.α,yα

cases, either Inmethod. Newton the by solved is which

argmin

where Otherwise, (ii)

If (i)

Solution of PUMMA is found numerically:

xpost, xneg

: last positive and negative examples which incur updates

Our (implicit) reduction which preserves the margin

),...,1( 1

:to sub.21

.minarg),( 2

),...,1,,...1(

:to sub.21

.minarg~ 2

xx negj

ww ~ Thm. * = -

hyperplane with bias hyperplane without biasover pairs of positive and negative instances

Main Result

Thm• Suppose that given S=((x1,y1),…,(xT,yT)),

there exists a linear classifier (u,b) , s.t. yt(u·x+b)≥1 for t=1,…,T.

• (# of updates of PUMMAp(δ)) ≤(p-1)uq2R2/ δ2

• After (p-1)u q2R2/ δ2 updates,

PUMMAp(δ) outputs a hypothesis with p-norm margin ≥ (1-δ)γ (γ: margin of (u,b) ).

.max where,...1 ptTt

similar to those of previous algorithms

• example (x,y)- x: n(=100)-dimensional {-1,+1}-valued vector- y=f(x),where

• generate 1000 examples randomly• 3 datasets (b=1 (small), 9(medium), 15(large))• Compare ROMMA(p=2), ALMA(p=2ln n).

Experiment over artificial data

),(sign)( 1621 bxxxf x

Results over Artificial Data

NOTE1: margin is defined over the original space (w/o reduction) NOTE2: We omit the results for b=9 for clarity .

0 0.5 1 1.5 2 2.5x　104

- 0.08

- 0.06

- 0.04

- 0.02

0.0225

#　of　updates

PUMMA(15)PUMMA(1)ROMMA(15)ALMA(1)

103 104 105 106- 0.2

- 0.15

- 0.05

0.0461p=2 ln (N)

#　of　updates

ALMA(15)ALMA(9)ALMA(1)PUMMA(15)PUMMA(9)PUMMA(1)

ROMMAPUMMA ALMA

# of updates # of updates

# of updates

p=2 p=2ln n

Computation Time

• time

For p=2 ， PUMMA is faster than ROMMA.For p=2ln n ， PUMMA is faster than ALMA even though PUMMA uses Newton method.

p=2 p=2ln n

large← 　　　　 bias 　　　　→ small

Sec. Se

Results over UCI Adult data

• result

# of data 32561

algorithm sec.maginrate

SVMlight 5893 100ROMMA

(99%) 71296 99.03PUMMA

(99%) 44480 99.14

•Fix p=2.•2-norm soft margin formulation for linearly inseparable data.•Run ROMMA and PUMMA until they achieves 99% of the maximum margin.

Results over MNIST data

# of data

algorithm sec. margin rate(%)

SVMlight 401.36 100ROMMA

(99%) 1715.57 93.5PUMMA

(99%) 1971.30 99.2

•Fix p=2.•Use polynomial kernels.•2-norm soft margin formulation for linearly inseparable data.•Run ROMMA and PUMMA until they achieves 99% of the maximum margin.

Summary

• PUMMA can learn p-norm maximum margin classifiers with bias directly.– # of updates is similar to those of previous algs.– achieves (1-δ) times the maximum p-norm margin.

• PUMMA outperforms other online algs when the underlying hyperplane has large bias.

Future work

• Maximizing ∞-norm margin directly.• Tighter bounds of # of updates:– In our experiments, PUMMA is faster especially

when bias is large (like WINNOW). – Our current bound does not reflect this fact.

Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K....

Documents

Transcript of Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K....

Resonances in Atomic and Molecular Continuum Processeshatano-lab.iis.u-tokyo.ac.jp/hatano/NonHermite/... · Ugo Fano’s theory •isolated resonance for one open channel •isolated

1 The Ewing Public Schools 2011-12 Overview of NCLB Results presented by Dr. Danita Ishibashi Assistant Superintendent.

Executive summary · Web viewSafety studies of protein-glutaminase produced by Chryseobacterium proteolyticum - reverse mutation test in bacteria. [Sui (Study Director); Hatano Research

Mitsuru Imaizumi HTV-5 Spacecraft Power System, Kyusyu Inst. Tech. Dec. 11, 2015.

The Effect of Vignetting in Hinode/XRT Optical System Kazunori Ishibashi (NWRA) Rev. 02 (DRAFT) June 12 th, 2008.

Akihiro Ishibashi and Robert M Wald- Dynamics in non-globally-hyperbolic static spacetimes: III. Anti-de Sitter spacetime

L ICitizen Soul! D Y Hatano Hirofumi People In The Box # 25 D 0 Z … · 2020-04-26 · C 3 Z O 0 z fill JD J \ L ICitizen Soul! D Y Hatano Hirofumi People In The Box # 25 D 0 Z D

Modelling and Computer Animation of Damage Stability K. Hasegawa, K. Ishibashi, Y. Yasuda Presentation: Marcel van den Elst.

Superradiant instabilities in astrophysical systems · Superradiant instabilities in astrophysical systems Helvi Witek, V. Cardoso, A. Ishibashi, U. Sperhake CENTRA / IST, Lisbon,

General Hospital Company - IR Webcasting · 2016-12-27 · © Terumo Corporation Shoji Hatano President, General Hospital Company General Hospital Company 4

YITP Workshop - University of Tokyohatano-lab.iis.u-tokyo.ac.jp/hatano/PHHQP16/abstractbook...YITP Workshop PHHQP16 Progress in Quantum Physics with Non-Hermitian Operators August

Weighted Joint Bilateral Filter with Slope Depth Compensation Filter for Depth Map Refinement Takuya Matsuo, Norishige Fukushima and Yutaka Ishibashi VISAPP.

Case Study: Ishibashi, Japan's largest wind turbin ... · chanical engineering at Flender in Germany. Ishibashi manufactures both standard and cus-tomised gearbox designs, typically

Nanosecond laser photothermal effect-triggered ... · Nanosecond laser photothermal effect-triggered amplification of photochromic reaction in diarylethene nanoparticles Y. Ishibashi,a

Resonances in Chemical Reactions : Theory and …hatano-lab.iis.u-tokyo.ac.jp/.../1212_04_takayanagi.pdfab initio quantum chemistry calculations in chemistry field First-principles

QoE Assessment in Olfactory and Haptic Media Transmission: Influence of Inter-Stream Synchronization Error Sosuke Hoshino, Yutaka Ishibashi, Norishige.

Two Faces of the Hate Korean Campaign in Japanapjjf.org/-Ishibashi-Gaku--Ishibashi-Gaku--Narusawa... · Two Faces of the Hate Korean Campaign in Japan ... China (1894-1895) and Russia

Non-Kolmogorov Avrami Ishibashi Switching Dynamics in ... › mpi › publi › pdf › 9176_10.pdfNon-Kolmogorov-Avrami-Ishibashi Switching Dynamics in Nanoscale Ferroelectric Capacitors

Case Study: Kyusyu Sangyo University - MSC Softwarefiles.mscsoftware.com/cdn/farfuture/FrEbKgNAy_AOoIAt58... · Adams Simulation Incorporated DOE to Help Design Robot with Human Motion

The Legacy of Tanzan Ishibashi: The Books Formerly Kept by … · 2020. 1. 12. · The Legacy of Tanzan Ishibashi 11 Next, there was Chibetto ryokoki [Three years in Tibet], revised