Jacob Gardner · 2015. 3. 20. · Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner, Kilian Q....

Post on 14-Aug-2021

2 views 0 download

Transcript of Jacob Gardner · 2015. 3. 20. · Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner, Kilian Q....

Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner, Kilian Q. Weinberger, Yixin Chen

Support Vector Elastic Network

“Sven the Terrible”

Traditional Computer Science

Data

ProgramOutput

Computer

Traditional CS:

Machine Learning

Data

ProgramOutput

Computer

Traditional CS:

Machine Learning:

Data

OutputProgram

Computer

Support Vector Machines

w >x

min

w

1

2

kwk22 + CnX

i=1

max(0, 1� yi(w>xi))

2}

L2 Regularization.

}

Squared hinge loss.

14644 Citations

Published in ML journals

Usable means MATLAB

Fast means parallel

Many GPU Implementations

Support Vector Machines

w >x

min

w

1

2

kwk22 + CnX

i=1

max(0, 1� yi(w>xi))

2}

L2 Regularization.

}

Squared hinge loss.

14644 Citations

Published in ML journals

Usable means MATLAB

Fast means parallel

Many GPU Implementations

Elastic Net/Lasso

min�

kX� � yk22 + �2k�k22such that |�|1 t

13856 Citations

Published in stats journals

Usable means R

Fast means Fortran

Zero GPU Implementations

min�

kX� � yk22 + �2k�k22such that |�|1 t

13856 Citations

Published in stats journals

Usable means R

Fast means Fortran

Zero GPU Implementations

Elastic Net/Lasso

min�

kX� � yk22 + �2k�k22such that |�|1 t

13856 Citations

Published in stats journals

Usable means R

Fast means Fortran

Zero GPU Implementations

Elastic Net/Lasso

min�

kX� � yk22 + �2k�k22such that |�|1 t

t

0 0.5 1 1.50.2

0

0.2

0.4

0.6 Glmnet

0 0.5 1 1.50.2

0

0.2

0.4

0.6 SVEN (GPU)

Coe

ffici

ents

�i

L1 budget t L1 budget t

Equivalence of regularization path

L1 Budget

Elastic Net/Lasso

+ interpretable+ parallel + scales to large data + multi-platform

- slow - does not scale

- not interpretable

Elastic Net SVM

Reductions

Problem A Problem B

Solution BSolution A

Elastic Net SVM

Input X,Y Input Xnew,Ynew

Output � ↵Output

Reductions

Problem A Problem B

Solution BSolution A

[n,p] = size(X); Xnew = [bsxfun(@minus,X,Y./t) bsxfun(@plus,X,Y./t)]'; Ynew = [ones(p,1); -ones(p,1)]; C = 1/(2*lambda);

alpha = C * max(1 - Ynew.*(Xnew*model.w),0); beta = t*(alpha(1:p) - alpha(p+1:2*p)) / sum(alpha);

model = trainsvmGPU(Ynew,sparse(Xnew),['-q -s 1 -c ' num2str(C)]);

Input X,Y Input Xnew,Ynew

Output � ↵Output

Elastic Net SVMfunction beta = SVEN(X,Y,t,lambda)

Results

0 0.5 1 1.50.2

0

0.2

0.4

0.6 Glmnet

0 0.5 1 1.50.2

0

0.2

0.4

0.6 SVEN (GPU)

Coe

ffici

ents

�i

L1 budget t L1 budget t

Equivalence of regularization path

ResultsO

ther

alg

. run

time

(sec

)

101

MITFaces [n=489410, p=361] Yahoo [n=141397, p=519] YMSD [n=463715, p=90]

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

FD [n=400000, p=900]

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) runtime (sec)100

100

101

102

102 101100 102

100

101

102

100 10110-110-1

100

101

101

101

102

102

glmnet SVEN (CPU)Shotgun L1_Ls

n>>d datasets

O(d2)Running time:

Or…

ResultsO

ther

alg

. run

time

(sec

)

GLI85 [n=85, p=22283] arcene [n=900, p=10000] SMKCAN187 [n=187, p=19993] GLABRA180 [n=180, p=49151]

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

100

10-1

10-2

101

10010-110-2 101 10010-1 101 102

10-1

100

101

102

10010-1 101

10-1

100

101

10-1

100

101

102

10010-1 101 102

glmnet SVEN (CPU)Shotgun L1_Ls

PEMS [n=440, p=138672] scene15 [n=544, p=71963] dorothea [n=800, p=88119] E2006 [n=3308, p=72812]

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) fa

ster

SVEN (GPU) s

lower

SVEN (GPU) runtime (sec)10010-1 101 102

10-1

100

101

102

10-1

100

101

102

10010-1 101 102 10010-1 101 10210-1

100

101

102

100

101

102

103

100 101 102 103

d>>n datasets

Running time: O(n2)

Conclusion

Elastic Net and SVM are equivalent problems.

Many optimizations only for SVM now apply to Elastic Net.

This leads to the fastest Elastic Net solver we are aware of.

Questions?

“Sven the Nice?”