RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International...

23
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University of Wisconsin-Madison

Transcript of RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International...

Page 1: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

RSVM: Reduced Support Vector Machines

Y.-J. Lee & O. L. Mangasarian

First SIAM International Conference on Data Mining

Chicago, April 6, 2001

University of Wisconsin-Madison

Page 2: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Outline of Talk

The smooth support vector machine (SSVM)

Difficulties with nonlinear SVMs:Computational: Handling massive kernel matrix: mâ mStorage: Separating surface depends on almost entire dataset

Reduced Support Vector Machines (RSVMs)mâ m Reduced kernel : Much smaller rectangular matrix

mm : 1% to 10% of

Numerical Results

What is a support vector machine (SVM) classifier?

A new SVM solvable without an optimization package

e.g. 32,562-point dataset classified in 17 minutes comparedto 2.15 hours by a standard algorithm (SMO)

Speeds computation & reduces storage

Page 3: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

What is a Support Vector Machine?

An optimally defined surface Typically nonlinear in the input space Linear in a higher dimensional space Implicitly defined by a kernel function

Page 4: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

What are Support Vector Machines Used For?

Classification Regression & Data Fitting Supervised & Unsupervised Learning

(Will concentrate on classification)

Page 5: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Geometry of the Classification Problem2-Category Linearly Separable Case

A+

A-

x0w = í + 1

x0w = í à 1

w

Page 6: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Support Vector MachinesMaximizing the Margin between Bounding Planes

x0w = í + 1

x0w = í à 1

A+

A-

jjwjj22

w

= Margin

Page 7: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Support Vector Machines Formulation

Margin is maximized by minimizing21kw;í k2

2

÷> 0 Solve the quadratic program for some :

2÷kyk2

2 + 21kw;í k2

2

D(Awà eí ) + y > ey > 0;w;í

min

s. t.(QP)

,

, denoteswhere D ii = æ1 A+ Aàor membership.

Page 8: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

SVM as an Unconstrained Minimization Problem

At the solution of (QP) : where (á)+ = maxf á;0g

y = (eà D(Awà eí ))+ ,

Hence (QP) is equivalent to the nonsmooth SVM:

minw;í 2

÷k(eà D(Awà eí ))+k22 + 2

1kw; í k22

2÷kyk2

2 + 21kw;í k2

2

D(Awà eí ) + y > ey > 0;w;í

min

s. t.(QP)

Page 9: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

SSVM: The Smooth Support Vector Machine

Replacing the plus function (á)+ in the nonsmooth

SVM by the smooth p(á;ë) , gives our SSVM:

ënonsmooth SVM as goes to infinity. The solution of SSVM converges to the solution of

ë = 5(Typically, )

min(w; í ) 2 Rn+12

÷kp((eà D(Awà eí ));ë)k22 + 2

1kw;í k22

, obtained by integrating the sigmoid function (á)+ofHere, p(á;ë) is an accurate smooth approximation

of neural networks. (sigmoid = smoothed step)

Page 10: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Nonlinear Smooth Support Vector Machine Nonlinear Separating Surface: K (x0;A0)Du = í

K (A;A0) Use a nonlinear kernel in SSVM:

2÷kp(eà D(K (A;A0)Du à eí )k2

2+ 21ku; í k2

2u; ímin

The kernel matrix K (A;A0) 2 Rmâ m is fully dense

Use Newton algorithm to solve the problem Each iteration solves m+1 linear equations in m+1

variables Nonlinear separating surface depends on entire dataset :

K (x0;A0)Du = í

Page 11: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Examples of Kernels

K (A;B) : Rmâ n â Rnâ l 7à! Rmâ l

A 2 Rmâ n;a 2 Rm;ö 2 R; d is an integer:

Polynomial Kernel : (AA0+ öaa0)dï

)(Linear KernelAA0: ö = 0;d = 1

Gaussian (Radial Basis) Kernel :

"à ökA ià A jk22; i; j = 1;. . .;mK (A;A0)ij =

Page 12: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Difficulties with Nonlinear SVM for Large Problems

The nonlinear kernel K (A;A0) 2 Rmâ m is fully dense Long CPU time to compute

m2

numbers

Computational complexity depends on m

Separating surface depends on almost entire dataset Need to store the entire dataset after solving the problem

Complexity of nonlinear SSVM ø O((m+ 1)3)

Runs out of memory while storing kernel matrixmâ m

Page 13: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Overcoming Computational & Storage DifficultiesUse a Rectangular Kernel

Choose a small random sample A 2 Rmâ n of A The small random sample A is a representative sample

of the entire dataset

A Typically is 1% to 10% of the rows of A Replace K (A;A0) 2 Rmâ mK (A;A0) by with

D ú Dcorresponding in nonlinear SSVM

the rectangular kernel Only need to compute and storemâ m numbers for

Computational complexity reduces to O((m+ 1)3)

A The nonlinear separator only depends on

Using K (A;A0) gives lousy results!

Page 14: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Reduced Support Vector Machine AlgorithmNonlinear Separating Surface: K (x0;Aö0)Döuö = í

(i) Choose a random subset matrix ofA 2 Rmâ n

entire data matrix A 2 Rmâ n

(ii) Solve the following problem by the Newtonmethod with corresponding D ú D :

2÷kp(eà D(K (A;A0)Döuö à eí );ë)k2

2 + 21kuö; í k2

2min(u; í ) 2 Rm+1

K (x0;Aö0)Döuö = í

(iii) The separating surface is defined by the optimal(u;í )solution in step (ii):

Page 15: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

How to Choose in RSVM?A

A is a representative sample of the entire dataset Need not be a subset of A

A good selection of A may generate a classifier usingvery small m

Possible ways to chooseA :

Choose random rows from the entire datasetm A Choose such that the distance between its rows A

exceeds a certain tolerance Use k cluster centers of Aas AàA+ and

Page 16: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points in

Separate 486 Asterisks from 514 DotsR2

Page 17: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

K (A;A0) 2 R50â 50

Page 18: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

K (A;A0) 2 R1000â 50

Page 19: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

RSVM on Moderate Sized Problems(Best Test Set Correctness %, CPU seconds)

Cleveland Heart297 x 13, 30

86.473.04

85.9232.42

76.881.58

BUPA Liver345 x 6 , 35

74.862.68

73.6232.61

68.952.04

Ionosphere 351 x 34, 35

95.195.02

94.3559.88

88.702.13

Pima Indians768 x 8, 50

78.645.72

76.59328.3

57.324.64

Tic-Tac-Toe958 x 9, 96

98.7514.56

98.431033.5

88.248.87

Mushroom8124 x 22, 215

89.04466.20

N/A

N/A

83.90221.50

K (A;A0)mâ m K (A;A0)mâ m K (A;A0)mâ mmâ n; mDataset Size

Page 20: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

RSVM on Large UCI Adult DatasetStandard Deviation over 50 Runs = 0.001

Average Correctness % & Standard Deviation, 50 Runs

(6414, 26148) 84.47 0.001 77.03 0.014 210 3.2%(11221, 21341) 84.71 0.001 75.96 0.016 225 2.0%(16101, 16461) 84.90 0.001 75.45 0.017 242 1.5%(22697, 9865) 85.31 0.001 76.73 0.018 284 1.2%(32562, 16282) 85.07 0.001 76.95 0.013 326 1.0%

Dataset Size( Train ; Test)

UCI AdultK (A;A0)mâ m

Testing%Std.Dev.

Amâ 123

m m=mK (A;A0)mâ m

%Testing Std.Dev.

Page 21: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

CPU Times on UCI Adult DatasetRSVM, SMO and PCGC with a Gaussian Kernel

Adult Dataset : Training Set Size vs. CPU Time in Seconds

Size 3185 4781 6414 11221 16101 22697

32562

RSVM 44.2 83.6 123.4 227.8 342.5 587.4 980.2

SMO 66.2 146.6 258.8 781.4 1784.4

4126.4

7749.6

PCGC 380.5 1137.2

2530.6

11910.6

Ran out of memory

Page 22: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Tim

e( C

PU

sec

. )

Training Set Size

CPU Time Comparison on UCI DatasetRSVM, SMO and PCGC with a Gaussian Kernel

Page 23: RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Conclusion

RSVM : An effective classifier for large datasets Classifier uses 10% or less of dataset Can handle massive datasets Much faster than other algorithms Test set correctness:

Applicable to all nonlinear kernel problems

Same or better than full dataset K (A;A0) Much better than randomly chosen subset K (A;A0)

Rectangular kernel K (A;A0) :Novel practical idea