Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

41
Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay

Transcript of Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Page 1: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Model Selection for Support Vector Machines

Shibdas Bandyopadhyay

Page 2: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

2

Outline

• Brief Introduction to SVM• Cross-Validation• Methods for Parameter tuning• Grid Search• Genetic Algorithm• Auto-tuning for Classification• Results• Conclusion• Pattern Search for Regression

Page 3: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

3

Support Vector Machines• Classification

- Given a set (x1, y1), (x2, y2),…, (xm, ym) (X, Y) where X = set of input vectors and Y = set of classes, we are to predict the class y for an unseen x X

A+A-

w X,yx ii li ,...1

*

-Tube

0

• Regression- Given a set (x1, y1), (x2, y2),…, (xm, ym) (X, Y) where X = set of input vectors and Y = set of values, we are to predict the value y for an unseen x X

Page 4: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

4

Support Vector Machines

• Kernels

- kernel maps the linearly non-separable data in to higher dimensional feature space where it may become linearly separable

Page 5: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

5

Support Vector Classification

• Soft Margin Classifier

Optimization Problem

minimize

Subject to

where C(>0) is the trade-off between margin maximization and training error minimization

m

ii

Cww

1

22

2||||

2

1),(

0,1),( iiii bxwy

Page 6: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

6

10 fold Cross Validation

Training

Testing

Testing

Page 7: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

7

Cross-Validation

• Widely regarded as the best method to measure the generalization error (Test error)

• Training set is divided into p folds

• Training runs are done using all possible combinations of (p – 1) training folds

• Testing is done on the remaining fold for each run

• We are to find the parameter values for which average cross-validation error is minimum

Page 8: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

8

Model Parameter Selection

• Consider RBF kernel and SVM Classification (Soft Margin Case)

• RBF Kernel is given by

• Two Parameters C (trade-off of soft margin case) and of Kernel.

• Benchmark Dataset – Breast Cancer (100 realizations)

• Change of parameters changes the test Error

• Parameters should be chosen such that test error is minimal

0,),(2

21 ||||21 xxexxK

9616.4,6364.33,1,10

3150.4,53.28,10,1

stderrormeanCfor

stderrormeanCfor

Page 9: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

9

Range input

Parameter selection

Raw data

SVM classifier

optimal C and gamma

final results

C, gamma

misclass. error

Approach

SVM classifier

Page 10: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

10

Methods for parameter tuning

• Grid Search

• Genetic Algorithm

• Auto-tuning for Classification

Page 11: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

11

Grid Search

Two Dimensional Parameter Space

1 C 5000

1000

1

2857, 571.5

2142.8 C 3571.4

714.3

428.5

3571.42142.8

714.3

428.5

Page 12: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

12

Grid Search• Simple technique resembling exhaustive search

• Take exponentially increasing values in a particular range

• Find the set with minimum Cross-validation error

• Adjust the new range in the neighborhood of that chosen set

• Repeat the process until a satisfactory value for cross-validation error is obtained

Page 13: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

13

Genetic Algorithm

• Genetic Algorithm is a subclass of “Evolutionary Computing”

• It is based on Darwin’s theory of evolution

• Widely accepted for parameter search and optimization

• Has a high probability of finding the global optimum

Page 14: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

14

Genetic Algorithm - Steps

• Selection - “Survival of the fittest”. Choose the set of parameter

values for which the objective function is optimal

• Cross-Over - Combine the chosen values

• Mutation - Modify the combined values to produce the next

generation

Page 15: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

15

Genetic Algorithm –Selection

• Set a criterion for choosing parents which will cross-over

• For example, Two individual or binary strings are selected with 1’s preferred over 0’s .

1

1

1

1

1

1

1

11

1

1 11

0

00

0

0

0

0

0

0

0

0

0

0

0

110

10 0 11

11 110

10 0 11

Page 16: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

16

Genetic Algorithm – Cross - Over

• Combine the chosen parents to produce the offspring• For example, two parents represented as binary strings

performing cross–over

1

0

0

1

1

0

1

1

1

1X

1

1

1

1

1

0

1

1

0

0

Page 17: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

17

Genetic Algorithm - Mutation

• Structure of the produced offspring is changed• Prevents the algorithm from being trapped in a local minima• For example, the produced is mutated ( one bit position is

flipped)

0

1

1

0

0

0

0

1

0

0

Page 18: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

18

Genetic Algorithm - Coding

• Parameters are to be coded into strings before applying GA

• Real – Coded GA operates on real numbers

• Simulates the cross-over and mutation through various operators

• Simulated Binary Cross-over and polynomial mutation operators are used

Page 19: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

19

Auto-tuning

• Consider a bound for the expected generalization error

• Try to minimize it by varying the parameters

• Apply well known minimization procedures to make this “automatic”

Page 20: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

20

Generalization Error Estimates

• Validation Error - Keep a part of the training data for validation - Find the error while performing tests on validation set - Try to minimize the error on that set

• Leave One-out Error- Keep one element of training data set for testing

- Do training on the remaining elements - Test the element which was previously removed - Do this for all training data elements - Provides an unbiased estimate of the expected generalization error

Page 21: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

21

Leave-One-Out Bounds

• Span Bound

where Sp is the distance between the point and where

• Radius-margin Bound

where R is the radius of the smallest sphere enclosing all data points and M is the margin obtained from SVM optimization solution

px

l

pppSl

T1

20 )1(1

0, 0

1),(ipi pi

iiip x

2

21

M

R

lT

p

Page 22: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

22

Why Radius-margin Bound?

• It can be thought of an upper bound of the span-bound,which is an accurate estimate of the test error

• Minimization of the span-bound is more difficult to implement and to control(more local minima)

• Margin can be obtained from the solution of SVM optimization problem

• Radius can be calculated by solving a Quadratic optimization problem

• Soft-margin SVM can be easily incorporated by modifying the kernel of the hard margin version so that C will be considered just as another parameter of the kernel function

Page 23: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

23

Auto-tuning - Steps

• M = 1 / ||w||, where ||w|| can be obtained by solving the problem: maximize

subject to

• R is obtained by solving the Quadratic Optimization Problem

subject to

l

ji

jiji

l

i

iii xxKxxKR1,1

2 ),(),(max

l

iii

1

0,1

m

i

iji

m

ji

jii xxkyyW1 ,

),(2/1

0,01

i

m

i

iiy

Page 24: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

24

Auto-tuning - Steps Let θ = set of parameters. Steps are as follows:-

• Initialize θ to some value

• Using SVM find the maximum of W

• Update θ by a minimization method such that T is minimized

• Go to step 2 or stop when minimum of T is achieved

),(maxarg)(0

W

Page 25: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

25

Results

• Methods are tested on five benchmark datasets• Mean Error, Minimum error among 100 realizations, Maximum error

among 100 realizations and std. deviation is reported

• Breast-Cancer Dataset

• Thyroid Dataset

• Titanic Dataset

• Heart Dataset

• Diabetics Dataset

Page 26: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

26

Classification Results – Breast Cancer Dataset

• Number of train patterns : 200

• Number of test patterns : 77

• Input dimension : 9

• Output dimension : 1

MethodsMean Error

Min. Error Max. ErrorStandard

Deviation

Benchmark 26.04 4.74

Grid Search 27.22 14.58 36.36 4.75

Auto-tuning 27.47 16.88 36.36 3.97

Genetic Algorithm

25.40 15.58 33.77 4.39

Page 27: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

27

Classification Results – Thyroid Dataset• Number of train patterns : 140

• Number of test patterns : 75

• Input dimension : 5

• Output dimension : 1

Methods Mean ErrorMin. Error

Max. Error

Standard

Deviation

Benchmark 4.80 2.19

Grid Search 4.32 0 8.00 1.74

Auto-tuning 4.56 0 9.333 2.02

Genetic Algorithm 4.44 0 10.667 2.43

Page 28: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

28

Classification Results – Titanic Dataset• Number of train patterns : 150

• Number of test patterns : 2051

• Input dimension : 3

• Output dimension : 1

Methods Mean ErrorMin. Error

Max. Error

Standard

Deviation

Benchmark 22.42 1.02

Grid Search 23.08 21.55 33.21 1.18

Auto-tuning 23.01 20.87 33.21 1.33

Genetic Algorithm 22.66 21.69 33.21 1.11

Page 29: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

29

Classification Results – Heart Dataset• Number of train patterns : 170

• Number of test patterns : 100

• Input dimension : 13

• Output dimension : 1

Methods Mean ErrorMin. Error

Max. Error

Standard

Deviation

Benchmark 15.95 3.26

Grid Search 15.49 8.00 23.00 3.29

Auto-tuning 15.65 8.00 23.00 3.21

Genetic Algorithm 15.87 10.00 25.00 3.27

Page 30: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

30

Classification Results – Diabetis Dataset• Number of train patterns : 468

• Number of test patterns : 300

• Input dimension : 8

• Output dimension : 1

Methods Mean ErrorMin. Error

Max. Error

Standard

Deviation

Benchmark 23.53 1.73

Grid Search 23.14 19.33 26.67 1.17

Auto-tuning 23.68 19.33 27.33 1.68

Genetic Algorithm 23.69 19.00 28.33 1.71

Page 31: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

31

Conclusion

• Grid Search is the best technique if the number of parameters is low as it does an exhaustive search on the parameter space

• Auto-tuning performs much less number of training runs in all cases

• Genetic Algorithm is quite steady and gives near-optimal solutions

• Future work would be to test these techniques for regression

• Analysis of pattern search method for regression

Page 32: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

32

Support Vector Regression

• Regression Estimate

• Optimization Problem

maximize

subject to

bxxKxf i

l

iii

,1

*

),())((2

1

)()(),(

*

1,

*

1 1

** *

jijj

m

jiii

i

m

i

m

iii

xxk

Wi

m

iiiC

1

** 0)(,,0

Page 33: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

33

Pattern Search

• Simple and efficient optimization technique

• No derivatives, only direct function evaluations are needed

• It gets rarely trapped in a bad local minima

• Converges rapidly to an optimum

Page 34: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

34

Pattern Search

Page 35: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

35

Pattern Search

• Patterns determine which points on the parameter space are searched

• Pattern is usually specified by a matrix. We have considered the matrix

which corresponds to the pattern obtained from

(x,y) (x+d,y)d

d

d

(x-d,y)

(x,y+d)

(x,y-d)

d

kknk dcxx

Page 36: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

36

Pattern Search - Algorithm

Cross-validation error is the function to be minimized

1. Fix a pattern matrix Pk, set sk = 0

2. Given and , randomly pick an initial center of pattern qk

3. Compute the function value f(qk) and set min f(qk)

4. If < then stop

For i =1 … (p -1) where p is the number of columns in Pk

compute if < min then

5. go to step 2

1),(min,, 11 kkqfqq ikkkik

k

k ikk

ik sqq

)( ikqf

)( ikqf

1,21

kkkk

Page 37: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

37

Thank You

Mail your Questions/ Suggestions at: [email protected]

Page 38: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

38

Genetic Algorithm - Implementation

• Simulated Binary Cross - over

- ui is chosen randomly between 0 and 1.

- βi follows the distribution

- find out such that cumulative probability density is ui

1,)1(5.0

,1)1(5.0)(

2

i

ni

i

ifcn

icn

otherwisecn

p

c

qi

dxxpui

)(

iq

Page 39: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

39

Genetic Algorithm – Implementation (Cont…)

- Generate the offspring xi(1,t+1) and xi(2,t+1) from parents xi(1,t) and xi(2,t).

• Polynomial Mutation

- A random number ri is selected between 0 and 1. - is found out such that cumulative probability of polynomial

distribution up to is ri. The polynomial distribution can be written as:

- Mutated offspring are obtained using the following rule:

where and are respectively the upper and lower bound on x i.

])1()1[(5.0

])1()1[(5.0),2(),1()1,2(

),2(),1()1,1(

ti

qi

ti

qi

ti

ti

qi

ti

qi

ti

xxx

xxx

i

Li

Ui

ti

ti xxxy )( )()()1,1()1,1(

)||1)(1(5.0)( mmP

i

i

)(Uix

)(Lix

Page 40: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

40

LOO Bounds• Jaakola-Haussler Bound

where is the α’s obtained from the solution of SVM optimization problem in case of testing with ‘p’th training example and where is the step function when x > 0 and otherwise. is the number of elements in the training set.

• Opper-Winther Bound

where KSV is the matrix of dot product of support vectors.

)1),((1

1

0

pp

l

pp xxK

lT

0p

)1)(

(1

11

0

l

p ppSV

p

KlT

)(x1)( x 0)( x l

Page 41: Efficient Model Selection for Support Vector Machines Shibdas Bandyopadhyay.

Efficient Predictions Using SVM

41

Support Vector Classification

• Finds the optimal hyper-plane which separates the two classes in feature space

• Decision Function

• Quadratic Optimization Problem

minimize

subject to

for all i = 1…m

m

i

iii bxxkyxf1

)),(sgn(

RbHwww ,||||2

1)( 2

1),( bxwy ii