Post on 20-Jan-2018
description
Incremental Reduced Support Vector Machines
Yuh-Jye Lee, Hung-Yi Lo and Su-Yun HuangNational Taiwan University of Science and Technology and Institute of Statistical Science Academia Sinica
2003 International Conference on Informatics, Cybernetics, and SystemsISU, Kaohsiung, Dec. 14 2003
Outline
Difficulties with nonlinear SVMs for large problems Storage and computational complexity
Reduced Support Vector Machines
Support Vector Machines for classification problems Linear and nonlinear SVMs
Incremental Reduced Support Vector Machines
Numerical Results Conclusions
Support Vector Machines (SVMs)
Powerful tools for Data Mining
SVMs have a sound theoretical foundation Base on statistical learning theory
SVMs can be generated very efficiently and havehigh accuracy
SVMs have an optimal defined separating surface
algorithm for classification and regression SVMs become the most promising learning
SVMs can be extend from linear to nonlinear case By using kernel functions
Support Vector Machines for ClassificationMaximizing the Margin between Bounding Planes
A+
A-
Support Vector Machine Formulation
Solve the quadratic program for some :min
s. t.(QP),
, denoteswhere or membership.
SSVM: Smooth Support Vector Machine is anefficient SVM algorithm proposed by Yuh-Jye Lee
Nonlinear Support Vector Machine
Extend to nonlinear cases by using kernel functions
min
s. t.
Nonlinear Support Vector Machine formulation:
The value of kernel function represents the innerproduct in the feature space
Map data from input space to a higher dimensionalfeature space where the data can be separated linearly
Difficulties with Nonlinear SVM for Large Problems
Separating surface depends on almost entire dataset Need to store the entire dataset after solving the problem
The nonlinear kernel is fully dense Long CPU time to compute numbers Runs out of memory while storing kernel matrix
Computational complexity depends on Complexity of nonlinear SSVM ø
Reduced Support Vector MachinesOvercoming Computational & Storage Difficulties
by Using a Rectangular Kernel
Choose a small random sample of The small random sample is a representative sample
of the entire dataset Typically is 1% to 10% of the rows of
Replace by with corresponding in nonlinear
SSVMthe rectangular kernel
Only need to compute and store numbers for
Computational complexity reduces to The nonlinear separator only depends on
Reduced Setplays the most important role in RSVM
It is natural to raise two questions: Is there a way to choose the reduced set other than
random selection so that RSVM will have a better performance?
Is there a mechanism that determines the size of reduced set automatically or dynamically?
Incremental reduced support vector machine isproposed to answer these questions
Our Observations (Ⅰ)
is a linear combination of a set of kernel functions
If the kernel functions are very similar, thehypothesis space spanned by this kernel functions will be very limited.
The nonlinear separating surface
Our Observations (Ⅱ)
Start with a very small reduced set , then add new data point only when the kernel function is dissimilar to the current function set
These points contribute the most extra information
The distance from the kernel vector to the column space of is greater than a threshold
The information criterion is
This distance can be determined by solving aleast squares problem
How to measure the dissimilar? solving least squares
problems
Dissimilar Measurementsolving least squares problems
It has a unique solution , and the distance is
í
IRSVM Algorithm pseudo-code
(sequential version)1 Randomly choose two data from the training data as the initial reduced set2 Compute the reduced kernel matrix3 For each data point not in the reduced set4 Computes its kernel vector5 Computes the distance from the kernel vector 6 to the column space of the current reduced kernel matrix7 If its distance exceed a certain threshold8 Add this point into the reduced set and form the new reduced kernal matrix9 Until several successive failures happened in line 710 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel11 A new data point is classified by the separating surface
Speed up IRSVM
Note we have to solve the least squares problemmany times whose time complixity is
The main cost depends on but not on
Take advantage of this fact, we proposed a batch version of IRSVM that examines a batch points once
IRSVM Algorithm pseudo-code
(Batch version)1 Randomly choose two data from the training data as the initial reduced set2 Compute the reduced kernel matrix3 For a batch data point not in the reduced set4 Computes their kernel vectors5 Computes the corresponding distances from these kernel vector 6 to the column space of the current reduced kernel matrix7 For those points’ distance exceed a certain threshold8 Add those point into the reduced set and form the new reduced kernal matrix9 Until no data points in a batch were added in line 7,810 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel11 A new data point is classified by the separating surface
IRSVM on four public data sets
Conclusions IRSVM — an advanced algorithm of RSVM
Start with extremely small reduced set and sequentially expandsto include informative data points into the reduced set
Determine the size of the reduced set automatically anddynamically but no pre-specified
The reduced set generated by IRSVM will be more representative
All advantages of RSVM for dealing with large scalenonlinear classification problem are retained
Experimental tests show that IRSVM used a smaller reduced set without scarifying classification accuracy