SVM classifierscvml.ajou.ac.kr/wiki/images/1/1e/Ch3_1_SVM-1.pdfย ยท 2017. 1. 12.ย ยท Linear...
Transcript of SVM classifierscvml.ajou.ac.kr/wiki/images/1/1e/Ch3_1_SVM-1.pdfย ยท 2017. 1. 12.ย ยท Linear...
-
SVM classifiers
-
Binary classification
Given training data (๐ฅ๐ , ๐ฆ๐) for ๐ = 1 . . . N, with ๐ฅ๐ โ R๐ and ๐ฆ๐ โ {-1,1},
learn a classifier ๐(๐ฅ) such that
๐ ๐ฅ๐ = โฅ 0, ๐ฆ๐= +1< 0, ๐ฆ๐= โ1
i.e. ๐ฆ๐๐ ๐ฅ๐ > 0 for a correct classification.
-
Linear separability
Linearly
separable
not
Linearly
separable
-
Linear classifiers
A linear classifier has the form
๐ ๐ฅ = ๐ค๐๐ฅ + b
โข In 2D the discriminant is a line
โข ๐ค is the normal to the line, and b the bias
โข ๐ค is known as the weight vector
-
Linear classifiers
A linear classifier has the form
๐ ๐ฅ = ๐ค๐๐ฅ + b
โข In 3D the discriminant is a plane, and in nD it is a hyperplain
For a K-NN classifier it was necessary to โcarryโ the training data
For a linear classifier, the training data is used to learn ๐ค and then discarded
Only ๐ค is needed for classifying new data
-
Reminder: The Perceptron Classifier
Given linearly separable data ๐ฅ๐ labelled into two categories ๐ฆ๐ = {-1,1}, find a weight vector ๐ค such that the discriminant function
๐ ๐ฅ๐ = ๐ค๐๐ฅ๐+ b
Separates the categories for ๐ = 1, โฆ ,N
โข How can we find this separating hyperplane?
The Perceptron Algorithm
Write classifier as ๐ ๐ฅ๐ = ๐ค๐ ๐ฅ๐ + ฯ0 = ๐ค
๐๐ฅ๐where ๐ค = ( ๐ค, ฯ0), ๐ฅ๐ = ( ๐ฅ๐,1)
โข Initialize ๐ค = 0
โข Cycle though the data points {๐ฅ๐, ๐ฆ๐}
โข If ๐ฅ๐ is misclassified then ๐ค โ ๐ค + ฮฑsign(๐ ๐ฅ๐ )๐ฅ๐โข Until all the data is correctly classified
-
For example in 2D
โข Initialize ๐ค = 0
โข Cycle though the data points {๐ฅ๐, ๐ฆ๐}
โข If ๐ฅ๐ is misclassified then ๐ค โ ๐ค + ฮฑsign(๐ ๐ฅ๐ )๐ฅ๐
โข Until all the data is correctly classified
-
โข If the data is linearly separable, then the algorithm will converge
โข Convergence can be slow โฆ
โข Separating line close to training data
โข We would prefer a larger margin for generalization
-
โข Maximum margin solution: most stable under perturbations of the inputs
-
Support Vector Machine
-
SVM โ sketch derivation
โข Since ๐ค๐๐ฅ+ b = 0 and c(๐ค๐๐ฅ+ b) = 0 define the same plane, we have the freedom to choose the normalization of ๐ค
โข Choose normalization such that ๐ค๐๐ฅ++ b = +1 and ๐ค๐๐ฅ-+ b = -1 for the
positive and negative support vectors respectively
โข Then the margin is given by
๐ค
๐คโ (๐ฅ+ โ ๐ฅโ) =
๐ค๐(๐ฅ+โ๐ฅโ)
๐ค=
2
๐ค
-
Support Vector MachineLinearly separable data
-
SVM โ Optimization
โข Learning the SVM can be formulated as an optimization:
m๐๐ฅ๐ค
2
๐คsubject to ๐ค๐๐ฅ๐ + ๐
โฅ 1 ๐๐ ๐ฆ๐ = +1โค โ1 ๐๐ ๐ฆ๐ = โ1
for ๐ = 1 . . . N
โข Or equivalently
m๐๐๐ค
๐ค 2 subject to ๐ฆ๐(๐ค๐๐ฅ๐ + ๐) โฅ 1 for ๐ = 1 . . . N
โข This is a quadratic optimization problem subject to linear constraints and
there is a unique minimum
-
Linear separability again: What is the best w?
โข The points can be linearly separated but there is a very narrow margin
In general there is a trade off between the margin and the number of
Mistakes on the training data
โข But possibly the large margin solution is better, even though one constraint is violated
-
Introduce โslackโ variables
-
โSoftโ margin solution
The optimization problem becomes
min๐คโ๐ ๐,๐
๐โ๐ +
๐ค 2 +๐
๐
๐
๐
subject to
๐ฆ๐(๐ค๐๐ฅ๐ + ๐) โฅ 1- ๐๐ for ๐ = 1 . . . N
โข Every constraint can be satisfied if ๐๐ is sufficiently large
โข ๐ถ is regularization parameter:
- small ๐ถ allows constraints to be easily ignored โ large margin
- large ๐ถ makes constraints hard to ignored โ narrow margin
- ๐ถ = โ enforces all constraints: hard margin
โข This is still a quadratic optimization problem and there is a unique minimum.
Note, there is only one parameter, ๐ถ.
-
โข Data is linearly separable
โข But only with a narrow margin
-
๐ถ = โ : hard margin
-
C = 10 soft margin
-
Application: Pedestrian detection in
Computer Vision
โข Objective: detect (localize) standing humans in an image
(c.f. face detection with a sliding window classifier)โข
โข reduces object detection to binary classification
โข does an image window contain a person or not?
-
Detection problem (binary) classification
problem
-
Each window is separately classified
-
Training data
โข 64x128 images of humans cropped from a varied set of personal photos
โข Positive data โ 1239 positive window examples (reflections->2478)
โข Negative data โ 1218 person-free training photos (12180 patches)
-
Training
โข A preliminary detector
โข Trained with (2478) vs (12180) samples
โข Retraining
โข With augmented data set
โข initial 12180 + hard examples
โข Hard examples
โข 1218 negative training photos are searched exhaustively for false positive
-
Feature: histogram of oriented gradients
(HOG)
-
Averaged examples
-
Algorithm
โข Training(Learning)
โข Represent each example window by a HOG feature vector
โข Train a SVM classifier
โข Testing(Detection)
โข Sliding window classifier
๐ ๐ฅ = ๐ค๐๐ฅ + b
x๐โR๐, ๐ค๐๐กโ ๐ = 1024
-
Learned model
๐ ๐ฅ = ๐ค๐๐ฅ + b