Towards Scalable Support Vector Machines Using Squashing

26
Towards Scalable Support Vector Machines Using Squashing • Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Scienc e University of Califor nia • Advisor:Dr. Hsu. • Reporter:Hung Ching-Wen

description

Towards Scalable Support Vector Machines Using Squashing. Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California Advisor:Dr. Hsu. Reporter:Hung Ching-Wen. Outline. 1. Motivation 2. Objective - PowerPoint PPT Presentation

Transcript of Towards Scalable Support Vector Machines Using Squashing

Page 1: Towards Scalable Support Vector Machines Using Squashing

Towards Scalable Support Vector Machines Using Squashing

• Author:Dmitry Pavlov, Darya Chudova,

• Padhraic Smyth

• Info. And Comp. Science

• University of California

• Advisor:Dr. Hsu.

• Reporter:Hung Ching-Wen

Page 2: Towards Scalable Support Vector Machines Using Squashing

Outline

• 1. Motivation

• 2. Objective

• 3. Introduction

• 4. SVM

• 5. Squashing for SVM

• 6.EXPERIMENTS

• 7. conclusion

Page 3: Towards Scalable Support Vector Machines Using Squashing

Motivation

• SVM provide classification model with strong theoretical foundation and excellent empirical performance.

• But the major drawback of SVM is the necessity to solve a large-scale quadratic programming problem.

Page 4: Towards Scalable Support Vector Machines Using Squashing

Objective

• This paper combines likelihooh-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets.

Page 5: Towards Scalable Support Vector Machines Using Squashing

Introduction

• The applicability of SVMs to large datasets is limited ,because the high computational cost.

• Speed-up training algorithms:• Chunking,Osuna’s decomposition method S

MO• They can accelerate the training, but cannot

scale well with the size of the training data.

Page 6: Towards Scalable Support Vector Machines Using Squashing

Introduction

• Reducing the computational cost :

• Sampling

• Boosting

• Squashing(DuMouchel et. al.,Madigan et. al.)

• 本文作者提出 Squashing-SMO,以解決 SVM的高計算成本問題

Page 7: Towards Scalable Support Vector Machines Using Squashing

SVM

• Training data:D={ (xi,yi):i=1,…,N}• xi is a vector, yi=+1,-1

• In linear SVM :The linear separating classify y=<w,x>+b

• w is the normal vector

• b is the intercept of the hyperplane

Page 8: Towards Scalable Support Vector Machines Using Squashing

SVM(non-separable)

Page 9: Towards Scalable Support Vector Machines Using Squashing

SVM(a prior on w)

Page 10: Towards Scalable Support Vector Machines Using Squashing

Squashing for SVM

• (1).Select a probabilistic model• P((X,Y) θ)∣• (2).Our objective is to find mle θML

Page 11: Towards Scalable Support Vector Machines Using Squashing

Squashing for SVM

• (3). Training data:D={ (xi,yi):i=1,…,N} can be grouped into Nc groups

• (Xc,Yc)sq:The squashed data point placed at the cluster C

• βc :the wieght

Page 12: Towards Scalable Support Vector Machines Using Squashing

Squashing for SVM

• If take the prior of w is • P(w) ~ exp(- w∥ ∥2)

Page 13: Towards Scalable Support Vector Machines Using Squashing

Squashing for SVM

• (4).The optimization model for the squashed data:

Page 14: Towards Scalable Support Vector Machines Using Squashing

Squashing for SVM

• Important design issues for the squashing algorithm:

• (1).the choice of the number and location of the squashing points

• (2).to sample the values of w from the prior p(w)• (3).b can be made from the optimization model • (4).fixed w,b ,we evaluate the likelihood of trainin

g point, and repeat the selection procedure L times(L is length)

Page 15: Towards Scalable Support Vector Machines Using Squashing

EXPERIMENTS

• experiment datasets:

• Synthetic data

• UCI machine learning

• UCI KKD repositories

Page 16: Towards Scalable Support Vector Machines Using Squashing

EXPERIMENTS

• Evalute:

• Full-SMO,Srs-SMO(simple random simple),squash-SMO,boost-SMO

• Run:over 100 runs

• Performance:

• Misclassification rate ,learning time ,the memory

Page 17: Towards Scalable Support Vector Machines Using Squashing

EXPERIMENTS(Results on Synthetic data)

• (Wf,bf):estimated by full-SMO• (Ws,bs): :estimated by squashed or sampled data

Page 18: Towards Scalable Support Vector Machines Using Squashing

EXPERIMENTS(Results on Synthetic data)

Page 19: Towards Scalable Support Vector Machines Using Squashing

EXPERIMENTS(Results on Synthetic data)

Page 20: Towards Scalable Support Vector Machines Using Squashing

EXPERIMENTS(Results on Benchmark data)

Page 21: Towards Scalable Support Vector Machines Using Squashing

EXPERIMENTS(Results on Benchmark data)

Page 22: Towards Scalable Support Vector Machines Using Squashing

EXPERIMENTS(Results on Benchmark data

Page 23: Towards Scalable Support Vector Machines Using Squashing

EXPERIMENTS(Results on Benchmark data)

Page 24: Towards Scalable Support Vector Machines Using Squashing

conclusion

• 1.we describe how the use of squashing make the training of SVM applicable to large datasets.

• 2.comparison with full-SMO show squash-SMO and boost-SMO are near-optimal performance with much lower time and memory.

• 3.srs-SMO has a higher misclassification rate.• 4.squash-SMO and boost-SMO can tune paramete

r in cross-validation ,it is impossible to full-SMO

Page 25: Towards Scalable Support Vector Machines Using Squashing

conclusion

• 5.although the performance of squash-SMO and boost-SMO is similar on the benchmark problems.

• 6. squash-SMO can offer a better interpretability of model and can be expected to run faster than SMO that do not reside in the memory.

Page 26: Towards Scalable Support Vector Machines Using Squashing

opinion

• It is a good ideal that the author describe how the use of squashing make the training of SVM applicable to large datasets.

• 我們可以根據資料性質來改變 w的 prior distribution, 例如指數分配 ,Log-normal,或用無母數方法去做