Linear and Kernel Classi cation: When to Use Which? · 2020. 10. 11. · VI MultiLinear SVM using...

Appendix of “Linear and Kernel Classification: When to Use Which?”

Hsin-Yuan Huang∗ Chih-Jen Lin†

In the following content, we assume that the lossfunction ξ(w;x, y) can be represented as a function ofwTx when y is fixed. That is,

ξ(w;x, y) = ξ̄(wTx; y).

Note that the three loss functions in Section 2 satisfythe above property.

I Parameter r Is not Needed for Degree-2Expansions

From the definition of φ, we have

φγ,r(x) = γφ1, rγ(x).

By Theorem 4.1, the optimal solution for (C, γ, r) and(γ2C, 1, r/γ) are w/γ and w, respectively. Thereforethe two decision functions are the same:

γ)Tφγ,r(x) = wTφ1, rγ

Thus, if C is chosen properly, γ is not needed.

II Proof of Theorem 3.1

We prove the result by contradiction. Assume thereexists w′ such that

l∑i=1

ξ(w′; x̄i, yi) <

l∑i=1

ξ(D−1w∗; x̄i, yi).

Then from (3.8) we have

l∑i=1

ξ̄((Dw′)Txi; yi) <

l∑i=1

ξ̄((DD−1w∗)Txi; yi).

l∑i=1

ξ(Dw′;xi, yi) <

l∑i=1

ξ(w∗;xi, yi),

which contradicts the assumption that w∗ is an optimalsolution for (3.9). Thus D−1w∗ is an optimal solutionfor (3.10).

∗Department of Computer Science, National Taiwan Univer-

sity. momohuang@gmail.com†Department of Computer Science, National Taiwan Univer-

sity. cjlin@csie.ntu.edu.tw

III Proof of Theorem 4.1

It can be clearly seen that the following two optimiza-tion problems are equivalent

(III.1) minw

2wTw + C

l∑i=1

ξ(w; yi,xi)

(III.2) minw

∆)T (

∆) +

l∑i=1

∆; yi,∆xi).

Problem (III.2) can be written as

(III.3) minw̄

2w̄T w̄ +

l∑i=1

ξ(w̄; yi,∆xi),

which trains ∆xi,∀i under the regularization parameterC/∆2. Therefore, if w is optimal for (III.1), then w/∆is optimal for (III.3).

IV Details of Experimental Settings

All experiments run on a 4GHz Intel Core i7 (I7-4790K) with 16G RAM. We transform some problemsfrom multi-class to binary. For news20, we considerclasses 2, . . . ,7 and 12, . . . ,15 as positive, while others asnegative. For the digit-recognition problem mnistOvE,we consider odd numbers as positive and even numbersas negative. For poker, class 0 forms the positive class,and the rest forms the negative class.

For the reference linear classifier and all linearclassifiers built within our kernel-check methods, weconsider the primal-based Newton method for L2-lossSVM in LIBLINEAR. For the reference Gaussian kernelmodel, we use LIBSVM, which implements L1-lossSVM. We do not consider L1-loss SVM in LIBLINEARbecause the automatic parameter-selection procedureis available only for logistic and L2 hinge losses. Wetake this chance to see if our kernel-check methods aresensitive to the choices of loss functions. All defaultsettings of LIBSVM and LIBLINEAR are used exceptthat the stopping tolerance of LIBLINEAR is reducedto 10−4.

We conduct feature-wise scaling such that eachinstance xi becomes Dxi, where D is a diagonal matrix

Djj =1

maxt |(xt)j |, j = 1, . . . , n.

The resulting feature values are in [−1, 1]. Although theranges of features may be slightly different, this scalingensures that the sparsity of the data is kept.

V Experiments on Data Scaling

We compare the performance (CV accuracy) of original,feature-wisely scaled and instance-wisely scaled dataunder different C in Figure (I). In addition to datasets considered in the paper, we include more sets fromLIBSVM data sets without modifications.1 For a fewproblems, curves with/without scaling are the same.Problem a9a has 0/1 values, so the set remains the sameafter feature-wise scaling. For gisette, the original datais already feature-wisely scaled. For rcv1, real-sim andwebspam, the original data is already instance-wiselyscaled. We consider the primal-based Newton method inLIBLINEAR for L2-loss SVM. The stopping toleranceis set to be 10−4, except 10−6 for breast cancer and 10−2

for yahoo-japan.From Figure (I), we make the following observa-

tions.

1. The best CV accuracy is about the same for allthree settings, except that instance-wise normaliza-tion gives significantly lower CV accuracy for aus-tralian.

2. The curve for instance-wise scaling is all to the rightof the original data, and for many data sets theshape of the curve is also very similar.

The above observations are consistent with our analysisin Section 4.

VI MultiLinear SVM using Different Settings

Experiments of MultiLinear SVM using different clus-tering algorithms and numbers of clusters are given atFigures (II) and (III) for all 15 data sets.

1One exception is yahoo-japan, which is not publicly available.

(a) mnistOvE (b) madelon (c) heart (d) cod-rna

(e) news20 (f) yahoo-japan (g) a9a (h) covtype

(i) ijcnn1 (j) svmguide1 (k) german.numer (l) fourclass

(m) gisette (n) webspam (o) rcv1 (p) real-sim

(q) australian (r) breast cancer

Figure (I): Performance of linear classifiers using different scaling methods.

(a) fourclass (AC) (b) fourclass (Time) (c) mnistOvE (AC) (d) mnistOvE (Time)

(e) webspam (AC) (f) webspam (Time) (g) madelon (AC) (h) madelon (Time)

(i) a9a (AC) (j) a9a (Time) (k) gisette (AC) (l) gisette (Time)

(m) news20 (AC) (n) news20 (Time) (o) cod-rna (AC) (p) cod-rna (Time)

(q) german.numer (AC) (r) german.numer (Time) (s) ijcnn1 (AC) (t) ijcnn1 (Time)

Figure (II): Performance of MultiLinear SVM under different settings

(a) rcv1 (AC) (b) rcv1 (Time) (c) real-sim (AC) (d) real-sim (Time)

(e) covtype (AC) (f) covtype (Time) (g) poker (AC) (h) poker (Time)

(i) svmguide1 (AC) (j) svmguide1 (Time)

Figure (III): Performance of MultiLinear SVM under different settings (Continued)

Linear and Kernel Classi cation: When to Use Which? · 2020. 10. 11. · VI MultiLinear SVM using...

Documents

Transcript of Linear and Kernel Classi cation: When to Use Which? · 2020. 10. 11. · VI MultiLinear SVM using...

A NOVEL NEUTROSOPHIC LOGIC SVM (N-SVM) AND ITS APPLICATION

SupportVectorMachine(SVM) · 2020. 10. 13. · SupportVectorMachine(SVM) Whataresupportvectormachines(SVM)? LikeLDAandLogisticregression,anSVMisalsoalinearclassiﬁerbut seekstoﬁndamaximum

WEIGHTED INEQUALITIES FOR THE MULTILINEAR ...files.ele-math.com/preprints/jmi-14-08.pdfCaldero´n multilinear operators. Asapplications, wecharacterize aweighted multilinear Hilbert’s

Candidate Multilinear Maps

A team multilinear powerpoint presentation (provincial)

Multilinear Representations of Rotation Groups within ...geometry.mrao.cam.ac.uk/wp-content/uploads/1998/01/98rotation_g… · Multilinear Representations of Rotation Groups within

A Robust Multilinear Model Learning Framework for 3D Faces · 2020-06-15 · Multilinear face models: Multilinear face models have been used in a variety of applications. Vlasic et

Multilinear Algebra and Chees Endgames Stiller

Daniele Loiacono - Politecnico di Milanohome.deib.polimi.it/loiacono/uploads/Teaching/DMTM/DMTM Suppor… · Training SVM ν-SVM Multi-class SVM Regression. Daniele Loiacono SVM Training

Cryptanalysis of CLT13 Multilinear Maps with Independent SlotsCryptanalysis of CLT13 Multilinear Maps with Independent Slots Jean-SébastienCoron,LucaNotarnicola UniversityofLuxembourg

BIOMASS ESTIMATION BASED ON MULTILINEAR REGRESSION …

Multilinear Algebrabuzzard.ups.edu/courses/...2014-shurbert-multilinear-algebra-present.pdf · Multilinear Algebra Davis Shurbert University of Puget Sound April 17, 2014 Davis Shurbert

Topics on Harmonic analysis and Multilinear Algebra€¦ · Topics on Harmonic analysis and Multilinear Algebra Mahdi Hormozi Abstract The present thesis consists of six di erent

MULTILINEAR ALGEBRA BASED TECHNIQUES FOR …

Multilinear Algebra PDF

Algorithms as Multilinear Tensor Equations

Multilinear Independent Components Analysis - MIT …maov/mica/mica05.pdf · Multilinear Independent Components Analysis ... proposed multilinear generalization of Eigenfaces dubbed

Evolutionary, Adaptionist, and Materials Theories Julian Steward’s Cultural Ecology and Multilinear Evolution Julian Steward’s Cultural Ecology and Multilinear.

Multilinear Spectral Decomposition for Nonlinear …afigotin/papers/Babin Figotin Multilinear... · Amer. Math. Soc. Transl. (2) Vol. 206, 2002 Multilinear Spectral Decomposition

Multilinear Projection For Face Recognition Via Canonical ...web.cs.ucla.edu/~maov/papers/multproj_forfacerec_candecomp2.pdf · Multilinear Projection For Face Recognition Via Canonical