Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is...
Transcript of Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is...
![Page 1: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/1.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Linear Algebra in and forLeast Squares Support Vector Machines
Katholieke Universiteit LeuvenDepartment of Electrical Engineering
ESAT-STADIUS
1 / 69
![Page 2: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/2.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
2 / 69
![Page 3: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/3.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
3 / 69
![Page 4: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/4.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Wishlist
Big Data: Volume, Velocity, Variety, . . .
Dealing with high dimensional input spaces
Need for powerful black box modeling techniques
Avoid pitfalls of nonlinear optimization (convergenceissues, local minima,. . . )
Preferable: (numerical) linear algebra, convex optimization
Algorithms for (un-)supervised function regression andestimation, (predictive) modeling, clustering andclassification, data dimensionality reduction, correlationanalysis (spatial-temporal modeling), feature selection,(early - intermediate - late) data fusion, ranking, outlierand fraud detection, decision support systems (processindustry 4.0, digital health, . . . )
. . .
4 / 69
![Page 5: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/5.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Wishlist
Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation
Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA
Angles - CCA Kernel CCA
5 / 69
![Page 6: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/6.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
6 / 69
![Page 7: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/7.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
System Identification
System Identification: PEM
LTI models
Non-convex optimization
Considered ’solved’ early nineties
Linear Algebra approach
⇒ Large block Hankel data matrices;SVD; Orthogonal and oblique projections;Least squares⇒ Subspace methods
7 / 69
![Page 8: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/8.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Polynomial optimization problems
Multivariate polynomial optimizationproblems
Multivariate polynomial object function +constraints
Non-convex optimization
Computer Algebra, Homotopy methods,Numerical Optimization
Linear Algebra approach
⇒ Macaulay matrix; SVD; Kernel;Realization theory⇒ Smallest eigenvalue of large matrix
8 / 69
![Page 9: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/9.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machines. . .
Nonlinear regression, modelling and clustering
Most regression, modelling and clusteringproblems are nonlinear when formulated in theinput data space
This requires nonlinear nonconvex optimizationalgorithms
Linear Algebra approach
⇒ Least Squares Support Vector Machines
‘Kernel trick’ = projection of input data to ahigh-dimensional feature space
Regression, modelling, clustering problembecomes a large scale linear algebra problem (setof linear equations, eigenvalue problem)
9 / 69
![Page 10: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/10.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machines. . .
10 / 69
![Page 11: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/11.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machines. . .
Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation
Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA
Angles - CCA Kernel CCA
11 / 69
![Page 12: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/12.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
12 / 69
![Page 13: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/13.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least squares
X.w = y
Consistent ⇐⇒ rank(X) = rank(X y)
Then estimate w unique iff X of full column rank
Inconsistent if rank(X y) = rank(X) + 1
Find ‘best’ linear combination of columns of X toapproximate y =⇒ Least Squares
13 / 69
![Page 14: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/14.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least squares
min w ‖y −Xw‖22 =min w wTXTXw − 2yTXw + yTy
Derivatives w.r.t. w =⇒ normal equations
XTXw = XTy
If X full column rank: w = (XTX)−1XTy
14 / 69
![Page 15: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/15.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least squares
Equivalently: Call e = y −Xw
min e,w ‖e‖22 = eTe subject to y −Xw − e = 0
Lagrangean L(e, w, l) = 12eTe+ lT (y −Xw − e)
∂L∂e
= 0 = e− l∂L∂w
= 0 = XT l
∂L∂l
= 0 = y −Xw − e
=⇒ e = l
=⇒ XTe = 0
=⇒ XTXw = XTy
15 / 69
![Page 16: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/16.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least squares
Consider
mine,w
1
2eTV −1e+
1
2wTW−1w
subject toy = Xw + e
This is maximum likelihood/Bayesian with priors
e ∼ N (0, V ) and w ∼ N (0,W ) .
Lagrangean
L(w, l, e) = 1
2eTV −1e+
1
2wTW−1w−lT (y−Xw−e)
16 / 69
![Page 17: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/17.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least squares
∂L∂w
= 0 = W−1w +XT l
∂L∂l
= 0 = y −Xw − e∂L∂e
= 0 = V −1e+ l
Hence
0 X IXT W−1 0I 0 V −1
lwe
=
y00
Karush-Kuhn-Tucker equations
17 / 69
![Page 18: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/18.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least squares
Let X ∈ Rp×q. Eliminate e = y −Xw
V −1y − V −1Xw + l = 0
W−1w +XT l = 0
Primal: Eliminate lx = (XTV −1X +W−1)−1XTV −1y→ ‘small’ q × q inverse
Dual: Eliminate wl = −(V +XWXT )−1y→ ‘large’ p× p inverse
18 / 69
![Page 19: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/19.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machine Regression
Given X ∈ RN×q, y ∈ RN with i-th row xi.Consider a nonlinear vector function ϕ(xi) ∈ R1×q
and the constrained least squares optimizationproblem:
minw,b,e
1
2wTw +
γ
2eTe
subject to
y =
ϕ(x1)...
ϕ(xN)
w + e = Xϕw + e
19 / 69
![Page 20: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/20.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machine Regression
Lagrangean
L(w, e, l) = 1
2wTw +
γ
2eTe+ lT (y −Xϕw − e)
Eliminate e and w to find
l = (XϕXTϕ +
1
γIN)
−1y
20 / 69
![Page 21: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/21.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machine Regression
Call (XϕXTϕ + 1
γIN) the kernel K(., .) with element
i, j: K(xi, xj) = ϕ(xi).ϕT (xj), an ‘inner product’.
Then, obviously
y(x) =∑N
i=1K(xi, x)li
21 / 69
![Page 22: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/22.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machine Regression
22 / 69
![Page 23: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/23.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machine Regression
Observations:
Kernel: N ×N , symmetric positive definite matrix
q can be large (possibly ∞)
ϕ(xi) nonlinearly maps data row xi into a high-(possibly∞-)dimensional space.
Not needed to know ϕ(.) explicitly. In machine learning, wefix a symmetric continuous kernel that satisfies Mercer’scondition: ∫
K(x, z)g(x)g(z)dxdz ≥ 0 ,
for any square integrable function g(x). Then K(x, z)separates: ∃ Hilbert space H, ∃ map φ(.) and ∃ λi > 0 suchthat
K(x, z) =∑
λiφ(x)φ(z) .
Kernel trick: Work out ‘dual formulation’ with Lagrangemultipliers; generate ‘long’ (∞) inner products with ϕ(.);
23 / 69
![Page 24: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/24.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machine Regression
Kernels:
Mathematical form: linear, polynomial, radial basis function,splines, wavelets, string kernel, kernels from graphicalmodels, Fisher kernels, graph kernels, data fusionkernels, spike kernels, . . .
Application inspired: Text mining, bioinformatics, images, . . .
24 / 69
![Page 25: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/25.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Least Squares Support Vector Machine Regression
Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation
Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA
Angles - CCA Kernel CCA
25 / 69
![Page 26: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/26.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
26 / 69
![Page 27: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/27.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
27 / 69
![Page 28: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/28.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Linear classification
28 / 69
![Page 29: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/29.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Linear classification
29 / 69
![Page 30: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/30.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
LS-SVM classifier
30 / 69
![Page 31: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/31.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
LS-SVM classifier
31 / 69
![Page 32: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/32.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
32 / 69
![Page 33: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/33.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation
Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA
Angles - CCA Kernel CCA
33 / 69
![Page 34: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/34.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
PCA - SVD
Given data matrix X. Find vectors of maximum variance:
maxw,e
eT e ,
subject toe = Xw , wTw = 1 .
Lagrangean:
L(w, e, λ) = 1
2eT e− lT (e−Xw) + λ(1− wTw)
giving
0 = e− l0 = XT l − 2wλ
0 = e−Xw1 = wTw
34 / 69
![Page 35: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/35.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
PCA - SVD
Hencel = Xw , XT l = w(2λ) , lT l = 2λ .
Call v = l/√2λ and σ =
√2λ, then
Xw = vσ , wTw = 1
XT v = wσ , vT v = 1
SVD !! So, the left singular vectors of X are Lagrange multipliersin a PCA problem.
Example: 12 600 genes72 patients:
- 28 Acute Lymphoblastic Leukemia (ALL)
- 24 Acute Myeloid Leukemia (AML)
- 20 Mixed Linkage Leukemia (MLL)
35 / 69
![Page 36: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/36.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
PCA - SVD
36 / 69
![Page 37: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/37.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
PCA - SVD
37 / 69
![Page 38: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/38.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Kernel PCA
38 / 69
![Page 39: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/39.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Kernel PCA
39 / 69
![Page 40: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/40.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
40 / 69
![Page 41: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/41.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation
Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA
Angles - CCA Kernel CCA
41 / 69
![Page 42: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/42.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Principal angles and directions. . .
Given two data matrices X,Y . Find directions in the column spaceof X, resp. Y that maximally correlate:
mine,f,v,w
1
2‖e− f‖22 ,
subject toe = Xv, f = Y w, eT e = 1, fT f = 1 .
Notice that
‖e− f‖22 = 1 + 1− 2eT f = 2(1− cos θ)
Minimizing distance = maximizing cosine = minimizing anglebetween column spaces
42 / 69
![Page 43: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/43.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Principal angles and directions. . .
Lagrangean:L(e, f, v, w, a, b, α, β) = 1− eT f + aT (e−Xv) + bT (f − Y w)
−α(1− eT e)− β(1− fT f)resulting in
−f + a = −eα e = Xv
−e+ b = −fβ f = Y w
XTa = 0 eT e = 1
Y T b = 0 fT f = 1
Eliminating a, b, e, f gives
XTY w = XTXvα
Y TXv = Y TY wβ
Hence: α = β = λ(say)(= cos θ) .
43 / 69
![Page 44: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/44.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Principal angles and directions. . .
Principal angles and directions follow from Generalized EVP(0 XTY
Y TX 0
)(vw
)=
(XTX 00 Y TY
)(vw
)λ
vTXTXv = wTY TY w = 1
Numerically correct way: use 3 SVD’s (see Golub/VanLoan)
44 / 69
![Page 45: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/45.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Kernel CCA
45 / 69
![Page 46: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/46.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Kernel CCA
46 / 69
![Page 47: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/47.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Kernel CCA
47 / 69
![Page 48: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/48.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
48 / 69
![Page 49: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/49.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
‘Core’ LS-SVM problems
49 / 69
![Page 50: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/50.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Software
http://www.esat.kuleuven.be/sista/lssvmlab/
50 / 69
![Page 51: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/51.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Enforcing sparsity
51 / 69
![Page 52: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/52.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Robustness
52 / 69
![Page 53: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/53.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Robustness
53 / 69
![Page 54: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/54.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Fixed-size LS-SVM
54 / 69
![Page 55: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/55.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Prediction intervals
55 / 69
![Page 56: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/56.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Semi-supervised learning
56 / 69
![Page 57: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/57.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Tensor data
57 / 69
![Page 58: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/58.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
58 / 69
![Page 59: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/59.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Modeling a tsunami of data
59 / 69
![Page 60: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/60.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Electric Load Forecasting
60 / 69
![Page 61: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/61.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Electric Load Forecasting
61 / 69
![Page 62: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/62.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Electric Load Forecasting
62 / 69
![Page 63: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/63.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Medical applications
PACS UZ Leuven
1,6 PetaByte
Genomics core HiSeq 2000 full speed exome sequencing
1 TeraByte / week
1 small animal image
1 GigaByte
1 CD-ROM 750
MegaByte
sequencing all newborns by 2020 (125k births /
year) 125 PetaByte / year
index of 20 million
Biomedical PubMed records
23 GigaByte
1 slice mouse brain MSI at
10 μm resolution 81 GigaByte
raw NGS data of 1 full genome
1 TeraByte
1 kB = 1000 1 MB = 1 000 000 1 GB = 1 000 000 000 1 TB = 1 000 000 000 000 1 PB = 1 000 000 000 000 000
63 / 69
![Page 64: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/64.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Magnetic resonance spectroscopic imaging
64 / 69
![Page 65: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/65.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Proteomics
65 / 69
![Page 66: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/66.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Ranking from data fusion
66 / 69
![Page 67: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/67.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Outline
1 Big Data
2 Low is difficult, high is easy
3 Regression
4 Classification
5 Dimensionality reduction
6 Correlation analysis
7 Extensions
8 Applications
9 Conclusions
67 / 69
![Page 68: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/68.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Conclusions
LS-SVM = unifying framework for (un-)supervised ML tasks:regression, (predictive) modeling, clustering and classification,data dimensionality reduction, correlation analysis(spatial-temporal modeling), feature selection, (early -intermediate - late) data fusion, ranking, outlier detection
Form a core ingredient of decision support systems with‘human decision maker in-the-loop’: Policies in climate,energy, pollution; Clinical decision support: digital health;Industrial decision support: yield, monitoring, emissioncontrol; Zillions of application areas;
Tsunami of Big Data (high dimensional input spaces, highcomplexity and interrelations, ...) are generated byInternet-of-Things multi-sensor networks, clinical monitoringequipment, etc...
Via the Kernel Trick: It’s all linear algebra !
68 / 69
![Page 69: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation](https://reader034.fdocuments.us/reader034/viewer/2022042806/5f75a8b8da436b52f9328f92/html5/thumbnails/69.jpg)
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions
Conclusions
STADIUS - SPIN-OFFS “Going beyond research”�www. esat.kuleuven.be/stadius/spinoffs.php�
���������������� ������������������
Automation & Optimization��
Transport & �Mobility research �
��
Financial compliance�
Data mining� industry solutions�
����������������
Automated PCR analysis�������������� ��
Data handling & mining for clinical genetics����������������� ��
���������������
����������������� ��
�����������������������������
�������������������������������
�������� �����
69 / 69