MLCC 2018 Dimensionality Reduction and...
Transcript of MLCC 2018 Dimensionality Reduction and...
![Page 1: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/1.jpg)
MLCC 2018Dimensionality Reduction and PCA
Lorenzo RosascoUNIGE-MIT-IIT
![Page 2: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/2.jpg)
Outline
PCA & Reconstruction
PCA and Maximum Variance
PCA and Associated Eigenproblem
Beyond the First Principal Component
PCA and Singular Value Decomposition
Kernel PCA
MLCC 2018 2
![Page 3: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/3.jpg)
Dimensionality Reduction
In many practical applications it is of interest to reduce thedimensionality of the data:
I data visualization
I data exploration: for investigating the ”effective” dimensionality ofthe data
MLCC 2018 3
![Page 4: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/4.jpg)
Dimensionality Reduction (cont.)
This problem of dimensionality reduction can be seen as the problem ofdefining a map
M : X = RD → Rk, k � D,
according to some suitable criterion.
In the following data reconstruction will be our guiding principle.
MLCC 2018 4
![Page 5: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/5.jpg)
Dimensionality Reduction (cont.)
This problem of dimensionality reduction can be seen as the problem ofdefining a map
M : X = RD → Rk, k � D,
according to some suitable criterion.
In the following data reconstruction will be our guiding principle.
MLCC 2018 5
![Page 6: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/6.jpg)
Principal Component Analysis
PCA is arguably the most popular dimensionality reduction procedure.
It is a data driven procedure that given an unsupervised sample
S = (x1, . . . , xn)
derive a dimensionality reduction defined by a linear map M .
PCA can be derived from several prospective and here we give ageometric derivation.
MLCC 2018 6
![Page 7: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/7.jpg)
Principal Component Analysis
PCA is arguably the most popular dimensionality reduction procedure.
It is a data driven procedure that given an unsupervised sample
S = (x1, . . . , xn)
derive a dimensionality reduction defined by a linear map M .
PCA can be derived from several prospective and here we give ageometric derivation.
MLCC 2018 7
![Page 8: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/8.jpg)
Principal Component Analysis
PCA is arguably the most popular dimensionality reduction procedure.
It is a data driven procedure that given an unsupervised sample
S = (x1, . . . , xn)
derive a dimensionality reduction defined by a linear map M .
PCA can be derived from several prospective and here we give ageometric derivation.
MLCC 2018 8
![Page 9: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/9.jpg)
Dimensionality Reduction by Reconstruction
Recall that, ifw ∈ RD, ‖w‖ = 1,
then (wTx)w is the orthogonal projection of x on w
MLCC 2018 9
![Page 10: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/10.jpg)
Dimensionality Reduction by Reconstruction
Recall that, ifw ∈ RD, ‖w‖ = 1,
then (wTx)w is the orthogonal projection of x on w
MLCC 2018 10
![Page 11: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/11.jpg)
Dimensionality Reduction by Reconstruction (cont.)
First, consider k = 1. The associated reconstruction error is
‖x− (wTx)w‖2
(that is how much we lose by projecting x along the direction w)
Problem:Find the direction p allowing the best reconstruction of the training set
MLCC 2018 11
![Page 12: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/12.jpg)
Dimensionality Reduction by Reconstruction (cont.)
First, consider k = 1. The associated reconstruction error is
‖x− (wTx)w‖2
(that is how much we lose by projecting x along the direction w)
Problem:Find the direction p allowing the best reconstruction of the training set
MLCC 2018 12
![Page 13: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/13.jpg)
Dimensionality Reduction by Reconstruction (cont.)
Let SD−1 = {w ∈ RD | ‖w‖ = 1} is the sphere in D dimensions.Consider the empirical reconstruction minimization problem,
minw∈SD−1
1
n
n∑i=1
‖xi − (wTxi)w‖2.
The solution p to the above problem is called the first principalcomponent of the data
MLCC 2018 13
![Page 14: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/14.jpg)
An Equivalent Formulation
A direct computation shows that ‖xi − (wTxi)w‖2 = ‖xi‖2 − (wTxi)2
Then, problem
minw∈SD−1
1
n
n∑i=1
‖xi − (wTxi)w‖2
is equivalent to
maxw∈SD−1
1
n
n∑i=1
(wTxi)2
MLCC 2018 14
![Page 15: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/15.jpg)
An Equivalent Formulation
A direct computation shows that ‖xi − (wTxi)w‖2 = ‖xi‖2 − (wTxi)2
Then, problem
minw∈SD−1
1
n
n∑i=1
‖xi − (wTxi)w‖2
is equivalent to
maxw∈SD−1
1
n
n∑i=1
(wTxi)2
MLCC 2018 15
![Page 16: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/16.jpg)
Outline
PCA & Reconstruction
PCA and Maximum Variance
PCA and Associated Eigenproblem
Beyond the First Principal Component
PCA and Singular Value Decomposition
Kernel PCA
MLCC 2018 16
![Page 17: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/17.jpg)
Reconstruction and Variance
Assume the data to be centered, x̄ = 1nxi = 0, then we can interpret the
term(wTx)2
as the variance of x in the direction w.
The first PC can be seen as the direction along which the data havemaximum variance.
maxw∈SD−1
1
n
n∑i=1
(wTxi)2
MLCC 2018 17
![Page 18: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/18.jpg)
Reconstruction and Variance
Assume the data to be centered, x̄ = 1nxi = 0, then we can interpret the
term(wTx)2
as the variance of x in the direction w.
The first PC can be seen as the direction along which the data havemaximum variance.
maxw∈SD−1
1
n
n∑i=1
(wTxi)2
MLCC 2018 18
![Page 19: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/19.jpg)
Centering
If the data are not centered, we should consider
maxw∈SD−1
1
n
n∑i=1
(wT (xi − x̄))2 (1)
equivalent to
maxw∈SD−1
1
n
n∑i=1
(wTxci )2
with xc = x− x̄.
MLCC 2018 19
![Page 20: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/20.jpg)
Centering and Reconstruction
If we consider the effect of centering to reconstruction it is easy to seethat we get
minw,b∈SD−1
1
n
n∑i=1
‖xi − ((wT (xi − b))w + b)‖2
where((wT (xi − b))w + b
is an affine (rather than an orthogonal) projection
MLCC 2018 20
![Page 21: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/21.jpg)
Outline
PCA & Reconstruction
PCA and Maximum Variance
PCA and Associated Eigenproblem
Beyond the First Principal Component
PCA and Singular Value Decomposition
Kernel PCA
MLCC 2018 21
![Page 22: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/22.jpg)
PCA as an Eigenproblem
A further manipulation shows that PCA corresponds to an eigenvalueproblem.
Using the symmetry of the inner product,
1
n
n∑i=1
(wTxi)2 =
1
n
n∑i=1
wTxiwTxi =
1
n
n∑i=1
wTxixTi w = wT (
1
n
n∑i=1
xixTi )w
Then, we can consider the problem
maxw∈SD−1
wTCnw, Cn =1
n
n∑i=1
xixTi
MLCC 2018 22
![Page 23: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/23.jpg)
PCA as an Eigenproblem
A further manipulation shows that PCA corresponds to an eigenvalueproblem.
Using the symmetry of the inner product,
1
n
n∑i=1
(wTxi)2 =
1
n
n∑i=1
wTxiwTxi =
1
n
n∑i=1
wTxixTi w = wT (
1
n
n∑i=1
xixTi )w
Then, we can consider the problem
maxw∈SD−1
wTCnw, Cn =1
n
n∑i=1
xixTi
MLCC 2018 23
![Page 24: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/24.jpg)
PCA as an Eigenproblem
A further manipulation shows that PCA corresponds to an eigenvalueproblem.
Using the symmetry of the inner product,
1
n
n∑i=1
(wTxi)2 =
1
n
n∑i=1
wTxiwTxi =
1
n
n∑i=1
wTxixTi w = wT (
1
n
n∑i=1
xixTi )w
Then, we can consider the problem
maxw∈SD−1
wTCnw, Cn =1
n
n∑i=1
xixTi
MLCC 2018 24
![Page 25: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/25.jpg)
PCA as an Eigenproblem (cont.)
We make two observations:
I The (”covariance”) matrix Cn = 1n
∑ni=1X
TnXn is symmetric and
positive semi-definite.
I The objective function of PCA can be written as
wTCnw
wTw
the so called Rayleigh quotient.
Note that, if Cnu = λu then uTCnuuTu
= λ, since u is normalized.
Indeed, it is possible to show that the Rayleigh quotient achieves itsmaximum at a vector corresponding to the maximum eigenvalue of Cn
MLCC 2018 25
![Page 26: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/26.jpg)
PCA as an Eigenproblem (cont.)
We make two observations:
I The (”covariance”) matrix Cn = 1n
∑ni=1X
TnXn is symmetric and
positive semi-definite.
I The objective function of PCA can be written as
wTCnw
wTw
the so called Rayleigh quotient.
Note that, if Cnu = λu then uTCnuuTu
= λ, since u is normalized.
Indeed, it is possible to show that the Rayleigh quotient achieves itsmaximum at a vector corresponding to the maximum eigenvalue of Cn
MLCC 2018 26
![Page 27: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/27.jpg)
PCA as an Eigenproblem (cont.)
We make two observations:
I The (”covariance”) matrix Cn = 1n
∑ni=1X
TnXn is symmetric and
positive semi-definite.
I The objective function of PCA can be written as
wTCnw
wTw
the so called Rayleigh quotient.
Note that, if Cnu = λu then uTCnuuTu
= λ, since u is normalized.
Indeed, it is possible to show that the Rayleigh quotient achieves itsmaximum at a vector corresponding to the maximum eigenvalue of Cn
MLCC 2018 27
![Page 28: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/28.jpg)
PCA as an Eigenproblem (cont.)
We make two observations:
I The (”covariance”) matrix Cn = 1n
∑ni=1X
TnXn is symmetric and
positive semi-definite.
I The objective function of PCA can be written as
wTCnw
wTw
the so called Rayleigh quotient.
Note that, if Cnu = λu then uTCnuuTu
= λ, since u is normalized.
Indeed, it is possible to show that the Rayleigh quotient achieves itsmaximum at a vector corresponding to the maximum eigenvalue of Cn
MLCC 2018 28
![Page 29: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/29.jpg)
PCA as an Eigenproblem (cont.)
Computing the first principal component of the data reduces tocomputing the biggest eigenvalue of the covariance and thecorresponding eigenvector.
Cnu = λu, Cn =1
n
n∑i=1
XTnXn
MLCC 2018 29
![Page 30: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/30.jpg)
Outline
PCA & Reconstruction
PCA and Maximum Variance
PCA and Associated Eigenproblem
Beyond the First Principal Component
PCA and Singular Value Decomposition
Kernel PCA
MLCC 2018 30
![Page 31: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/31.jpg)
Beyond the First Principal Component
We discuss how to consider more than one principle component (k > 1)
M : X = RD → Rk, k � D
The idea is simply to iterate the previous reasoning
MLCC 2018 31
![Page 32: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/32.jpg)
Residual Reconstruction
The idea is to consider the one dimensional projection that can bestreconstruct the residuals
ri = xi − (pTxi)pi
An associated minimization problem is given by
minw∈SD−1,w⊥p
1
n
n∑i=1
‖ri − (wT ri)w‖2.
(note: the constraint w ⊥ p)
MLCC 2018 32
![Page 33: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/33.jpg)
Residual Reconstruction
The idea is to consider the one dimensional projection that can bestreconstruct the residuals
ri = xi − (pTxi)pi
An associated minimization problem is given by
minw∈SD−1,w⊥p
1
n
n∑i=1
‖ri − (wT ri)w‖2.
(note: the constraint w ⊥ p)
MLCC 2018 33
![Page 34: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/34.jpg)
Residual Reconstruction (cont.)
Note that for all i = 1, . . . , n,
‖ri − (wT ri)w‖2 = ‖ri‖2 − (wT ri)2 = ‖ri‖2 − (wTxi)
2
since w ⊥ p
Then, we can consider the following equivalent problem
maxw∈SD−1,w⊥p
1
n
n∑i=1
(wTxi)2 = wTCnw.
MLCC 2018 34
![Page 35: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/35.jpg)
Residual Reconstruction (cont.)
Note that for all i = 1, . . . , n,
‖ri − (wT ri)w‖2 = ‖ri‖2 − (wT ri)2 = ‖ri‖2 − (wTxi)
2
since w ⊥ p
Then, we can consider the following equivalent problem
maxw∈SD−1,w⊥p
1
n
n∑i=1
(wTxi)2 = wTCnw.
MLCC 2018 35
![Page 36: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/36.jpg)
PCA as an Eigenproblem
maxw∈SD−1,w⊥p
1
n
n∑i=1
(wTxi)2 = wTCnw.
Again, we have to minimize the Rayleigh quotient of the covariancematrix with the extra constraint w ⊥ p
Similarly to before, it can be proved that the solution of the aboveproblem is given by the second eigenvector of Cn, and the correspondingeigenvalue.
MLCC 2018 36
![Page 37: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/37.jpg)
PCA as an Eigenproblem
maxw∈SD−1,w⊥p
1
n
n∑i=1
(wTxi)2 = wTCnw.
Again, we have to minimize the Rayleigh quotient of the covariancematrix with the extra constraint w ⊥ p
Similarly to before, it can be proved that the solution of the aboveproblem is given by the second eigenvector of Cn, and the correspondingeigenvalue.
MLCC 2018 37
![Page 38: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/38.jpg)
PCA as an Eigenproblem (cont.)
Cnu = λu, Cn =1
n
n∑i=1
xixTi
The reasoning generalizes to more than two components:computation of k principal components reduces to finding k eigenvaluesand eigenvectors of Cn.
MLCC 2018 38
![Page 39: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/39.jpg)
Remarks
I Computational complexity roughly O(kD2) (complexity of formingCn is O(nD2)). If we have n points in D dimensions and n� Dcan we compute PCA in less than O(nD2)?
I The dimensionality reduction induced by PCA is a linear projection.Can we generalize PCA to non linear dimensionality reduction?
MLCC 2018 39
![Page 40: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/40.jpg)
Remarks
I Computational complexity roughly O(kD2) (complexity of formingCn is O(nD2)). If we have n points in D dimensions and n� Dcan we compute PCA in less than O(nD2)?
I The dimensionality reduction induced by PCA is a linear projection.Can we generalize PCA to non linear dimensionality reduction?
MLCC 2018 40
![Page 41: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/41.jpg)
Outline
PCA & Reconstruction
PCA and Maximum Variance
PCA and Associated Eigenproblem
Beyond the First Principal Component
PCA and Singular Value Decomposition
Kernel PCA
MLCC 2018 41
![Page 42: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/42.jpg)
Singular Value Decomposition
Consider the data matrix Xn, its singular value decomposition is given by
Xn = UΣV T
where:
I U is a n by k orthogonal matrix,
I V is a D by k orthogonal matrix,
I Σ is a diagonal matrix such that Σi,i =√λi, i = 1, . . . , k and
k ≤ min{n,D}.
The columns of U and the columns of V are the left and right singularvectors and the diagonal entries of Σ the singular values.
MLCC 2018 42
![Page 43: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/43.jpg)
Singular Value Decomposition (cont.)
The SVD can be equivalently described by the equations
Cnpj = λjpj ,1
nKnuj = λjuj ,
Xnpj =√λjuj ,
1
nXT
n uj =√λjpj ,
for j = 1, . . . , d and where Cn = 1nX
TnXn and 1
nKn = 1nXnX
Tn
MLCC 2018 43
![Page 44: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/44.jpg)
PCA and Singular Value Decomposition
If n� p we can consider the following procedure:
I form the matrix Kn, which is O(Dn2)
I find the first k eigenvectors of Kn, which is O(kn2)
I compute the principal components using
pj =1√λjXT
n uj =1√λj
n∑i=1
xiuij , j = 1, . . . , d
where u = (u1, . . . , un), This is O(knD) if we consider k principalcomponents.
MLCC 2018 44
![Page 45: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/45.jpg)
Outline
PCA & Reconstruction
PCA and Maximum Variance
PCA and Associated Eigenproblem
Beyond the First Principal Component
PCA and Singular Value Decomposition
Kernel PCA
MLCC 2018 45
![Page 46: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/46.jpg)
Beyond Linear Dimensionality Reduction?
By considering PCA we are implicitly assuming the data to lie on a linearsubspace....
...it is easy to think of situations where this assumption might violated.
Can we use kernels to obtain non linear generalization of PCA?
MLCC 2018 46
![Page 47: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/47.jpg)
Beyond Linear Dimensionality Reduction?
By considering PCA we are implicitly assuming the data to lie on a linearsubspace....
...it is easy to think of situations where this assumption might violated.
Can we use kernels to obtain non linear generalization of PCA?
MLCC 2018 47
![Page 48: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/48.jpg)
Beyond Linear Dimensionality Reduction?
By considering PCA we are implicitly assuming the data to lie on a linearsubspace....
...it is easy to think of situations where this assumption might violated.
Can we use kernels to obtain non linear generalization of PCA?
MLCC 2018 48
![Page 49: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/49.jpg)
From SVD to KPCA
Using SVD the projection of a point x on a principal component pj , forj = 1, . . . , d, is
(M(x))j = xT pj =1√λjxTXT
n uj =1√λj
n∑i=1
xTxiuij ,
Recall
Cnpj = λjpj ,1
nKnuj = λjuj ,
Xnpj =√λjuj ,
1
nXT
n uj =√λjpj ,
MLCC 2018 49
![Page 50: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/50.jpg)
PCA and Feature Maps
(M(x))j =1√λj
n∑i=1
xTxiuij ,
What if consider a non linearfeature-map Φ : X → F , beforeperforming PCA?
(M(x))j = Φ(x)T pj =1√λj
n∑i=1
Φ(x)T Φ(xi)uij ,
where Knσj = σjuj and (Kn)i,j = Φ(x)T Φ(xj).
MLCC 2018 50
![Page 51: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/51.jpg)
PCA and Feature Maps
(M(x))j =1√λj
n∑i=1
xTxiuij ,
What if consider a non linearfeature-map Φ : X → F , beforeperforming PCA?
(M(x))j = Φ(x)T pj =1√λj
n∑i=1
Φ(x)T Φ(xi)uij ,
where Knσj = σjuj and (Kn)i,j = Φ(x)T Φ(xj).
MLCC 2018 51
![Page 52: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/52.jpg)
Kernel PCA
(M(x))j = Φ(x)T pj =1√λj
n∑i=1
Φ(x)T Φ(xi)uij ,
If the feature map is defined by a positive definite kernelK : X ×X → R, then
(M(x))j =1√λj
n∑i=1
K(x, xi)uij ,
where Knσj = σjuj and (Kn)i,j = K(xi, xj).
MLCC 2018 52
![Page 53: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/53.jpg)
Wrapping Up
In this class we introduced PCA as a basic tool for dimensionalityreduction. We discussed computational aspect and extensions to nonlinear dimensionality reduction (KPCA)
MLCC 2018 53
![Page 54: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:](https://reader033.fdocuments.us/reader033/viewer/2022052100/603960a8d7f846138044ed27/html5/thumbnails/54.jpg)
Next Class
In the next class, beyond dimensionality reduction, we ask how we candevise interpretable data models, and discuss a class of methods based onthe concept of sparsity.
MLCC 2018 54