Post on 25-Jul-2020
Dimensionality Reduction
Hung Le
University of Victoria
March 23, 2019
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 1 / 28
Motivation
Matrices are everywhere:
Graphs: Web or Social Network.
Pairwise interaction between two types of entities: movie rating,image-tags
Spatial representation: images.
In many applications, the raw matrix can be summarized by a “narrower”matrices.
“Narrower” means the output matrices have much less number ofcolumns/rows.
“Can be summarized ” means we can almost recover the originalmatrix from the summarized matrices.
Dimensionality reduction: find the “narrower” matrices of the originalmatrix.
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 2 / 28
Motivation
Matrices are everywhere:
Graphs: Web or Social Network.
Pairwise interaction between two types of entities: movie rating,image-tags
Spatial representation: images.
In many applications, the raw matrix can be summarized by a “narrower”matrices.
“Narrower” means the output matrices have much less number ofcolumns/rows.
“Can be summarized ” means we can almost recover the originalmatrix from the summarized matrices.
Dimensionality reduction: find the “narrower” matrices of the originalmatrix.
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 2 / 28
Motivation
Matrices are everywhere:
Graphs: Web or Social Network.
Pairwise interaction between two types of entities: movie rating,image-tags
Spatial representation: images.
In many applications, the raw matrix can be summarized by a “narrower”matrices.
“Narrower” means the output matrices have much less number ofcolumns/rows.
“Can be summarized ” means we can almost recover the originalmatrix from the summarized matrices.
Dimensionality reduction: find the “narrower” matrices of the originalmatrix.
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 2 / 28
Eigenvectors and Eigenvalues of Symmetric Matricese is an eigenvector corresponding to an eigenvalue λ of a square matrix Miff:
Me = λe (1)
Fact 1 If e is an eigenvector of M, for any constant c 6= 0, ce is alsoan eigenvector of M (with the same eigenvalue) ⇒ We often requirethat ||e||2 = 1.
Fact 2 If M ∈ Rn×n is real and symmetric, then M has n realeigenvectors and eigenvalues ⇒ In this lecture, M is real, symmetric.
Fact 3 We can order
λ1 ≥ λ2 ≥ . . . ≥ λne1 e2 . . . en
(2)
We can make ej , ej orthogonal, i.e, eTi ej = 0 for all i 6= j .
λ1 and e1 called the principal eigenvalue the principal eigenvector,respectively
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 3 / 28
Eigenvectors and Eigenvalues of Symmetric Matricese is an eigenvector corresponding to an eigenvalue λ of a square matrix Miff:
Me = λe (1)
Fact 1 If e is an eigenvector of M, for any constant c 6= 0, ce is alsoan eigenvector of M (with the same eigenvalue) ⇒ We often requirethat ||e||2 = 1.
Fact 2 If M ∈ Rn×n is real and symmetric, then M has n realeigenvectors and eigenvalues ⇒ In this lecture, M is real, symmetric.
Fact 3 We can order
λ1 ≥ λ2 ≥ . . . ≥ λne1 e2 . . . en
(2)
We can make ej , ej orthogonal, i.e, eTi ej = 0 for all i 6= j .
λ1 and e1 called the principal eigenvalue the principal eigenvector,respectively
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 3 / 28
Eigenvectors and Eigenvalues of Symmetric Matricese is an eigenvector corresponding to an eigenvalue λ of a square matrix Miff:
Me = λe (1)
Fact 1 If e is an eigenvector of M, for any constant c 6= 0, ce is alsoan eigenvector of M (with the same eigenvalue) ⇒ We often requirethat ||e||2 = 1.
Fact 2 If M ∈ Rn×n is real and symmetric, then M has n realeigenvectors and eigenvalues ⇒ In this lecture, M is real, symmetric.
Fact 3 We can order
λ1 ≥ λ2 ≥ . . . ≥ λne1 e2 . . . en
(2)
We can make ej , ej orthogonal, i.e, eTi ej = 0 for all i 6= j .
λ1 and e1 called the principal eigenvalue the principal eigenvector,respectivelyHung Le (University of Victoria) Dimensionality Reduction March 23, 2019 3 / 28
Eigenvectors and Eigenvalues of Symmetric Matrices - AnExample
M =
[3 2
2 6
](3)
has:
λ1 = 7 and x1 = [1√5,
2√5
]T
λ2 = 2 and x2 = [2√5,− 1√
5]T
(4)
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 4 / 28
Finding Eigenvalues by Solving Equations
Eigenvalues are the roots of the following equation (with variable λ):
det(M − λI) = 0 (5)
where I is an identity matrix, and det(X ) is the determinant of an n × nmatrix X .
Fact 4 det(M − λI) is a degree-n polynomial with variable λ.
Fact 5 If M is real and symmetric, Equation 5 has n real roots.
Fact 6 Computing the determinant of a matrix takes O(n3).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 5 / 28
Finding Eigenvalues by Solving Equations
Eigenvalues are the roots of the following equation (with variable λ):
det(M − λI) = 0 (5)
where I is an identity matrix, and det(X ) is the determinant of an n × nmatrix X .
Fact 4 det(M − λI) is a degree-n polynomial with variable λ.
Fact 5 If M is real and symmetric, Equation 5 has n real roots.
Fact 6 Computing the determinant of a matrix takes O(n3).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 5 / 28
Finding Eigenvalues by Solving Equations
Eigenvalues are the roots of the following equation (with variable λ):
det(M − λI) = 0 (5)
where I is an identity matrix, and det(X ) is the determinant of an n × nmatrix X .
Fact 4 det(M − λI) is a degree-n polynomial with variable λ.
Fact 5 If M is real and symmetric, Equation 5 has n real roots.
Fact 6 Computing the determinant of a matrix takes O(n3).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 5 / 28
Finding Eigenvalues by Solving Equations - An example
M =
[3 2
2 6
](6)
We have:
det(M − λI ) = det(
[3− λ 2
2 6− λ
]) = λ2 − 9λ+ 14 (7)
Two roots are λ1 = 7 and λ2 = 2.
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 6 / 28
Finding Eigenvalues and Eigenvectors by Power Iteration
Finding principal eigenvector and value.
PowerIteration1(M)v0 ← a random vectorfor t ← 1 to k
vt = Mvt−1
||Mvt−1||return vk , v
Tk Mvk .
Running time O((m + n)k) where m is the number of non-zeros of M.
e1 ≈ vk and λ1 ≈ vTk Mvk .
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 7 / 28
Power Iteration - An example
M =
[3 2
2 6
](8)
and v0 = [1, 1]T . Then
Mv0 =
[3 2
2 6
][1
1
]=
[5
8
](9)
Thus, v1 = Mv0||Mv0||2 = [0.530, 0.848]T . Repeat second time we get:
v2 = [0.471, 0.882]T (10)
The limiting vector is:vk = [0.447, 0.894]T (11)
with λ = vTk Mvk = 6.993
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 8 / 28
Finding Eigenvalues and Eigenvectors by Power Iteration
Finding the second largest eigenvalues and vectors
PowerIteration2(M)(e1, λ1)← PowerIteration1(M)v0 ← a random vectorM2 ← M − λ1e1eT1for t ← 1 to k
vt = M2vt−1
||Mvt−1||return vk , v
Tk M2vk .
Running time O((m + n)k) where m is the number of non-zeros of M.
e2 ≈ vk and λ2 ≈ vTk Mvk .
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 9 / 28
The Matrix of Eigenvectors
E =
e1 e2 . . . en
(12)
Then:ETE = EET = I (13)
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 10 / 28
The Matrix of Eigenvectors - An Example
M =
[3 2
2 6
](14)
has:
x1 = [1√5,
2√5
]T x2 = [2√5,− 1√
5]T (15)
Thus,
E =
[1√5
2√5
2√5− 1√
5
](16)
It is straightforward to verify that:
EET = ETE =
[1 0
0 1
](17)
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 11 / 28
Principal Component Analysis- PCA
Given a set of points D = {x1, x2, . . . , xn} in Rd , find a direction where allthe points line up best.
A direction is a (unit) vector w.
All the points line up best along w when:
n∑i=1
(xTi w)2 (18)
is maximum.
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 12 / 28
Principal Component Analysis- PCA
Given a set of points D = {x1, x2, . . . , xn} in Rd , find a direction where allthe points line up best.
A direction is a (unit) vector w.
All the points line up best along w when:
n∑i=1
(xTi w)2 (18)
is maximum.
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 12 / 28
Principal Component Analysis- PCA
Let:
X =
xT1xT2...xTn
(19)
Then
n∑i=1
(xTi w)2 = ||Xw||22 = wTXTXw (20)
Thus,∑n
i=1(xTi w)2 is maximum when w is the principal eigenvector ofXTX .
Question: what is wTXTXw when w is the principal eigenvectorof XTX?Answer: λ1(XTX ).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 13 / 28
Principal Component Analysis- PCA
Let:
X =
xT1xT2...xTn
(19)
Then
n∑i=1
(xTi w)2 = ||Xw||22 = wTXTXw (20)
Thus,∑n
i=1(xTi w)2 is maximum when w is the principal eigenvector ofXTX . Question: what is wTXTXw when w is the principal eigenvectorof XTX?
Answer: λ1(XTX ).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 13 / 28
Principal Component Analysis- PCA
Let:
X =
xT1xT2...xTn
(19)
Then
n∑i=1
(xTi w)2 = ||Xw||22 = wTXTXw (20)
Thus,∑n
i=1(xTi w)2 is maximum when w is the principal eigenvector ofXTX . Question: what is wTXTXw when w is the principal eigenvectorof XTX?Answer: λ1(XTX ).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 13 / 28
PCA - An Example
X =
1 2
2 1
3 4
4 3
and XTX =
[30 28
28 30
](21)
has λ1(XTX ) = 58 with vector x1 = [ 1√2, 1√
2]T
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 14 / 28
PCA - More Components
Given a set of points D = {x1, x2, . . . , xn} in Rd , find the second bestdirection where all the points line up best.
A direction u is the second best if
uTw = 0, here w is the best.
All the points line up best along u among all directions orthogonal tow . That is
n∑i=1
(xTi u)2 (22)
is maximum among all directions orthogonal to w
Fact: u is the second eigenvector of XTX corresponding to the secondlargest eigenvalue λ2(XTX ).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 15 / 28
PCA - More Components
Given a set of points D = {x1, x2, . . . , xn} in Rd , find the second bestdirection where all the points line up best.A direction u is the second best if
uTw = 0, here w is the best.
All the points line up best along u among all directions orthogonal tow . That is
n∑i=1
(xTi u)2 (22)
is maximum among all directions orthogonal to w
Fact: u is the second eigenvector of XTX corresponding to the secondlargest eigenvalue λ2(XTX ).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 15 / 28
PCA - More Components
Given a set of points D = {x1, x2, . . . , xn} in Rd , find the second bestdirection where all the points line up best.A direction u is the second best if
uTw = 0, here w is the best.
All the points line up best along u among all directions orthogonal tow . That is
n∑i=1
(xTi u)2 (22)
is maximum among all directions orthogonal to w
Fact: u is the second eigenvector of XTX corresponding to the secondlargest eigenvalue λ2(XTX ).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 15 / 28
PCA - More Components
Given a set of points D = {x1, x2, . . . , xn} in Rd , find the k-th bestdirection where all the points line up best.Fact: The k-the best direction is the eigenvector of XTX correspondingto the k − th largest eigenvalue λk(XTX ).
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 16 / 28
Using PCA for Dimensionality Reduction
Given k eigenvectors e1, e2, . . . , ek corresponding to k largest eigenvalues.We can then represent each new data point xi ∈ D as:
xTi e1
xTi e2
···
xTi ek
(23)
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 17 / 28
Singular Value DecompositionLet M be a m × n matrix. Let r be the rank of M. A Singular ValueDecomposition is a decomposition of M into three matrices U,Σ,Vwhere:
U is a column-orthogonal m × r matrix, i.e, UTU = ImΣ is a diagonal r × r matrix. Elements on the diagonal of Σ aresingular values and are decreasingly ordered.
V is an column-orthogonal n × r matrix, i.e, V TV = In
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 18 / 28
SVD
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 19 / 28
Understanding SVD
Think of U,Σ,V as a representation of concepts hidden in M.
Two concepts:
ScienceFiction = {TheMatrixAlien, StarWars}Romance = {Casablanca, Titanic}
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 20 / 28
Dimensionality Reduction by SVD
A rank-k SVD approximation of M is the matrix UkΣkVTk where:
Uk contains the first k columns of U.
Σk contains k largest elements of Σ.
Vk contains the first k columns of V .
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 21 / 28
Dimensionality Reduction by SVD - An Example
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 22 / 28
Dimensionality Reduction by SVD - An Example
A rank-2 approximation of M ′:
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 23 / 28
Dimensionality Reduction by SVD - Why?
Theorem
Given M, a rank-k SVD approximation of M, denoted by Ak = UkΣkVTk
has:||M − Ak ||F (24)
minimum among all possible rank-k matrices.
||X ||F is the Frobenius norm of X :
||X ||F =
√√√√ n∑i=1
m∑j=1
X [i , j ]2 (25)
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 24 / 28
How to choose k
Choose k so that at least 90% of energy of Σ is preserved.
Energy(Σ) =r∑
i=1
Σ[i , i ]2 (26)
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 25 / 28
Querying Concept
Suppose that a new person P has seen only one movie the Matrix andrated it 4. Recall:
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 26 / 28
Querying Concept (Contt.)
Let q be the row representation of P, that is:
q =[4 0 0 0 0
](27)
We can determine the “concept space” of P by:
qV =[2.32 0
](28)
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 27 / 28
Computing SVD
RecallM = UΣV T (29)
Hence,MTM = VΣUTUΣV T = VΣ2V T (30)
which implies:MTMV = VΣ2 (31)
Conclusion: V is the set of eigenvectors of MTM. By the sameargument, U is the set of eigenvectors of MMT .Question: What is Σ?
Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 28 / 28