Dimensionality Reduction

Hung Le

University of Victoria

March 23, 2019

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 1 / 28

Motivation

Matrices are everywhere:

Graphs: Web or Social Network.

Pairwise interaction between two types of entities: movie rating,image-tags

Spatial representation: images.

In many applications, the raw matrix can be summarized by a “narrower”matrices.

“Narrower” means the output matrices have much less number ofcolumns/rows.

“Can be summarized ” means we can almost recover the originalmatrix from the summarized matrices.

Dimensionality reduction: find the “narrower” matrices of the originalmatrix.

Motivation

Eigenvectors and Eigenvalues of Symmetric Matricese is an eigenvector corresponding to an eigenvalue λ of a square matrix Miff:

Me = λe (1)

Fact 1 If e is an eigenvector of M, for any constant c 6= 0, ce is alsoan eigenvector of M (with the same eigenvalue) ⇒ We often requirethat ||e||2 = 1.

Fact 2 If M ∈ Rn×n is real and symmetric, then M has n realeigenvectors and eigenvalues ⇒ In this lecture, M is real, symmetric.

Fact 3 We can order

λ1 ≥ λ2 ≥ . . . ≥ λne1 e2 . . . en

We can make ej , ej orthogonal, i.e, eTi ej = 0 for all i 6= j .

λ1 and e1 called the principal eigenvalue the principal eigenvector,respectively

Me = λe (1)

Fact 3 We can order

λ1 ≥ λ2 ≥ . . . ≥ λne1 e2 . . . en

λ1 and e1 called the principal eigenvalue the principal eigenvector,respectively

Me = λe (1)

Fact 3 We can order

λ1 ≥ λ2 ≥ . . . ≥ λne1 e2 . . . en

λ1 and e1 called the principal eigenvalue the principal eigenvector,respectivelyHung Le (University of Victoria) Dimensionality Reduction March 23, 2019 3 / 28

Eigenvectors and Eigenvalues of Symmetric Matrices - AnExample

λ1 = 7 and x1 = [1√5,

λ2 = 2 and x2 = [2√5,− 1√

Finding Eigenvalues by Solving Equations

Eigenvalues are the roots of the following equation (with variable λ):

det(M − λI) = 0 (5)

where I is an identity matrix, and det(X ) is the determinant of an n × nmatrix X .

Fact 4 det(M − λI) is a degree-n polynomial with variable λ.

Fact 5 If M is real and symmetric, Equation 5 has n real roots.

Fact 6 Computing the determinant of a matrix takes O(n3).

det(M − λI) = 0 (5)

Finding Eigenvalues by Solving Equations - An example

We have:

det(M − λI ) = det(

[3− λ 2

2 6− λ

]) = λ2 − 9λ+ 14 (7)

Two roots are λ1 = 7 and λ2 = 2.

Finding Eigenvalues and Eigenvectors by Power Iteration

Finding principal eigenvector and value.

PowerIteration1(M)v0 ← a random vectorfor t ← 1 to k

vt = Mvt−1

||Mvt−1||return vk , v

Tk Mvk .

Running time O((m + n)k) where m is the number of non-zeros of M.

e1 ≈ vk and λ1 ≈ vTk Mvk .

Power Iteration - An example

and v0 = [1, 1]T . Then

Thus, v1 = Mv0||Mv0||2 = [0.530, 0.848]T . Repeat second time we get:

v2 = [0.471, 0.882]T (10)

The limiting vector is:vk = [0.447, 0.894]T (11)

with λ = vTk Mvk = 6.993

Finding Eigenvalues and Eigenvectors by Power Iteration

Finding the second largest eigenvalues and vectors

PowerIteration2(M)(e1, λ1)← PowerIteration1(M)v0 ← a random vectorM2 ← M − λ1e1eT1for t ← 1 to k

vt = M2vt−1

||Mvt−1||return vk , v

Tk M2vk .

Running time O((m + n)k) where m is the number of non-zeros of M.

e2 ≈ vk and λ2 ≈ vTk Mvk .

The Matrix of Eigenvectors

e1 e2 . . . en

Then:ETE = EET = I (13)

The Matrix of Eigenvectors - An Example

x1 = [1√5,

]T x2 = [2√5,− 1√

5]T (15)

[1√5

2√5− 1√

It is straightforward to verify that:

EET = ETE =

Principal Component Analysis- PCA

Given a set of points D = {x1, x2, . . . , xn} in Rd , find a direction where allthe points line up best.

A direction is a (unit) vector w.

All the points line up best along w when:

n∑i=1

(xTi w)2 (18)

is maximum.

Given a set of points D = {x1, x2, . . . , xn} in Rd , find a direction where allthe points line up best.

A direction is a (unit) vector w.

All the points line up best along w when:

n∑i=1

(xTi w)2 (18)

is maximum.

xT1xT2...xTn

n∑i=1

(xTi w)2 = ||Xw||22 = wTXTXw (20)

Thus,∑n

i=1(xTi w)2 is maximum when w is the principal eigenvector ofXTX .

Question: what is wTXTXw when w is the principal eigenvectorof XTX?Answer: λ1(XTX ).

xT1xT2...xTn

n∑i=1

(xTi w)2 = ||Xw||22 = wTXTXw (20)

Thus,∑n

i=1(xTi w)2 is maximum when w is the principal eigenvector ofXTX . Question: what is wTXTXw when w is the principal eigenvectorof XTX?

Answer: λ1(XTX ).

xT1xT2...xTn

n∑i=1

(xTi w)2 = ||Xw||22 = wTXTXw (20)

Thus,∑n

i=1(xTi w)2 is maximum when w is the principal eigenvector ofXTX . Question: what is wTXTXw when w is the principal eigenvectorof XTX?Answer: λ1(XTX ).

PCA - An Example

and XTX =

[30 28

has λ1(XTX ) = 58 with vector x1 = [ 1√2, 1√

PCA - More Components

Given a set of points D = {x1, x2, . . . , xn} in Rd , find the second bestdirection where all the points line up best.

A direction u is the second best if

uTw = 0, here w is the best.

All the points line up best along u among all directions orthogonal tow . That is

n∑i=1

(xTi u)2 (22)

is maximum among all directions orthogonal to w

Fact: u is the second eigenvector of XTX corresponding to the secondlargest eigenvalue λ2(XTX ).

Given a set of points D = {x1, x2, . . . , xn} in Rd , find the second bestdirection where all the points line up best.A direction u is the second best if

n∑i=1

(xTi u)2 (22)

Given a set of points D = {x1, x2, . . . , xn} in Rd , find the second bestdirection where all the points line up best.A direction u is the second best if

n∑i=1

(xTi u)2 (22)

Given a set of points D = {x1, x2, . . . , xn} in Rd , find the k-th bestdirection where all the points line up best.Fact: The k-the best direction is the eigenvector of XTX correspondingto the k − th largest eigenvalue λk(XTX ).

Using PCA for Dimensionality Reduction

Given k eigenvectors e1, e2, . . . , ek corresponding to k largest eigenvalues.We can then represent each new data point xi ∈ D as:

xTi e1

xTi e2

···

xTi ek

Singular Value DecompositionLet M be a m × n matrix. Let r be the rank of M. A Singular ValueDecomposition is a decomposition of M into three matrices U,Σ,Vwhere:

U is a column-orthogonal m × r matrix, i.e, UTU = ImΣ is a diagonal r × r matrix. Elements on the diagonal of Σ aresingular values and are decreasingly ordered.

V is an column-orthogonal n × r matrix, i.e, V TV = In

Understanding SVD

Think of U,Σ,V as a representation of concepts hidden in M.

Two concepts:

ScienceFiction = {TheMatrixAlien, StarWars}Romance = {Casablanca, Titanic}

Dimensionality Reduction by SVD

A rank-k SVD approximation of M is the matrix UkΣkVTk where:

Uk contains the first k columns of U.

Σk contains k largest elements of Σ.

Vk contains the first k columns of V .

Dimensionality Reduction by SVD - An Example

A rank-2 approximation of M ′:

Dimensionality Reduction by SVD - Why?

Theorem

Given M, a rank-k SVD approximation of M, denoted by Ak = UkΣkVTk

has:||M − Ak ||F (24)

minimum among all possible rank-k matrices.

||X ||F is the Frobenius norm of X :

||X ||F =

√√√√ n∑i=1

m∑j=1

X [i , j ]2 (25)

How to choose k

Choose k so that at least 90% of energy of Σ is preserved.

Energy(Σ) =r∑

Σ[i , i ]2 (26)

Querying Concept

Suppose that a new person P has seen only one movie the Matrix andrated it 4. Recall:

Querying Concept (Contt.)

Let q be the row representation of P, that is:

q =[4 0 0 0 0

We can determine the “concept space” of P by:

qV =[2.32 0

Computing SVD

RecallM = UΣV T (29)

Hence,MTM = VΣUTUΣV T = VΣ2V T (30)

which implies:MTMV = VΣ2 (31)

Conclusion: V is the set of eigenvectors of MTM. By the sameargument, U is the set of eigenvectors of MMT .Question: What is Σ?

Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the...

Documents

Transcript of Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the...

Dimensionality Reduction Compositional Data

17 Dimensionality Reduction

Machine Learning Dimensionality Reduction

Exploratory Analysis Dimensionality Reduction

Local Dimensionality Reduction...Local Dimensionality Reduction 635 2.1 FACTORANALYSIS(LWFA) Factor analysis (Everitt, 1984) is a technique of dimensionality reduction which is the

Dimensionality Reduction and Prior Knowledge in E …papers.nips.cc/paper/234-dimensionality-reduction-and...178 Lang and Hinton Dimensionality Reduction and Prior Knowledge in E-set

Dimensionality Reduction AShortTutorial

Dimensionality Reduction: SVD & CUR

Dimensionality Reduction II: ICA

Dimensionality Reduction - University of Pittsburghpeople.cs.pitt.edu/~iyad/DR.pdf · Dimensionality Reduction Problems of learning in high dimensional spaces: • Curse of dimensionality

Dimensionality reduction - Boston University

Dimensionality Reduction with PCA - Over ons · Dimensionality Reduction PCA - Principal Components Analysis PCA Experiment The Dataset Discussion Conclusion. Why dimensionality reduction?

07 dimensionality reduction

Dimensionality reduction (pca)

Preprocessing and Dimensionality Reduction

Nonlinear Dimensionality Reduction

Dimensionality reduction using an edge finite …+days/2016/pdf/CA.pdf1 Dimensionality reduction for periodic magnetostatic fields Dimensionality reduction using an edge finite

12.2 Dimensionality Reduction

Dimensionality reduction: Some Assumptions