Download - Principal Components Analysis (PCA) 273A Intro Machine Learning

Principal Components Analysis(PCA)

273A Intro Machine Learning

Principal Components Analysis

• We search for those directions in space that have the highest variance.

• We then project the data onto the subspace of highest variance.

• This structure is encoded in the sample co-variance of the data:

• Note that PCA is a unsupervised learning method (why?)

1

1

1

1( )( )

N

ii

NT

i ii

xN

C x xN

PCA• We want to find the eigenvectors and eigenvalues of this covariance:

TC U U

12

d

0

0

1u

2u

du

eigenvalue = variancein direction eigenvector

( in matlab [U,L]=eig(C) )

1u

2u

Orthogonal, unit-length eigenvectors.

PCA properties

1

1 1

( )

( ) ( )

dT

i i ii

d dT T

j i i i j i i i j j ji i

C uu

Cu uu u u u u u

(U eigevectors)

T TU U UU I (u orthonormal U rotation)

1:T

i iky U x1u

2u

3u1: 1: 1:

Tk k kC U U

12

0

0 3

1:3U

1:3

(rank-k approximation)

(projection)

1: 1: 1: 1: 1: 1: 1:1 1

1 1N NT T T T T T

y i i i ik k k k k k ki i

C U x x U U x x U U U U UN N

PCA properties

1:kC is the optimal rank-k approximation of C in Frobenius norm. I.e. it minimizes the cost-function:

12 2

1 1 1

( )d d k

Tij il lj

i j l

C A A with A U

Note that there are infinite solutions that minimize this norm. If A is a solution, then is also a solution.

The solution provided by PCA is unique because U is orthogonal and orderedby largest eigenvalue.

Solution is also nested: if I solve for a rank-k+1 approximation, I will find that the first k eigenvectors are those found by an rank-k approximation (etc.)

TAR with RR I