Principal Components Analysis (PCA) 273A Intro Machine Learning
description
Transcript of Principal Components Analysis (PCA) 273A Intro Machine Learning
Principal Components Analysis(PCA)
273A Intro Machine Learning
Principal Components Analysis
• We search for those directions in space that have the highest variance.
• We then project the data onto the subspace of highest variance.
• This structure is encoded in the sample co-variance of the data:
• Note that PCA is a unsupervised learning method (why?)
1
1
1
1( )( )
N
ii
NT
i ii
xN
C x xN
PCA• We want to find the eigenvectors and eigenvalues of this covariance:
TC U U
12
d
0
0
1u
2u
du
eigenvalue = variancein direction eigenvector
( in matlab [U,L]=eig(C) )
1u
2u
Orthogonal, unit-length eigenvectors.
PCA properties
1
1 1
( )
( ) ( )
dT
i i ii
d dT T
j i i i j i i i j j ji i
C uu
Cu uu u u u u u
(U eigevectors)
T TU U UU I (u orthonormal U rotation)
1:T
i iky U x1u
2u
3u1: 1: 1:
Tk k kC U U
12
0
0 3
1:3U
1:3
(rank-k approximation)
(projection)
1: 1: 1: 1: 1: 1: 1:1 1
1 1N NT T T T T T
y i i i ik k k k k k ki i
C U x x U U x x U U U U UN N
PCA properties
1:kC is the optimal rank-k approximation of C in Frobenius norm. I.e. it minimizes the cost-function:
12 2
1 1 1
( )d d k
Tij il lj
i j l
C A A with A U
Note that there are infinite solutions that minimize this norm. If A is a solution, then is also a solution.
The solution provided by PCA is unique because U is orthogonal and orderedby largest eigenvalue.
Solution is also nested: if I solve for a rank-k+1 approximation, I will find that the first k eigenvectors are those found by an rank-k approximation (etc.)
TAR with RR I