PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform...
Transcript of PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform...
![Page 1: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/1.jpg)
Dimensionality Reduction
Lecturer: Javier Hernandez Rivera
30th September 2010
MAS 622J/1.126J: Pattern Recognition & Analysis
"A man's mind, stretched by new ideas, may never return to it's original
dimensions" Oliver Wendell Holmes Jr.
![Page 2: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/2.jpg)
Outline
2
Before
Linear Algebra
Probability
Likelihood Ratio
ROC
ML/MAP
Today
Accuracy, Dimensions & Overfitting (DHS 3.7)
Principal Component Analysis (DHS 3.8.1)
Fisher Linear Discriminant/LDA (DHS 3.8.2)
Other Component Analysis Algorithms
![Page 3: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/3.jpg)
Accuracy, Dimensions & Overfitting
3
Classification accuracy depends upon the dimensionality?
Training
Testing
O(c) < O(log n) < O(n) < O(n log n) < O(n^2) < O(n^k) < O(2^n) < O(n!) < O(n^n)
Complexity (computational/temporal)
Big O notation
if
![Page 4: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/4.jpg)
Overfitting
4
ClassificationRegression
Error
Complexity
Overfitting
![Page 5: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/5.jpg)
Dimensionality Reduction
5
Optimal data representation
2D: Evaluating students’ performance
# As
# study hours
# As
# study hours
Unnecessary complexity
![Page 6: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/6.jpg)
Optimal data representation
3D: Find the most informative point of view
Dimensionality Reduction
6
![Page 7: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/7.jpg)
Principal Component Analysis
7
Assumptions for new basis:
Large variance has important structure
Linear projection
Orthogonal basis
Y =WTX
d dim, n samples
dim i of sample j
X 2Rd;nxij
Y 2Rk;nWT 2Rk;d
projected data
k basis of d dimwTj wi = 1 i = j wTj wi = 0 i 6= j
![Page 8: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/8.jpg)
Optimal W?
1. Minimize projection cost (Pearson, 1901)
2. Maximize projected variance (Hotelling, 1933)
Principal Component Analysis
8
minkx¡ x̂k
![Page 9: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/9.jpg)
Principal Component Analysis
9
Covariance Algorithm
1. Zero mean:
2. Unit variance:
3. Covariance matrix:
4. Compute eigenvectors of ∑: WT
5. Project X onto the k principal components
xij = xij ¡¹i ¹i =1n
Pn
j=1 xij
xij =xij¾j
¾i =1
n¡1Pn
j=1(xij ¡ ¹i)2
§ =XXT
Ykxn
WT
kxd
Xdxn
= x
k < d
![Page 10: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/10.jpg)
Principal Component Analysis
10
How many PCs?
There are d eigenvectors
Choose first p eigenvectors based on their eigenvalues
Final data has only k dimensions
14% of Energy
Normalized eigenvalue of 3rd PC
Pk
i=1¸iP
n
i=1¸i
> Threshold (0.9 or 0.95)
Y =WXkxn kxd dxn
![Page 11: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/11.jpg)
PCA Applications: EigenFaces
11
Images from faces:
• Aligned
• Gray scale
• Same size
• Same lighting
Xdxn
d = # pixels/image
n = # images
Wdxd
eigenFaces
Sirovich and Kirby, 1987
Matthew Turk and Alex Pentland, 1991
eig vect (X XT)
![Page 12: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/12.jpg)
PCA Applications: EigenFaces
12
Faces EigenFaces
Eigenfaces are standardized face ingredients
A human face may be considered to be a combination of these standard faces
![Page 13: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/13.jpg)
PCA Applications: EigenFaces
13
x
xW
Dimensionality Reduction
ynewClassification
Outlier Detection
argminckynew ¡ yck < µ1
argmaxckycar ¡ yck > µ2
ycar
![Page 14: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/14.jpg)
PCA Applications: EigenFaces
14
Reconstructed images
(adding 8 PC at each step)
= y1 + y2 + yk+ …
Reconstructionx w1 w2 wk
Reconstructing with
partial information
![Page 15: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/15.jpg)
Principal Component Analysis
15
Any problems?
Let’s define a samples as an image of100x100 pixels
d = 10,000 X XT is 10,000 x 10,000
O(d2d3)
computing eigenvectors becomes infeasible
Possible solution (just if n<<d)Use WT = (XE)T
where E є Rn,k are k eigenvectors of XTX
100 px
100 p
x
![Page 16: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/16.jpg)
SVD Algorithm
1. Zero Mean:
2. Unit Variance:
3. Compute SVD of X:
4. Projection matrix:
5. Project data
Principal Component Analysis
16
xij = xij ¡¹i ¹i =1n
Pn
j=1 xij
xij =xij¾j
¾i =1
n¡1Pn
j=1(xij ¡ ¹i)2
X =U§VT
W=U
Why W = U?
Ykxn
Wkxd
Xdxn
= x
k <= n < d
![Page 17: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/17.jpg)
Principal Component Analysis
17
MATLAB: [U S V] = svd(A);
Data
Columns
are data
points
Right Singular
Vectors
Columns are
eigenvectors of
XXT
Left Singular
Vectors
Rows are eigenvectors
of XTX
Singular
Values
Diagonal matrix
of sorted values
Xdxn
Udxd
Sdxn
= x VT
nxn
x
![Page 18: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/18.jpg)
PCA Applications: Action Unit Recognition
18
T. Simon, J. Hernandez, 2009
![Page 19: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/19.jpg)
Applications
19
Visualize high dimensional data (e.g. AnalogySpace)
Find hidden relationships (e.g. topics in articles)
Compress information
Avoid redundancy
Reduce algorithm complexity
Outlier detection
Denoising
Demos:
http://www.cs.mcgill.ca/~sqrt/dimr/dimreduction.html
![Page 20: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/20.jpg)
PCA for pattern recognition
20
• higher variance
• bad for discriminability
• smaller variance
• good discriminability
Principal Component Analysis
Fisher Linear Discriminant
Linear Discriminant Analysis
![Page 21: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/21.jpg)
Linear Discriminant Analysis
21
Assumptions for new basis:
Maximize distance between projected class means
Minimize projected class variance
y = wTx
![Page 22: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/22.jpg)
Algorithm
1. Compute class means
2. Compute
3. Project data
Linear Discriminant Analysis
22
Objective
w = S¡1W (m2 ¡m1)
argmaxwJ(w) =wTSBwwTSWw
SW =P2
j
Px2Cj(x¡mj)(x¡mj)
T
SB = (m2¡m1)(m2¡m1)T
mi =1ni
Px2Ci x
y = wTx
![Page 23: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/23.jpg)
PCA vs LDA
23
PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional space as possible.
LDA: Perform dimensionality reduction while preserving as much of the class discriminatory information as possible.
![Page 24: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/24.jpg)
Other Component Analysis Algorithms
24
Independent Component Analysis (ICA)
PCA: uncorrelated PCsICA: independent PCs
Blind source separation
M. Poh, D. McDuff,
and R. Picard 2010
Recall:
independent uncorrelated
uncorrelated X independent
![Page 25: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/25.jpg)
Independent Component Analysis
25
Sound
Source 1
Sound
Source 2
Mixture 1
Mixture 2
Output 1
Output 2
I
C
A
Sound
Source 3
Mixture 3
Output 3
![Page 26: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/26.jpg)
Other Component Analysis Algorithms
26
Curvilinear Component Analysis (CCA)
PCA/LDA: linear projection
CCA: non-linear projection
while preserving proximity
between points
![Page 27: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/27.jpg)
What you should know?
27
Why dimensionality reduction?
PCA/LDA assumptions
PCA/LDA algorithms
PCA vs LDA
Applications
Possible extensions
![Page 28: PCA & Fisher Discriminant Analysisjavierhr/files/slidesPCA.pdf · PCA vs LDA 23 PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional](https://reader036.fdocuments.us/reader036/viewer/2022081600/60239d6920609f5e500972a1/html5/thumbnails/28.jpg)
Recommended Readings
28
Tutorials A tutorial on PCA. J. Shlens
A tutorial on PCA. L. Smith
Fisher Linear Discriminat Analysis. M. Welling
Eigenfaces (http://www.cs.princeton.edu/~cdecoro/eigenfaces/)
C. DeCoro
Publications Eigenfaces for Recognition. M. Turk and A. Pentland
Using Discriminant Eigenfeatures for Image Retrieval.
D. Swets, J. Weng
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. P. Belhumeur, J. Hespanha, and D. Kriegman
PCA versus LDA. A. Martinez, A. Kak