Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis...
Transcript of Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis...
![Page 1: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/1.jpg)
Dimensionality Reduction:Linear Discriminant Analysis and
Principal Component Analysis
CMSC 678UMBC
![Page 2: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/2.jpg)
Outline
Linear Algebra/Math Review
Two Methods of Dimensionality ReductionLinear Discriminant Analysis (LDA, LDiscA)Principal Component Analysis (PCA)
![Page 3: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/3.jpg)
Covariance
covariance: how (linearly) correlated are variables
Value of variable jin object k
Mean ofvariable j
Value of variable iin object k
Mean ofvariable i
covariance ofvariables i and j
𝜎"# =1
𝑁 − 1()*+
,
(𝑥)" − 𝜇")(𝑥)# − 𝜇#)
![Page 4: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/4.jpg)
Covariance
covariance: how (linearly) correlated are variables
Value of variable jin object k
Mean ofvariable j
Value of variable iin object k
Mean ofvariable i
covariance ofvariables i and j
𝜎"# =1
𝑁 − 1()*+
,
(𝑥)" − 𝜇")(𝑥)# − 𝜇#)
𝜎"# = 𝜎#" Σ =𝜎++ ⋯ 𝜎+3⋮ ⋱ ⋮𝜎3+ ⋯ 𝜎33
![Page 5: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/5.jpg)
Eigenvalues and Eigenvectors
𝐴𝑥 = 𝜆𝑥matrix
vector
scalar
for a given matrix operation (multiplication):
what non-zero vector(s) change linearly? (by a single multiplication)
![Page 6: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/6.jpg)
Eigenvalues and Eigenvectors
𝐴𝑥 = 𝜆𝑥matrix
vector
scalar
𝐴 = 1 50 1
![Page 7: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/7.jpg)
Eigenvalues and Eigenvectors
𝐴𝑥 = 𝜆𝑥matrix
vector
scalar
𝐴 = 1 50 1
1 50 1
𝑥𝑦 = 𝑥 + 5𝑦
𝑦
𝑥 + 5𝑦𝑦 = 𝜆
𝑥𝑦
![Page 8: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/8.jpg)
Eigenvalues and Eigenvectors
𝐴𝑥 = 𝜆𝑥matrix
vector
scalar
𝐴 = 1 50 1
only non-zero vector to scale
1 50 1
10 = 1 1
0
𝑥 + 5𝑦𝑦 = 𝜆
𝑥𝑦
![Page 9: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/9.jpg)
Outline
Linear Algebra/Math Review
Two Methods of Dimensionality ReductionLinear Discriminant Analysis (LDA, LDiscA)Principal Component Analysis (PCA)
![Page 10: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/10.jpg)
Dimensionality Reduction
Original (lightly preprocessed data)
Compressed representation
N instances
D input features
L reduced features
![Page 11: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/11.jpg)
Dimensionality Reduction
clarity of representation vs. ease of understanding
oversimplification: loss of important or relevant information
Courtesy Antano Žilinsko
![Page 12: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/12.jpg)
Why “maximize” the variance?
How can we efficiently summarize? We maximize the variance within our summarization
We don’t increase the variance in the dataset
How can we capture the most information with the fewest number of axes?
![Page 13: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/13.jpg)
Summarizing Redundant Information
(2,1)
(2,-1)(-2,-1)
(4,2)
![Page 14: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/14.jpg)
Summarizing Redundant Information
(2,-1)(-2,-1)
(4,2)
(2,1) = 2*(1,0) + 1*(0,1)
(2,1)
![Page 15: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/15.jpg)
Summarizing Redundant Information
(2,1)
(2,-1)(-2,-1)
(4,2)
(2,1)
(2,-1)(-2,-1)
(4,2)
u1
u2
2u1
-u1
(2,1) = 1*(2,1) + 0*(2,-1)(4,2) = 2*(2,1) + 0*(2,-1)
![Page 16: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/16.jpg)
Summarizing Redundant Information
(2,1)
(2,-1)(-2,-1)
(4,2)
(2,1)
(2,-1)(-2,-1)
(4,2)
u1
u2
2u1
-u1
(2,1) = 1*(2,1) + 0*(2,-1)(4,2) = 2*(2,1) + 0*(2,-1)
(Is it the most general? These vectors aren’t orthogonal)
![Page 17: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/17.jpg)
Outline
Linear Algebra/Math Review
Two Methods of Dimensionality ReductionLinear Discriminant Analysis (LDA, LDiscA)Principal Component Analysis (PCA)
![Page 18: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/18.jpg)
Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA)
Summarize D-dimensional input data by uncorrelated axes
Uncorrelated axes are also called principal components
Use the first L components to account for as much variance as possible
![Page 19: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/19.jpg)
Geometric Rationale of LDiscA & PCA
Objective: to rigidly rotate the axes of the D-dimensional space to new positions (principal axes):
ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance, .... , and axis D has the lowest variance
covariance among each pair of the principal axes is zero (the principal axes are uncorrelated)
Courtesy Antano Žilinsko
![Page 20: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/20.jpg)
Remember: MAP Classifiers are Optimal for Classification
min𝐰("
𝔼ABC[ℓF/+(𝑦, I𝑦")] → max
𝐰("
𝑝 O𝑦" = 𝑦" 𝑥"
𝑝 O𝑦" = 𝑦" 𝑥" ∝ 𝑝 𝑥" O𝑦" 𝑝(O𝑦")posterior class-conditional
likelihood class prior
𝑥" ∈ ℝS
![Page 21: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/21.jpg)
Linear Discriminant Analysis
MAP Classifier where:
1. class-conditional likelihoods are Gaussian
2. common covariance among class likelihoods
![Page 22: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/22.jpg)
LDiscA: (1) What if likelihoods are Gaussian
𝑝 O𝑦" = 𝑦" 𝑥" ∝ 𝑝 𝑥" O𝑦" 𝑝(O𝑦")
𝑝 𝑥" 𝑘 = 𝒩 𝜇), Σ)
=exp −12 𝑥" − 𝜇) YΣ)Z+ 𝑥" − 𝜇)
2𝜋 S/\ Σ) +/\
https://upload.wikimedia.org/wikipedia/commons/5/57/Multivariate_Gaussian.png
https://en.wikipedia.org/wiki/Normal_distribution
![Page 23: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/23.jpg)
LDiscA: (2) Shared Covariance
log𝑝 O𝑦" = 𝑘 𝑥"𝑝 O𝑦" = 𝑙 𝑥"
= log𝑝(𝑥"|𝑘)𝑝(𝑥"|𝑙)
+ log𝑝(𝑘)𝑝 𝑙
![Page 24: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/24.jpg)
LDiscA: (2) Shared Covariance
log𝑝 O𝑦" = 𝑘 𝑥"𝑝 O𝑦" = 𝑙 𝑥"
= log𝑝(𝑥"|𝑘)𝑝(𝑥"|𝑙)
+ log𝑝(𝑘)𝑝 𝑙
= log𝑝(𝑘)𝑝 𝑙 + log
exp −12 𝑥" − 𝜇) YΣ)Z+ 𝑥" − 𝜇)
2𝜋 S/\ Σ) +/\
exp −12 𝑥" − 𝜇b YΣbZ+ 𝑥" − 𝜇b
2𝜋 S/\ Σb +/\
![Page 25: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/25.jpg)
LDiscA: (2) Shared Covariance
log𝑝 O𝑦" = 𝑘 𝑥"𝑝 O𝑦" = 𝑙 𝑥"
= log𝑝(𝑥"|𝑘)𝑝(𝑥"|𝑙)
+ log𝑝(𝑘)𝑝 𝑙
= log𝑝(𝑘)𝑝 𝑙 + log
exp −12 𝑥" − 𝜇) YΣZ+ 𝑥" − 𝜇)
2𝜋 S/\ Σ) +/\
exp −12 𝑥" − 𝜇b YΣZ+ 𝑥" − 𝜇b
2𝜋 S/\ Σb +/\
Σb = Σ)
![Page 26: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/26.jpg)
LDiscA: (2) Shared Covariance
log𝑝 O𝑦" = 𝑘 𝑥"𝑝 O𝑦" = 𝑙 𝑥"
= log𝑝(𝑥"|𝑘)𝑝(𝑥"|𝑙)
+ log𝑝(𝑘)𝑝 𝑙
= log𝑝(𝑘)𝑝 𝑙 −
12 𝜇) − 𝜇b YΣZ+ 𝜇) − 𝜇b + 𝑥"YΣZ+(𝜇) − 𝜇b)
linear in xi(check for yourself: why did the
quadratic xi terms cancel?)
![Page 27: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/27.jpg)
LDiscA: (2) Shared Covariance
log𝑝 O𝑦" = 𝑘 𝑥"𝑝 O𝑦" = 𝑙 𝑥"
= log𝑝(𝑥"|𝑘)𝑝(𝑥"|𝑙)
+ log𝑝(𝑘)𝑝 𝑙
= log𝑝(𝑘)𝑝 𝑙 −
12 𝜇) − 𝜇b YΣZ+ 𝜇) − 𝜇b + 𝑥"YΣZ+(𝜇) − 𝜇b)
linear in xi(check for yourself: why did the
quadratic xi terms cancel?)
= 𝑥"YΣZ+𝜇) −12𝜇)
YΣZ+𝜇) + log 𝑝(𝑘)
+𝑥"YΣZ+𝜇b −12𝜇b
YΣZ+𝜇b + log 𝑝 𝑙
rewrite only in terms of xi(data) and single-class terms
![Page 28: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/28.jpg)
Classify via Linear Discriminant Functions
𝛿) 𝑥" = 𝑥"YΣZ+𝜇) −12𝜇)
YΣZ+𝜇) + log 𝑝(𝑘)
argmax𝑘 𝛿) 𝑥" MAP classifierequivalent
to
![Page 29: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/29.jpg)
LDiscA
Parameters to learn: 𝑝 𝑘 ), 𝜇) ), Σ
𝑝 𝑘 ∝ 𝑁)
number of items labeled with class k
![Page 30: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/30.jpg)
LDiscA
Parameters to learn: 𝑝 𝑘 ), 𝜇) ), Σ
𝑝 𝑘 ∝ 𝑁) 𝜇) =1𝑁)
(":BC*)
𝑥"
![Page 31: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/31.jpg)
LDiscA
Parameters to learn: 𝑝 𝑘 ), 𝜇) ), Σ
𝑝 𝑘 ∝ 𝑁) 𝜇) =1𝑁)
(":BC*)
𝑥"
Σ =1
𝑁 −𝐾()
scatter) =1
𝑁 −𝐾()
(":BC*)
𝑥" − 𝜇) 𝑥" − 𝜇) Y
within-class covarianceone option for 𝛴
![Page 32: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/32.jpg)
Computational Steps for Full-Dimensional LDiscA
1. Compute means, priors, and covariance
![Page 33: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/33.jpg)
Computational Steps for Full-Dimensional LDiscA
1. Compute means, priors, and covariance2. Diagonalize covariance
Σ = UDUm
diagonal matrix of eigenvalues
K x K orthonormal matrix (eigenvectors)
Eigen decomposition
![Page 34: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/34.jpg)
Computational Steps for Full-Dimensional LDiscA
1. Compute means, priors, and covariance2. Diagonalize covariance
3. Sphere the data
Σ = UDUm
X∗ = 𝐷Z+\ 𝑈Y𝑋
![Page 35: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/35.jpg)
Computational Steps for Full-Dimensional LDiscA
1. Compute means, priors, and covariance2. Diagonalize covariance
3. Sphere the data (get unit covariance)
4. Classify according to linear discriminant functions 𝛿)(𝑥"∗)
Σ = UDUm
X∗ = 𝐷Z+\ 𝑈Y𝑋
![Page 36: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/36.jpg)
Two Extensions to LDiscAQuadratic Discriminant Analysis (QDA)
Keep separate covariances per class
𝛿) 𝑥" =
−12 𝑥" − 𝜇) mΣsZ+(𝑥" − 𝜇))
+ log 𝑝 𝑘 −log |Σ)|2
![Page 37: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/37.jpg)
Two Extensions to LDiscAQuadratic Discriminant Analysis (QDA)
Keep separate covariances per class
𝛿) 𝑥" =
−12 𝑥" − 𝜇) mΣsZ+(𝑥" − 𝜇))
+ log 𝑝 𝑘 −log |Σ)|2
Regularized LDiscA
Interpolate between shared covariance estimate (LDiscA) and class-specific estimate (QDA)
Σ) 𝛼 = 𝛼Σ) + 1 − 𝛼 Σ
![Page 38: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/38.jpg)
Vowel Classification
LDiscA (left) vs. QDA (right)
ESL 4.3
![Page 39: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/39.jpg)
Vowel Classification
LDiscA (left) vs. QDA (right)
Regularized LDiscA
ESL 4.3
Σ) 𝛼 = 𝛼Σ) + 1 − 𝛼 Σ
![Page 40: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/40.jpg)
Supervised learning: learning with a teacherYou had training data which was (feature, label) pairs and the goal was to learn a mapping from features to labels
Supervised à Unsupervised
![Page 41: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/41.jpg)
Supervised learning: learning with a teacherYou had training data which was (feature, label) pairs and the goal was to learn a mapping from features to labels
Unsupervised learning: learning without a teacherOnly features and no labels
Why is unsupervised learning useful?Visualization — dimensionality reduction
lower dimensional features might help learning
Discover hidden structures in the data: clustering
Supervised à Unsupervised
![Page 42: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/42.jpg)
Outline
Linear Algebra/Math Review
Two Methods of Dimensionality ReductionLinear Discriminant Analysis (LDA, LDiscA)Principal Component Analysis (PCA)
![Page 43: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/43.jpg)
Geometric Rationale of LDiscA & PCA
Objective: to rigidly rotate the axes of the D-dimensional space to new positions (principal axes):
ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance, .... , and axis D has the lowest variance
covariance among each pair of the principal axes is zero (the principal axes are uncorrelated)
Adapted from Antano Žilinsko
LDiscA: but
now without
the labels!
![Page 44: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/44.jpg)
L-Dimensional PCA
1. Compute mean 𝜇, priors, and common covariance Σ
2. Sphere the data (zero-mean, unit covariance) 3. Compute the (top L) eigenvectors, from
sphere-d data, via V
4. Project the data
Σ =1𝑁("
𝑥" − 𝜇 𝑥" − 𝜇 Y
𝑋∗ = 𝑉𝐷v𝑉Y
𝜇 =1𝑁("
𝑥"
![Page 45: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/45.jpg)
0
2
4
6
8
10
12
14
0 2 4 6 8 10 12 14 16 18 20
Variable X1
Varia
ble
X2
+
2D Example of PCAvariables X1 and X2 have positive covariance & each has a similar variance
35.81 =X
91.42 =X
Courtesy Antano Žilinsko
![Page 46: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/46.jpg)
-6
-4
-2
0
2
4
6
8
-8 -6 -4 -2 0 2 4 6 8 10 12
Variable X1
Varia
ble
X2
Configuration is Centeredsubtract the component-wise mean
Courtesy Antano Žilinsko
![Page 47: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/47.jpg)
-6
-4
-2
0
2
4
6
-8 -6 -4 -2 0 2 4 6 8 10 12
PC 1
PC 2
Compute Principal ComponentsPC 1 has the highest possible variance (9.88)PC 2 has a variance of 3.03PC 1 and PC 2 have zero covariance.
Courtesy Antano Žilinsko
![Page 48: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/48.jpg)
Compute Principal ComponentsPC 1 has the highest possible variance (9.88)PC 2 has a variance of 3.03PC 1 and PC 2 have zero covariance.
-6
-4
-2
0
2
4
6
8
-8 -6 -4 -2 0 2 4 6 8 10 12
Variable X1
Varia
ble
X2
PC 1
PC 2
Courtesy Antano Žilinsko
![Page 49: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/49.jpg)
-6
-4
-2
0
2
4
6
8
-8 -6 -4 -2 0 2 4 6 8 10 12
Variable X1
Varia
ble
X2
PC 1
PC 2
PC axes are a rigid rotation of the original variablesPC 1 is simultaneously the direction of maximum variance and a least-squares “line of best fit” (squared distances of points away from PC 1 are minimized).
Courtesy Antano Žilinsko
![Page 50: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/50.jpg)
Generalization to p-dimensions
if we take the first k principal components, they define the k-dimensional “hyperplane of best fit” to the point cloud
of the total variance of all p variables:
PCs 1 to k represent the maximum possible proportion of that variance that can be displayed in k dimensions
Courtesy Antano Žilinsko
![Page 51: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/51.jpg)
How many axes are needed?
does the (k+1)th principal axis represent more variance than would be expected by chance?
a common “rule of thumb” when PCA is based on correlations is that axes with eigenvalues > 1
are worth interpreting
Courtesy Antano Žilinsko
![Page 52: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/52.jpg)
PCA as Reconstruction Error
minw
𝑋 − 𝑍𝑈Y \ =𝑍 = 𝑋𝑈
NxD DxL
![Page 53: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/53.jpg)
PCA as Reconstruction Error
minw
𝑋 − 𝑍𝑈Y \ =𝑍 = 𝑋𝑈
minw
𝑋 − 𝑋𝑈𝑈Y \ =NxD DxL
![Page 54: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/54.jpg)
PCA as Reconstruction Error
minw
𝑋 − 𝑍𝑈Y \ =𝑍 = 𝑋𝑈
minw
𝑋 − 𝑋𝑈𝑈Y \ =
minw2 𝑋 \ − 2𝑈Y𝑋Y𝑋𝑈 =
NxD DxL
![Page 55: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/55.jpg)
PCA as Reconstruction Error
minw
𝑋 − 𝑍𝑈Y \ =𝑍 = 𝑋𝑈
minw
𝑋 − 𝑋𝑈𝑈Y \ =
minw2 𝑋 \ − 2𝑈Y𝑋Y𝑋𝑈 =
minw𝐶 − 2 𝑋𝑈 \
maximizing variance ↔ minimizing reconstruction error
NxD DxL
![Page 56: Dimensionality Reduction: Linear Discriminant Analysis and ... · Linear Discriminant Analysis (LDA, LDiscA) and Principal Component Analysis (PCA) Summarize D-dimensional input data](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f678e0a15408e3c56255577/html5/thumbnails/56.jpg)
Slides Credit
https://www.mii.lt/zilinskas/uploads/visualization/lectures/lect4/lect4_pca/PCA1.ppt