Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis...
Transcript of Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis...
![Page 1: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/1.jpg)
Lecture 13: Principal Components AnalysisStatistical Learning (BST 263)
Jeffrey W. Miller
Department of BiostatisticsHarvard T.H. Chan School of Public Health
(Figures from An Introduction to Statistical Learning, James et al., 2013,
and The Elements of Statistical Learning, Hastie et al., 2008)
1 / 42
![Page 2: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/2.jpg)
Outline
Unsupervised Learning
Principal Components Analysis (PCA)
Covariance method of computing PCA
SVD method of computing PCA
Principal Components Regression (PCR)
High-dimensional issues
2 / 42
![Page 3: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/3.jpg)
Outline
Unsupervised Learning
Principal Components Analysis (PCA)
Covariance method of computing PCA
SVD method of computing PCA
Principal Components Regression (PCR)
High-dimensional issues
3 / 42
![Page 4: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/4.jpg)
Unsupervised Learning
So far, we have focused on supervised learning.
In supervised learning, we are given examples (xi, yi), and wetry to predict y for future x’s.
In unsupervised learning, we are given only xi’s, with nooutcome yi.
Unsupervised learning is less well defined, but basicallyconsists of finding some structure in the x’s.
Two most common types of unsupervised learning:
1. Finding lower-dimensional representations.2. Finding clusters / groups.
4 / 42
![Page 5: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/5.jpg)
Unsupervised Learning
Data: x1, . . . , xn.
Often, xi ∈ Rp, but the xi’s could be anything.I e.g., time-series data, documents, images, movies, mixed-type.
Unsupervised learning can be used in many different ways:I Visualization of high-dimensional dataI Exploratory data analysisI Feature construction for supervised learningI Discovery of hidden structureI Removal of unwanted variation (e.g., batch effects, technical
biases, population structure)I Matrix completion or De-noisingI Density estimationI Compression
5 / 42
![Page 6: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/6.jpg)
Unsupervised Learning
In supervised learning, we can use the outcome y to reliablyevaluate performance.
This enables us to:I choose model settings andI estimate test performance
via cross-validation or similar train/test split approaches.
However, we don’t have this luxury in unsupervised learning.
A challenge is that often there is no standard way to evaluatethe performance of an unsupervised method.
6 / 42
![Page 7: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/7.jpg)
Outline
Unsupervised Learning
Principal Components Analysis (PCA)
Covariance method of computing PCA
SVD method of computing PCA
Principal Components Regression (PCR)
High-dimensional issues
7 / 42
![Page 8: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/8.jpg)
Principal Components Analysis (PCA)
PCA is an unsupervised method for dimension reduction.
That is, finding a lower-dimensional representation.
PCA is the oldest and most commonly used method in thisclass.
I PCA goes back at least to Karl Pearson in 1901.
Basic idea: Find a low-dimensional representation thatapproximates the data as closely as possible in Euclideandistance.
8 / 42
![Page 9: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/9.jpg)
PCA example: Ad spending
9 / 42
![Page 10: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/10.jpg)
PCA example: Ad spending
10 / 42
![Page 11: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/11.jpg)
PCA example: Ad spending
PC1 and PC2 scores:
In this example, pop and ad are both highly correlated with PC1:
11 / 42
![Page 12: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/12.jpg)
PCA example: Ad spending
PC1 and PC2 scores:
Pop and ad versus PC2:
12 / 42
![Page 13: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/13.jpg)
PCA example: Half-sphere simulation
13 / 42
![Page 14: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/14.jpg)
PCA example: Half-sphere simulation
14 / 42
![Page 15: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/15.jpg)
PCA directions, scores, and scalesPC directions
The PC1 direction is the direction along which the data hasthe largest variance.
The PCm direction is the direction along which the data hasthe largest variance, among all directions orthogonal to thefirst m− 1 PC directions.
PC scores
The PCm score for point xi is the position of xi along themth PC direction.
Mathematically, the PCm score is the dot product of xi withthe PCm direction.
PC scales
The PCm scale is the standard deviation of the data alongthe PCm direction.
15 / 42
![Page 16: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/16.jpg)
Other interpretations of PCA
Best approximation interpretation:I PC1 score × PC1 direction = the best 1-dimensional
approximation to the data in terms of MSE.
I∑M
m=1 PCm score × PCm direction = the bestM -dimensional approximation to the data in terms of MSE.
Eigenvector interpretation:I The PCm direction is the mth eigenvector (normalized to unit
length) of the covariance matrix, sorting the eigenvectors bythe size of their eigenvalues.
16 / 42
![Page 17: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/17.jpg)
Outline
Unsupervised Learning
Principal Components Analysis (PCA)
Covariance method of computing PCA
SVD method of computing PCA
Principal Components Regression (PCR)
High-dimensional issues
17 / 42
![Page 18: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/18.jpg)
Computing the PCA directions and scores
Covariance methodI Simplest way of doing PCA.I Based on the eigenvector interpretation.I Slow when p is large.
Singular value decomposition (SVD) methodI Faster for large p.I Truncated SVD allows us to compute only the top PCs, which
is much faster than computing all PCs when p is large.I More numerically stable.
The SVD method is usually preferred.
18 / 42
![Page 19: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/19.jpg)
Covariance method of computing PCA
Data: x1, . . . , xn where xi ∈ Rp.
For simplicity, assume the data has been centered so that1n
∑ni=1 xij = 0 for each j.
Usually, it is a good idea to also scale the data to have unitvariance along each dimension j.
However, if the data are already in common units then it maybe better to not standardize the scales.
19 / 42
![Page 20: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/20.jpg)
Covariance method of computing PCA
Put the data into an n× p matrix: X =
xT1...xTn
.
Estimate the covariance matrix:
C = 1n−1X
TX =1
n− 1
n∑i=1
xixTi .
Compute the eigendecomposition: C = V ΛV T.I Λ = diag(λ1, . . . , λp) where λ1 ≥ · · · ≥ λp ≥ 0 are the
eigenvalues of C.I V is orthogonal and the mth column of V is the mth
eigenvector of C.
PCm direction is mth column of V .
PCm scale is√λm.
PC score vector (i.e., PC1-PCp scores) for xi is V Txi.I So XV gives us all the scores for all the xi’s.
20 / 42
![Page 21: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/21.jpg)
Notes on the covariance method
A matrix V ∈ Rp×p is orthogonal if V V T = V TV = I.Since V is an orthogonal matrix,
I the PC directions are orthogonal to one another,I the PC directions have unit length (Euclidean norm of 1), andI the PC scores XV are a rotated version of the data.
Most languages have tools for computing theeigendecomposition, e.g., the eigen function in R.
Notation: diag(λ1, . . . , λp) =
λ1 0 · · · 00 λ2 0...
. . ....
0 0 · · · λp
.
This follows from “block matrix multiplication”:
XTX =[x1 · · · xn
] xT1...xTn
=
n∑i=1
xixTi .
21 / 42
![Page 22: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/22.jpg)
PCA example: Hand-written digits
22 / 42
![Page 23: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/23.jpg)
PCA example: Hand-written digits
23 / 42
![Page 24: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/24.jpg)
PCA example: Hand-written digits
PCA can be used to make a low-dimensional approximation toeach image.
The PCA approximation is the sum of scores times directions,plus the sample mean since data is centered at 0:
PC approx = sample mean + score1 × dir1 + score2 × dir2
In this example, each PC direction is, itself, an image.
24 / 42
![Page 25: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/25.jpg)
PCA example: Gene expression
PCA scores for gene expression of leukemia samples:
(figure from Armstrong et al., 2002, Nature Genetics)
25 / 42
![Page 26: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/26.jpg)
Outline
Unsupervised Learning
Principal Components Analysis (PCA)
Covariance method of computing PCA
SVD method of computing PCA
Principal Components Regression (PCR)
High-dimensional issues
26 / 42
![Page 27: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/27.jpg)
SVD method of computing PCA
The covariance method is simple, but slow when p is large.
SVD = Singular Value Decomposition.
The SVD method is faster for large p.
Truncated SVD allows us to compute only the top PCs, whichis much faster than computing all PCs when p is large.
The SVD of X ∈ Rn×p is X = USV T whereI U ∈ Rn×n is orthogonal,I V ∈ Rp×p is orthogonal, andI S ∈ Rn×p is zero everywhere except s11 ≥ s22 ≥ · · · ≥ 0,
which are called the singular values.
27 / 42
![Page 28: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/28.jpg)
SVD method of computing PCA
Put the data into an n× p matrix: X =
xT1...xTn
.
Compute the SVD: X = USV T.
PCm direction is the mth column of V .
PCm scale is 1√n−1smm.
PC score vector (i.e., PC1-PCp scores) for xi is V Txi.
Connection to covariance method:
V ΛV T = C = 1n−1X
TX = 1n−1V S
TUTUSV T
= 1n−1V S
TSV T = V ( 1n−1S
TS)V T.
28 / 42
![Page 29: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/29.jpg)
SVD method of computing PCA
The truncated SVD allows us to compute only the top PCs.
Truncated SVD computes U [, 1:nu], S, and V [1 :nv, ] foruser-specified choices of nu and nv.
This is much faster than computing all PCs when p is large.
Usually, we only need the top few PCs anyway.
29 / 42
![Page 30: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/30.jpg)
Outline
Unsupervised Learning
Principal Components Analysis (PCA)
Covariance method of computing PCA
SVD method of computing PCA
Principal Components Regression (PCR)
High-dimensional issues
30 / 42
![Page 31: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/31.jpg)
Principal Components Regression (PCR)
We have seen methods of controlling variance by:I using a less flexible model, e.g., fewer parameters,I selecting a subset of predictors,I regularization / shrinkage.
Another approach: transform the predictors to be lowerdimensional.
I Various transformations: PCA, ICA, principal curves.
Combining PCA with linear regression leads to principalcomponents regression (PCR).
31 / 42
![Page 32: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/32.jpg)
Principal Components Regression (PCR)
PCR = PCA + linear regression:I Choose how many PCs to use, say, M .I Use PCA to define a feature vector ϕ(xi) containing the
PC1,. . . ,PCM scores for xi.I Use least-squares linear regression with this model:
Yi = ϕ(xi)Tβ + εi.
PCR works well when the directions in which the originalpredictors vary most are the directions that are predictive ofthe outcome.
PCR versus least-squares:I When M = p, PCR = least-squares.I PCR has higher bias but lower variance.I PCR can handle p > n.I PCR can handle some collinearity. (Why?)
32 / 42
![Page 33: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/33.jpg)
Example #1: PCR versus Ridge and Lasso
Simulation example with p = 45 and n = 50.True model is linear regression with all nonzero coefficients.
33 / 42
![Page 34: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/34.jpg)
Example #2: PCR versus Ridge and Lasso
Simulation example with p = 45 and n = 50.True model is linear regression with 2 nonzero coefficients.
34 / 42
![Page 35: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/35.jpg)
Example #3: PCR versus Ridge and LassoSimulation example with p = 45.True model is PCR with 5 nonzero PC coefficients.
35 / 42
![Page 36: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/36.jpg)
PCR versus Ridge and Lasso
PCR does not select a subset of predictors/features.
PCR is more closely related to Ridge than Lasso.
Ridge can be thought of as a continuous version of PCR.
(See ESL Section 3.5 for more info.)36 / 42
![Page 37: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/37.jpg)
Cross-validation
Can choose PCR dimensionality M using cross-validation.
37 / 42
![Page 38: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/38.jpg)
Outline
Unsupervised Learning
Principal Components Analysis (PCA)
Covariance method of computing PCA
SVD method of computing PCA
Principal Components Regression (PCR)
High-dimensional issues
38 / 42
![Page 39: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/39.jpg)
High-dimensional issues (p ≥ n)
Overfitting:
39 / 42
![Page 40: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/40.jpg)
High-dimensional issues (p ≥ n)
R2, RSS, AIC, BIC often don’t accurately assess fit (unless youreduce dimensionality first):
40 / 42
![Page 41: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/41.jpg)
High-dimensional issues (p ≥ n)
Regularization or variable selection is key.
Tuning the flexibility knob(s) is important.
Adding uninformative features hurts performance.
Assess performance using held-out test sets (e.g.,cross-validation).
41 / 42
![Page 42: Lecture 13: Principal Components Analysis - GitHub …...Lecture 13: Principal Components Analysis Statistical Learning (BST 263) Je rey W. Miller Department of Biostatistics Harvard](https://reader033.fdocuments.us/reader033/viewer/2022043016/5f38ce21b2fc2812236da63f/html5/thumbnails/42.jpg)
References
James G, Witten D, Hastie T, and Tibshirani R (2013). AnIntroduction to Statistical Learning, Springer.
Hastie T, Tibshirani R, and Friedman J (2008). The Elementsof Statistical Learning, Springer.
42 / 42