Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
-
Upload
arron-daniel -
Category
Documents
-
view
229 -
download
0
description
Transcript of Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
![Page 1: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/1.jpg)
Principal Component Analysis
Zelin JiaShengbin Lin
10/20/2015
![Page 2: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/2.jpg)
What is PCA?
• An orthogonal transformation
• Convert correlated variables to an artificial variable(Principle Component)
• The resulting vectors are an orthogonal basis set
• A tool in exploratory data analysis
https://en.wikipedia.org/wiki/Principal_component_analysis
![Page 3: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/3.jpg)
Why use PCA?
• Reduce the dimensionality of the data
• Compress the data
• Prepare the data for further analysis using other techniques
• Understand your data better by interpreting the loadings, and by graphing the derived variables
http://psych.colorado.edu/wiki/lib/exe/fetch.php?media=labs:learnr:emily_-_principal_components_analysis_in_r:pca_how_to.pdfDr. Peter Westfall
![Page 4: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/4.jpg)
How PCA works
1. PCA begin with covariance matrix: Cov(X)=XTX
2. For the covariance matrix, calculate its eigenvectors and eigenvalues.
3. Get sets of eigenvectors zi and eigenvaluesλi
(Constraint: ziT zi=1)
4. arrange the eigenvectors in decreasing order of the eigenvalues
5. Pick eigenvectors, multiple by original data matrix(X), we will get PC matrix.
https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca
![Page 5: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/5.jpg)
Example of how PCA works (by R)
•A financial sample data with 8 variables and 25obs
•Perform PCA on this data and reduce the number of variables from 8 to something more manageable
https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca
![Page 6: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/6.jpg)
Simulate PC on uncorrelated data and highly correlated data (by R)
PCA is better for more highly correlated data in that greater reduction is achievable.
Provided by Dr. Peter Westfall
![Page 7: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/7.jpg)
PCA standardization
Why:The variable with the smaller numbers – even though this may be the more important number – will be overwhelmed by the other larger numbers in what it contributes to the covariance
https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca
![Page 8: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/8.jpg)
properties of PC
• The number of principal components is less than or equal to the number of original variables.
• The first principal component has the largest possible variance.
• Each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.
https://en.wikipedia.org/wiki/Principal_component_analysis
![Page 9: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/9.jpg)
What is SVD?
Applied_Regression_Analysis_A_Research_Tool.pdf
![Page 10: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/10.jpg)
Relationship between SVD and PCA
From SVD we have X = UL1/2ZT
-> W = XZ = UL1/2
If X is an n × p matrix of observations on p variables, each column of W is a new variable defined as a linear transformation of the original variables.
Applied_Regression_Analysis_A_Research_Tool.pdf
![Page 11: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/11.jpg)
EFA vs PCA
• EFA: EFA provides a model to explain why the data looks like it does.
• PCA: PC is not a model that explains how the data looks. There is no model at all.
Provided by Dr. Peter Westfall
![Page 12: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/12.jpg)
EFA vs PCA
http://www.gac-usp.com.br/resources/use_of_exploratory_factor_analysis_park_dailey.pdf
![Page 13: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/13.jpg)
EFA vs PCA
EFA: in EFA one postulates that there is a smaller set of unobserved (latent) variables or constructs underlying the variables actually observed or measured (this is commonly done to assess validity)
PCA: in PCA one is simply trying to mathematically derive a relatively small number of variables to use to convey as much of the information in the observed/measured variables as possible
http://www.gac-usp.com.br/resources/use_of_exploratory_factor_analysis_park_dailey.pdf
![Page 14: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/14.jpg)
Application of PCA
• Data visualization
• Image compression
![Page 15: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/15.jpg)
Data visualization
If a multivariate dataset is visualized as a set of coordinates in a high-dimensional data space (1 axis per variable), PCA can supply the user with a lower-dimensional picture.
https://en.wikipedia.org/wiki/Principal_component_analysis
![Page 16: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/16.jpg)
PCA using on compressing image
The PCA formulation may be used as a digital image compression algorithm with a low level of loss.
http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1679-45082012000200004
![Page 17: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/17.jpg)
princomp vs prcomp
For prcomp:
The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix. This is generally the preferred method for numerical accuracy.
For princomp:
The calculation is done using eigen on the correlation or covariance matrix, as determined by cor. This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use svd on x, as is done in prcomp."
http://stats.stackexchange.com/questions/20101/what-is-the-difference-between-r-functions-prcomp-and-princomp
![Page 18: Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.](https://reader034.fdocuments.us/reader034/viewer/2022050900/5a4d1b2f7f8b9ab05999a56d/html5/thumbnails/18.jpg)
Thanks!