Examples of Dimensionality Reduction
Transcript of Examples of Dimensionality Reduction
Examples of Dimensionality Reduction
CIS 660 Data Mining
Sunnie Chung
Problem: Curse of Dimensionality
High Dimensionality is a Problem for Machine Learning Algorithms to Classify Data by the
Observed Class Variables (Labels)
How to Deal with High Dimensionality
• Dimensionality Reduction WITHOUT Loss of Information
• Feature Selection
• Feature Extraction
• Identify Highly Positively Correlated Features to Merge/Remove
• Same Information: Correlation of height and urefu (height in Swahili) ~= 1
• Highly Positively Correlated Features X1, X2, X3, X4, X5 to Temperature
• Apply Transformation Algorithms to Reduce/Extract True Dimensions
Only to Reduce Dimensions
• Apply Well Known Data Reduction Methods – PCA or SVD
PCA Procedure
1. Find the eigenvectors of the covariance matrix
2. The eigenvectors define the new space
Project two red points on blue e that preserves greatest variability (range of variance) instead of green e
on which distance of original two red points get the same.
See the rest of the procedure here.
http://eecs.csuohio.edu/~sschung/CIS660/MahalanobisDistance.pdf
https://plot.ly/ipython-notebooks/principal-component-analysis/
https://www.youtube.com/watch?v=IbE0tbjy6JQ&list=PLBv09BD7ez_5_yapAg86Od6JeeypkS4YM
When to Use PCA for Your Data Analytics