Interpreting Principal Components
description
Transcript of Interpreting Principal Components
![Page 1: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/1.jpg)
Interpreting Principal Components
Simon Mason
International Research Institute for Climate Prediction
The Earth Institute of Columbia University
L i n k i n g S c i e n c e t o S o c i e t yL i n k i n g S c i e n c e t o S o c i e t y
![Page 2: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/2.jpg)
Retaining Principal Components
Principal components analysis is specifically designed as a data reduction technique.
How many of the new variables should be retained to represent the total variability of the original variables adequately? A stopping rule is required to identify at which point additional principal components are no longer required.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
![Page 3: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/3.jpg)
Retaining Principal Components
There is a range of criteria that could be used to formulate a stopping rule:
Internal criteria
1. Total variance explained;
2. Marginal variance explained;
3. Comparison with other deleted/retained eigenvalues;
External criteria
4. Usefulness;
5. Physical interpretability.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
![Page 4: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/4.jpg)
Retaining Principal Components
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
Total variance explained
Ensures a minimum loss of information, but
No a priori criteria for defining the proportion of signal.
1i
i
c
![Page 5: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/5.jpg)
Retaining Principal Components
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
Marginal variance explained
Ensures that each component explains a substantial proportion of the total variance.
Choice of c?
i c
![Page 6: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/6.jpg)
Retaining Principal Components
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
Marginal variance explained
1. Original variables
For the correlation matrix, the Guttmann - Kaiser criterion sets c = 1.
For the covariance matrix, Kaiser’s rule sets c to the average of the original variables:
tri p
Λ
![Page 7: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/7.jpg)
Retaining Principal Components
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
Marginal variance explained
2. Significant
a. The “broken stick” rule
b. Rule N
Randomization procedures.
tr 1p
ij ip j
Λ
![Page 8: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/8.jpg)
Retaining Principal Components
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
Similar variance explained
Delete if components with similar variance are deleted.
1. χ2 approximations
2. Scree test
Delete eigenvalues below the elbow.
![Page 9: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/9.jpg)
Retaining Principal Components
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
Similar variance explained
3. Log-eigenvalue test
Scree test using logarithms of eigenvalues.
Based on the assumption that the eigenvalues should decline exponentially.
![Page 10: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/10.jpg)
Retaining Principal Components
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
Usefulness
If principal components are to be used in other applications, retain the number that gives the best results.
Use cross-validation.
Perhaps retain subsets that do not necessarily include the first few components.
Possibly subject to sampling errors, especially subset selection.
![Page 11: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/11.jpg)
Retaining Principal Components
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
Physical interpretability
1. Time scores
Do the time scores differ from white noise?
2. Spatial loadings
Loadings identify “modes” of variability.
![Page 12: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/12.jpg)
Interpreting the Principal Components
Principal components are notoriously difficult to interpret physically.
The weights are defined to maximize the variance, not maximize the interpretability!
With spatial data (including climate data) the interpretation becomes even more difficult because there are geometric controls on the correlations between the data points.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
![Page 13: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/13.jpg)
Buell patterns
Imagine a rectangular domain in which all the points are strongly correlated with their neighbours.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
![Page 14: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/14.jpg)
Buell patterns
The points in the middle of the domain will have the strongest average correlations with all other points, simply because their average distance to all other grids is a minimum.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
The strong correlations between neighbouring grids will be represented by PC 1, with the central grids dominating.
![Page 15: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/15.jpg)
Buell patterns
The points in the corners of the domain will have the weakest average correlations with all other points, simply because their average distance to all other grids is a maximum.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
The weak correlations between distant grids will be represented by PC 2. The direction of the dipole reflects the domain shape.
![Page 16: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/16.jpg)
Buell patterns?
Are these real, or are they a function of the domain shape?
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
![Page 17: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/17.jpg)
Buell patterns
Because of domain shape dependency:
1. the first PC frequently indicates positive loadings with strongest values in the centre of the domain;
2. the second PC frequently indicates negative loadings on one side and positive loadings on the other side in the direction of the longest dimension of the domain.
Similar kinds of problems arise when using:
1. gridded data with converging longitudes, or simply with longitude spacing different from latitude spacing;
2. station data.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
![Page 18: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/18.jpg)
Rotation
The principal component weights are defined to maximize the variance, not maximize the interpretability!
The weights could be redefined to meet alternative criteria. Rotation is sometimes performed to maximize the weights of as many metrics as possible, and to minimize the weights of the others.
An objective of rotation is to attain simple structure:
1. weights are either close to zero or close to one;
2. variables have high weights on only one component.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
![Page 19: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/19.jpg)
Rotation
The principal component weights are defined to maximize the variance, not maximize the interpretability!
The weights could be redefined to meet alternative criteria. Rotation is sometimes performed to maximize the weights of as many metrics as possible, and to minimize the weights of the others.
An objective of rotation is to attain simple structure:
1. weights are either close to zero or close to one;
2. variables have high weights on only one component.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
![Page 20: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/20.jpg)
Rotation
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !
Commonly used rotation procedures include:
• Varimax – maximises the variance of the squared loadings.
• Quartimin – oblique rotation
• Procrustes – maximises the similarity between one set of loadings and a target set. Can be orthogonal or oblique.
![Page 21: Interpreting Principal Components](https://reader036.fdocuments.us/reader036/viewer/2022062519/568151f1550346895dc029fd/html5/thumbnails/21.jpg)
Rotation
Rotation does NOT solve Buell pattern problems, nor station and uneven gridded data problems, it only reduces them. What if a mode does not have simple structure – for example, a general warming trend?
These problems are only of concern for interpretation.
Rotation may be redundant if the principal components are used as input into some other procedures.
L i n k i n g S c i e n c e t o S p o r t !L i n k i n g S c i e n c e t o S p o r t !