Factor Analysis and Principal Components

26
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables

description

Factor Analysis and Principal Components. Removing Redundancies and Finding Hidden Variables. Two Goals. Measurements are not independent of one another and we need a way to reduce the dimensionality and remove collinearity – Principal components - PowerPoint PPT Presentation

Transcript of Factor Analysis and Principal Components

Page 1: Factor Analysis and Principal Components

Factor Analysis and Principal ComponentsRemoving Redundancies and

Finding Hidden Variables

Page 2: Factor Analysis and Principal Components

Two Goals• Measurements are not

independent of one another and we need a way to reduce the dimensionality and remove collinearity – Principal components

• Measurements affected by unobserved, latent factors – we want to estimate those factors – Factor analysis

Page 3: Factor Analysis and Principal Components

Principal Components• Qualities we are interested in

studying can be measured indirectly

• Measurements have redundancy – e.g. multiple measurements reflect size

• Measurements reflect more than one property – e.g. size and shape

Page 4: Factor Analysis and Principal Components

Steps• Select variables – generally

interval or ratio scale variables – dichotomies can also be used

• Analysis usually begins with a covariance or correlation matrix of the variables

• Principal components are extracted that reflect correlations between variables

Page 5: Factor Analysis and Principal Components

Terminology• Eigenvalues – a measure of the

variance “explained” by a component

• Eigenvectors – dimensions that have been extracted from the correlation matrix – the principal components

• Communality – amount of variance for a variable “explained” by a subset of the components

Page 6: Factor Analysis and Principal Components

Issues• Need more cases than variables• Sum of the eigenvalues = number

of variables or number of cases – 1 whichever is smaller

• Principal components are often standardized to a variance of 1.

• Each component is independent

Page 7: Factor Analysis and Principal Components

Results• Eigenvalues for extracted

components and proportion of variance “explained”

• Loadings (correlations) between variables and components

• Scores for the components for each case

Page 8: Factor Analysis and Principal Components

Number of Components• Principal components can be used

simply to produce k independent components for k inter-related variables

• More commonly, the number of components extracted is limited to a smaller number, e.g. those with eigenvalues>1

Page 9: Factor Analysis and Principal Components

Example• Rcmdr Statistics | Dimensional

analysis | Principal-components • princomp() and prcomp() in R

compute principal components – prcomp() is more stable

• Packages psych and ade4 have principal component functions

Page 10: Factor Analysis and Principal Components

Handaxes• Collection of 600 handaxes from

Furze Platt, Maidenhead, England at the Royal Ontario Museum

• Seven dimensional measurements measure shape and size

Page 11: Factor Analysis and Principal Components

> .PC <- princomp(~L+L1+T+T1+W+W1+W2, cor=TRUE, data=HandAxes)

> unclass(loadings(.PC)) # component loadings Comp.1 Comp.2 Comp.3 Comp.4 Comp.5L -0.3920231 -0.32304228 0.3538015 -0.33843343 -0.4808967L1 -0.3315569 0.53860582 0.2426709 -0.47116870 -0.1314518T -0.3634691 -0.05646815 0.6714878 0.49319567 0.4072064T1 -0.3630703 0.28215177 -0.2868995 0.62010656 -0.5665372W -0.4413891 -0.23565830 -0.2611803 -0.15214093 0.1240457W1 -0.3839257 0.42527974 -0.3223177 -0.11082837 0.4786462W2 -0.3608806 -0.53511821 -0.3326024 -0.01609245 0.1420839 Comp.6 Comp.7L 0.50165401 0.139034239L1 -0.54544728 0.065516026T -0.06346368 -0.026815813T1 -0.02654645 0.007598814W -0.07495719 -0.798293730W1 0.51900435 0.238953875W2 -0.41365937 0.530309784

Page 12: Factor Analysis and Principal Components

> .PC$sd^2 # component variances Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 4.24372416 1.18476216 0.56766626 0.54851088 0.27589883 0.09909247 Comp.7 0.08034523

> summary(.PC) # proportions of varianceImportance of components: Comp.1 Comp.2 Comp.3 Comp.4Standard deviation 2.0600301 1.0884678 0.75343630 0.7406152Proportion of Variance 0.6062463 0.1692517 0.08109518 0.0783587Cumulative Proportion 0.6062463 0.7754980 0.85659323 0.9349519 Comp.5 Comp.6 Comp.7Standard deviation 0.52526073 0.31478956 0.28345235Proportion of Variance 0.03941412 0.01415607 0.01147789Cumulative Proportion 0.97436604 0.98852211 1.00000000

> biplot(.PC, cex=c(.5, 1))> scatterplotMatrix(~PC1+PC2+PC3+PC4, reg.line=FALSE, + smooth=FALSE, spread=FALSE, span=0.5, diagonal = 'density',+ data=HandAxes, pch=20)

Page 13: Factor Analysis and Principal Components
Page 14: Factor Analysis and Principal Components
Page 15: Factor Analysis and Principal Components

Factor Analysis• We are interested in studying

something that cannot be directly observed

• We can, however, observe variables which are affected by the unobserved factors

• Correlations between observed variables are assumed to reflect the unobserved factors

Page 16: Factor Analysis and Principal Components

Steps• Select variables as with principal

components• Analysis usually begins with a

correlation matrix of the variables• Communality estimates defined• Extract one or more factors• Rotate factors for interpretability

Page 17: Factor Analysis and Principal Components

Terminology• Eigenvalues, Eigenvectors, and

Communality• Communality relates to common

variance in the variable as opposed to the unique variance:

Page 18: Factor Analysis and Principal Components

Issues• Need more cases than variables• Sum of the eigenvalues = number

of variables or number of cases – 1 whichever is smaller

• Factors are often standardized to a variance of 1.

• Each factor is independent if no rotation or orthogonal rotation is used

Page 19: Factor Analysis and Principal Components

Results• Eigenvalues for extracted

components and proportion of variance “explained”

• Loadings (correlations) between variables and factors

• Factor rotation results• Factor scores for each case

Page 20: Factor Analysis and Principal Components

Number of Factors• Default choice is usually to select

the factors with eigenvalues > 1 – these factors explain the equivalent variance of at least one original variable

• Scree plots can be used to select more or fewer factors

Page 21: Factor Analysis and Principal Components

Rotation• Rotation is used to make the

factors more interpretable• Rotation tries to create variables

with very high or very low loadings• Orthogonal rotation preserves the

independence of the factors• Oblique rotation produces

correlated factors

Page 22: Factor Analysis and Principal Components

Interpretation• Interpretability is not a test that

the factors are “real”• Factors are interpreted using

information about the variables that load highly on them

• Interpretations should be evaluated against other information

Page 23: Factor Analysis and Principal Components

Example• In Rcmdr use Statistics |

Dimensional analysis | Factor Analysis

• factanal() or fa() in psych

Page 24: Factor Analysis and Principal Components

> .FA <- factanal(~L+L1+T+T1+W+W1+W2, factors=2, + rotation="varimax", scores="regression", data=HandAxes)

> .FA

Call:factanal(x = ~L + L1 + T + T1 + W + W1 + W2, factors = 2,data = HandAxes, scores = "regression", rotation = "varimax")

Uniquenesses: L L1 T T1 W W1 W2 0.369 0.191 0.594 0.515 0.081 0.240 0.064

Loadings: Factor1 Factor2L 0.707 0.362 L1 0.895 T 0.470 0.430 T1 0.374 0.587 W 0.850 0.444 W1 0.337 0.804 W2 0.965

Page 25: Factor Analysis and Principal Components

Factor1 Factor2SS loadings 2.637 2.309Proportion Var 0.377 0.330Cumulative Var 0.377 0.707

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 371.35 on 8 degrees of freedom.The p-value is 2.5e-75

> scatterplot(F2~F1, reg.line=FALSE, smooth=FALSE, spread=FALSE, boxplots=FALSE, span=0.5, data=HandAxes)

Page 26: Factor Analysis and Principal Components