Factor analysis Caroline van Baal March 3 rd 2004, Boulder.
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Factor analysis Caroline van Baal March 3 rd 2004, Boulder.
Factor analysis
Caroline van Baal
March 3rd 2004, Boulder
Phenotypic Factor Analysis
• (Approximate) description of the relations between different variables– Compare to Cholesky decomposition
• Testing of hypotheses on relations between different variables by comparing different (nested) models– How many underlying factors?
Factor analysis and related methods
• Data reduction– Consider 6 variables:– Height, weight, arm length, leg length,
verbal IQ, performal IQ– You expect the first 4 to be correlated, and
the last 2 to be correlated, but do you expect high correlations between the first 4 and the last 2?
Data analysis in non-experimental designs using latent
constructs
• Principal Components Analysis
• Triangular Decomposition (Cholesky)
• Exploratory Factor Analysis
• Confirmatory Factor Analysis
• Structural Equation Models
Exploratory Factor Analysis
• Account for covariances among observed variables in terms of a smaller number of latent, common factors
• Includes error components for each variable• x = P * f + u• x = observed variables• f = latent factors• u = unique factors• P = matrix of factor loadings
SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA
Factor 1IQ, “g”
1
SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA
Factor 1verbal
Factor 2performal
1 1
EFA equations
• C = P * D * P’ + U * U’• C = observed covariance matrix
• Nvar by nvar, symmetric
• P = factor loadings• Nvar by nfac, full
• D = correlations between factors• Nfac by nfac, standardized
• U = specific influences, errors• Nvar by nvar, diagonal
Exploratory factor analysis
• No prior assumption on number of factors
• All variables load on all latent factors
• Factors are either all correlated or all uncorrelated
• Unique factors are uncorrelated
• Underidentification
SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA
Factor 1verbal
Factor 2performal
Fix to 0
1 1
Confirmatory factor analysis• An initial model is constructed, because:
– its elements are described by a theoretical process
– its elements have been obtained from a previous analysis in another sample
• The model has a specific number of factors• Variables do not have to load on all factors• Measurement errors may correlate• Some latent factors may be correlated,
while others are not
SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA
Factor 1verbal
Factor 2performal
1 1
SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA
Factor 1verbal
Factor 2performal
1 1
SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA
VC FD PO
SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA
VC FD PO
CFA equations
• x = P * f + u• x = observed variables, f = latent factors• u = unique factors, P = factor loadings• C = P * D * P’ + U * U’• C = observed covariance matrix• P = factor loadings• D = correlations between factors• U = diagonal matrix of errors
Structural equations models
• The factor model x = P * f + u is sometimes referred to as the measurement model
• The relations between latent factors can also be modeled
• This is done in the covariance structure model, or the structural equations model
• Higher order factor models
SIMINF VOC CODCOM ARI DIG BLC MAZ PIC PIA OBA
VC FD PO
2nd order Factor“g”
F3F2F1
• Second order factor model: C = P*(A*I*A’+B*B')*P' + U*U’
Five steps characterize structural equation models
• Model specification• Identification
– E.g., if a factor loads on 2 variables only, multiple solutions are possible, and the factor loadings have to be equated
• Estimation of parameters• Testing of goodness of fit• Respecification
• K.A. Bollen & J. Scott Long: Testing Structural Equation Models, 1993, Sage Publications
Practice!• IQ and brain volumes (MRI)
• 3 brain volumes– Total cerebellum, Grey matter, White matter
• 2 IQ subtests– Calculation, Letters / numbers
• Brain and IQ factors are correlated
• Datafile: mri-IQ-all-twinA-5.dat
Script: phenofact.mx
• BEGIN MATRICES ;• P FULL NVAR NFACT free ; ! factor loadings• D STAND NFACT NFACT !free ;! correlations between factors• U DIAG NVAR NVAR free ; ! subtest specific
influences• M Full 1 NVAR free ; ! means • END MATRICES ;
• BEGIN ALGEBRA;• C= P*D*P' +U*U' ; ! variance covariance matrix• END ALGEBRA;
• Means M /• Covariances C /
• in exploratory factor analysis, if nfact = 2, one of the factor loadings has to be fixed to 0 to make it an identified model
• fix P 1 2
• In confirmatory factor analysis, specify a brain and an IQ factor• SPECIFY P• 101 0• 102 0• 103 0• 0 204• 0 205• 0 206
• (if a factor loads on 2 variables only, it is not possible to estimate both factor loadings. Equate them, or fix one of them to 1)
Phenotypic Correlations: MRI-IQ, Dutch twins (A), n=111/296 pairs
brain
cereb
brain
grey
brain
white
IQ
calc
IQ
L/n
Cerebellum 1
Grey .63 1
White .61 .55 1
calculation .23 .25 .26 1
Letter/numb. .30 .19 .19 .46 1
• What is the fit of a 1 factor model?– C = P * P’ + U*U’, P = 5x1 full, U = 5x5 diagonal
• What is the fit of a 2 factor model?– Same, P = 5x2 full with 1 factor loading fixed to 0– (Reducion: fix first 3 factor loadings of factor 2 to 0)
• Data suggest 2 latent factors: a brain (first 3) and an IQ factor (last 2): what is the evidence for this model?– Same, P = 5x2 full with 5 factor loadings fixed to 0
• Can the 2 factor model be improved by allowing a correlation between these 2 factors?– C = P * D * P’ + U*U’, P = 5x2 full matrix (5 fixed),
D = stand 2x2 matrix, U = 5x5 diagonal matrix
Principal Components Analysis
• SPSS, SAS, Mx (functions \eval, \evec)
• Transformation of the data, not a model
• Is used to reduce a large set of correlated observed variables (xi) to (a smaller number of) uncorrelated (orthogonal) components (ci)
• xi is a linear function of ci
PCA path diagram
• D
• P
• S = observed covariances = P * D * P’
x1 x2 x3 x4 x5
c1 c2 c3 c4 c5
PCA equations
• Covariance matrix qSq = qPq * qDq * qPq’
• P = full q by q matrix of eigenvectors• D = diagonal matrix of eigenvalues• P is orthogonal: P * P’ = I (identity)
Criteria for number of factors• Kaiser criterion, scree plot, %var• Important: models not identified!
x1 x2 x3 x4 x5
c1 c2 c3 c4 c5
Correlations: satisfaction, n=100
Var 1
work
Var 2
work
Var 3
work
Var 4
home
Var 5
home
Var 6
home
Var 1 1
Var 2 .65 1
Var 3 .65 .73 1
Var 4 .14 .14 .16 1
Var 5 .15 .18 .24 .66 1
Var 6 .14 .24 .25 .59 .73 1
++++ ++
00
0
0
0
0++
++++
work home
Var 1 Var 2 Var 3 Var 4 Var 5 Var 6
PCA: Factor loadings(eigenvalues 2.89 & 1.79)
Factor 1 Factor 2
Var 1 (work) .65 .56
Var 2 (work) .72 .54
Var 3 (work) .74 .51
Var 4 (home) .63 -.56
Var 5 (home) .71 -.57
Var 6 (home) .71 -.53
Triangular decomposition (Cholesky)
x1 x2 x3 x4 x5
y1 y2 y3 y4 y5
1 operationalization of all PCA outcomes
Model is just identified! Model is saturated (df=0)
1 1 1 1 1
Triangular decomposition
• S = Q * Q’ ( = P# * P# ‘, where P# is P*D)•
5Q5 = f11 0 0 0 0f21 f22 0 0 0f31 f32 f33 0 0f41 f42 f43 f44 0f51 f52 f53 f54 f55
• Q is a lower matrix• This is not a model! This is a transformation of the
observed matrix S. Fully determinate!
Saturated model, # latent factorsscript: phenochol.mx
• BEGIN MATRICES ;• P LOWER NVAR NVAR free ; ! factor loadings• M FULL 1 NVAR free ; ! means • END MATRICES ;
• BEGIN ALGEBRA;• C= Q*Q' ; ! variance covariance matrix• K=\stnd(C) ; ! correlation matrix• X=\eval(K) ; ! eigen values (i.e., variance of latent factors)• Y=\evec(K) ; ! eigenvectors (i.e., regression coefficients)• END ALGEBRA;
• Means M /• Covariances C /