Introduction to Factor Analysis - Facultyfaculty.nps.edu/rdfricke/OA4109/Lecture 10-1 -- Factor...
Transcript of Introduction to Factor Analysis - Facultyfaculty.nps.edu/rdfricke/OA4109/Lecture 10-1 -- Factor...
Introduction to Factor Analysis!
Professor Ron Fricker!Naval Postgraduate School!
Monterey, California!
3/2/13 1
Reading Assignment:!Fricker, Kulzy & Appleget (2012)!
Goals for this Lecture!
• Learn about factor analysis as a tool for:!– Deriving unobserved latent variables from
observed survey question responses!– Data reduction!
• Understand the steps in conducting factor analysis and the R functions/syntax!
• Illustrate the application of factor analysis to survey data!
3/2/13 2
Why Factor Analysis?!
• Factor analysis is a method for identifying latent traits from question-level survey data!– Useful in survey analysis whenever the
phenomenon of interest is complex and not directly measurable via a single question!
• In such situations, must ask a series of questions about the phenomenon, then appropriately combine the resulting responses into a single measure or “factor”!– Such factors, then, become the observed
measures of the unobservable or latent phenomenon!
3/2/13 3
Goal of Factor Analysis!
• Factor analysis is a hybrid of social and statistical science!
• Dates to the early 1900s, where the goal was multivariate data reduction!
• Idea is to explain the correlation structure observed in p dimensions via a linear combination of r factors, where:!– the number of factors is smaller than the number
of observed variables (r < p), and!– the factors achieve both “statistical simplicity and
scientific meaningfulness” (Harman, 1976)!
3/2/13 4
Factor Analysis and Survey Data!
• Common use of exploratory factor analysis is to “determine what sets of items hang together in a questionnaire” (DeCoster, 1998)!– Particularly important for instruments with large
number of items (i.e., for data reduction)!– Also when need to summarize sets of items in
terms of their commonalities (i.e., express results in terms of latent variables)!
• Practically, can make interpreting and summarizing (complex) survey results easier / more meaningful / efficient!
3/2/13 5
Three Types of Factor Analysis!
• Principle components!– Empirical data reduction methodology, but not
focused on achieving “scientific meaningfulness”!• Exploratory factor analysis!
– Also empirical data reduction methodology that often does derive scientifically meaningful factors!
– Focus of this lecture!• Confirmatory factor analysis!
– Variety of methods focused on testing hypotheses about structure of factors!
– See Maj Steve Jones’ thesis (2012) for more info.!
3/2/13 6
A Bit About Principle Components!
• Standard statistical method for data reduction!• Seeks to explain as much variance as
possible in a small number of orthogonal linear combinations of the original data!
• Useful when the goal is to reduce the number of variables in a model/analysis while capturing much of the variability!
• However, as just stated, resulting components do not necessarily achieve “scientific meaningfulness”!
3/2/13 7
A Bit About Confirmatory Factor Analysis!
• Intended as a way to test theories/hypotheses about factor constructs!
• My preference: Whenever possible, test results via reproducibility (on separate data) vice confirmatory factor analysis (CFA)!– “Finally, the process of reproducing Factor Analysis on out-
of-sample data (the 2011 survey) proved much more useful than conducting CFA. Although CFA most undoubtedly has uses for some models and some data sets, it is neither powerful enough, nor informative enough, to justify its use compared to the reproduction of Factor Analysis” (Jones, 2012).!
ü Reproducibility is the appropriate scientific standard and important to do for any statistical analysis!
3/2/13 8
Exploratory Factor Analysis in a Picture!
3/2/13 9
• Example: Six questions that are functions of two underlying (unobserved) factors:!
Mathematically!
• The idea is to find a set of r common factors, F1,…,Fr, such that when used to estimate the data the correlation structure of the estimated data is close to the correlation structure of the actual data!
3/2/13 10 Common factors!
Loadings! Unique loading (and its factor)!
Steps in (Exploratory) Factor Analysis!
• Determine the number of factors!– Seems like a Catch-22 (“How can I know the
number of factors if they’re unobserved?”), but there is a way that works well!
• Fit the exploratory factor analysis model!• Rotate the model to achieve desired solution!
– Two main approaches: promax and varimax!– Decide whether to keep all variables in each factor
or use a cut-off for the loadings!• Interpret the resulting factors!
– Re-rotate as necessary!3/2/13 11
Determining the Number of Factors!
• Getting the number of factors right is critical!– Too few and factors load with irrelevant items!– Too many and items spread out over many factors!– Both make interpreting the resulting factors hard
and may obscure the real underlying factors!• Variety of methods proposed:!
– Kaiser rule, scree plot, etc.!• What works well is parallel analysis!
– Idea: Factors derived from real data should have larger eigenvalues than equivalent factors derived from equivalent simulated data!
3/2/13 12
Parallel Analysis with QOL Data!
• Consider question 7 from QOL survey!– 5-point Likert rating of 15 NPS services!
3/2/13 13
Removed – too much missing (950 out of 1,368)
Removed – too much missing (864 out of 1,368)
Removed – too much missing (818 out of 1,368)
Removed – too much missing (505 out of 1,368)
Data Preparation!
• Re-coded Likert scale: 1=Very Satisfied to 5=Very Unsatisfied!
• Deleted all records where respondents failed to answer one or more of the 11 parts (casewise deletion)!– Only did it here for convenience to illustrate factor analysis!– Would have used nearest neighbor hot deck imputation,
based on demographics, in a real analysis!
• Final result: 11 questions for 555 respondents!
3/2/13 14
Results for QOL Q7 Data!
3/2/13 15
2 4 6 8 10
01
23
Parallel Analysis Scree Plots
Factor Number
eige
nval
ues
of p
rinci
pal c
ompo
nent
s an
d fa
ctor
ana
lysi
s
PC Actual Data PC Simulated Data PC Resampled Data FA Actual Data FA Simulated Data FA Resampled Data
Indicates 6 factors appropriate for Q7!
Fitting the Model!
• Idea: Find factors and associated loadings so that covariance of their linear combination is “close to” covariance of the original data!– I.e., find the estimated data so that !
• Mathematics beyond the scope of what we’ll cover today!
• Because factors and their loadings are all unknown, there is no unique solution !– In fact, there are an infinite number of solutions!
3/2/13 16
X̂ = Λ̂F̂ + Ψ̂cor X( ) ≈ cor X̂( )
Fitting the Model in R!
• Given the desired number of factors, use the factanal() function in base R!
• Basic syntax is factanal(dataframe,nr_factors) – Here “dataframe” contains only those variables to
be used in the factor analysis!– And “nr_factors” is an integer!– Default rotation is varimax, but can also specify
promax!• Varimax results in orthogonal factors!• Promax allows for correlated factors !!
3/2/13 17
Varimax Rotation!
• Varimax finds the rotation that makes the high loadings as high as possible while also making the low loadings as low as possible!
• I.e., varimax finds an orthogonal transformation that for maximizes:!
!
3/2/13 18
Essentially, the variance of the jth factor’s (rescaled) loadings
over the p questions !
Sum of the “variances” over the r factors !
Example #1: QOL Results!
• In the end, I found the following 6 factors using a loadings cut-off of 0.4 (a subjective choice):!
3/2/13 19
Hea
thca
re
Serv
ice!
Exch
ange
an
d C
omm
.!
Auto
Se
rvic
es!
MW
R
Serv
ices!
NPS
Stu
dent
Se
rvic
es!
Fitn
ess
Serv
ices!
Compare to Principle Components!
3/2/13 20
Example #1: Discussion!
• This is only an illustration!– Use of casewise deletion was extreme!
• Better to use demographics and nearest neighbor hot deck imputation!
– Also, only running factor analysis on a small subset of the survey questions was extreme!
• Better to run factor analysis on all the questions!– How might the additional information affected the
factor formulation? What else might have entered into the factors?!
• Compared to principle components, resulting factors more intuitively interpretable!
3/2/13 21
Example #2: Three National Surveys!
• 140 questions common across four countries!• Fielded in 2010 to: !
– 3,770 respondents in Country “A” !– 1,661 respondents in Country “B” !– 1,874 respondents in Country “C”!– 1,481 respondents in Country “D”!
• Survey asked about !– quality of life!– governance, politics, and international relations!– security, social tolerance!
3/2/13 22
Example #2 (continued)!
• Figure shows the results from fa.parallel for Country A, which resulted in setting r = 27 – Sensitivity analysis using other values of r
confirmed that r = 27 was appropriate !
• Country B: r = 28; for Country C: r = 25; etc.!3/2/13 23
“Government Trust” Factors!
3/2/13 24
Country “A” Country “B” Country “C” Country “D”
“Trustor Propensity” Factors!
3/2/13 25
Country “A” Country “B” Country “C” Country “D”
“Ability” Factors!
3/2/13 26
Ability Factors & Loadings Country “A” Country “B” Country “C” Country “D”
“Benevolence/Integrity” Factors!
3/2/13 27
Country “A” Country “B” Country “C” Country “D”
What We Have Just Learned!
• Learned about factor analysis as a tool for:!– Deriving unobserved latent variables from
observed survey question responses!– Data reduction!
• Discussed the steps in conducting factor analysis and the R functions/syntax!
• Illustrated the application of factor analysis to survey data!
3/2/13 28