Overview and Illustration of Bayesian Confirmatory Factor ...
Structural Equation Modeling and Confirmatory Factor Analysis
Transcript of Structural Equation Modeling and Confirmatory Factor Analysis
11/4/2014
1
Structural Equation Modeling and
Confirmatory Factor AnalysisAdvanced Statistics for Researchers Session 3
Dr. Chris Rakes Website: http://csrakes.yolasite.comEmail: [email protected]: @RakesChris
1
Types of Variables
Nominal: Names, Categories, ID numbers Ordinal: Ranks Interval: Dichotomous, Polytomous (No
Absolute Zero) Ratio: Measurements, Scalars (Absolute
Zero)
11/4/2014
2
Describing Data by the Center Example Data Set:
50, 10,1, 7, 1, 25, 20◦ Mean: Center Value
◦ Median: Center Term1, 1, 7, 10, 20, 25, 50
◦ Mode: Most Often Repeated Term(s)
3.167
114
7
50252010711
n
xx
Degrees of Freedom
Number of independent observations Consider a group of 4 observations. The mean is
20 (sum = 80). 20,so we estimate μ = 20 In the next sample, we’ve already estimated the
population mean to be 20, so 4 data points must sum to 80.
The first 3 observations are free to be anything, but the fourth must be fixed to make the sum 80.
____ +____ +____ +____ = _____ So, we always lose a degree of freedom when we
estimate a parameter.
80Free Free Free Fixed
11/4/2014
3
Variance and Standard Deviation Consider a sample data set
59637167647266
-7-351
-260
-7-3
-2
15
6
Is the “Mean Difference”XX
If the sum of the difference is 0, how can we compute a meaningful average?
P1
P2
a
bc
Notice that a + b ≠ c
a
c
b
11/4/2014
4
a
b
c
a2 + b2 = c2
So, square areas can be used to calculate distance, and they eliminate the 0 sum problem.
22 bac
Enter Pythagoras
Returning to Sample Data
Let’s look at a picture of these squares
59637167647266
-7-351
-260
49925
436
124
1
2XX
Distance (Distance)2
2XX
11/4/2014
5
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
Why 5 and not 6?
=5
Variance=
24.8
How can we find the side length of this square?
95.45.24
11/4/2014
6
What does that get us? The average square of mean distances is
referred to as “variance.”
This “noise” gives us a measure of how much of the data is not represented by the mean.
Standard Deviation: The side length of the variance square; the average distance from the mean
1
σ2
2
n
XXVariance
1
σσ2
2
n
XXDeviation Standard
Two Variables: Lines of Best Fit “Linear Regression” or “Least Squares Regression” How far do points “regress” from a line? Regress = “Deviate from.”
For two variables, we begin by considering the dependent variable, average distance of each Y from .
X Y11 96 15 28 39 54 29 412 6
YY 5‐3‐2‐11‐202
4 0=
11/4/2014
7
Converting to Squares
So the average sized square of variance is:Y
91235246
YY 2YY
?2
YSSYYN = 8df = 7
?
2
df
YY
Then the average distance from mean is:
?
2
df
YY
5‐3‐2‐11‐202
259411404
48
48
7= 6.9
= 2.62
Do the same for X
X1165894912
XX 2XX
0 XX ?
2
XSS
XXN = 8df = 7
?
2
df
XX
?
2
df
XX
56
56
7=8
?X 8
3‐2‐301‐414
9490116116
=2.83
11/4/2014
8
?XSS ?YSS ? YYXXSPXY
YX
XY
YX
XYXY
SSSS
SP
ss
sr
8490
84651
44
4856
44.
.XYr
X Y11 9 3 56 1 ‐2 ‐35 2 ‐3 ‐28 3 0 ‐19 5 1 14 2 ‐4 ‐29 4 1 012 6 4 2
XX YY YYXX
156601808
56
2XX 2YY 9490116116
259411404
4844
How strong is the relationship between X and Y?
1447
6.29
Estimate Line of Best Fit
ii XbbY 10 ˆ
786056
441 .
X
XY
SS
SPb
28820878600410 .... XbYb
ii XY 786.0288.2ˆ So,
11/4/2014
9
Use Linear Equation InformationX Y11 9 3 5 9 25 156 1 ‐2 ‐3 4 9 65 2 ‐3 ‐2 9 4 68 3 0 ‐1 0 1 09 5 1 1 1 1 14 2 ‐4 ‐2 16 4 89 4 1 0 1 0 012 6 4 2 16 4 8
48
XX YY YYXX 2XX 2YY Y YY ˆ 2YY
6.3582.4281.642
44.7860.8564.7867.144
2.642‐1.4280.358
‐10.2141.144‐0.786‐1.144
6.982.040.131.000.051.310.621.31
?ˆ 2 YY
13.43ii XY 786.0288.2ˆ
571.34429.1348 regressionSS
720000048
571342 ..
.
Y
regressionXY
SS
SSr
How much of an effect did the regression have?
11/4/2014
10
SEM *Causal* processes can be represented by structural
equations (regression equations –dependent variables being predicted by independent variables).
A model of these structural relations can be generated (and modeled pictorially)
SEM Variables Observed or manifest or measured variables: X’s or Y’s.
Latent variables (factors) – constructs that cannot be directly observed (or measured). Latent variables are estimated through hypothesized relationships with observed variables.◦ Exogenous latent variables – independent variables that “cause” changes in other
latent variables in the model. These are taken as “given” by the model under consideration, and any changes in exogenous variables are due to factors outside the model.
◦ Endogenous latent variables – dependent variables that are influenced by exogenous variables in the model. These are the outcomes your SEM model wishes to explain.
Observed(Y)
Latent
Latent
Latent
ExogenousEndogenous
Residual
ErrorFactor
LoadingsFactorLoading
Observed(X)
Error
11/4/2014
11
Factor Analysis Used to identify the factor structure or model for a set
of variables (Stevens, 2012)
Two types: Exploratory (EFA) and Confirmatory (CFA)
21
Exploratory Factor AnalysisSeveral Methods: ◦ Principal Components Analysis (PCA): Each successive
component accounts for the largest amount of unexplained variance
◦ Principal Axis Factoring: Identical to PCA, except that the factors are extracted from a correlation matrix with “communality estimates” on the main diagonal rather than 1’s, as in PCA.
◦ Unweighted Least Squares: Minimizes the sum of squared differences between the observed and model-implied off-diagonal correlation matrices.
◦ Generalized Least Squares: Correlations weighted by the inverse of their uniqueness, high uniqueness less weight.
◦ Alpha: Maximizes the Cronbach alpha of the factors (i.e., reliability)
◦ Image: Factors are defined by their linear regression on variables not associated with the hypothetical factors.
22
11/4/2014
12
Maximum Likelihood Estimation Attempts to find the population
parameter values from which the observed data are most likely to have arisen.
The likelihood function quantifies the discrepancy between the observed and model-implied parameters, assuming normal distribution.
Closed-form solutions for parameters usually do not exist, so iterative algorithms are used in practice for parameter estimation.
23
The Model Fitting Process Let S = the sample variance/covariance matrix
of observed scores from p variables. Let Σ = the variance/covariance matrix of the
population. Let θ represent the vector of model
parameters. Therefore, Σ(θ) represents the restricted
variance/covariance matrix implied by the model.
We are testing the hypothesis that the restricted matrix holds in the population:Null Hypothesis: Σ = Σ(θ).
SEM computes a minimum discrepancy function, Fmin. 24
11/4/2014
13
Understanding the Fmin Function
25
SpSTraceFMin loglog1
As Σ(θ) approaches S, this difference approaches 0
Trace: The sum of the diagonal
of a matrix
An inverse matrix times itself = the Identity Matrix (I), So, as Σ(θ)
approaches S, Σ(θ)-1S approaches I, as a result, the trace of the matrix will approach the
number of observed variables, p
So, as Σ(θ) approaches S, the difference of the trace
and p approaches 0.
Maximum Likelihood Estimation (Cont’d.)
The shape of the multivariate normal curve is defined by:
ℓ1
2 Σ
Substituting an individual’s vector of scores yields the likelihood of that set of scores given the population mean vector μ and covariance matrix Σ
26
11/4/2014
14
Maximum Likelihood Estimation (Cont’d.)
A model’s final parameter estimates are those that yield model-implied variances and covariances (and means) that maximize the combined likelihood of all n cases.
ℓ ℓ ℓ ℓ ⋯ℓ
ℓ1
2 Σ
27
Casewise Log Likelihoods
Likelihoods tend to be very small numbers, and hence their products become practically infinitesimal.
Taking the natural log of the likelihood makes things a bit more manageable.
ℓ ℓ ℓ ⋯ℓℓ ⋯ ℓ
⋯
22
12
Σ12
′Σ
28
11/4/2014
15
Casewise Log Likelihoods (Cont’d.)
With complete data, each case’s contribution to the overall log likelihood (LL) is:
22
12
Σ12
′Σ
In the missing data context, each case’s contribution to the log likelihood is:
22
12
Σ12
′Σ
Data and parameter arrays can vary for each ithcase.
The ith case’s contribution to the overall likelihood is based only on those variables for which that case has complete data.
29
Maximum Likelihood in SEM Model’s final parameter estimates are those
that yield model-implied variances and covariances (and means) that maximize the aggregated casewise log likelihoods:
12
212
Σ12
′Σ
In FIML, no data are ever imputed. Parameters and their SE are estimated directly
using all observed data. FIML is the default in many software (e.g.,
Mplus, Amos)
30
11/4/2014
16
Confirmatory Factor Analysis
Cannot be run easily in basic statistics packages such as SPSS—they do not offer the option to force variables to load on particular factors, only the number of factors.
SEM software easily accommodates CFA models, e.g., MPlus, AMOS, EQS, LISREL.
31
Psychological Distress CFA
32
First-Order CFA Second-Order CFA
11/4/2014
17
Psychological Distress CFA ResultsModel Model Description N AIC DF
Chi Square CFI RMSEA
RMSEA LO90
RMSEA HI90 SRMR ECVI
0a1CFA Caregiver Psychological Distress 227 8298.11 103 283.81 0.901 0.088 0.076 0.100 0.049 36.56
0a20a1 with Q030 and Q031 covaried 227 8236.27 102 219.97 0.935 0.071 0.058 0.084 0.044 36.28
0a3 2nd order CFA built on 0a2 227 8238.27 101 219.97 0.935 0.072 0.059 0.085 0.044 36.29
33
Variable Criterion
Minimum Fit χ2 Nested Model Comparison
CFI (Comparative Fit Index) > 0.95
AIC (Akaike Information Criterion)
Model Comparison Only (Does not have to be nested), Smaller Value = Better Fit
SRMR (Standardized Root Mean Square Residual)
< 0.10, Reasonable Fit< 0.08 Good Fit
RMSEA (Root Mean Square Error of Approximation)
< 0.05 = Good Fit0.05 – 0.08 = Reasonable0.08 – 0.10 = Mediocre> 0.10 = Poor Fit
ECVI As model is changed, smaller value indicates greater likelihood of being generalizable in the population
Reflection on CFA
What is your dissertation/thesis conceptual framework?
Are the constructs in your framework well-defined, and are the definitions well-established?
Could a CFA strengthen your study? Why or why not?
34
11/4/2014
18
Thank You!
All materials from this workshop series can be downloaded at http://csrakes.yolasite.com/resource.php
35