Improving Model Selection in Structural Equation Models...
Transcript of Improving Model Selection in Structural Equation Models...
Improving Model Selection in Structural EquationModels with New Bayes Factor Approximations
Kenneth A. Bollen∗ Surajit Ray† Jane Zavisca‡ Jeffrey J. Harden§
April 11, 2011
Abstract
Often researchers must choose among two or more structural equation models for a given setof data. Typically, selection is based on having the highest chi-square p-value or the highest fitindex such as the CFI or RMSEA. Though a common situation, there is little evidence on theperformance of these fit indices in choosing between models. In other statistical applications,Bayes Factor approximations such as the BIC are commonly used to select between models,but these are rarely used in SEMs. This paper examines several new and old Bayes Factorapproximations along with some commonly used fit indices to assess their accuracy in choos-ing the true model among a broad set of false models. The results show that the Bayes FactorApproximations outperform the other fit indices. Among these approximations one of the newones, SPBIC, is particularly promising. The commonly used chi-square p-value and the CFI,IFI, and RMSEA do much worse.
Keywords: Bayes Factor · Structural Equation Models · Empirical Bayes · Informa-tion Criterion ·Model Selection
∗Department of Sociology, University of North Carolina, Chapel Hill, NC 27599, [email protected]†Department of Mathematics and Statistics, Boston University, Boston, MA 02446, [email protected]‡Department of Sociology, University of Arizona, Tucson, AZ 85721, [email protected]§Department of Political Science, University of North Carolina, Chapel Hill, NC 27599, [email protected]
1 Introduction
Structural equation models (SEMs) refer to a general statistical model that incorporates
ANOVA, multiple regression, simultaneous equations, path analysis, factor analysis, growth curve
analysis, and many other common statistical models as special cases. SEMs are well-established
in sociology and the social and behavioral sciences and are beginning to spread to the health and
natural sciences (Bentler and Stein 1992; Shipley 2000; Schnoll, Fang, and Manne 2004; Beran
and Violato 2010, e.g.,[). A key issue in the use of SEMs is determining the fit of the model to
the data and comparing the relative fit of two or more models to the same data. An asymptotic
chi-square distributed test statistic that tests whether a SEM exactly reproduces the covariance
matrix of observed variables was the first test and index of model fit (Joreskog 1969, 1973). How-
ever, given the approximate nature of all models and the enhanced statistical power that generally
accompanies a large sample size, the chi-square test statistic routinely rejects models in large sam-
ples regardless of their merit as approximations to the underlying process. Starting with Tucker
and Lewis (1973), Bentler and Bonett (1980), and Joreskog and Sorbom (1979), a variety of fit
indices have been developed to supplement the chi-square test statistic (e.g., see Bollen and Long
1993). Controversy surrounds these alternative fit indices in that often the sampling distribution of
the fit index is not known, sometimes the fit index tends to increase as sample size increases, and
there is disagreement about the optimal cutoff values for a “good fitting” model.
Bayes Factors play a role in comparing the fit of models in multiple regression, generalized
linear models, and several other areas of statistical modeling. Yet despite this common use, Bayes
Factors have received little attention in SEM literature. Cudeck and Browne (1983) and Bollen
(1989) give only brief mention to Schwarz’s (1978) Bayesian Information Criterion (BIC) as an
approximation to the Bayes Factor. Raftery (1993, 1995) provides more discussion of the BIC
in SEMs, as do Haughton, Oud, and Jansen (1997). However, it is rare to see the BIC or Bayes
Factors in applications of SEMs, despite their potential value in comparing the fit of competing
models.
1
The primary purpose of this paper is to compare several widely used measures of SEM
model fit mentioned above with several different approximations to Bayes Factors, including two
new approximations: the Information Matrix-Based Information Criterion (IBIC) and the Scaled
Unit Information Prior Bayesian Information Criterion (SPBIC). In the course of presenting the
IBIC and SPBIC, we explain the potential use of Bayes Factors in model selection of SEMs. As
evidence of this usefulness, we demonstrate in a simulation study that Bayes Factor approximations
perform much better in SEM model selection than more commonly-used methods, and that IBIC
and SPBIC are among the best-performing fit statistics under several conditions.
The next section introduces the SEM equations and assumptions and reviews the current
methods of assessing SEM model fit. Following this is a section in which we present the Bayes
Factor, the BIC, and the IBIC and SPBIC. We then describe our Monte Carlo simulation study
that examines the behavior of various model selection criteria for a series of models that are either
correct, underspecified, or overspecified. Finally, we conclude with a discussion of the implications
of the results and recommendations for researchers.
2 Structural Equation Models and Fit Indices
SEMs have two primary components: a latent variable model and a measurement model.
We write the latent variable model as:
η = αη + Bη + Γξ + ζ (1)
The η vector is m × 1 and contains the m latent endogenous variables. The intercept terms are in
the m × 1 vector of αη. The m ×m coefficient matrix B gives the effect of the ηs on each other.
The N latent exogenous variables are in the N × 1 vector ξ. The m × N coefficient matrix Γ
contains the coefficients for ξ’s impact on the ηs. An m× 1 vector ζ contains the disturbances for
each latent endogenous variable. We assume that E(ζ) = 0, COV (ζ ′, ξ) = 0, and we assume that
the disturbance for each equation is homoscedastic and nonautocorrelated across cases, although
the variances of ζs from different equations can differ and these ζs can correlate across equations.
2
The m × m covariance matrix Σζζ has the variances of the ζs down the main diagonal and the
across equation covariances of the ζs on the off-diagonal. The N × N covariance matrix of ξ is
Σξξ.
The measurement model is:
y = αy + Λyη + ε (2)
x = αx + Λxξ + δ (3)
The p× 1 vector y contains the indicators of the ηs. The p×m coefficient matrix Λy (the “factor
loadings”) give the impact of the ηs on the ys. The unique factors or errors are in the p× 1 vector
ε. We assume that E(ε) = 0 and COV (η′, ε) = 0. The covariance matrix for the ε is Σεε. There
are analogous definitions and assumptions for measurement Equation 3. for the q×1 vector x. We
assume that ζ , ε, and δ are uncorrelated with ξ and in most models these disturbances and errors
are assumed to be uncorrelated among themselves, though the latter assumption is not essential.1
We also assume that the errors are homoscedastic and nonautocorrelated across cases.
Define z′= [y′ x′], then the covariance matrix and mean of z′ as a function of the model
parameters are
Σ(θ) = E[(z− µz)(z− µz)′] (4)
=
Λy(I−B)−1(ΓΣξξΓ′ + Σζζ)(I−B)−1′Λ′y + Σεε Λy(I−B)−1ΓΣξξΛ
′x
ΛxΣξξΓ′(I−B)−1′Λ′y ΛxΣξξΛ
′x + Σδδ
µ(θ) = E[z] (5)
=
αy + Λy(I−B)−1(αη + Γκ)
αx + Λxκ
1We can modify the model to permit such correlations among the errors, but do not do so here.
3
where θ is a vector that contains all of the parameters (i.e., coefficients, variances, and covariances)
that are estimated in a model and κ is the vector of means of ξ. Σ(θ) and µ(θ) are the implied
covariance matrix and the implied mean vector of the observed variables, z. These implied matrices
are written as functions of the parameters of the SEM. For a valid model, they imply that the
covariance matrix (Σ) and the mean vector (µ) of the observed variables, z, are exactly reproduced
by the corresponding implied matrices so that
Σ = Σ(θ) (6)
and
µ = µ(θ). (7)
The log likelihood of the maximum likelihood estimator, derived under the assumption that z
comes from a multinormal distribution, is
lnL(θ) = K − 1
2ln |Σ(θ)| − 1
2[z− µ(θ)]′Σ−1(θ)[z− µ(θ)], (8)
where K is a constant that has no effect on the values of θ that maximize the log likelihood.
The chi-square test statistic derives from a likelihood ratio test of the hypothesized model
to a saturated model where the covariance matrix and mean vector are exactly reproduced. The
null hypothesis of the chi-square test is that Equations 6 and 7 hold exactly. In large samples,
the likelihood ratio (LR) test statistic follows a chi-square distribution when the null hypothesis is
true. Excess kurtosis can affect the test statistic, though there are various corrections for excess
kurtosis (Satorra and Bentler 1988; Bollen and Stine 1992). In practice, the typical outcome of
this test in large samples is rejection of the null hypothesis that represents the model. This often
occurs even when corrections for excess kurtosis are made. At least part of the explanation is that
the null hypothesis assumes exact fit whereas in practice we anticipate at least some inaccuracies
in our modeling. The significant LR test simply reflects what we know: our model is at best an
approximation.
4
In response to the common rejection of the null hypothesis of perfect fit, researchers have
proposed and used a variety of fit indices. The Incremental Fit Index (IFI, Bollen 1989), Compara-
tive Fit Index (CFI), Tucker-Lewis Index (TLI, 1973), Relative Noncentrality Index (RNI, Bentler
and Bonett 1980), and Root Mean Squared Error of Approximation (RMSEA, Steiger and Lind
1980) are just a few of the many indices of fit that have been proposed. Though the popularity
of each has waxed and waned, there are problems characterizing all or most of these. One is that
there is ambiguity as to the proper cutoff value to signify that a model has an acceptable fit. Dif-
ferent authors recommend different standards of good fit and in nearly all cases the justification
for cutoff values are not well founded. Second, the distributions of most of these fit indices are
unknown. Thus, with the exception of the RMSEA, confidence intervals for the fit indices are not
provided. Third, the fit indices that are based on comparisons to a baseline model (e.g., TLI, IFI,
RNI) can behave somewhat differently depending on the fitting function used to estimate a model.
This suggests that standards of fit should be adjusted depending on the fitting function employed.
Another issue is that the means of the sampling distributions of some measures tend to be larger
in bigger samples than in smaller ones even if the identical model is fitted. Thus, cutoff values for
such indices might need to differ depending on the size of the sample.
Another set of issues arise when comparing competing models for the same data. If the
competing models are nested, then a LR test that is asymptotically chi-square distributed is avail-
able. However, in such a case, the issue of statistical power in small samples remains. Furthermore,
the LR test is not available for nonnested models. Finally, other factors such as excess kurtosis can
impact the LR test. The situation is not made better by employing other fit indices. Though we
can take the difference in the fit indices across two or more models, there is little guidance on what
to consider as a sufficiently large difference to conclude that one model’s fit is better than another.
In addition, the standards might need to vary depending on the estimator employed and the size
of the sample for some of the fit indices. Nonnested models further complicate these comparisons
since the rationale for direct comparisons of nonnested models for most of these fit indices is not
well-developed.
5
This brief overview of the LR test and the other fit indices in use in SEMs reveals that there
are drawbacks to the current methods of assessing the fit of models. It would be desirable to have
a method to assess fit that worked for nested and nonnested models and that had firm rationale in
statistical theory. From a practical point of view, it is desirable to have a measure that is calculable
using standard SEM software. In the next section, we examine the Bayes Factor and methods to
approximate it that address these criteria.
3 Approximating Bayes Factors in SEMs
The Bayes Factor expresses the odds of observing a given set of data under one model
versus an alternative model. For any two models 1 and 2, the Bayes Factor, B12, is the ratio of
P (Y|Mk) for two different models where P (Y|Mk) is the probability of the data Y given the
model, Mk. More explicitly, the Bayes Factor, B12, is
B12 =P (Y|M1)
P (Y|M2)(9)
Conceptually, the Bayes Factor is a relatively straightforward way to compare models, but
it can be complex to compute. The marginal likelihoods must be calculated for the competing
models. Following the Bayes Theorem, the marginal likelihood (also known as the integrated
likelihood) can be expressed as a weighted average of the likelihood of all possible parameter
values under a given model.
P (Y|Mk) =
∫P (Y | θk,Mk)P (θk |Mk) d θk, (10)
where θk is vector of parameters of Mk, P (Y | θk,Mk) is the likelihood under Mk, and P (θk |Mk)
is the prior distribution of θk.
Closed-form analytic expressions of the marginal likelihoods are difficult to obtain, and
numeric integration is computationally intensive with high dimensional models. Approximation
of this integral via the Laplace method is a common approach to solving this problem. Technical
6
details of the Laplace method can be found in many sources, including Kass and Raftery (1995).
In brief, a second order Taylor expansion of the logP (Y |Mk) around the Maximum Likelihood
(ML) estimator θk leads to the following expression, where l(θk) is the log likelihood function
(i.e. logP (Y |θk,Mk)), IE(θk) is the expected information matrix, dk is the dimension (number of
parameters) of the model, and O(n−1/2) is the approximation error, and N is the sample size.
logP (Y |Mk) = l(θk) + logP (θk|Mk)
+dk2
log(2π)− dk2
log(n)− 1
2log∣∣∣IE(θk)
∣∣∣+O(n−1/2). (11)
This equation forms the basis for several approximations to the Bayes Factor. Here we
briefly present the equations for several approximations, including Schwarz’s BIC, Haughton’s
BIC, and our two new approximations, the IBIC and the SPBIC. Derivations and justifications for
these equations can be found in a companion technical paper (reproduced in the appendix). Here
our goal is simply to present the final equations, show their relationships to each other, and explain
how they can be calculated for SEM models using standard software.
The Standard Bayesian Information Criterion (BIC). Schwarz’s (1978) BIC drops the
second, third, and fifth terms of Equation 11, on the basis that in large samples, the first and fourth
terms will dominate in the order of error. Ignoring these terms gives
logP (Y |Mk) = l(θk)−dk2
log(n) +O(1). (12)
If we multiply this by −2, the BIC for say M1and M2 is calculated as
BIC = 2[l(θ2)− l(θ1)]− (d2 − d1) log(n). (13)
An advantage of the BIC is that it does not require specification of priors and it is readily
calculable from standard output of most statistical packages. However, the approximation does not
always work well. Problematic conditions include small sample, extremely collinear explanatory
7
variables, or when the number of parameters increases with sample size. For other critiques see
Winship (1999) and Weakliem (1999).
Haughton’s BIC (HBIC). A variant on the BIC retains the third term of Equation 11
(Haughton 1988). We call this enhanced approximation the HBIC (Haughton labels it BIC*).
logP (Y |Mk) = l(θk) +dk2
log(2π)− dk2
log(n) +O(1) (14)
= l(θk)−dk2
log(n
2π) +O(1).
In comparing models M1and M2 and multiplying by −2 this leads to
HBIC = 2[l(θ2)− l(θ1)]− (d2 − d1) log( n
2π
). (15)
The HBIC performed better in model selection in a simulation study for SEMs (Haughton, Oud,
and Jansen 1997).
The Information Matrix-Based BIC (IBIC). Following Haughton’s lead, we see no rea-
son to throw away information that is easily calculable. The estimated expected information matrix
IE(θk) is output by many SEM packages (it can be computed as the inverse of the covariance ma-
trix for parameter estimates). Taking advantage of this, we preserve the fifth term in Equation
11.2
logP (Y |Mk) = l(θk)−dk2
log( n
2π
)− 1
2log∣∣∣IE(θk)
∣∣∣+O(1).
For modelsM1andM2 and multiplying by−2, the Information matrix based Bayesian information
Criterion (IBIC) is given by
IBIC = 2[l(θ2)− l(θ1)]− (d2 − d1) log( n
2π
)− log
∣∣∣IE(θ2)∣∣∣+ log
∣∣∣IE(θ1)∣∣∣ . (16)
The Scaled Unit Information Prior BIC (SPBIC). An alternative derivation of Schwarz’s2Note that this is very similar to Kashyap’s (1982) approximation, which uses log(n) rather than log( n2π ).
8
BIC makes use of prior distribution of θk. It is possible to arrive at Equation 12 by utilizing a unit
information prior, which assumes that the prior has approximately the same information as the
information contained in a single data point. Critics have argued that the unit information prior is
too flat to reflect a realistic prior. This prior also penalizes complex models too heavily, making
BIC overly conservative toward the null hypothesis (Weakliem 1999; Kuha 2004; Berger et al.
2006). Some researchers have proposed centering priors around the MLE (θ) of Mk, which favors
more complex models heavily. We propose to use a scaled information prior that is also centered
around the MLE, but scaled so that the prior extends out to the likelihood, and does not place all
the mass around the complex model. The final analytical expression is as follows, where Io(θk) is
the observed information matrix evaluated at θk.
logP (Y |Mk) = l(θk)−dk2
+dk2
(log pk − log(θTk IO(θk)θk)) (17)
=⇒ SPBIC = 2(l(θ1)− l(θ2))− d1
(1− log
[d1
θT1 IO(θ1)θ1
])
+ d2
(1− log
[d2
θT2 IO(θ2)θ2
]). (18)
For more details on the calculation of the scaling factor (which is model dependent and
thus leads to an empirical Bayes prior) and the derivation of the analytical expression below, see
the appendix. The essential point for the practitioner is that the prior is implicit in the information
and does not need to be explicitly calculated. All elements of the SPBIC formula are output by
standard SEM software.
Theoretically, the IBIC and the SPBIC should perform at least as well as the BIC or the
HBIC, and better under certain circumstances. The IBIC incorporates an additional term that
makes it closer to the original Laplace approximation. To the degree that this term contributes to
distinguishing between models, it holds the potential to improve accuracy. This might occur in
smaller samples where this term is not dominated by the other terms in the approximation. The
SPBIC employs a more flexible prior with more realistic assumptions than the unit information
9
prior on which the standard BIC is based. << Add something on why this matters, esp. in SEM
case.>>
However, despite this theoretical justification for using Bayes Factor approximations with
SEMs, there is little empirical evidence as to their effectiveness. The only study of Bayes Factor
approximations in selecting SEMs is the Haughton, Oud, and Jansen (1997) research mentioned
above. These authors simulate a three factor model with six indicators and compare minor variants
on the true structure to determine how successful the Bayes Factor approximations and several
fit indices are in selecting the true model. Their simulation results find that the Bayes Factor
approximations as a group are superior to the other p-value of the chi-square test and the other fit
indices in correctly selecting the true generating model. They find that HBIC does the best among
the Bayes Factor approximations. Among the other fit indices and the chi-square p-value, the IFI
and RFI have the best correct model selection percentages, though their success is lower than the
Bayes Factor approximations.
The Haughton, Oud, and Jansen (1997) study is valuable in several respects, not the least of
which in its demonstration of the value of the Bayes Factor approximations. As they acknowledge,
it is unclear whether their results extend beyond the relatively simple model that they examined. In
addition, the structure of most of the models that they considered were quite similar to each other.
The performance of the Bayes Factor approximations for models with very different structures is
unknown. Also their study examined the single model that was best fitting according to their fit
measures. The distance of the best fitting model from the second best is not discussed.
Our simulation analysis seeks to test the generality of their results by looking at larger
models with alternative models that range from mildly different to very different structures. In
addition, we provide the first evidence on the performance of the IBIC and SPBIC in the selection
of SEMs. We also include several popular fit indices including the p-value of the chi-square test
statistic.
10
4 Simulation Study
To test the performance of these various approximations to the Bayes Factor, we fit a variety
of models to two sets of simulated data. We use models designed by Paxton et al. (2001), which
has been the basis for several other simulation studies. This will enable comparisons to the results
for other fit indices that use the same design, but different fit indices. Standard SEM assumptions
hold in these models: exogenous variables and disturbances come from normal distributions and
the disturbances follow the assumptions presented in Section 2 above. Graphical depictions of each
models and the specific parameters used to generate the simulated data can be found in Paxton et al.
(2001). The first simulation (herefore designated SIM1) was based on the following underlying
population model. The structural model consists of three latent variables linked in a chain.η1
η2
η3
=
0 0 0
β21 0 0
0 β32 0
η1
η2
η3
+
ψ11
ψ22
ψ33
The measurement model contains 9 observed variables, 3 of which crossload on more than
one latent variable. Scaling is accomplished by fixing λ = 1 for a single indicator loading on each
latent variable
y1
y2
y3
y4
y5
y6
y7
y8
y9
=
λ11 0 0
1 0 0
λ31 0 0
λ41 λ42 0
0 1 0
0 λ62 λ63
0 λ72 λ73
0 0 1
0 0 λ93
η1
η2
η3
+
δ1
δ2
δ3
δ4
δ5
δ6
δ7
δ8
δ9
11
For each simulated data set, we fit a series of misspecified models, listed in Table 1. This
range of models tests the ability of the different fit statistics to detect a variety of common errors.
Parameters were misspecified so as to introduce error into both the measurement and latent variable
components of the models. For example, factor loadings for indicators of the latent variables are
incorrectly added or dropped in models 2–4, correlated errors between indicators are introduced
in models 5 and 7, and latent variables are artificially added or dropped in models 11–12 (with
corresponding changes in the measurement model).
[Insert Table 1 here]
The second simulation (SIM2) is based on a generating model that is identical to SIM1 in
the structural model, but contains an additional two observed indicators for each latent variable.η1
η2
η3
=
0 0 0
β21 0 0
0 β32 0
η1
η2
η3
+
ψ1
ψ2
ψ3
12
y1
y2
y3
y4
y5
y6
y7
y8
y9
y10
y11
y12
y13
y14
y15
=
λ11 0 0
1 0 0
λ31 0 0
λ41 0 0
λ51 0 0
λ61 λ62 0
0 1 0
0 λ82 0
0 λ92 0
0 λ10,2 λ10,3
0 λ11,2 λ11,3
0 0 1
0 0 λ13,3
0 0 λ14,3
0 0 λ15,3
η1
η2
η3
+
δ1
δ2
δ3
δ4
δ5
δ6
δ7
δ8
δ9
δ10
δ11
δ12
δ13
δ14
δ15
The misspecified models used in SIM2 are given in Table 2.
[Insert Table 2 here]
For each simulation we generated 1000 data sets with each of the following sample sizes:
N = 100, N = 250, N = 500, N = 1000, and N = 5000. Note that, following standard practice,
samples were discarded if they led to non-converging or improper solutions, and new samples were
drawn to replace the “bad” samples until a total of 1000 samples were drawn that had convergent
solutions for each of the 12 fitted models.3
In these simulations we tested the relative performance of various information criteria as
well as several traditional tests of overall model fit by calculating the percentage of samples (at3This was most problematic at small sample sizes for SIM1—at N = 100, 715 samples had to be discarded for
nonconvergence and 64 for improper solutions. The respective SIM1 numbers at other sample sizes are: N = 250: 236and 7, N = 500: 96 and 0, N = 1000: 40 and 0, and N = 5000: 27 and 0. The numbers for SIM2 are: N = 100: 85and 56, N = 250: 9 and 4, N = 500: 5 and 0, N = 1000: 0 and 0, and N = 5000: 0 and 0.
13
each sample size) for which the correct model (M1) was unambiguously selected, ambiguously
contained within a set of well-fitting models, ambiguously rejected, or unambiguously rejected.
The fit statistics included in the study are the BIC, IBIC, SPBIC, HBIC, AIC, chi-square p-value,
IFI, CFI, TLI, RNI, and RMSEA. To assess the information criteria measures, we used the follow-
ing categories:
• True/Clear: The true model has the smallest value for the information criterion, and is at
least 2 points smaller than the next best model.
• True/Close: The true model has the smallest value for the information criterion, but it is
within 2 points of the next best model.
• False/Close: An incorrect model has the smallest value of the information criterion, but it is
within 2 points of the true model
• False/Clear: An incorrect model has the smallest value of the information criterion, and is
at least 2 points smaller of the true model.
We use the value of 2 as a measure of close fit based on the Jeffreys-Raftery recommenda-
tions for Bayes Factor approximations (Raftery 1995). In essence, this suggests that measures that
differ by 2 or less do not draw sharp distinctions between two models. For the chi-square p-value,
IFI, CFI, TLI, and RNI, the best fitting model is chosen as that with the highest value. For the
RMSEA, the model with the smallest value is the best fitting one. For the IFI, CFI, TLI, RNI, and
RMSEA the fit of two models is considered close if the fit is within ± 0.01. This cutoff is based
on the common practice of only considering models that fall between 0.9 and 1 for the IFI, CFI,
TLI, and RNI, and between 0.10 and 0 for the RMSEA. A different of .01 represents 10% of the
typical range of acceptable model fit. Fit indices that differ by less than this amount, differ by
less than this 10%. The chi-square p-value difference that we use to separate close fit is also 0.01.
Instances in which the p-value differs by less than 0.01 seem too close to justify the conclusion
that the models’ fit the data differently.4
4Regardless, results remain unchanged if another value, such as ± 0.001 is used for all of these cutoffs.
14
4.1 Results
Figures 1 and 2 present model selection results. The y-axes in these graphs plot the per-
centage of the 1000 simulations in which a given fit statistic clearly or ambiguously selected the
true model (positive bars) and clearly or ambiguously selected an incorrect model (negative bars).
Each graph plots results at a different sample size, increasing from 100 (top-middle graph) to panel
(bottom-right). A perfect fit statistic would produce a medium gray positive bar that extends up to
100 on the y-axis of every graph, and no bar extending downward. This would indicate that the
statistic always clearly selects the true model and never selects an incorrect model.
Two main points to note in these figures are that the Bayes Factor approximations are
notably superior to the conventional methods with respect to selecting the correct model and that
their performance improves as sample size increases. The graphs show that all of the measures
perform poorly at N = 100, and improve as the sample size increases. However, this improvement
in performance is larger for the various BIC measures compared to the chi-square-based indices.
AtN = 1000 in SIM1, three of the four BIC measures clearly select the true model over 90% of the
time. In contrast, the best performing chi-square-based method, RNI, only selects the correct model
ambiguously, and in only about 65% of the simulations. In fact, even at N = 5000 (SIM1), the IFI,
CFI, TLI, RNI, and RMSEA still ambiguously select an incorrect model more than 30% of the
time and the chi-square p-value clearly selects the wrong model in almost 50% of the simulations.
Both SIM1 and SIM2 show similar patterns, though performance of all of the measures
increases slightly in SIM2, which is to be expected given that its model contains an additional two
observed indicators for each latent variable. In short, the Bayes Factor approximations substan-
tially outperform the more common fit measures in selecting the true model. These results hold for
several different cutoff values with the the IFI, CFI, TLI, RNI, and RMSEA (see note 4).
[Insert Figure 1 here]
[Insert Figure 2 here]
There are also noticeable differences between the various information criteria we examine
15
in Figures 1 and 2. At small sample sizes, BIC and IBIC have the highest rate of clearly selecting
an incorrect model (both over 80% at N = 100 for SIM1), while SPBIC and HBIC perform better,
with SPBIC showing the best performance (over 20% True/Clear at N = 250 in SIM1 and 60%
in SIM2). All of the information criteria improve as N increases, with the two new measures
described here—IBIC and SPBIC—performing slightly better than the others in several instances.
BIC, IBIC, and SPBIC perform the best at the larger sample sizes in both SIM1 and SIM2, with
HBIC performing better than the conventional methods, but not quite at the level of the other three.
For instance, in SIM2 at N = 1000, BIC, IBIC, and SPBIC all clearly select the true model more
than 90% of the time (with IBIC at 100%), while HBIC falls into the True/Clear category about
85% of the time. Finally, AIC performance is between that of the various Bayesian information
criteria and the conventional methods, most often falling in one of the two ambiguous categories
(True/Close or False/Close).
4.2 Discussion
This simulation study suggests that there is much to gain from using Bayesian informa-
tion criteria—including two new measures show here—in SEM model selection. The BIC, IBIC,
SPBIC, and HBIC consistently outperform the conventional methods of conducting model selec-
tion. This pattern holds at several sample sizes and in SEM data with relatively less information
(SIM1) and relatively more information (SIM2). While all of the measures examined here perform
poorly in selecting the correct model in small samples, the Bayes Factor approximations drastically
improve as N increases while the conventional methods improve only moderately or not at all.
There is also variance between the various Bayes Factor approximations. In model selec-
tion, the new IBIC and SPBIC shown here as well as BIC are the top performers, with HBIC not
performing as well. In several cases, either IBIC or SPBIC clearly selects the true model at the
highest frequency of all the fit statistics.
16
5 Conclusions
As in any statistical analysis, a key issue in the use of SEMs is determining the fit of the
model to the data and comparing the relative fit of two or more models to the same data. However,
current practice with SEMs commonly employs methods that are problematic for several different
reasons. For example, the asymptotic chi-square distributed test statistic that tests whether a SEM
exactly reproduces the covariance matrix of observed variables routinely rejects models in large
samples regardless of their merit as approximations to the underlying process. Several other fit
indices have been developed in response, but those measures also have important problems: their
sampling distributions are not commonly known, they tend to increase as sample size increases,
and there is disagreement about the optimal cutoff values for a “good fitting” model.
In this paper, we have shown that Bayes Factor approximations can serve as a useful al-
ternative to these conventional methods. Though others have noted this usefulness (e.g., Raftery
1993, 1995; Haughton, Oud, and Jansen 1997), it is rare to see the BIC or Bayes Factors in appli-
cations of SEMs. We introduce two new information criteria—the IBIC and SPBIC—to the SEM
literature and show through an extensive set of comparisons that, along with some established ap-
proximations, these fit statistics outperform several conventional measures in selecting the correct
model.
In sum, Bayes Factor approximations are a useful tool for the applied researcher in compar-
ing SEM models. Their significant improvement in model selection compared to the conventional
methods can substantially improve the current practice of SEM analysis. In addition, among the
various options for approximating a Bayes Factor, we show that our new fit indices—the IBIC and
SPBIC—perform well when compared to other information criteria. Both of these measures are
easily calculated from standard SEM software, and thus are easy for the applied researcher to use.
Thus, we recommend use of IBIC and SPBIC to researchers seeking to compare SEM models in
their work.
17
A Appendix: Bayes Factor Technical Details
This paper is based on research done in an unpublished companion technical paper. Below
we reproduce key details from that paper on the Bayes Factor, methods of approximation, and
specifics of the BIC, SPBIC, and IBIC.
A.1 Bayes Factor
In this section we briefly describe the Bayes Factor and the Laplace approximation. Readers
familiar with this derivation may skip this section. For a strict definition of the Bayes Factor we
have to appeal to the Bayesian model selection literature. Thorough discussions on Bayesian
statistics in general and model selection in particular are in Gelman et al. (1995), Carlin and Louis
(1996), and Kuha (2004). Here we provide a very simple description.
We start with data Y and hypothesize that it was generated by either of two competing
models M1 and M2.5 Moreover, let us assume that we have prior beliefs about the data generat-
ing models M1 and M2, so that we can specify the probability of data coming from model M1,
P (M1) = 1 − P (M2). Now, using the Bayes theorem the ratio of the posterior probabilities is
given byP (M1|Y )
P (M2|Y )=P (Y |M1)
P (Y |M2)
P (M1)
P (M2).
The left side of the equation can be interpreted as the posterior odds of M1 vs. M2, and
it represents how the prior odds P (M1)/P (M2) is updated by the observed data Y . The factor
P (Y |M1)/P (Y |M2) that is multiplied to obtain the posterior from the prior is known as the Bayes
Factor, which we will denote as
B12 =P (Y |M1)
P (Y |M2). (19)
Typically, we will deal with parametric models Mk which are described through model
5Generalization to more than two competing models is achieved by comparing the Bayes Factor of all models to a“base” model and choosing the model with highest Bayes Factor
18
parameters, say θk. So the marginal likelihoods P (Y |Mk) are evaluated using
P (Y |Mk) =
∫P (Y |θk,Mk)P (θk|Mk)dθk, (20)
where P (Y |θk,Mk) is the likelihood under Mk, and P (θk|Mk) is the prior distribution of θk.
A closed form analytical expression of the marginal likelihoods is difficult to obtain, even with
completely specified priors (unless they are conjugate priors). Moreover, in most cases θk will
be high-dimensional and a direct numerical integration will be computationally intensive if not
impossible. For this reason we turn to a way to approximate this integral given in (20).
A.1.1 Laplace Method of Approximation
The Laplace Method of approximating Bayes Factors is a common device (see Raftery
1993; Weakliem 1999; Kuha 2004). Since we also make use of the Laplace method for our pro-
posed approximations, we will briefly review it here.
The Laplace method of approximating∫P (Y |θk,Mk)P (θk|Mk)dθk = P (Y |Mk) uses a
second order Taylor series expansion of the logP (Y |Mk) around θk, the posterior mode. It can be
shown that
logP (Y |Mk) = logP (Y |θk,Mk) + logP (θk|Mk) +dk2
log(2π)− 1
2log | −H(θk)|+O(N−1),
(21)
where dk is the number of distinct parameters estimated in model Mk, π is the usual mathematical
constant, H(θk) is the Hessian matrix
∂2 log[P (Y |θk,Mk)P (θk|Mk)]
∂θk∂θk′ ,
evaluated at θk = θk.and O(N−1) is the “big-O” notation indicating that the last term is bounded
in probability from above by a constant times N−1 (see, for example, Tierney and Kadane 1986;
Kass and Vaidyanathan 1992; Kass and Raftery 1995).
Instead of the Hessian matrix we will be working with different forms of information ma-
19
trices. Specifically, the observed information matrix is denoted by
IO(θ) = −∂2 log[P (Y |θk,Mk)]
∂θk∂θk′ ,
whereas the expected information matrix IE is given by
IE(θ) = −E[∂2 log[P (Y |θk,Mk)]
∂θk∂θk′
].
Let us also define the average information matrices IE = IE/N and IO = IO/N . If the
observations come from an i.i.d. distribution we have the following identity
IE = −E[∂2 log[P (Yi|θk,Mk)]
∂θk∂θk′
],
the expected information for a single observation. We will make use of this property later.
In large samples the Maximum Likelihood (ML) estimator, θk, will generally be a reason-
able approximation of the posterior mode, θk. If the prior is not that informative relative to the
likelihood, we can write
logP (Y |Mk) = logP (Y |θk,Mk) + logP (θk|Mk)+
dk2
log(2π)− 1
2log |IO(θk)|+O(N−1), (22)
where logP (Y |θk,Mk) is the log likelihood for Mk evaluated at θk, IO(θk) is the observed infor-
mation matrix evaluated at θk (Kass and Raftery 1995) and O(N−1) is the “big-O” notation that
refers to a term bounded in probability to be some constant multiplied by N−1. If the expected
information matrix, IE(θk), is used in place of IO(θk), then the approximation error is O(N−1/2)
20
and we can write
logP (Y |Mk) = logP (Y |θk,Mk) + logP (θk|Mk)
+dk2
log(2π)− 1
2log∣∣∣IE(θk)
∣∣∣+O(N−1/2) (23)
If we now make use of the definition, IE = IE/N and substitute it in (23), we have
logP (Y |Mk) = logP (Y |θk,Mk) + logP (θk|Mk)+
dk2
log(2π)− dk2
log(N)− 1
2log∣∣∣IE(θk)
∣∣∣+O(N−1/2). (24)
This equation forms the basis of several approximations to Bayes Factors, which we discuss
below.
A.2 Approximation through Special Prior Distributions
The BIC has been justified by making use of a unit information prior distribution of θk.
Our first new approximation, SPBIC, builds on this rationale using a more flexible prior. We first
derive the BIC, then present our proposed modification.
A.2.1 BIC: The Unit Information Prior Distribution
We assume for model MK (the full model, or the largest model under consideration), the
prior of θK is given by a multivariate normal density
P (θK |MK) ∼ N (θ∗K ,V∗K) (25)
where θ∗K and V ∗K are the prior mean and variance of θK . In the case of BIC we take V ∗K = I−10 ,
the average information defined in Section 3. This choice of variance can be interpreted as setting
the prior to have approximately the same information as the information contained in a single data
21
point.6 This prior is thus referred to as the Unit Information Prior and is given by
P (θK |MK) ∼ N
(θ∗K ,
[IO(θK)
]−1). (26)
For any model Mk nested in the full model MK , the corresponding prior P (θk|Mk) is
simply the marginal distribution obtained by integrating out the parameters that do not appear in
Mk. Using this prior in (22), we can show that
logP (Y |Mk) ≈ log(P (Y |θk,Mk))−dk2
log(N)− 1
2N(θ − θ∗)T I0(θk)(θ − θ∗)
≈ log(P (Y |θk,Mk))−dk2
log(N). (27)
Note that the the error of approximation is still of the order O(N−1). Comparing models M1 and
M2 and multiplying by −2 gives the usual BIC
BIC = 2(l(θ2)− l(θ1))− (d2 − d1) log(N) (28)
where we use the more compact notation of l(θk) ≡ log(P (Y |θk,Mk)) for the log likelihood
function at θk.
An advantage of this unit information prior rationale for the BIC is that the error is of
order O(N−1) instead of O(1) when we make no explicit assumption about the prior distribution.
However, some critics argue that the unit information prior is too flat to reflect a realistic prior
(e.g., Weakliem 1999).
A.2.2 SPBIC: The Scaled Unit Information Prior Distribution
The SPBIC, our alternative approximation to the Bayes Factor, is similar to the BIC
derivation except instead of using the unit information prior, we use a scaled unit information prior
that allows the variance to differ and permits more flexibility in the prior probability specifica-
6Note that unlike the expected information, even if the observations are i.i.d., the average observed informationIO(θk), may not be equal to the observed information of any single observation.
22
tion. The scale is chosen by maximizing the marginal likelihood over the class of normal priors.
Berger (1994), Fougere (1990), and others have used related concepts of partially informative pri-
ors obtained by maximizing entropy. In general, these maximizations involve tedious numerical
calculations. We ease the calculation by employing components in the approximation that are
available from standard computer output. We first provide a Bayesian interpretation of our choice
of the scaled unit information prior and then provide an analytical expression for calculating this
empirical Bayes prior.
We have already shown how the BIC can be interpreted as using a unit information prior
for calculating the marginal probability in (22). This very prior leads to the common criticism
of BIC being too conservative towards the null, as the unit prior penalizes complex models too
heavily. The prior of BIC is centered at θ∗ (parameter value at the null model) and has a variance
scaled at the level of unit information. As a result this prior may put extremely low probability
around alternative complex models (for discussion, see Weakliem 1999; Kuha 2004; Berger et al.
2006). One way to overcome this conservative nature of BIC is to use the ML estimate, θ, of
the complex model Mk, as the center of the prior P (θk|Mk). But this prior specification puts all
the mass around the complex model and thus favors the complex model heavily. A scaled unit
information prior (henceforth denoted as SUIP) is a less aggressive way to be more favorable to
a complex model. The prior is still centered around θ∗, so that the highest density is at θ∗, but we
choose its scale, ck, so that the prior flattens out to put more mass on the alternative models. This
places higher prior probability than the unit information prior on the space of complex modes, but
at the same time prevents complete concentration of probability on the complex model. This effec-
tively leads to choosing a prior in our class that is most favorable to the model under consideration,
placing the SUIP in the class of Empirical Bayes priors.
To obtain the analytical expression for the appropriate scale, we start from the marginal
probability given in (20). Instead of using the more common form of the Laplace approximation
applied jointly to the likelihood and the prior, we use the approximate normality of the likelihood,
but choose the scale of the variance of the multivariate normal prior so as to maximize the marginal
23
likelihood. As a result, our prior distribution is not fully specified by the information matrix; rather
the variance is a scaled version of the variance of the unit information prior defined in (26). This
prior has the ability to adjust itself through a scaling factor, which may differ significantly from
unity in several situations (e.g., highly collinear covariates, dependent observations).
The main steps in constructing the SPBIC are as follows. First we define the scaled unit
information prior as
Pk(θk) ∼ N
θ∗k,[IO(θk)
]−1ck
, (29)
where IO = 1NIO and ck is the scale factor which we will evaluate from the data. Note that ck may
vary across different models. Using the above prior in (29) and the generalized Laplace integral
approximation described in Appendix A, we have
logP (Y |Mk) ≈ l(θk)−1
2dk log
(1 +
N
ck
)− 1
2
ckN + ck
[(θk − θ∗)T I(θk)(θk − θ∗)
]. (30)
Now we estimate ck. Note that we can view (30) as a likelihood function in ck, and we can
write L(ck|Y ,Mk) =∫L(θk)π(θk|ck)dθk = L(Y |Mk). So the optimum ck is given by its ML
estimate, which can be obtained as follows.
2l(ck) = −dk log
(1 +
N
ck
)+ 2l(θk)−
ckN + ck
θTk IO(θk)θk
=⇒ 2l′(ck) = dkN
ck(N + ck)− θ
Tk IO(θk)θkN
(N + ck)2
=⇒ 0 = N(N + ck)dk −N ck θTk IO(θk)θk
=⇒ ckN
=
dkθTk IO(θk)θk−dk
if dk < θTk IO(θk)θk,
∞ if dk ≥ θTk IO(θk)θk.
Using this ck we arrive at the SPBIC measures under the two cases.
24
Case 1: When dk < θTk IO(θk)θk, we have
logP (Y |Mk) = l(θk)−dk2
+dk2
(log dk − log(θTk IO(θk)θk)) (31)
=⇒ SPBIC = 2(l(θ2)− l(θ1))− d2
(1− log
[d2
θT2 IO(θ2)θ2
])
+ d1
(1− log
[d1
θT1 IO(θ1)θ1
])(32)
Case 2: Alternatively, when dk ≥ θTk IO(θk)θk, as ck = ∞, the prior variance goes to 0 so the prior
distribution is a point mass at the mean, θ∗. Thus we have
logP (Y |Mk) = l(θk)−1
2θTk IO(θk)θk,
which gives
SPBIC = 2(l(θ2)− l(θ1))− θT2 IO(θ2)θ2 + θT1 IO(θ1)θ1. (33)
A.3 Approximation through Eliminating O(1) Terms
An alternative derivation of the BIC does not require the specification of priors. Rather,
it is justified by eliminating smaller order terms in the expansion of the marginal likelihood with
arbitrary priors (see (24)). After providing the mathematical steps of deriving the BIC using (24),
we show how approximations of the marginal likelihood with fewer assumptions can be obtained
using readily available software output.
A.3.1 BIC: Elimination of Smaller Order Terms
One justification for the Schwarz (1978) BIC is based on the observation that the first and
fourth terms in (24) have orders of approximations ofO(N) andO(log(N)), respectively, while the
second, third, and fifth terms are O(1) or less. This suggests that the former terms will dominate
25
the latter terms when N is large. Ignoring these latter terms, we have
logP (Y |Mk) = logP (Y |θk,Mk)−dk2
log(N) +O(1). (34)
If we multiply this by −2, the BIC for, say, M1 and M2 is calculated as
BIC = 2[logP (Y |θ2,M2)− logP (Y |θ1,M1)]− (d2 − d1) log(N). (35)
Equivalently using l(θk), the log-likelihood function in place of the logP (Y |θk,Mk) terms in (35)
we have
BIC = 2[l(θ2)− l(θ1)]− (d2 − d1) log(N). (36)
The relative error of theBIC in approximating a Bayes Factor isO(1). This approximation
will not always work well, for example when N is small, but also when sample size does not
accurately summarize the amount of available information. This assumption breaks down for
example when the explanatory variables are extremely collinear or have little variance, or when
the number of parameters increase with sample size (Winship 1999; Weakliem 1999).
A.3.2 IBIC and Related Variants: Retaining Smaller Order Terms
Some of the terms for the approximation of the marginal likelihood that are dropped by
the BIC can be easily calculated and retained. One variant called the HBIC retains the third
term in equation (23) (Haughton 1988; Haughton, Oud, and Jansen 1997). A simulation study
by Haughton, Oud, and Jansen (1997) found that this approximation performs better in model
selection for structural equation models than does the usual BIC.
The second term can also be retained by calculating the estimated expected information
matrix, IE(θk), which is often available in statistical software for a variety of statistical models.
Taking advantage of this and building on Haughton, Oud, and Jansen (1997), we propose a new
26
approximation for the marginal likelihood, omitting only the second term in (24).
logP (Y |Mk) = logP (Y |θk,Mk)−dk2
log
(N
2π
)− 1
2log∣∣∣IE(θk)
∣∣∣+O(1).
For modelsM1andM2 and multiplying by−2, this leads to a new approximation of a Bayes
Factor, the Information matrix-based Bayesian Information Criterion (IBIC).7 This is given by
IBIC = 2[l(θ2)− l(θ1)]− (d2 − d1) log
(N
2π
)− log
∣∣∣IE(θ2)∣∣∣+ log
∣∣∣IE(θ1)∣∣∣ . (38)
Of the three approximations that we have discussed, IBIC includes the most terms from
equation (23), leaving out just the prior distribution term logP (θk|Mk).
7Another approximation due to Kashyap (1982), and given by
2[l(θ2)− l(θ1)]− (d2 − d1) log (N) − log∣∣∣IE(θ2)
∣∣∣+ log∣∣∣IE(θ1)
∣∣∣ (37)
is very similar to our proposed IBIC. The IBIC incorporates the estimated expected information matrix at theparameter estimates for the two models being compared.
27
References
Bentler, P.M. and Douglas G. Bonett. 1980. “Significance Tests and Goodness of Fit in the Analysis
of Covariance Structures.” Psychological Bulletin 88:588–606.
Bentler, P.M. and J.A. Stein. 1992. “Structural Equation Models in Medical Research.” Statistical
Methods in Medical Research 1(2):159–181.
Beran, Tanya N. and Claudio Violato. 2010. “Structural Equation Modeling in Medical Research:
A Primer.” BMC Research Notes 3(1):267–277.
Berger, James O. 1994. “An Overview of Robust Bayesian Analysis.” Test (Madrid) 3(1):5–59.
Berger, James O., Surajit Ray, Ingmar Visser, Ma J. Bayarri and W. Jang. 2006. Generalization of
BIC. Technical report University of North Carolina, Duke University, and SAMSI.
Bollen, Kenneth A. 1989. Structural Equations with Latent Variables. New York: John Wiley &
Sons.
Bollen, Kenneth A. and J. Scott Long. 1993. Testing Structural Equation Models. Sage.
Bollen, Kenneth A. and Robert A. Stine. 1992. “Bootstrapping Goodness of Fit Measures in
Structural Equation Models.” Sociological Methods & Research 21:205–229.
Carlin, Bradley P. and Thomas A. Louis. 1996. Bayes and Empirical Bayes Methods for Data
Analysis. New York: Chapman and Hall.
Cudeck, Robert and Michael W. Browne. 1983. “Cross-validation of Covariance Structures.” Mul-
tivariate Behavioral Research 18(2):147–167.
Fougere, P. 1990. Maximum Entropy and Bayesian Methods. In Maximum Entropy and Bayesian
Methods, ed. P. Fougere. Dordrecht, NL: Kluwer Academic Publishers.
Gelman, Andrew, John B. Carlin, Hal S. Stern and Donald B. Rubin. 1995. Bayesian Data Analy-
sis. London: Chapman & Hall.
Haughton, Dominique M. A. 1988. “On the Choice of a Model to Fit Data From an Exponential
Family.” The Annals of Statistics 16(1):342–355.
Haughton, Dominique M. A., Johan H. L. Oud and Robert A. R. G. Jansen. 1997. “Information
28
and Other Criteria in Structural Equation Model Selection.” Communications in Statistics, Part
B – Simulation and Computation 26(4):1477–1516.
Joreskog, Karl. 1969. “A General Approach to Confirmatory Maximum Likelihood Factor Analy-
sis.” Psychometrika 34(2):183–202.
Joreskog, Karl. 1973. Analysis of Covariance Structures. New York: Academic Press.
Joreskog, Karl and Dag Sorbom. 1979. Advances in Factor Analysis and Structural Equation
Models. Cambridge, MA: Abt Books.
Kashyap, Rangasami L. 1982. “Optimal Choice of AR and MA Parts in Autoregressive Moving
Average Models.” IEEE Transactions on Pattern Analysis and Machine Intelligence 4(2):99–
104.
Kass, Robert E. and Adrian E. Raftery. 1995. “Bayes Factors.” Journal of the American Statistical
Association 90(430):773–795.
Kass, Robert E. and Suresh K. Vaidyanathan. 1992. “Approximate Bayes Factors and Orthogonal
Parameters, with Application to Testing Equality of Two Binomial Proportions.” Journal of the
Royal Statistical Society, Series B: Methodological 54(1):129–144.
Kuha, Jouni. 2004. “AIC and BIC: Comparisons of Assumptions and Performance.” Sociological
Methods & Research 33(2):188–229.
Paxton, Pamela, Patrick J. Curran, Kenneth A. Bollen, Jim Kirby and Feinian Chen. 2001. “Monte
Carlo Experiments: Design and Implementation.” Structural Equation Modeling 8(2):287–312.
Raftery, Adrian E. 1993. Bayesian Model Selection in Structural Equation Models. In Testing
Structural Equation Models, ed. Kenneth A. Bollen and J. Scott Long. Newbury Park, CA: Sage
pp. 163–180.
Raftery, Adrian E. 1995. Sociological Methodology. Cambridge, MA: Blackwell chapter Bayesian
Model Selection in Social Research (with Discussion), pp. 111–163.
Satorra, Albert and P.M. Bentler. 1988. Scaling Corrections for Chi-Square Statistics in Covariance
Structure Analysis. In ASA 1988 Proceedings of the Business and Economic Statistics Section.
American Statistical Association Alexandra, VA: pp. 308–313.
29
Schnoll, R.A., C.Y. Fang and S.L. Manne. 2004. “The Application Of SEM to Behavioral Research
in Oncology: Past Accomplishments and Future Opportunities.” Structural Equation Modeling
11(4):583–614.
Schwarz, Gideon. 1978. “Estimating the Dimension of a Model.” Annals of Statistics 6(2):461–
464.
Shipley, Bill. 2000. Cause and Correlation in Biology: A User’s Guide to Path Analysis, Structural
Equations, and Causal Inference. New York: Cambridge University Press.
Steiger, J. H. and J.C. Lind. 1980. Statistically Based Tests For The Number of Common Factors.
Presented at the annual meeting of the Psychometric Society, Iowa City, IA.
Tierney, Luke and Joseph B. Kadane. 1986. “Accurate Approximations for Posterior Moments and
Marginal Densities.” Journal of the American Statistical Association 81(393):82–86.
Tucker, Ledyard R. and Charles Lewis. 1973. “A Reliability Coefficient for Maximum Likelihood
Factor Analysis.” Psychometrika 38(1):1–10.
Weakliem, David L. 1999. “A Critique of the Bayesian Information Criterion for Model Selection.”
Sociological Methods & Research 27(3):359–397.
Winship, Christopher. 1999. “Editor’s Introduction to the Special Issue on the Bayesian Informa-
tion Criterion.” Sociological Methods & Research 27(3):355–358.
30
Table 1: Fitted Models for SIM1
Model Description # Par. Dropped Parameters Extra ParametersCorrect ModelM1 Correct model 23Misspecified Measurement ModelsM2 Drop 1 cross
loading22 λ41
M3 Drop 2 crossloadings
21 λ41, λ72
M4 Drop 3 crossloadings
20 λ41, λ72, λ63
M5 Drop cross-loadings; addcorrelated errors
22 λ72, λ63 θ67
M6 Correlated errors 24 θ67M7 Switch factor
loadings23 λ21, λ52 λ51, λ22
Misspecified Structural ModelsM8 Add double-
lagged effect24 γ31
M9 Add correlated er-rors and double-lagged effect
25 θ67, β31
M10 No relationsamong η’s
21 β21, β32
M11 Two latent vari-ables (remove η3)
19 λ42, λ72, λ73, λ93, β32,ψ33 λ82, λ92
M12 Four latent vari-ables (add η4)
24 λ41, λ62, λ72, λ83, λ93 λ32, λ53, λ74, λ94, β43, ψ44
31
Tabl
e2:
Fitte
dM
odel
sfo
rSIM
2
Mod
elD
escr
iptio
n#
Par.
Dro
pped
Para
met
ers
Ext
raPa
ram
eter
sC
orre
ctM
odel
M1
Cor
rect
mod
el35
Mis
spec
ified
Mea
sure
men
tMod
els
M2
Dro
p1
cros
slo
adin
g34
λ61
M3
Dro
p2
cros
slo
adin
gs33
λ61,λ
11,2
M4
Dro
p3
cros
slo
adin
gs32
λ61,λ
11,2,λ
10,3
M5
Dro
pcr
oss-
load
ings
;ad
dco
rrel
ated
erro
rs
34λ11,2,λ
10,3
θ 10,11
M6
Cor
rela
ted
erro
rs36
θ 10,11
M7
Switc
hfa
ctor
load
ings
35λ21,λ
72
λ71,λ
22
Mis
spec
ified
Stru
ctur
alM
odel
sM
8A
dddo
uble
-la
gged
effe
ct36
γ31
M9
Add
corr
elat
eder
-ro
rsan
ddo
uble
-la
gged
effe
ct
37θ 1
0,11,γ
31
M10
No
rela
tions
amon
gη
’s33
β21,β
32
M11
Two
late
ntva
ri-
able
s(r
emov
eη 3
)31
λ62,λ
10,3,λ
11,3,λ
13,3,λ
14,3,λ
15,3,β
32,ψ
33
λ12,2,λ
13,2,λ
14,2,λ
15,2
M12
Four
late
ntva
ri-
able
s(a
ddη 4
)37
λ61,λ
92,λ
10,2,λ
11,2,λ
13,3,λ
14,3,λ
15,3,
λ42,λ
52,λ
83,λ
12,3,λ
11,4,λ
12,4,λ
14,4,λ
15,4,β
43,ψ
44
32
True
/Cle
arTr
ue/C
lose
Fals
e/C
lear
Fals
e/C
lose
N=
100
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
N=
250
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
N=
500
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
N=
1000
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
N=
5000
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
Figu
re1:
SIM
1M
odel
Sele
ctio
nR
esul
ts
33
True
/Cle
arTr
ue/C
lose
Fals
e/C
lear
Fals
e/C
lose
N=
100
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
N=
250
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
N=
500
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
N=
1000
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
N=
5000
1009080706050403020100102030405060708090100
False Model True Model
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
BIC
IBIC
SPBIC
HBIC
AIC
p−value
IFI
CFI
TLI
RNI
RMSEA
Figu
re2:
SIM
2M
odel
Sele
ctio
nR
esul
ts
34