Bay Semiparam SEM. Yang n Dunson 2010

PSYCHOMETRIKA2010DOI: 10.1007/S11336-010-9174-4

BAYESIAN SEMIPARAMETRIC STRUCTURAL EQUATION MODELS WITH LATENTVARIABLES

MINGAN YANGSAINT LOUIS UNIVERSITY

DAVID B. DUNSONDUKE UNIVERSITY

Structural equation models (SEMs) with latent variables are widely useful for sparse covariancestructure modeling and for inferring relationships among latent variables. Bayesian SEMs are appealingin allowing for the incorporation of prior information and in providing exact posterior distributions ofunknowns, including the latent variables. In this article, we propose a broad class of semiparametricBayesian SEMs, which allow mixed categorical and continuous manifest variables while also allowingthe latent variables to have unknown distributions. In order to include typical identifiability restrictions onthe latent variable distributions, we rely on centered Dirichlet process (CDP) and CDP mixture (CDPM)models. The CDP will induce a latent class model with an unknown number of classes, while the CDPMwill induce a latent trait model with unknown densities for the latent traits. A simple and efficient Markovchain Monte Carlo algorithm is developed for posterior computation, and the methods are illustrated usingsimulated examples, and several applications.

Key words: Dirichlet process, factor analysis, latent class, latent trait, mixture model, nonparametricBayes, parameter expansion.

1. Introduction

In the social sciences and increasingly in other application areas, it is routine to collect mul-tivariate data, with the individual measurements having a variety of scales (continuous, count,categorical). Often, these measurements are collected specifically with the goal of studying rela-tionships among latent variables, such as life event-induced anxiety, that can only be measured in-directly through multiple manifest variables. In such settings, structural equation models (SEMs)provide a valuable tool for obtaining insight into the relationships between different latent vari-ables and between latent and observed variables (Bollen, 1989). In addition, SEMs provide aflexible class of multivariate models for describing covariance structures in multivariate data.

In recent years, there has been increased interest in Bayesian SEMs due in part to advancesin posterior computation that now allow Bayesian approaches to be implemented routinely incomplex settings involving multilevel structures, missing data, censoring, and other challenges.Using Markov chain Monte Carlo (MCMC) algorithms, one can obtain samples from the exactposterior distribution of all the unknowns, including the latent variables. These samples can beused to estimate exact posterior distributions, which provide a full probability characterizationof uncertainty without needing to appeal to large sample assumptions. Asymptotic argumentsmay be difficult to justify for SEMs even in large samples, since the data can contain minimalinformation about certain parameters due to weak identifiability. In such settings, it is particularlyimportant to include outside information and theory into the analysis, which can be accomplished

Requests for reprints should be sent to Mingan Yang, School of Public Health, Saint Louis University, St. Louis,MO 63104, USA. E-mail: [email protected]

2010 The Psychometric Society

PSYCHOMETRIKA

within a Bayesian approach through an informative prior. For a recent review of Bayesian SEMs,refer to Palomo, Dunson, and Bollen (2007).

Because Bayesian SEMs require a full likelihood specification, a concern is robustness toparametric assumptions, such as normality of the latent variables and measurement errors. Leeand Xia (2006) recently proposed robust maximum likelihood methods for nonlinear SEMs in-corporating symmetric heavy-tailed distributions. Fahrmeir and Raach (2007) proposed an al-ternative semiparametric Bayesian approach, which characterizes the latent variables in a latentfactor regression model using an additive model. This approach assumes normally distributedlatent variables, while allowing the mean of the latent variable distribution to vary flexibly withcontinuous predictors and spatial location. An alternative strategy to define flexible latent vari-able models is to use mixtures of parametric models. Jedidi, Jagpal, and DeSarbo (1997) used afinite mixture of SEMs to allow heterogeneity across subgroups. Zhu and Lee (2001) consideredBayesian inference on finite mixtures of linear structural relations (LISREL) models (Jreskog &Srbom 1986). Fokoue and Titterington (2003) and Fokoue (2005) proposed mixtures of factoranalysers, based on a finite mixture of normal factor models. Lubke and Muthn (2005) usedfactor mixture models to assess population heterogeneity. McLachlan, Bean, and Jones (2007)proposed a robust extension of mixtures of factor analysers to allow heavy-tailed distributionswithin each component.

Instead of characterizing heterogeneity through finite mixtures of SEMs, our focus is on us-ing Bayesian nonparametric methods to allow the latent variable distributions within an SEM tobe unknown. There is a rich literature on the use of Dirichlet process (DP) priors (Ferguson 1973,1974) and DP mixtures (DPMs) to allow unknown latent variable distributions in hierarchicalmodels. For example, Bush and MacEachern (1996), Kleinman and Ibrahim (1998), and Brownand Ibrahim (2003) used DPMs for random effects distributions. Ansari and Iyengar (2006) re-cently used DP components to define a semiparametric dynamic choice model. Dunson (2006)used dynamic mixtures of DPs to allow a latent variable distribution to change nonparametricallyacross groups. Burr and Doss (2005) used a conditional DP for the random effects distributionwithin a meta-analysis application. Lee, Lu, and Song (2008) placed a truncated DP on the dis-tribution of the exogenous latent variables within an SEM, while treating the endogenous latentvariables as conditionally Gaussian.

Unfortunately, direct application of Dirichlet processes and other priors for unknown distri-butions is problematic in general SEMs due to the need to incorporate constraints on the latentvariable distributions for identifiability and interpretability. For example, it is standard practice torestrict the residual distributions on the latent variable level to have mean zero and variance one.Although this is straightforward when the latent variable distributions are normally distributed,it is quite challenging to incorporate mean and variance constraints on unknown distributions. Infact, although there is a rich literature on incorporation of median and quantile constraints, untilrecently there were no approaches available for placing mean zero and variance one constraintson unknown latent variable distributions within a hierarchical model. To address this gap, twoconceptually related centering approaches were independently developed by Yang, Dunson, andBaird (2010) and Li, Mller, and Lin (in press).

The focus of this article is on applying the centered DP (CDP) and CDP mixture (CDPM)models proposed by Yang et al. (2010) to develop a general class of semiparametric Bayes SEMsthat have unknown latent variable distributions. When a CDP is used for a latent variable distrib-ution, one obtains a latent class model with infinitely many classes represented in the population,finitely many of which are represented in the current sample. When a CDPM is used, one in-stead treats the latent variables as continuous with an unknown smooth density, which can beskewed and multimodal. By centering the prior for this unknown density on the standard nor-mal density, one utilizes Bayesian shrinkage to stabilize estimation of latent variable densitiesin small samples and weak identifiability situations. In practice, it is straightforward to define

MINGAN YANG AND DAVID B. DUNSON

SEMs having both discrete and continuous latent variables. Posterior computation relies on astraightforward data augmentation parameter-expanded Gibbs sampling algorithm, which tendsto be highly efficient.

Section 2 describes the class of SEMs to be considered. Section 3 describes the CDP andCDPM, discussing properties in the setting of SEMs. Section 4 describes the algorithm for pos-terior computation. Section 5 contains the simulation and real data examples, providing a proofof concept that nonnormal latent variable densities can be estimated reliably. Section 6 discussesthe results.

2. Semiparametric Bayes SEMs

SEMs provide a broad framework for modeling of multivariate data having mixed categor-ical and continuous measurement scales. For subject i (i = 1, . . . , n), the observed data consistof zi = (yi ,xi ), where yi = (yi1, . . . , yip) is a p 1 vector of outcome measurements, andxi = (xi1, . . . , xiq) is a q 1 vector of predictor measurements. Following common practice,we assume that yij = gj (yij ; y,j ) for j = 1, . . . , p and xij = hj (xij ; x,j ) for j = 1, . . . , q ,where yij is linked to an underlying continuous variable yij through a link function gj havingparameters y,j , and xij is linked to an underlying continuous variable xij through a link func-tion hj having parameters x,j . Typically, for continuous measurements, identity links will beused, so that yij = yij and xij = xij . In contrast, for categorical measurements, threshold linkswill be used mapping from to {1, . . . ,C}, with C the number of categories.

After relating the observations to underlying continuous variables, we specify the SEM intwo components: (1) the measurement model, which relates the underlying variable to latentvariables; and (2) the latent variable or structural model, which describes relationships amongthe latent variables. For the measurement model, we use a typical normal, linear form as inBollen (1989):

yi = y + yi + y,i , y,i Np(0,y),xi = x + x i + x,i , x,i Nq(0,x),

(1)

where y,x are intercepts, y is a p r factor loadings matrix, i is an r 1 vector of latentresponse variables, y,i is a p 1 vector of idiosyncratic measurement errors, y is a p pdiagonal covariance matrix, x is a q s factor loadings matrix, i is an s 1 vector of latentpredictors, x,i is a q 1 vector of measurement errors, and x is a q q diagonal covariancematrix.

To complete a specification of the SEM, we then choose a LISREL model:

i = Bi + i + i , i F, i G,

(2)

where B is an r r matrix with zeros along the diagonal describing relationships among thedifferent latent response variables, is an r s matrix describing relationships between iand i , and i is a residual. Our contribution relative to previous work on SEMs is to treat Fand G as unknown discrete or continuous distributions using nonparametric Bayes methods. Inparticular, this is accomplished by letting F P and G Q, where P and Q are priors withsupport on the space of distributions on r and s , respectively, subject to appropriate meanand/or variance constraints. Lee et al. (2008) proposed to use a truncated Dirichlet process for Qwithout constraints, while treating F as Gaussian.

PSYCHOMETRIKA

3. Centered Dirichlet Process Mixtures for Latent Variables

3.1. Dirichlet Processes and Dirichlet Process Mixtures

In this section, we review some basic properties of the Dirichlet process (DP). First, sup-pose that Q corresponds to a DP(G0) prior, which is a Dirichlet process with precisionparameter and base distribution G0. For example, G0 may correspond to a normal distri-bution. Then, any realization G from Q will be discrete, implying that individuals will begrouped into clusters. This clustering will occur through the DP prediction rule, or Polya urnscheme, which was originally described by Blackwell and MacQueen (1973). Assuming that i G, with G DP(G0), the Polya urn scheme implies that 1 G0, ( i | 1, . . . , i1) ( +i1 )G0 + 1+i1

i1h=1 h , where is a degenerate distribution with all its mass on

. Hence, the first subject has their latent variable value drawn from G0; and as subjectsare added, they are either grouped with one of the existing subjects or allocated to a newcluster, with probability decreasing as the number of subjects increases. This allows thereto be infinitely many latent classes represented in the population, with k n classes repre-sented in the current sample. Allowing the number of latent classes to grow slowly with thesample size is more realistic, in most applications, than assuming a fixed, finite number ofclasses.

Another property of the DP, which we will utilize in describing a modification to incor-porate identifiability constraints, is the stick-breaking representation of Sethuraman (1994). Inparticular, G DP(G0) implies that

G =

h=1Vh

l


smooth density on the real line can be accurately approximated using such an approach. However,in the setting of SEMs, this flexibility raises questions about identifiability of the latent variabledistributions.

3.2. Identifiability IssuesBecause we do not observe the latent variables, i and i , directly for any of the subjects

under study, it is clear that some constraints are needed on the latent variable distributions and/orthe parameters in (1) and (2). In the parametric case, such constraints were discussed in Bollen(1989), and there are two common strategies used. The first relies on fixing the diagonal elementsof y and x as equal to one, in order to identify the scale of the latent variable distributions,while including sufficient numbers of structural zeros in the factor loadings matrices. In theBayesian setting, such an approach is often unappealing in requiring one to have prior knowledgethat a specific measurement is particularly relevant in defining a latent trait.

For example, suppose that yi = (yi1, yi2, yi3) consists of three measures of sperm concen-tration for individual i based on different technologies, and we consider the following factormodel:

yij = y,j + y,j i + y,ij , y,ij N(0, 2y,j

),

i F.(4)

Then, if F has an unknown mean and variance, we can still ensure identifiability by lettingy,1 = 0 and y,1 = 1. However, this implies that the first measure of sperm concentration istreated differently than the other two measures, which is artificial unless we have prior knowledgethat the first measure is in some sense a gold standard. An alternative, which would treat thedifferent sperm concentration measures as exchangeable, is to restrict F to have mean zero andvariance one, while letting y,j 0 for j = 1,2,3 to remove sign ambiguity.

In the parametric case, restricting the mean and variance of F is straightforward, as we cansimply let F correspond to the standard normal distribution. However, if we let F be unknownthrough using a DP or DPM prior, such constraints are not incorporated. One strategy is to choosethe base distribution to be constrained. For example, one could let F DP(F0) with F0 chosento correspond to the standard normal distribution. In this case, the prior expectation of the meanof F is zero, and the prior expectation of the variance of F is one. However, the posterior ex-pectations of the mean and variance of F can deviate substantially from these prior expectations,leading to substantially biased inferences.

3.3. Centered Dirichlet Process

To address this problem, we propose to use a centering idea related to that proposed byYang et al. (2010). In particular, they proposed a centered Dirichlet process (CDP) prior for anunknown latent variable distribution, which is constrained to have mean zero and identity covari-ance. There are two common complications in SEMs, which are not directly accommodated bytheir proposed formulation of the CDP prior. Firstly, it is important to allow the incorporation ofstructural zeros in x, y , and . Secondly, in many cases, we want the latent variance distrib-ution to be constrained to have mean zero and covariance with ones along the diagonal to allowdependence in the latent variables.

We propose two minor modifications of the CDP prior, which we refer to as CDP-1 andCDP-2, with the CDP-1 prior assuming independence in the latent variances and the CDP-2 priorallowing dependence. In both cases, structural zeros can be incorporated, and the latent variabledistribution is constrained to have mean zero and covariance with ones along the diagonal, withCDP-1 including the further constraint of zero off-diagonal elements.

PSYCHOMETRIKA

We start by describing the Yang et al. (2010) CDP prior. In particular, if i G with G Q,we say that Q corresponds to a CDP(G0) prior if

G =

h=1Vh

l


Under this specification, G has mean zero and covariance with ones along the diagonal withprobability one. This again results from a parameter-expanded DP model.

3.4. Centered Dirichlet Process MixturesThe CDP models proposed in the previous subsection assume that the latent variable distrib-

ution is discrete and hence induce latent class models. To accommodate continuous latent traits,we can instead use a CDP mixture (CDPM), which has the following form:

i =(G +

)1/2(i G

), i = 1, . . . , n,

i Nr(i ,

), i = 1, . . . , n,

i G, IW(m,0),(9)

where each of the terms is as defined in Section 3.3, but we now incorporate an additional levelto draw i from a normal distribution centered on i with covariance matrix , which can beassigned an inverse-Wishart (IW) prior distribution. The CDPM in (9) differs from the CDP in(5) in drawing i from G instead of drawing i directly. Marginalizing out the latent variables{i }, we obtain

i Nr(i , (

G + )1

), i = 1, . . . , n,

i G, i = 1, . . . , n,

G =

h=1Vh

l

PSYCHOMETRIKA

where i are latent variables in the parameter expanded working DPM model. As noted,the CDPM-1 prior assumes independent latent variables. To allow dependence, we proposea CDPM-2(G0) prior, which is identical to expression (10) with the exception that hm =( 2Gm + 2l )1/2(hm Gm). The truncated form CDPMN -2(G0) is induced through

il =( 2Gl

+ 2l

)1/2(il Gl

), i = 1, . . . , n; l = 1, . . . , s,

i N(i ,

), i = 1, . . . , n,

i G, G =N

h=1hh, i = 1, . . . , n.

(12)

For the truncated models, N is interpretable as an upper bound on the number of mixturecomponents, or latent classes, needed to characterize the data. As it is well known that one canobtain an accurate approximation to any smooth density using a mixture of a modest numberof normal components (e.g., 57), an upper bound of 30 should be sufficient from a practicalperspective. This produces an accurate approximation to the CDP and CDPM when is small(e.g., 1), which is often favored in applications as giving a sparse approximation to the datawhile allowing considerable flexibility. We recommend choosing a Gamma(1,1) prior for to letthe data inform about its value. To verify that the level of truncation is sufficiently high, one canmonitor the maximum index of a mixture component occupied by a subject in the sample acrossthe MCMC iterations. If the probability of occupying the final component is not close to zero,one should increase the truncation level and rerun the analysis, particularly if there is interest ininference on rare subpopulations. Truncation can be avoided using the retrospective sampler ofPapaspiliopoulos and Roberts (2008), but we focus on a simpler algorithm in Section 4.

4. Posterior Computation

4.1. PX-Blocked Gibbs Sampler

For posterior computation in the semiparametric SEMs proposed in Section 2, with the CDPor CDPM priors of Section 3, we propose a parameter-expansion (PX) blocked Gibbs sampler.PX algorithms have been increasingly widely used to accelerate EM (Liu et al., 1998) and Gibbssampling convergence (Liu and Wu, 1999). The basic idea of PX data augmentation algorithms isthat one can reduce autocorrelation in Gibbs sampling by introducing extra variables, which arenot identified but are only incorporated for computational reasons to reduce posterior dependencein the draws of the parameters of interest. As shown by Ghosh and Dunson (2009) for parametriclatent factor models, PX Gibbs samplers can have dramatically improved performance relative totypical Gibbs samplers, particularly when such algorithms perform poorly. Unfortunately, suchpoor performance is standard in latent factor and structural equation models, and autocorrelationin MCMC samples is often very high, making it necessary to collect hundreds of thousands oreven millions of samples in many cases to be assured of convergence and obtain estimates withnegligible Monte Carlo error.

In addition to improving convergence and mixing rates, parameter expansion can be usedto induce new classes of priors having appealing properties (Gelman, 2004). This approach wasused by Ghosh and Dunson (2009), in the setting of factor analysis, to induce heavy-tailed defaultpriors for the factor loadings, which provide a robust plug-in specification in the absence of infor-mative prior knowledge. Here, we induce priors for the parameters in SEM-CDP or SEM-CDPMmodels through a PX specification. In particular, we treat an SEM with a typical DP or DPM prior


on the latent variable distributions as a parameter-expanded version of the SEM-CDP or SEM-CDPM model. Here, the extra or redundant parameters correspond to the mean and covarianceof the latent variable distributions. In the SEM-DP or SEM-DPM models, we follow standardpractice in PX analyses and do not worry about lack of identifiability due to the redundancy inthe parameterization.

We then run a blocked Gibbs sampler, as described in Ishwaran and James (2001), for theSEM-DP or SEM-DPM model. A naive analysis utilizing the results from this blocked Gibbssampler will perform poorly, with high autocorrelation in the Gibbs samples and unreliable in-ferences due to the nonidentifiability problem. However, prior to using the results for inferences,we apply a simple postprocessing approach, which consists of transforming the draws from theworking model parameterization (corresponding to the SEM-DP or SEM-DPM) to the infer-ential model parameterization (corresponding to the SEM-CDP or SEM-CDPM). Note that notall of the latent variable distributions need be constrained to have mean zero and variance iden-tity in the SEM-CDP or SEM-CDPM analyses; instead we can incorporate only those constraintsthat would have been included in a parametric analysis with normally distributed latent traits.

4.2. Outline of Sampling StepsThe blocked Gibbs sampler is a standard approach for implementing posterior computation

for DPMs, and the adaptation to structural equation models is straightforward. However, we in-clude the detailed sampling steps here to make it easier to implement the proposed approachwithout needing to go through the algebra needed to calculate the conditional posterior distribu-tions. Note that the blocked Gibbs sampler relies on truncating the stick-breaking representationof the DP shown in (3) by letting VN = 1, so that the N + 1, . . . , terms in the sum can bediscarded.

Considering the semiparametric SEM specified in expressions (1) and (2), we focus on thecase in which

i Nr( ,

),

iid G , G CDPMN -2(G,0), IW(1,m1),i Ns

( ,

),

iid G , G CDPMN -2(G,0), IW(2,m2),(13)

where the superscript denotes that these are the unknowns under the working model. Thisspecification implies that the working model assigns F and G DPM of normal priors, with DPpriors placed on the subject-specific location parameters. Hence, the latent variables are modeledas continuous variables with unknown distributions.

Using the truncation approximation, we let

G =N

h=1V,h

l

PSYCHOMETRIKA

Note that individuals within a class will vary in the exact values of the latent traits,i and i , but will have the same class-specific mean.

2. Update the stick-breaking random variables, V,h and V,h, by sampling from their betafull conditional posterior distributions,

V,h beta(

1.0 + M,h, +N

l=h+1M,l

)

,

V,h beta(

1.0 + M,h, +N

l=h+1M,l

)

,

where M,h and M,h record the numbers of S,i and S,i that equal h.3. Update the DP precision parameters, and , by sampling from their gamma

full conditional posterior distributions, assuming gamma priors Gamma(a0, b0) andGamma(a1, b1), respectively:

Gamma(

a0 + N 1, b0 N1

l=1log(1 V,l)

)

,

Gamma(

a1 + N 1, b1 N1

l=1log(1 V,l)

)

.

4. Update the component-specific parameters ,h by sampling from the conjugate multi-variate normal obtained in updating the Nr(,0,,0) prior with the normal likelihoodfor i for those subjects with S,i = h, h = 1, . . . ,N ,

,h Nr(

,h,

,h

),

where the conditional posterior mean and variance are, respectively,

,h = ,h

[

1,0,0 +

i:i =,h1

i

]

,

,h =[

1,0 +

i:i =,h1

]1.

5. Update the component-specific parameters, ,h by updating the Ns(,0,,0) priorwith the normal likelihood for i for those subjects with S,i = h, h = 1, . . . ,N ,

,h Ns(

,h,

,h

),

where the conditional posterior mean and variance are, respectively,

,h = ,h

[

1,0,0 +

i:i =,h1

i

]

,

,h =[

1,0 +

i:i =,h1

]1.


6. Update the free elements in B,,y,x,y,x,y,x using the full conditionalposterior distributions derived as for a typical parametric SEM. These steps are all out-lined in detail in the Appendix.

These steps are repeated for a large number of iterations, with a burn-in discarded to allowconvergence.

Upon convergence, this algorithm produces draws from the posterior under the SEM-DPMworking model. We need to then postprocess the samples to obtain approximate inferences underthe SEM-CDPM inferential model. This is accomplished by transforming each of the samples.In describing this transformation, we focus on the simple case in which all the latent variabledistributions are constrained to have zero mean and identity covariance, though modifications toconstrain a subset of the latent variables and only constrain the mean or variance are straightfor-ward.

1. Calculate G , G

, G , and G

relying on expression (6), noting that the N +1, . . . , terms are zero under the truncation approximation.

2. Let i = 1/2Q (i G ) and i =

1/2Q

(i G ).3. Calculate the parameters for the inferential models:

B = 1/2Q

B, = 1/2Q 1/2Q ,

x = x1/2Q , y = y

1/2t ,

Q = Diag(G +

), Q = Diag

(G +

),

x = x + xG , y = y + y(I B)1(G + G

),

t = (I B)1[G T + G

][(I B)T]1,

where Diag(A) is a matrix equivalent to A but with zero off-diagonal elements. Although theresulting samples are not drawn from the exact posterior distribution under the SEM-CDPM-2model, the resulting approach has had excellent performance in all cases we have considered.

5. Application: World Value Survey: Job and Homelife Satisfaction

5.1. Background and Description

We consider a subset of the Inter-university Consortium for Political and Social Research(ICPSR) data set collected in the World Value Survey (WVS 19812004). The survey was con-ducted by social scientists in over 90 societies, covering almost 90 percent of the worlds popu-lation. Information such as work, family life, belief, etc. was collected. Our focus is on assessingthe relationship between job and home life satisfaction in the United States. Data on the followingvariables were collected on a 10-point ordinal scale in the US for 2390 individuals.

Job satisfaction: Overall, how satisfied are you with your job? (xi1) How free are you able to make decisions in your job? (xi2)

Home life satisfaction: How satisfied are you with your home life? (yi1) How satisfied are you with your financial situation? (yi2) How satisfied are you with your life as a whole? (yi3)

PSYCHOMETRIKA

5.2. Simulation Experiment

We assessed the performance of the approach through a simulation example designed tomimic the ICPSR data described in Section 5.1. In the application, we use the same sample sizeas the real data. We assume the structural equations model (1)(2) with xij = xij for j = 1,2and yij = yij for j = 1,2,3. The latent response variable i measures home life satisfactionof individual i, the latent predictor i measures job satisfaction, B = 0, and measures therelationship between job and home life satisfaction. The true parameters are set as y = (0,0,0),x = (0,0), y = (1,1,1), x = (1,1), = 1.0, and x = Diag(1,1), y = Diag(1,1,1).

We conducted two simulations: for the first simulation, both the random errors i and latentvariable i are drawn from standard normal distributions; for the second one, we instead assumethe following mixtures of two normals:

i 0.75N(0.51,0.512) + 0.25N(1.54,0.262),

i 0.63N(0.66,0.602) + 0.37N(1.13,0.302),

each of which has mean 0 and variance 1. In the simulation, one of our goals was to assesswhether the data contain sufficient information to reliably estimate the latent variables densities.

We analyzed the simulated data using the CDPMN -2 prior for G. The DP precision parame-ter was treated as unknown using a Gamma(1,1) hyperprior. Conditionally conjugate priorswere chosen for the remaining parameters: the diagonal elements of 1x ,1y were assignedGamma(1,1) priors, N(0,1), we chose truncated positive normals N(0,5.0I ) for the load-ings y and x , and N(0,5.0I ) for the intercepts x and y . The blocked Gibbs sampler wasrun for 10,000 iterations after a 2,000 iteration burn-in. To assess convergence, we ran several in-dependent chains with different starting values; for sensitivity to prior specification, we also triedwith varied variances: priors with variance/2, prior with variance 2, priors with variance 4.With all these different trials, the results were essentially the same.

Table 1 provides posterior summaries of the parameters in the case in which the latent vari-ables are normally distributed. Figure 1 plots the estimated and true latent variable and randomerror distributions. Table 2 and Figure 2 provide the corresponding results when the latent vari-ables and random errors have mixture normal distributions. For both figures from left to right, thesubplots are for densities of i , i , and i . From the results we can see that our proposed nonpara-metric Bayes method produces better estimates of the parameters and latent variable distributionsthan a DPM implemented under the same hyperparameter specification.

When normality is true, the CDPM compares favorably to the results under a parametricmodel that assumes normality, while clearly outperforming the normal model under departuresfrom normality. Generally, when the latent variable distribution is close to the base measure G0,the results of the DPM model are fairly good. However, when the actual latent variable densitydeviates substantially from G0, the estimates of the DPM model deviate greatly from the truevalues. On the contrary, the CDPM results are more robust to the shape of the latent variabledensity since the method resolves the identifiability issue.

5.3. Analysis and Results

We repeated the analysis conducted for the simulation example for the real data. However,to accommodate the 110 ordinal scale of the real data, we let yij = g(yij ) and xij = g(xij ),with g() a threshold link function defined as g(z) = 1 for z < 0, g(z) = l for z [l 2, l 1),l = 2, . . . ,9, and g(z) = 10 for z 9. It is standard practice in the literature to treat 110 rank-ing data as continuous to avoid needing to estimate the very many threshold parameters thatwould typically be incorporated in models for ordered categorical measurements. Estimating


TABLE 1.Parameter estimates in the simulation example with true normal density.

Parameter True value DPM CDPMEstimate SD Estimate SD

1.00 1.04 1.39e-2 1.00 4.64e-31x 0.00 0.17 1.04e-2 3.18e-3 5.89e-42x 0.00 0.17 1.07e-2 1.38e-2 5.83e-4x,1 1.00 0.85 6.15e-3 1.01 6.40e-4x,2 1.00 0.84 6.10e-3 0.98 6.44e-4 2x,1 1.00 0.99 1.00e-3 0.96 1.00e-3

2x,2 1.00 1.01 1.03e-3 1.01 1.03e-3

1y 0.00 0.26 1.13e-2 1.08e-2 7.32e-42y 0.00 0.25 1.16e-2 6.64e-3 7.25e-43y 0.00 0.28 1.18e-2 1.73e-2 7.02e-4y,1 1.00 0.90 4.52e-3 1.02 6.37e-4y,2 1.00 0.92 4.59e-3 1.03 6.39e-4y,3 1.00 0.88 4.56e-3 0.98 6.18e-4 2y,1 1.00 1.04 9.35e-4 1.04 9.35e-4

2y,2 1.00 1.05 9.20e-4 1.05 9.20e-4

2y,3 1.00 1.00 8.85e-4 1.00 8.85e-4

FIGURE 1.True (normal) and estimated latent variable and residual error densities in the simulation mimicking job home-life satis-faction. From left to right, the subplots are for densities of i , i , and i . Dotted line: 99.75 percentile; solid line: averagedensity; dotdash line: 0.025 percentile; dashed line: N(0,1).

these thresholds adds to the computational burden and leads to identifiability problems, since cer-tain values of the 110 ranking may be rare for some of the indicators. We address this problemby using prespecified thresholds, while avoiding constraints on the variance of the measurementserrors in underlying variables. As our model is semiparametric, the model is still very flexible.

Convergence was verified from multiple runs at widely dispersed starting values, and theresults were again robust to changes in the hyperparameters. Posterior summaries of the para-meters are provided in Table 3. As expected, there is a fairly strong positive association be-tween job and home life satisfaction, with the estimated value of equal to 0.68 with a lowposterior standard deviation. Hence, people who are satisfied with their jobs tend to also be

PSYCHOMETRIKA

TABLE 2.Parameter estimates in the simulation example with true nonnormal density.

Parameter True value DPM CDPMEstimate SD Estimate SD

1.00 1.11 1.21e-2 1.01 5.94e-31x 0.00 0.27 5.81e-3 3.12e-3 9.33e-42x 0.00 0.25 5.86e-3 4.01e-2 9.45e-3x,1 1.00 0.79 3.20e-3 1.01 8.80e-3x,2 1.00 0.80 3.19e-3 1.01 8.68e-3 2x,1 1.00 1.00 1.27e-2 1.00 1.27e-2

2x,2 1.00 1.02 1.31e-2 1.02 1.31e-2

1y 0.00 0.50 7.59e-3 2.36e-3 1.14e-32y 0.00 0.52 7.45e-3 3.31e-2 1.15e-33y 0.00 0.52 7.71e-3 1.31e-2 1.16e-3y,1 1.00 0.77 2.25e-3 0.98 8.34e-3y,2 1.00 0.76 2.24e-3 0.97 8.09e-3y,3 1.00 0.78 2.30e-3 1.00 8.19e-3 2y,1 1.00 1.01 1.38e-3 1.01 1.38e-3

2y,2 1.00 1.01 1.39e-3 1.01 1.39e-3

2y,3 1.00 0.99 1.37e-3 0.99 1.37e-3

FIGURE 2.True (nonnormal) and estimated latent variable and residual error densities in the simulation mimicking job home-lifesatisfaction. From left to right, the subplots are for densities of i , i , and i . Dotted line: 99.75 percentile; solid line:average density; dotdash line: 0.025 percentile; dashed line: mixture of two normals with mean 0 and variance 1.

satisfied with their home life and vice versa. The loadings on the job satisfaction latent vari-able i were higher for xi1 than xi2, with the coefficient estimates implying a correlation coeffi-cient of

corr(xi1, xi2) = x,1x,22x,1 + 2x,1

2x,2 + 2x,2

= 0.522


TABLE 3.Parameter estimation of the ICPSR example.

Parameter Estimate SE

1x 6.55 1.34e32x 5.80 1.25e3 0.68 8.00e4x,1 2.52 2.04e3x,2 1.53 1.53e3 2x,1 0.20 1.28e3

2x,2 5.99 4.29e3

1y 6.32 8.10e42y 5.35 1.14e33y 7.11 8.93e4y,1 1.30 1.12e3y,2 1.15 1.17e3y,3 1.20 1.01e3 2y,1 1.11 2.03e3

2y,2 4.84 3.78e3

2y,3 1.96 2.16e3

FIGURE 3.Estimated latent variable and residual error densities for job home-life satisfaction example. From left to right, thesubplots are for densities of i , i , and i . Dotted line: 97.5 percentile; solid line: average density; dotdash line: 2.5percentile; dashed line: N(0,1).

between the measurement of overall job satisfaction and freedom to make decisions. For thehome life latent variable i , the induced correlation coefficient estimates were corr(yi1, yi2) =0.360, corr(yi1, yi3) = 0.506, and corr(yi2, yi3) = 0.301.

In addition to assessing the relationships between job and home life satisfaction, it is ofinterest to examine the distribution of these latent traits among individuals. Figure 3 plots the

PSYCHOMETRIKA

densities of the latent variables i , i and the residual i . From left to right, the subplots are fordensities of i , i , and i . Each of the densities is clearly non-Gaussian, providing motivation fora semiparametric model. The density of i has a bimodal shape, suggesting a subgroup of indi-viduals having higher job satisfaction. The density of i shows some suggestion of three modes,with a dominate mode concentrated at zero and secondary modes corresponding to individualswith high and low job satisfaction. The residual density is closer to normal and unimodal but hasheavier tails than normal.

Zhu and Lee (2001) provided a similar example using UK data from the World Value Sur-vey, with their method relying on a finite mixture of LISREL models. They also reported amoderate positive association between job and home life satisfaction. Our method differs sub-stantially from theirs in that we are not mixing LISREL models but instead allow the latenttraits distributions within one LISREL model to be unknown using novel nonparametric Bayesmethods. By using an infinite mixture instead of a finite mixture for the latent trait densi-ties, we avoid the difficult issue of how to select an appropriate number of mixture compo-nents. In addition, we allow new components to be introduced automatically as needed as ad-ditional individuals are added to the sample. For example, there may be rare subpopulationshaving very different traits, with such subpopulations not represented in an initial sample. TheFortran source code to implement our approach is available for download at the following link:http://cid-4ed60035efdc32a4.skydrive.live.com/self.aspx/.Public/JobHome.f90.

6. Discussion

In recent years, there has been increasing emphasis in a wide variety of disciplines in collect-ing multivariate data and in assessing latent relationships underlying these data. For this reason,structural equation models (SEMs) that have historically been used primarily in social scienceshave seen increasing use in a broad variety of fields. In disciplines less familiar with latent vari-ables and SEMs, researchers are often concerned about issues in identifiability and robustness toparametric assumptions given that the latent variables are not observed directly for any individ-uals under study.

We have proposed a semiparametric Bayes approach to analyses of SEMs, which has pro-vided promising results in that we have been able to accurately estimate latent variable densitiesand structural parameters in different applications without requiring normality of latent variabledistributions. However, in applying semiparametric Bayes methods in such contexts, it is crucialto continue to employ the same identifiability restrictions used in parametric SEMs. We haveproposed a simple centering approach to accomplish this and have noted in simulation studiesthat one can obtain very poor results in using an approach which instead places restrictions onlyon the base measure in the Dirichlet process.

Acknowledgements

This research was partially supported by grant number R01 ES017240-01 from the NationalInstitute of Environmental Health Sciences (NIEHS) of the National Institutes of Health (NIH).

Appendix

Here, we present the conditional posterior distributions used in implementing the Gibbssampler for the model (13). This Gibbs sampler can be easily modified to allow changes inmodel specifications.


1. Under prior y N(y ,y ), the conditional posterior is N(y ,y ) with

1y = 1y + n1y ,

y = y(

1y y +n

i=11y (yi yi)

)

.

2. Under prior y N+(y ,y ), the conditional posterior is N+(y ,y ) with

1y = 1y +n

i=12i

1y ,

y = y(

1yy +n

i=1i

1y (yi y)

)

.

3. Under the prior 2yk IG(g0, l0), the conditional posterior is also Inverse Gamma withparameters

g0 = g0 + n/2,

l0 = l0 +n

i=1

(y(k)i (k)y (k)y i

)2/2,

where m(k) is the kth element of m.4. Under the prior 2x,k IG(g1, l1), the posterior is also Inverse Gamma with parameters

g1 = g1 + n/2,

l1 = l1 +n

i=1

(x(k)i (k)x (k)x i

)2/2.

5. Prior x N(x ,x ) leads to the conditional posterior N(x ,x ) with

1x = 1x + n1x ,

x = x(

1x x +n

i=11x (xi x i)

)

.

6. Let jk denote the j th row and kth column element of ; ji and 2ji

as the conditional

mean and variance for the j th element of i . We specify prior jk N(0 , 20). Theposterior of jk also has normal distribution with parameters

2jk = 20 +n

i=12

ji

,

jk = 2jk(

20 0 +n

i=12

ji

(j)i

)

,

where (j)i = (j)i s

l=1,l =k jl(l)i j .

PSYCHOMETRIKA

7. Under prior x N+(x ,x ), the conditional posterior is N+(x ,x ) with

1x = 1x +n

i=12i

1x ,

x = x[

1xx +n

i=1 i

1x (xi x)

]

.

8. The conditional posterior of i is a multivariate normal distribution with parameters

1 = 1 + 1 + x1x x, =

{1 (i ) + 1 + Tx 1x (xi x)

}.

9. The conditional posterior of i is a multivariate normal distribution with parameters

1 = Ty 1y y + 1 ,i =

{Ty

1y (yi y) + 1 ( i + )

}.

10. With prior IW(1,m1), the posterior of is IW{1 + ni=1( i )( i )T,m1 + n}.

11. With prior IW(2,m2), the posterior of is IW{2 + ni=1(i i ) (i i )T,m2 + n}.

References

Ansari, A., & Iyengar, R. (2006). Semiparametric Thurstonian models for recurrent choice: a Bayesian analysis. Psy-chometrika, 71, 631657.

Blackwell, D., & MacQueen, J.B. (1973). Ferguson distribution via Polya Urn schmes. Annals of Statistics, 2, 353355.Bollen, K.A. (1989). Structural equations with latent variables. New York: Wiley.Brown, E.R., & Ibrahim, J.G. (2003). A Bayesian semiparametric joint hierarchical model for longitudinal and survival

data. Biometrics, 59, 221228.Burr, D., & Doss, H. (2005). A Bayesian semiparametric model for random-effects meta-analysis. Journal of the Ameri-

can Statistical Association, 100, 242251.Bush, C.A., & MacEachern, S.N. (1996). A semiparametric Bayesian model for randomised block designs. Biometrika,

83, 275285.Dunson, D.B. (2006). Bayesian dynamic modeling of latent trait distributions. Biostatistics, 7, 551568.Escobar, M.D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American

Statistical Association, 90, 577588.Fahrmeir, L., & Raach, A. (2007). A Bayesian semiparametric latent variable model for mixed responses. Psychometrika,

72, 327346.Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 2, 209230.Ferguson, T.S. (1974). Prior distributions on spaces of probability measures. Annals of Statistics, 2, 615629.Fokoue, E. (2005). Mixtures of factor analyzers: an extension with covariates. Journal of Multivariate Analysis, 95,

370384.Fokoue, E., & Titterington, D.M. (2003). Mixtures of factor analysers: Bayesian estimation and inference by stochastic

simulation. Machine Learning, 50, 7394.Gelman, A. (2004). Parameterization and Bayesian modeling. Journal of the American Statistical Association, 99, 537

545.Ghosh, J., & Dunson, D.B. (2009). Default priors and efficient posterior computation in Bayesian factor analysis. Journal

of Computational and Graphical Statistics, 18, 306320Ishwaran, H., & James, L.F. (2001) Gibbs samplling methods for stickbreaking priors. Journal of the American Statistical

Association, 96, 161173.Ishwaran, H., & Takahara, G. (2002). Independent and identically distributed Monte Carlo algorithms for semiparametric

linear mixed models. Journal of the American Statistical Association, 97, 11541166.Jedidi, K., Jagpal, H.S., & DeSarbo, W.S. (1997). Finite-mixture structural equation models for response-based segmen-

tation and unobserved heterogeneity. Marketing Science, 16, 3959.


Jreskog, K.G., & Srbom, D. (1986). Lisrel VI: Analysis of linear structural relationships by maximum likelihood,instrumental variables, and least squares methods. Mooresville: Scientific Software.

Kleinman, K.P., & Ibrahim, J.G. (1998). A semiparametric Bayesian approach to the random effects model. Biometrics,54, 921938.

Lee, S.Y., & Xia, Y.M. (2006). Maximum likelihood methods in treating outliers and symmetrically heavy-tailed distri-butions for nonlinear structural equation models with missing data. Psychometrika, 71, 565585.

Lee, S.Y., Lu, B., & Song, X.Y. (2008). Semiparametric Bayesian analysis of structural equation models with fixedcovariates. Statistics in Medicine, 27, 23412360.

Li, Y., Mller, P., & Lin, X. (in press). Center-adjusted inference for a nonparametric Bayesian random effect distribution.Statistica Sinica.

Liu, C.H., Rubin, D.B., & Wu, Y.N. (1998). Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika,85, 755770.

Liu, J.S., & Wu, Y.N. (1999). Parameter expansion for data augmentation. Journal of the American Statistical Associa-tion, 94, 12641274.

Lubke, G.H., & Muthn, B. (2005). Investigating population heterogeneity with factor mixture models. PsychologicalMethods, 10, 2139.

McLachlan, G.J., Bean, R.W., & Jones, L.B.T. (2007). Extension of the mixture of factor analyzers model to incorporatethe multivariate t-distribution. Computational Statistics & Data Analysis, 51, 53275338.

Mller, P., & Rosner, G.L. (1997). A Bayesian population model with hierarchical mixture priors applied to blood countdata. Journal of the American Statistical Association, 92, 12791292.

Ohlssen, D.I., Sharples, L.D., & Spiegelhalter, D.J. (2007). Flexible random-effects models using Bayesian semi-parametric models: applications to institutional comparisons. Statistics in Medicine, 26, 20882112.

Palomo, J., Dunson, D.B., & Bollen, K.A. (2007). Bayesian structural equation modeling. In S.-Y. Lee (Ed.), Handbookof latent variable and related methods (pp. 163188). Amsterdam: Elsevier.

Papaspiliopoulos, O., & Roberts, G.O. (2008). Retrospective Markov chain Monte Carlo methods for Dirichlet processhierarchical models. Biometrika, 95, 169186.

Sethuraman, J. (1994). A constructive definition of the Dirichlet process. Statistica Sinica, 4, 639650.Zhu, H.T., & Lee, S.Y. (2001). A Bayesian analysis of finite mixtures in the LISREL model. Psychometrika, 66, 133152.Walle, F. (1980). Education and the demographic transition in Switzerland. Population and Development Review, 3,

463472.Yang, M., Dunson, D.B., & Baird, D. (2010). Semiparametric Bayes hierarchical models with mean and variance con-

straints. Computational Statistics & Data Analysis, 54, 21722186.

Manuscript Received: 28 APR 2008Final Version Received: 11 FEB 2010

Bayesian Semiparametric Structural Equation Models with Latent VariablesAbstractIntroductionSemiparametric Bayes SEMsCentered Dirichlet Process Mixtures for Latent VariablesDirichlet Processes and Dirichlet Process MixturesIdentifiability IssuesCentered Dirichlet ProcessCentered Dirichlet Process Mixtures

Posterior ComputationPX-Blocked Gibbs SamplerOutline of Sampling Steps

Application: World Value Survey: Job and Homelife SatisfactionBackground and DescriptionSimulation ExperimentAnalysis and Results

DiscussionAcknowledgementsAppendixReferences

/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /Warning /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 150 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 600 /MonoImageMinResolutionPolicy /Warning /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

/Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles true /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /NA /PreserveEditing false /UntaggedCMYKHandling /UseDocumentProfile /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice

Bay Semiparam SEM. Yang n Dunson 2010

Documents

Transcript of Bay Semiparam SEM. Yang n Dunson 2010