Semiparametric Bayesian latent variable regression for skewed …dbandyop/SkewPaperCombined.pdf ·...

Biometrics , 1–20

Semiparametric Bayesian latent variable regression for skewed multivariate

data

Apurva Bhingare 1,∗, Debajyoti Sinha2, Debdeep Pati3, Dipankar Bandyopadhyay4

and Stuart R. Lipsitz5

1Eunice Kennedy Shriver National Institute of Child Health & Human Development, Bethesda, MD

2Department of Statistics, Florida State University, Tallahassee, FL

3Department of Statistics, Texas A & M University, College Station, TX

4Department of Biostatistics, Virginia Commonwealth University, Richmond, VA

5Department of Medicine, Brigham and Women’s Hospital, Boston, MA

*email: [email protected]

Summary: For many real-life studies with skewed multivariate responses, the level of skewness and association

structure assumptions are essential for evaluating the covariate effects on the response and its predictive distribution.

We present a novel semiparametric multivariate model and associated Bayesian analysis for multivariate skewed

responses. Similar to multivariate Gaussian densities, this multivariate model is closed under marginalization, allows

a wide class of multivariate associations, and has meaningful physical interpretations of skewness levels and covariate

effects on the marginal density. Other desirable properties of our model include the Markov Chain Monte Carlo com-

putation through available statistical software, and the assurance of consistent Bayesian estimates of the parameters

and the nonparametric error density under a set of plausible prior assumptions. We illustrate the practical advantages

of our methods over existing alternatives via simulation studies and the analysis of a clinical study on periodontal

disease.

Key words: Dirichlet process; Kernel density; Markov Chain Monte Carlo; Periodontal disease; Skewed error

This paper has been submitted for consideration for publication in Biometrics

Bayesian semiparametric multivariate skewed models 1

1 Introduction

In many biomedical and health-care cost studies, we often come across highly skewed

multivariate responses. For example, the preliminary analysis (see Figure 2) of clinical

data from a periodontal disease (PD) study (Fernandes et al., 2009) of Gullah-speaking

African-Americans diabetics (henceforth, GAAD study) clearly reveals that the two major

(correlated) clinical endpoints of subject-level PD status (Greenstein, 1997)– the mean

(average) periodontal pocket depth (PPD) and the mean clinical attachment level (CAL)

of all available teeth of each subject (cluster) are highly non-Gaussian (skewed), with the

skewness of CAL being different from the skewness of PPD. While the CAL is the accepted

‘gold standard’ endpoint for measuring the progression and history of PD, the PPD measures

the current disease activity. Also, CAL can occur in presence, or absence of PPD, and

vice versa. Hence, a holistic approach towards assessing the impact of a covariate (say,

diabetes status) on PD should consider the effect on the joint distribution of CAL and

PPD. A typical multivariate (normal) regression analysis that ignores non-Gaussianity (say,

skewness of CAL) may lead to biased parameter estimates (Azzalini and Capitanio, 2003),

and the corresponding predictive density. Although analysis may proceed transforming the

multivariate skewed responses to multivariate normal, difficulties remain in choosing a suit-

able transformation, and interpreting the covariate effects on the original non-transformed

responses. As alternatives, the quantile regression (QR) methods (Koenker, 2005) and their

multivariate extensions mostly focus on a single pre-specified quantile, without evaluating

the levels of skewness, and the whole joint density of the PPD and CAL given covariates.

This paper addresses a number of major challenges and limitations of existing model-

based regression analysis of multivariate skewed response data in biostatistical applications.

First, many recently developed model-based approaches (Genton, 2004; Sahu et al., 2003;

Bandyopadhyay et al., 2012) centers around the restricted parametric family pioneered

2 Biometrics,

by Azzalini and Dalla Valle (1996). Our less-restrictive semiparametric proposition with

skewed nonparametric marginal error densities will remain attractive under concerns about

the validity and consequence of the parametric assumptions on the error density. Second,

unlike most existing multivariate skewed models, our elegant model class is closed under

marginalization. Specifically, the distribution of any subset of the response vector is within

the same multivariate distribution class of the original vector, however, the smaller vector’s

density involves only the relevant subset of the parameters. So, unlike existing models, where,

say, the marginal variance and skewness of the CAL depend on the parameters of PPD, the

marginal variance and skewness of CAL in our model do not depend on the parameters

of PPD. This property alleviates the major roadblock for component-wise interpretation

of skewness and covariate effects, and also paves way for specifying priors based solely on

available marginal opinion for each response. Third, the accommodation of a flexible class of

multivariate associations (such as antedependence, toeplitz, or autoregressive structures) via

a covariance matrix continues to remain a major challenge. Only a recent proposal (Chang

and Zimmerman, 2016) accommodates the antependence association, however, at the cost

of a strict condition on the skewness parameters. Our Bayesian semiparametric formulation

allows a broader class of multivariate associations (including the aforementioned structures)

within the response vector, without imposing any restriction on skewness parameters. Fourth,

our proposal is computationally scalable, with easy implementation in freeware like R and

JAGS using available Markov Chain Monte Carlo (MCMC) tools. Finally, we also explore

and present some desirable asymptotic justification (Bayesian posterior consistency) of our

method under practically reasonable prior assumptions.

The rest of the paper proceeds as follows. After introducing our semiparametric model in

Section 2, we present the relevant components of Bayesian inference, such as prior choices,

likelihood specifications, posteriors, and the MCMC implementation routines in Section 3.


In Sections 4 and 5, we assess the posterior consistency and the finite sample properties of

the Bayesian estimates (via simulation studies), respectively. In Section 6, we illustrate our

methods via analyzing the motivating GAAD dataset. Finally, some concluding remarks are

presented in Section 7.

2 Semiparametric Model for Multivariate Skewed Response

Let Yi = (Yi1, . . . , Yim) ∈ Rm be a row vector of skewed responses from subject (cluster)

i = 1, . . . , n. For brevity, we use fixed number of m components per subject, which can be

generalized to accommodate unequal mi. Our regression model is

Yij = µij + εij, for j = 1, . . . ,m, (1)

where µij = xijβj, xij = (1, xij1, . . . , xij,p−1) is the row vector of (p−1) covariates for Yij (with

first term for the intercept), βj is the corresponding (p× 1) vector of regression parameters,

and εi = (εi1, . . . , εim) is the row vector of errors following a m-variate multivariate density

fm(ε), with possibly skewed marginal densities. For the PD study, m = 2 for every subject.

We introduce a new class of semiparametric multivariate skew (SMS) density for fm(ε),

denoted by εi ∼ SMS(0, h, Rρ, α), where Rρ is a (m×m) correlation matrix, h = (h1, . . . , hm)

is a vector of m unknown nonparametric (baseline) symmetric univariate densities, and

α = (α1, . . . , αm) ∈ Rm is a vector of skewness parameters. This SMS(0, h, Rρ, α) density

has the following stochastic representation

εTi = Aα|Z1i|+ A∗αZ2i , (2)

where Z1i = (Z1i1, . . . , Z1im)T and Z2i = (Z2i1, . . . , Z2im)T are two independent (m × 1)

vectors of latent variables with hj(·) being the common marginal density for Z1ij and Z2ij,

Aα and A∗α are diagonal matrices with the jth diagonal elements aj = αj(1 + α2j )−1/2 and

a∗j = (1 + α2j )−1/2 respectively. The univariate error εij in (2) can also be expressed as

εij = ajZ2ij + a∗j |Z1ij|, using the “skewing shock” |Z1ij| to the symmetric random noise

Z2ij with weights a∗j and aj, such that a2j + a∗j2 = 1. Only when the skewness parameters

(α1, . . . , αm) are all equal to zero, the εi of (2) equals Z2i with symmetric marginal densities

4 Biometrics,

hj(·). We assume m “skewing shocks” Z1ij, j = 1, . . . ,m to be independent with marginal

densities h1(·), . . . , hm(·), and we model the association within εi via the m-variate Gaussian

Copula (Nelsen, 2007)

Cm(Z2i | h,Rρ) = φm[Φ−11 {H1(z2i1)}, . . . ,Φ−11 {Hm(z2im)} | Rρ

]×

m∏j=1

hj(z2ij)

φ1

[Φ−11 {Hj(z2ij)}

](3)

for Z2i, where φm(. | M) is the m-variate Gaussian density with mean 0 and variance-

covariance matrix M , Rρ is a correlation matrix with unknown parameter vector ρ, Φ−11 (.)

is the inverse cdf of N(0, 1), and Hj(·) is the cdf of the unknown symmetric density hj(·).

The copula structure of (3) ensures that Z2ij has the nonparametric marginal density hj(·)

(same as Z1ij) for j = 1, . . . ,m. We model hj(·) as a scale mixture of symmetric (around 0)

density kernel K(· | σ), given by

hj(u) =

∫ ∞0

K(u | σ)dGj(σ), (4)

where Gj(·) is the unknown nonparametric mixing distribution of scale σ with support in

R+ = (0,∞). In this paper, we use the popular Gaussian kernel N(0, σ2) as the choice for

kernel K(· | σ) in (4). However, other choices are possible. For example, using the result of

Feller (1971) (p. 158), when the kernel K(· | σ) is Uniform(−σ, σ), the nonparametric hj(u)

in (4) represents the class of all unimodal and symmetric around 0 densities. The semipara-

metric Bayesian inference proceeds via specifying a Dirichlet process prior (Ferguson, 1974;

Muller et al., 2015) DP(G0j, C) on Gj(·), with pre-specified prior mean G0j(·) and precision

C. In Section 3, we explain that this DP(G0j, C) prior process on Gj(·) induces a flexible

prior on the nonparametric density hj(·) in (4).

When Z1i and Z2i in (2) are assumed to have parametric multivariate elliptical densities,

the resulting model is similar to the “multiple skewed shocks” model of Sahu et al. (2003).

However, our semiparametric model with nonparametric marginals has two additional crucial

differences: (a) a new parametrization of weights (Aα, A∗α), and (b) a multivariate copula


modeling of Z2i as in (3). Also, our model is applicable even in studies where no actual

latent skewing shock vectors are clinically interpretable in practice. Later, we demonstrate

that the latent vector representation in (2) provides computational convenience.

Now, a parametric subclass of the semiparametric SMS(0, h, Rρ, α) density of (2)-(3) is

obtained when Z1i and Z2i in (2) have parametric mean-zero multivariate normal densities

Nm(0, D2σ) and Nm(0,Σ) respectively with variance-covariance matrices Σ = DσRρDσ, where

Dσ = diag(σ1, . . . , σm), and Rρ is a positive definite correlation matrix. We call this a

(multivariate) parametric Gaussian mixture (PGM) model, denoted by εi ∼ PGM(0, α,Σ).

Despite similar parametrization based on (α,Σ), our parametric PGM model has key dif-

ferences with the location 0 parametric multivariate skew-normal (MSN) model of Azza-

lini and Dalla Valle (1996), denoted by εi ∼ MSN(0, α,Σ), with the corresponding joint

density fm(εi1, . . . , εim) = 2φm(εi|Σ)Φ1(εTi α). The marginal distribution of each compo-

nent εij for PGM(0, α,Σ) density is the univariate skew-normal (SN) density fj(εij) =

2φ1(εij|σj)Φ1(αjεij) denoted by SN(0, αj, σj) and dependent only on the component-specific

parameters (αj, σj), whereas, for the MSN(0, α,Σ) model of Azzalini and Dalla Valle (1996),

the corresponding univariate marginal density is SN(0, αj, σj), however, with parameter αj

being a function of the entire (α,Σ) (Chang and Zimmerman, 2016). See Appendix A for

details on the expression of αj. In general, similar to the parametric formulation of Sahu

et al. (2003), the model class in (2) is closed under marginalization, because the marginal

density of any subset ε(1)i of εi = (ε

(1)i , ε

(2)i ) is from the same class of semiparametric density

SMS(0, h(1), R11, α(1)) indexed by (h(1), R11, α

(1)), where α = (α(1), α(2)), (h1, . . . , hm) =

(h(1), h(2)) and matrix R =

R11 R12

R12 R22

are partitions of (α, h,R). Unlike the multivariate

model of (2) based on multiple “skewing shocks” |Z1i1|, . . . , |Z1im|, the existing multivariate

models, such as MSN(0, α,Σ) are based on univariate “skewing shock” |Z∗1i| (common to

all m components). As a consequence, these models are closed under marginalization only

6 Biometrics,

in a restricted sense because εi ∼ MSN(0, α,Σ) ⇒ ε(1)i ∼ MSN(0, α∗(1),Σ

∗(1)), where the

parameters (α∗(1),Σ∗(1)) are different from (α(1),Σ11), and in fact, they are even functions

of (α(2),Σ12,Σ22) (see Theorem 1 of Chang and Zimmerman (2016) for details). For the

bivariate PD response in GAAD study, the existing MSN models are essentially based on a

subject-specific univariate skewing shock |Z∗1i| common to both CAL and PPD, compared to

the paired skewing shocks (|Z1i|, |Z2i|) in model (2). Consequently, α1, depends on α1, as well

as (α2, σ2, ρ). This complicates the marginal interpretation and Bayesian prior specification

of α1 for the MSN model.

To further demonstrate the lack of interpretation of α as a skewness parameter within

the MSN model compared to the straightforward interpretation within our model, Figure 1

presents the contour plots and marginal densities of the bivariate MSN (Chang and Zim-

merman, 2016) model (right panel) and our bivariate PGM model (left panel), when both

models have common skewness parameter α1 = α2 = α0 = 10, ρ = −0.7 and σ1 = σ2 = 1.

While Figure 1(a) demonstrate substantial positive skewness of both components from the

bivariate PGM model (as expected from the chosen α0 as high as 10), both the (near elliptical)

contour and marginal density plots in Figure 1(b) indicate approximately symmetric marginal

densities from the MSN model. This illustrates the fact that, depending on the value of ρ and

other parameters of the model, the value (even its sign) of the marginal Pearsonian skewness

of the MSN model may be very different from the value of α0. However, for our multivariate

model in (2), the Pearsonian skewness of each marginal density does not depend on ρ and

scale parameters of other components. Further comparisons of contour plots of the bivariate

PGM with the bivariate SN density for various choices of α0 = 0, 2, 5, 10 and σ1 = σ2 = 1

with ρ = 0.7 are presented in Web Appendix B.

The marginal mean of Yi for our SMS model is E (Yij|Xi) = β∗j0 +∑p−1

l=1 xijlβjl with

β∗j0 = βj0 + {αj/(1 + α2j )

1/2}λj. The covariance matrix of Yi is var (Yi|Xi) = V , where


the diagonal elements Vjj = σ2hj − {α2

j/(1 + α2j )}λ2j , and the off-diagonal elements Vjk =

σhjka∗ja∗k, with weight a∗j = 1/(1 + α2

j )1/2 for all j 6= k ∈ {1, . . . ,m}. Here, the common

expectations E(|Z1ij|) = λj and variances V ar(Z1ij) = V ar(Z2ij) = σ2hj are taken with

respect to the nonparametric marginal density hj(·) of Z1ij and Z2ij, given in (4). The

covariance Cov(Z2ij, Z2ik) = E[Z2ijZ2ik] = σhjk involves the expectation with respect to the

bivariate density of the pair (Z2j, Z2k) based on the copula model of (3). For the GAAD

study, the correlation Corr(εi1, εi2) between PPD and CAL responses within subject/cluster

i is σh12a∗1a∗2/[{σ2

h1 − a21λ21}{σ2h2 − a22λ22}]1/2, where σh12 = E[Z1Z2] is the expectation taken

with respect to the bivariate version of (3) for m = 2.

3 Bayesian Inference: Likelihood, Prior and Posterior

In this section, we develop the semiparametric Bayesian inferential framework for the SMS

model. Given observed data D = {yi, Xi; i = 1, . . . , n} from n independent multivariate

responses, the likelihood function of the parameter space Θ = (β, α, ρ, σ,G) is

L(Θ | D) =n∏i=1

fm(yi −Xiβ), (5)

where fm(εi) is the integral∫Cm(A∗−1α {εi −Aα|Z1i|} | h,Rρ)[

∏mj=1 hj(Z1ij)] dZ1i, taken over

the support Rm of latent vectors Z1i in (2), hj(·) is the independent marginal density of

Z1ij, εi = yi − Xiβ, and Cm(· | h,Rρ) is the copula specification in (3). Clearly, fm(·) of

εi = yi −Xiβ in (2) has a complicated analytical expression. Based on the likelihood in (5),

the joint posterior density is

p(Θ | D) ∝ L(Θ | D)π1(β)π2(α)π3(ρ)π4(G) , (6)

with the following priors.

(i) The priors π1(β) and π2(α) for β and α are independent mean zero multivariate normal

with pre-specified covariance matrices Mβ and Mα, respectively. In particular, we choose

these as σ2I, where σ2 = 100.

(ii) For the GAAD study with bivariate response, the prior π3(ρ) of the scalar parameter ρ in

8 Biometrics,

Rρ (3) is Uniform(−1, 1). However, for a vector ρ, this π3(ρ) prior has to be a multivariate

density with an appropriate support.

(iii) The joint prior process π4(G) of G = (G1, . . . , Gm) is the product of m independent Dirichlet

processes DP(G0j, C), for j = 1, . . . ,m (Ferguson, 1974), where the prior mean G0j of Gj is

specified based on the “prior guess” h0j(u) =∫∞0K(u|σ)dG0j(σ) of the density hj(u) of Z1ij

in (4). The user-specified concentration/precision parameter C is the measure of uncertainty

of Gj around G0j; the larger the value of C, the closer the sample-paths of hj(u) are to the

“prior guess” h0j(u), while smaller values of C allow the sample-path of hj(u) to be very

different from h0j(u).

The posterior in (6) is analytically intractable, hence, we proceed via MCMC sampling

from the joint distribution

p(Θ, Z1 | D) ∝ [n∏i=1

φm(Wi | 0, Rρ)m∏j=1

hj(Z1ij)]π1(β)π2(α)π3(ρ)π4(G) , (7)

where φm(u|0, R) with (m × m) covariance matrix R is a mean-zero multivariate normal

density evaluated at u ∈ Rm, Wi = (Wi1, . . . ,Wim) with Wij = Φ−11 [Hj{(1 + α2j )

1/2(yij −

Xijβ)−αj|Z1ij|}], Φ−11 is the inverse-cdf of the standard normal density, and Hj(·) = Hj(·|Gj)

is the cdf corresponding to the density hj(·|Gj). For the parametric case, hj(·|Gj) and π4(G)

are replaced with the parametric density hj(.|σj) and the joint parametric prior of π4(σ).

For example, when hj(·|σj) is N(0, σ2j ) density with unknown σj, we use the parametric

prior distributions σj ∼ gj(·|γj) independently for j = 1, . . . ,m with pre-specified hyperpa-

rameters γj. For this special case, Wij simplifies to {(1 + α2j )

1/2(yij −Xijβ) − αj|Z1ij|}/σj.

For a semiparametric analysis with a particular symmetric density kernel K(z|σ) for the

nonparametric kernel-mixture densities h = (h1, . . . , hm) in (4), we implement the joint

density in (7) by assuming (a) (Wi1, . . . ,Wim) for i = 1, · · · , n are mean-zero multivariate

Nm(0, Rρ), where Wij = Φ−11 [Kij{(1 + α2j )

1/2(yij −Xijβ)− αj|Z1ij|}], and Kij and K−1ij are

the cdf and inverse-cdf of K(·|σij); (b) Z1ij for j = 1, . . . ,m are independent with density


K(·|σij), and (c) σij for j = 1, . . . ,m are independent with nonparametric cdf Gj following

Dirichlet Process (DP) prior DP(G0j, C). For the Gaussian kernel in (4), Kij(u) = Φ(u/σij)

and Wij further simplifies to Wij = {(1 + α2j )

1/2(yij −Xijβ)− αj|Z1ij|}/σij.

For convenient MCMC implementation, we use the constructive definition of the DP

(Sethuraman, 1994), truncated at a user specified finite number (K0) of components for the

prior on Gj. Suppose (δ1, . . . , δK0) are generated independently from G0j, (B1, . . . , BK0−1)

are generated independently from Beta(1, C), ω1 = B1, ωh = Bk

∏l<k(1 − Bl) for k =

2, . . . ,K0 − 1 and ωK0 = 1 −∑K0−1

l=1 ωl. Then, the sample path of Gj is approximated by

Gj =∑K0

l=1 ωlIδl , when K0 is large. These likelihood steps and priors facilitate easy Bayesian

semiparametric implementation for either the Gaussian or uniform kernels using available

freeware, such as WinBUGS/JAGS.

4 Posterior Consistency

For theoretically valid inference from a complex semiparametric Bayesian model, it is

important to assure that the parameter posteriors become increasingly concentrated around

the true parameter values with increasing sample size. This necessitates the investigation of

the asymptotic properties, along with the finite sample properties. In this section, we provide

sufficient conditions for posterior consistency, the most important asymptotic property. We

present our theory for the case where the dimension of the regression parameter, β, is p.

We first investigate if the support of the prior is large enough to cover all possible relevant

densities. Suppose F denotes the class of all univariate asymmetric unimodal residual den-

sities and C{(F)m × (0, 1)} denotes the collection of all m-dimensional residual densities f0

with unknown correlation ρjk ∈ (0, 1) having the form,

f0(e) =

∫Rm+

2m|J |m∏j=1

h0j(zj)h0j(e∗j)

φ1

[Φ−1{H0j(e∗j)}

]φm [Φ−1{H01(e∗1)}, . . . ,Φ−1{H0m(e∗m)}|Rρ

]dz,

with the j-th marginal density given by f0j(ej) =∫R+

2√

1 + α2jh0j(zj)h0j(e

∗j)dzj j =

1, . . . ,m, where h0j(·) is the true (unknown) symmetric around 0 density, H0j(·) is the

10 Biometrics,

corresponding cumulative distribution function, e∗j =√

1 + α2jej − αzj, and the Jacobian

of transformation |J | is |J | =

∣∣∣∣∣∣∣I 0

A∗−1α −A∗−1α Aα

∣∣∣∣∣∣∣. Let Π denote a prior on C{(F)m × (0, 1)},

such that Π = (πF)m ⊗ π3(ρ), where πF is a prior on F defined by Gj ∼ DP(G0j, C)

for j = 1, . . . ,m, α ∼ π2(α) and π3(ρ) is a prior on (0, 1). For ease of exposition, we

will assume that α is known. To show that posterior consistency holds at the true den-

sity f0, we first define the Kullback-Liebler divergence as KL(f0, f) =∫Rm f0 log(f0/f)

and Kullback-Leibler neighborhood of size ε as κε(f0) = {f : KL(f0, f) < ε}. For any

prior Π∗ on the density space F∗, f0 is said to be in the Kullback-Leibler support of

Π∗ denoted by KL(Π∗) if f0 ∈ S, where S = {f0 : Π∗(f : KL(f0, f) < ε)∀ε > 0}. Define

FKL ={f0 ∈ C{(F)m × (0, 1)},

∫f0| log f0| <∞

}. We characterize the Kullback-Leibler

support of Π in the following lemma, with its proof available in Web Appendix C.

Lemma 1: FKL ⊂ KL(Π) if G0j is defined on support R+ for all j = 1, . . . ,m and G0j

is absolutely continuous with respect to Lebesgue measure and supp(π3(ρ)) = (0, 1).

Suppose Dn is the observed data with sample size n, and β0 and f0 are the true values

of the regression parameter vector and the multivariate error density, respectively. We now

present the sufficient conditions to ensure that as the sample size n increases, the posterior

distributions of the parameter β and the error density f are concentrated around a small

neighborhood around their true values. We are essentially interested in the inference on the

regression parameter β; so, we define a strong L1 neighborhood of radius δ > 0 around

the true value β0 as the set S(β0) = {β : ‖β − β0‖ < δ}, and weak neighborhood around

f0 as Wε(f0) ={f : |

∫ϕf −

∫ϕf0| < ε

}for a bounded continuous function ϕ. Consider

U = Wε(f0) × S(β0) for any arbitrary δ > 0. The following theorem presents the result on

posterior consistency, with the proof given in Web Appendix C.

Theorem 1: Suppose (f0, β0) ∈ FKL×Rp. Consider a prior Π = (Π⊗Πβ) on FKL×Rp.


Then, Π{(fm, β) ∈ U c|Dn} → 0 a.s. under the true data generating distribution Pf0,β0 that

generates data Dn.

5 Simulation Study

We conduct three simulation studies to compare the finite sample properties of our Bayesian

estimates to those obtained from competing models under various data generation schemes.

Here, we present results from Simulation 1, where the data is generated from the model in

(2). Details on Simulations 2 and 3 are presented in Web Appendix D, where the data are

generated under the competing parametric skew-t assumptions.

Simulation 1: We consider N = 500 replications of bivariate responses with sample size

n = 50. The bivariate skewed errors (εi1, εi2) were generated from the bivariate PGM model,

where α = (α1, α2) = (2, 2), Z1i ∼ N2(0, I) and Z2i ∼ N2(0, Rρ), Rρ is a (2 × 2) correlation

matrix with R12 = 0.7. The regression structure is Yij = β0 + xij1β1 + xij2β2 + εij, j =

1, 2 where (β0, β1, β2) = (2, 0.5, 0.5), with xij1 and xij2 sampled independently, such that

xij2 ∼ N(0, 1) and xij1 = 1, or −1, each with probability 0.5. This simulation model has

the same regression parameters (common β) for both components (different from separate

β for two responses in GAAD study). We fit the SMS model, where symmetric h1(·) and

h2(·) are expressed as unknown scale mixtures of Gaussian kernels. We also fit the PGM

model, assuming Z1ij ∼ N(0, σ2j ) and Z2i ∼ Nm(0,Σ), with (Σ)11 = σ2

1, (Σ)22 = σ22 and

(Σ)12 = ρσ1σ2. We use independent N(0, 1) priors for β0, β1 and β2, and also for the skewness

parameters α1 and α2. For the DP(G01, C) and DP(G02, C) priors corresponding to G1 and

G2 respectively, we assume the prior guess G01 and G02 to be a Gamma(2, 1) density and

C = 1, implying the prior guess for εij is mean 0 with ranges ±6, with high prior probability

for a symmetric true density. This also implies the skewing shock has range of (0, 6), with

a high probability. Although these prior guesses for G1 and G2 corresponds to a prior guess

12 Biometrics,

of f(εi) far away from the true density of εi, we would like to demonstrate that even this

guess produces posterior estimates with good finite sample properties. For the PGM, we

specify Gamma(2, 1) for σ21 = σ2

2. To test the performances, we use the mean squared error

MSE(θ) =∑N

k=1(θk − θ)2/N , where θk is the posterior estimate of θ from the kth simulated

dataset, k = 1, 2, . . . , N . We also report the relative bias (RB),∑N

k=1(θk − θ)/(Nθ) and the

coverage probability (CP) of the 95% interval estimates from these competing methods. In

conjunction, we also compared a generalized estimating equation (GEE) fit (Liang and Zeger,

1986), given that GEE is equivalent to a mixed model with a sandwich variance estimate,

although producing a biased marginal estimate of the intercept β∗0j = β0−{αj/(1+α2j )}1/2λj.

The results are summarized in Table 1.

We observe that both the SMS and the PGM models yield smaller MSE’s (at least 20%

reduction), and overall better CPs, compared to the GEE estimates. Unlike the GEE, our

methods provides the estimates of skewness, and more precise interval estimates of the

regression parameters. Also, under model misspecification (data simulated from the PGM

model), the posterior estimates from the SMS model are comparable to the PGM in terms of

MSE and RB. Additional simulation studies (see Simulations 2 and 3 in the Web Appendix D)

that avoid stringent assumptions on the true form of the error densities reveal the superiority

of the SMS and PGM models over various existing alternatives, such as the skew-t, and GEE.

6 Analysis of GAAD Study

The GAAD study aims to evaluate the PD status of the Gullah-speaking African-Americans

as specified by the subject-level (mean) CAL and PPD endpoints, and quantify the ef-

fects of subject-level covariates such as age (in years), gender (1=Female, 0=Male), Body

Mass Index or BMI (obese=1 if BMI >= 30, = 0, otherwise), glycemic/HbA1c status (1=

High/uncontrolled, 0 = controlled) and smoking status (1 = smoker, 0 = never smoker),

on these endpoints. For our analysis, we select n = 288 patients (subjects) with complete


covariate information. About 31% of the subjects are smokers, with a mean age of 55 years,

ranging from 26-87 years. Female subjects are predominant in our data (about 76%), which

is not uncommon in Gullah population (Johnson-Spruill et al., 2009). Also, 68% of subjects

are obese (BMI >= 30), and 59% are with uncontrolled HbA1c level.

Here, the bivariate correlated responses, the mean PPD and mean CAL, calculated as

averages of the corresponding measurements across all sites and tooth for that subject, are

non-Gaussian (heavily skewed). Hence, the validity of estimation under a standard linear

mixed model (LMM) framework remains questionable. For further illustration, Figure 2

[panels (a)-(f)] presents the histograms of the responses, the histogram and Q-Q plot of

the empirical Bayes estimates of the random effects, and the histograms of the residuals,

after fitting LMMs separately to the responses using the nlme package in R. These plots

clearly reveal evidence of different degrees of skewness for the two error terms, and also

for the random subject effects. However, to avoid a overly complicated model, we assume

that the skewness of, say, CAL is the same for all subjects/clusters. To accommodate this,

we illustrate the application of our models developed in Sections 2 and 3 on this dataset.

Specifically, we compare the fit of the following 4 competing models:

(1) The proposed SMS model, with unknown symmetric densities h1(·) and h2(·) as nonpara-

metric scale mixtures of Gaussian kernels,

(2) The PGM model, with latent vectors Z1i ∼ N2(0, D2σ) and Z2i ∼ N2(0,Σ), where D2

σ =

diag(σ21, σ

22) and (Σ)11 = σ2

1, (Σ)22 = σ22, (Σ)12 = ρσ1σ2.

(3) The skew-t model (ST), with a Skew-t (Sahu et al., 2003) density for (εi1, εi2), with ν

degrees of freedom,

(4) The Bivariate Normal (BVN) model, with a symmetric N2(0,Σ) density for (εi1, εi2), ignor-

ing the skewness.

We use practically flat independent N(0, 100) priors for all the regression parameters

14 Biometrics,

(the components of β1 and β2), and the skewness parameters α1 and α2. For the unknown

mixing distributions G1 and G2, we choose independent DP(G0j, C) priors, with C = 1, and

assuming same Gamma(2, 1) density for the prior guess of G01 and G02. These prior guesses

matches with the prior choices for σ1 and σ2 in the PGM model, the skew-t and bivariate

normal models. For the degrees of freedom ν of the skew-t density, we choose Exp(0.1)I[2,∞),

the exponential density truncated at 2. For all four models, we use a Uniform(−1, 1) as

the prior for ρ. It is important to note that our choice of priors, although practically flat,

is primarily for illustration of our statistical methods, and may not represent the prior

opinion of clinical investigators. We generated two parallel MCMC chains of size 150,000 and

computed the posterior estimates after discarding the first 100,000 iterations (burn-in). To

guard against potential autocorrelation among successive iterations, we used a thinning of 25.

To assess model convergence, we use the trace plots, autocorrelation plots and the Gelman-

Rubin R. Compared to the SMS model, the PGM required larger number of iterations to

converge. The relevant R and JAGS codes for implementing these models on simulated data

are available at a GitHub repository (see Web Appendix E).

We use the Conditional Predictive Ordinate (cpo) statistic, (cpo)i =∫fm(yi − Xiβ |

Θ)p(Θ | D(−i))dΘ (Gelfand et al., 1992), to compare the model performances, where D(−i)

is the cross-validated data after deleting the ith observation from the full data D, and

p(Θ | D(−i)) is the posterior density of Θ given D(−i). The corresponding summary statistic

using the cpoi is the log of the pseudo-marginal likelihood (lpml), given by lpml =∑ni=1 log (cpoi). We also use the Watanabe-Akaike information criterion (waic) (Vehtari

et al., 2017), considered a state-of-the-art Bayesian model selection tool. The computations

of the waic and lpml are conveniently based on MCMC samples from the full posterior

distribution p(Θ | D). Models with larger lpml and smaller waic indicate more support for

observed data. These values suggest the PGM model to be the most appropriate (lpml=


–127.882 and waic= 441.89), followed by the SMS model (lpml= –134.03 and waic=

516.88). Recall that the PGM model is the parametric version of the SMS model. The lpml

(waic) values for the skew-t and bivariate normal models are –244.546 (618.94) and –300.315

(654.60) respectively, both substantially smaller (larger) than our new semiparametric and

parametric Bayesian methods. These numbers also suggest that compared to existing para-

metric competitors, our proposal is more appropriate for the GAAD dataset.

The posterior estimates, standard deviations, and 95% credible intervals (CIs) of model

parameters obtained from fitting the PGM and SMS models are presented in Table 2, along

with the corresponding estimates (without the 95% CIs) from the skew-t and the BVN

models. The CIs for the skewness parameters α1 (for PPD) and α2 (for CAL) do not contain

0, and are positive under both PGM and SMS models, implying substantial evidence of

right-skewness for both responses, also revealed in Figure 2. However, the CIs for α1 for the

skew-t contains 0, and fail to detect skewness for PPD. Posterior estimates of the effect of

HbA1c on both PPD (not substantial data evidence under SMS) and CAL are positive, with

95% CIs excluding zero, implying higher glycemic status may lead to substantially higher

levels of PD. Among smokers, there is a trend (without strong evidence) of higher PPD and

CAL, compared to non-smokers, with significant evidence of higher CAL only from the PGM

model. Also, strong evidence of higher PPD and CAL are observed in males compared to

females, from both SMS and PGM models. However, under the skew-t and BVN models,

gender does not have enough posterior evidence. In addition, there is also not enough evidence

for the effects of age and BMI on PPD and CAL. On the overall, for most parameters, our

new proposals provide more precise posterior estimates, as revealed by tighter 95% CIs. In

terms of sensitivity analysis, we observe that moderate changes in the choice of priors do

not affect the parameter estimates (both magnitude and sign) and the model comparison

measures (lpml and waic) greatly.

16 Biometrics,

To illustrate the practical usefulness of Bayesian inference using the SMS model, we focus

on the difference in median responses (PPD and CAL) for two future patients with same

age, BMI and smoking status, but different gender and HbA1c values. The 95% CI estimates

of difference in median PPD (DPPD) and difference in median CAL (DCAL) between patient1

(Female, low HbA1c) and patient2 (Male, high HbA1c) is (-0.5562, -0.0897) and (-0.8076,

-0.2570), respectively. The actual PPD differences are wider than the difference in the

estimated medians, however the interval is still way above 0. Under the PGM model, the 95%

CI estimates of DPPD and DCAL are (-0.6496, -0.0204) and (-1.0301, -0.2071) respectively,

both wider than those from the SMS model, but well above 0. However, under the skew-t

model, the 95% interval estimates for DPPD and DCAL are (-0.5712, 0.1156) and (-0.8983,

0.0890), respectively, both covering 0 and indicating lack of posterior evidence in support

of the difference in the median PPD and median CAL values between patient1and patient2.

Figures 3(a) and Figure 3(b) present the marginal density histogram of residuals for PPD

and CAL responses, respectively, obtained after fitting the SMS model, while Figure 3 (c)

presents the contour plot of the joint bivariate density of these residuals. Evidently, the

marginal, as well as the joint bivariate density of the residuals for PPD and CAL is skewed,

with one outlying residual corresponding to a 53 year old non-smoker female patient with

low BMI but with extremely high PPD and CAL values.

7 Discussion

Our methods development follows the recent joint EU/USA Periodontal Epidemiology

Working Group (Holtfreter et al., 2015) recommendations on modeling the full-mouth average

PPD and CAL, and is different from prior published work (Bandyopadhyay et al., 2010; Reich

and Bandyopadhyay, 2010) in terms of the modeling objectives and clinical endpoints con-

sidered. Separate regression modeling of full-mouth average PPD and CAL is not uncommon


(Jentsch et al., 2016). However, to cast new light on the biology of PD, this paper breaks

new ground via joint modeling of the two responses, instead of analyzing them separately.

In biomedical research, a semiparametric model is often preferred, because they avoid

restrictive and hard to verify parametric assumptions, specially on model features that

are not of direct interest, such as the hj(.) function in our model. A major strength is

the results on posterior consistency that establishes the theoretical validity of our model.

Also, our inferential framework is substantially different from the estimating equations (EE)

approach of Ma et al. (2005). EE based models and existing parametric “single shock” models

simultaneously adjust for the skewness and the within-cluster association via essentially

assuming |Z1i1|, . . . , |Z1im| in (2) be equal to a single univariate cluster-specific skewing shock

variable |Z∗i |. This assumption of a single skewing shock per cluster is sometimes biologically

untenable, and impedes the interpretation of the parameters and the link between skewness

and association. In particular, for Bayesian inference, the consequence of this assumption is a

major impediment for prior specifications on the parameters α and ρ, based on available prior

opinions about the marginal skewness and the within-cluster association. For our multiple-

shock model, the marginal Pearsonian coefficient of skewness for each εij is only a function

of (αj, hj). As a consequence of having independent skewing shocks for each component,

our model in (2) has comparatively moderate within-cluster association in tails, and is more

applicable when the association is not entirely driven by the tail-association. To accommodate

strong tail-association, a possible extension of the model in (2) is to assume a multivariate

distribution for Z1i, allowing dependence among Z1i1, . . . , Z1im. However, such a model is

less parsimonious than our current model and lacks advantages of ease of prior specifications

and computational scalability via standard software. Furthermore, our methods can also

estimate any desired quantile function, and are more appropriate compared to available

quantile regression techniques focusing on effects of covariates on certain quantiles.

18 Biometrics,

Our class of models can incorporate a large and flexible within-subject association including

Toeplitz, AR(1) and even higher-dimensional structures for the association matrix Rρ. Also,

for our model, the adopted association structure (form of Rρ) does not affect the marginal

densities. In addition, our latent variable representation facilitates the MCMC-based com-

putation, and allows the joint density class to be closed under marginalization. The later

property helps to extend our methods straightforwardly to studies where the dimensions mi

and association matrix Ri of the response vectors vary over subjects.

8 Supplementary Materials

Web appendices and computer codes referenced in Sections 6 are available with this article

at the Biometrics website on Wiley Online library and at the GitHub link:

https://github.com/bandyopd/Multivariate-Skewed-models

Acknowledgements

The authors thank the editor, the associate editor and two referees, for their constructive

comments. They also acknowledge NIH grants R01-DE024984 and R03-CA205018 and the

Gustavus & Louise Pfeiffer Research Foundation grant for their support.

References

Azzalini, A. and Capitanio, A. (2003). Distributions generated by perturbation of symmetry

with emphasis on a multivariate skew-t distribution. Journal of the Royal Statistical

Society: Series B (Statistical Methodology) 65, 367–389.

Azzalini, A. and Dalla Valle, A. (1996). The multivariate skew-normal distribution. Biometri-

ka 83, 715–726.

Bandyopadhyay, D., Lachos, V. H., Abanto-Valle, C. A., and Ghosh, P. (2010). Linear

mixed models for skew-normal/independent bivariate responses with an application to

periodontal disease. Statistics in Medicine 29, 2643–2655.

Bandyopadhyay, D., Lachos, V. H., Castro, L. M., and Dey, D. K. (2012). Skew-



normal/independent linear mixed models for censored responses with applications to

Hiv viral loads. Biometrical Journal 54, 405–425.

Chang, S.-C. and Zimmerman, D. L. (2016). Skew-normal antedependence models for skewed

longitudinal data. Biometrika 103, 363–376.

Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2. Wiley,

New York, NY, second edition.

Ferguson, T. (1974). Prior distribution on spaces of probability measures. The Annals of

Statistics 2, 615–629.

Fernandes, J., Wiegand, R., Salinas, C., Grossi, S., Sanders, J., Lopes-Virella, M., and Slate,

E. (2009). Periodontal disease status in Gullah African Americans with type 2 diabetes

living in South Carolina. Journal of Periodontology 80, 1062–1068.

Gelfand, A. E., Dey, D. K., and Chang, H. (1992). Model determination using predictive

distributions with implementation via sampling-based methods. Technical report, DTIC

Document.

Genton, M. G. (2004). Skew-elliptical distributions and Their Applications: A Journey

Beyond Normality. Chapman and Hall/CRC.

Greenstein, G. (1997). Contemporary interpretation of probing depth assessments: diagnostic

and therapeutic implications. A literature review. Journal of Periodontology 68, 1194–

1205.

Holtfreter, B., Albandar, J. M., Dietrich, T., Dye, B. A., Eaton, K. A., Eke, P. I., Papapanou,

P. N., and Kocher, T. (2015). Standards for reporting chronic periodontitis prevalence

and severity in epidemiologic studies: Proposed standards from the Joint EU/USA

Periodontal Epidemiology Working Group. Journal of Clinical Periodontology 42, 407–

412.

Jentsch, H. F., Buchmann, A., Friedrich, A., and Eick, S. (2016). Nonsurgical

20 Biometrics,

therapy of chronic periodontitis with adjunctive systemic azithromycin or amoxi-

cillin/metronidazole. Clinical Oral Investigations 20, 1765–1773.

Johnson-Spruill, I., Hammond, P., Davis, B., McGee, Z., and Louden, D. (2009). Health

of Gullah families in South Carolina with Type-2 diabetes: Diabetes self-management

analysis from Project SuGar. The Diabetes Educator 35, 117–123.

Koenker, R. (2005). Quantile Regression. Number 38 in Econometric Society Monographs.

Cambridge University Press.

Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear

models. Biometrika 73, 13–22.

Ma, Y., Genton, M. G., and Tsiatis, A. A. (2005). Locally efficient semiparametric esti-

mators for generalized skew-elliptical distributions. Journal of the American Statistical

Association 100, 980–989.

Muller, P., Quintana, F. A., Jara, A., and Hanson, T. (2015). Bayesian Nonparametric Data

Analysis. Springer.

Nelsen, R. B. (2007). An Introduction to Copulas. Springer Science & Business Media.

Reich, B. and Bandyopadhyay, D. (2010). A latent factor model for spatial data with

informative missingness. Annals of Applied Statistics 4, 439–459.

Sahu, S. K., Dey, D. K., and Branco, M. D. (2003). A new class of multivariate skew

distributions with applications to Bayesian regression models. Canadian Journal of

Statistics 31, 129–150.

Sethuraman, J. (1994). A Constructive Definition of Dirichlet Priors. Statistica Sinica 4,

639–650.

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using

leave-one-out cross-validation and WAIC. Statistics and Computing 27, 1413–1432.


(a) (b)

Figure 1: Comparing contour plots, when α1 = α2 = 10, ρ = −0.7 and σ1 = σ2 = 1.Bivariate parametric Gaussian mixture (PGM) model with marginal densities (left panel);bivariate version of the multivariate skew-normal (MSN) model with marginal densities (rightpanel). (This figure appears in color in the electronic version of this article).

22 Biometrics,

(a) (b)

(c) (d)

(e) (f)

Figure 2: GAAD data: Histograms of the Periodontal Pocket Depth (PPD) responses (panel a) and ClinicalAttachment Level (CAL) responses (panel b); Histogram (panel c) and Q-Q plot (panel d) of the empirical Bayesianestimates of the random effects; Histograms of residuals for PPD (panel e) and CAL (panel f), obtained after fittinglinear mixed models.


(a)

(b)

(c)

Figure

3:

GA

AD

dat

a:H

isto

gram

sof

resi

dual

sfo

rP

PD

(pan

ela)

and

CA

L(p

anel

b)

resp

onse

s,ob

tain

edaf

ter

fitt

ing

the

SM

Sm

odel

.C

onto

ur

plo

tof

the

biv

aria

teden

sity

ofth

ere

sidual

sar

epre

sente

din

pan

elc

24 Biometrics,

Tab

le1:

Sim

ula

tion

resu

lts

bas

edon

500

replica

tes

ofth

edat

aco

mpar

ing

the

Mea

nSquar

edE

rror

(MSE

),Sta

ndar

dE

rror

(SE

),R

elat

ive

Bia

s(R

B),

and

Cov

erag

eP

robab

ilit

y(C

P)

for

asso

ciat

ion

par

amet

erρ12

=0.

7sa

mple

size

isn

=50

.T

rue

par

amet

erva

lues

are

(β0,β1,β2,α1,α2)

=(2

,0.

5,0.

5,2,

2).

SP

GM

PG

MG

EE

β0

β1

β2

α1

α2

β0

β1

β2

α1

α2

β0

β1

β2

100×

MSE

1.70

20.

376

0.43

216

.22

19.9

951.

461

0.35

90.

397

18.0

714

.88

56.8

970.

479

0.54

510×

SE

0.94

80.

616

0.63

62.

361

3.03

60.

766

0.60

00.

627

2.96

62.

776

0.68

70.

632

0.64

0R

B0.

045

0.00

4-0

.025

-0.0

16-0

.016

0.04

70.

005

-0.0

16-0

.153

0.13

40.

375

0.01

9-0

.009

CP

0.94

50.

950.

975

0.97

51

0.95

0.96

0.96

10.

980.

20.

910.

9


Tab

le2:

Pos

teri

ores

tim

ates

ofth

ere

gres

sion

par

amet

ers

and

the

skew

nes

spar

amet

ers

(α1

andα2),

corr

esp

ondin

gto

the

resp

onse

s‘p

erio

don

tal

pock

etdep

th’

(PP

D)

and

‘clinic

alat

tach

men

tle

vel’

(CA

L),

obta

ined

afte

rfitt

ing

the

Sem

ipar

amet

ric

Mult

ivar

iate

Ske

w(S

MS)

resp

onse

model

,par

amet

ric

Gau

ssia

nm

ixtu

re(P

GM

),par

amet

ric

skew

-t(S

T),

and

biv

aria

tenor

mal

(BV

N)

model

s.SD

and

CI

den

ote

the

pos

teri

orst

andar

ddev

iati

onan

d95

%cr

edib

lein

terv

als,

resp

ecti

vely

.

SMS

PGM

ST

BVN

PPD

Mea

nSD

CI

Mea

nSD

CI

Mea

nSD

Mea

nSD

Intercep

t2.2624

0.3439

(1.6026,2.9764)

2.1887

0.3881

(1.4807,2.9230)

2.0865

0.6557

2.1169

0.6298

Age

-0.0062

0.0037

(-0.0137,0.0011)

-0.0058

0.0043

(-0.0137,0.0032)

-0.0069

0.0076

-0.0098

0.0073

Gen

der

-0.2005

0.0918

(-0.3793,-0.0313)

-0.2096

0.1219

(-0.4301,-0.0448)

-0.1458

0.1861

-0.0811

0.1753

BMI

-0.0903

0.0797

(-0.2302,0.0759)

-0.1378

0.1021

(-0.3769,0.0505)

-0.2259

0.1402

-0.1056

0.1645

Smoker

0.0544

0.0925

(-0.1244,0.2343)

0.0746

0.0987

(-0.1270,0.2534)

0.1663

0.1767

0.1485

0.1696

HbA1c

0.0801

0.0797

(-0.0704,0.2431)

0.1331

0.0634

(0.0156,0.2618)

0.2191

0.1546

0.1701

0.1501

α1

0.5862

0.1848

(0.1631,0.9053)

0.6968

0.1176

(0.4709,0.9378)

0.0687

0.0843

––

CAL

Mea

nSD

CI

Mea

nSD

CI

Mea

nSD

Mea

nSD

Intercep

t1.2180

0.4328

(0.3866,2.0272)

0.9896

0.4851

(-0.0245,1.8539)

1.4133

0.6813

1.1401

0.6485

Age

0.0062

0.0042

(-0.0018,0.0148)

0.0081

0.0057

(-0.0025,0.0198)

0.0

050

0.0078

0.0073

0.0075

Gen

der

-0.3339

0.1059

(-0.5560,-0.1528)

-0.3556

0.1592

(-0.6681,-0.0743)

-0.2505

0.1968

-0.3094

0.1872

BMI

-0.0392

0.0993

(-0.2249,0.1658)

-0.1022

0.1229

(-0.3778,0.1188)

-0.1936

0.1619

-0.1218

0.1725

Smoker

0.1495

0.1441

(-0.0510,0.3444)

0.2546

0.1145

(0.0259,0.4605)

0.2835

0.1865

0.3581

0.1813

HbA1c

0.1668

0.0979

(0.0047,0.3791)

0.2392

0.0864

(0.0708,0.3981)

0.3005

0.1623

0.3143

0.1573

α2

1.0500

0.1617

(0.7552,1.4063)

1.1646

0.1333

(0.9216,1.4275)

0.5294

0.0919

––

Web-based Supplementary Materials for

‘Semiparametric Bayesian latent variable regression for skewed multivariate data’

by

Apurva Bhingare, Debajyoti Sinha, Debdeep Pati, Dipankar Bandyopadhyay, and Stuart R. Lipsitz

Web Appendix A: Restrictive parameterization in the Az-

zalini & Dalla Valle’s multivariate skew-normal model

Unlike our semiparametric multivariate skew (SMS) model (equation 2 in the paper), pa-

rameters (α∗(1),Σ∗(1)) in the marginal multivariate skew-normal (MSN) specification (Azzalini and

Dalla Valle, 1996) of ε(1)i are functions of (α(2),Σ22,Σ12); see Theorem 1 of Chang and Zimmer-

man (2016) for details. In particular, the scalar component εij has the univariate skew-normal

distribution (Azzalini, 1985) denoted by SN(0; αj, σj) with density

f(εij) = 2φ1(εij|σj)Φ1(αjεij), (A-1)

where φ1(.|σ) is the N(0, σ2) density, (Σ)jj = σ2j , and the marginal skewness parameter αj is

αj =αj + (1/σ2

j )ΣT.j(−j)α(−j)

1 + αT(−j)Σ(−j,−j)α(−j) − (1/σ2

j )(Σ.j(−j)α(−j))2, (A-2)

where Σ.j is the j-th column of the covariance matrix Σ, R(−j) denotes the reduced vector after

deleting j-th element of the vector R, and Q(−j,−k) denotes the reduced matrix after deleting j-th

row and k-th column of matrix Q. It is obvious from (A-1)-(A-2) that the marginal distribution

of εij, including its mean, variance and skewness, clearly depends on all the components of vector

α and matrix Σ via αj, which is a function of αk and σk, ∀ k = 1, . . . ,m.

1

To illustrate the difference between the marginal distribution from the bivariate PGM model

(parametric subclass of SMS model) and the marginal distribution from the bivariate MSN model

of Azallini & Dalla Valle, we consider α = (α1, α2) and Σ =

σ21 ρσ1σ2

ρσ1σ2 σ22

, common to both

bivariate models. Under the MSN model, the univariate SN marginal density (Azzalini, 1985) is

SN(0; α1, σ1), with the marginal skewness parameter α1 = {α1 + ρα2σ2/σ1}/{1 + α22σ

22(1 − ρ2)},

with ρ = Corr(εi1, εi2) = σ12/{σ1σ2}, whereas, the marginal distribution of εi1 under the PGM

model (parametric case of our model) is SN(0;α1, σ1). It is obvious that the magnitudes of α1 and

α1 are different, and even the signs of α1 and α1 are different whenever α21σ1 + ρα1α2σ2 < 0. For

Bayesian analysis, in particular, this complicates the prior specification for, say, (α1, σ1) based on

available prior opinion about the marginal skewness α1 and variability of PPD. This model also

imposes a restriction that, even when α1 = 0, εi1 is skewed if α2, ρ 6= 0. In general, this model

imposes several restrictions on within cluster association structure.

Web Appendix B: Contour plots of our bivariate PGM(0, α,Σ)

density and the bivariate MSN density

Figure F1 compares the contour plot of our parametric bivariate PGM(0, α,Σ) density (right

panel) with the contour plot of the parametric bivariate MSN(0, α,Σ) density of Chang and Zim-

merman (2016) (left panel), for 4 choices of the common skewness parameter α0 = α1 = α2 =

0, 2, 5, 10, the common (for two components) scale parameter σ1 = σ2 = 1 and association param-

eter ρ = 0.7. Even when the common α0 is moderately large, the contours of these two bivariate

densities are noticeably different. Both bivariate densities become increasingly non-elliptical as α0

increases. Although, the effect of an increasing α0 on the skewness of the marginal density of the

bivariate MSN is not as profoundly obvious as this effect on the skewness of the marginal density

of our PGM. For the bivariate MSN densities, there is even a moderately negative association in

the left tail in spite of the association parameter ρ being as large as 0.7.

2

(a) α0 = 0 (b) α0 = 0

(c) α0 = 2 (d) α0 = 2

(e) α0 = 5 (f) α0 = 5

(g) α0 = 10 (h) α0 = 10

Figure F1: Contour plots of the bivariate parametric Gaussian mixture (PGM) model density (rightpanel) and of the bivariate skew-normal (MSN) densities (left panel) for association parameter ρ = 0.7,common scale parameter σ1 = σ2 = 1, and different values of common skewness α0 = α1 = α2.

3

Web Appendix C: Posterior Consistency

Lemma 1 FKL ⊂ KL(Π) if G0j is defined on support R+ for all j = 1, . . . ,m and G0j is absolutely

continuous with respect to Lebesgue measure and supp(π3(ρ)) = (0, 1).

Proof of Lemma 1

Given a density f0 ∈ F , the idea is to construct a sequence of functions f (M) ∈ C{(F)m ×

(0, 1)} M ≥ 1 such that KL(f0, f(M)) → 0 as M → ∞. Let e = (e1, . . . , em), Aα =

diag(α1(1 + α2

1)−1/2, . . . , αm(1 + α2m)−1/2

)and A∗α = diag

((1 + α2

1)−1/2, . . . , (1 + α2m)−1/2

). We

assume α1 = α2 = . . . = αm = α0, and α0 is known. Define,

f (M)(e) =

∫Rm+

2m|J |m∏j=1

hjM(zj)hjM(e∗j)

φ1

[Φ−1{HjM(e∗j)}

]φm [Φ−1{H1M(e∗1), . . . ,Φ−1{HmM(e∗m)}| RρM

]dz,

where e∗j = (1 + α20)1/2ej − α0zj for j = 1, . . . ,m, HjM is the cdf of hjM , ρM → ρ as M →∞, and

|J | =

∣∣∣∣∣∣ I 0

A∗−1α −A∗−1

α Aα

∣∣∣∣∣∣ =∏m

j=1 |αj| with the marginal density given by,

fjM(ej) =

∫ ∞0

2(1 + α20)1/2hjM(zj)hjM(e∗j)dzj.

We now construct hjM . Suppose FjM(·) denotes the cdf of fjM(·), and h0j is continuous and

symmetric (around 0) density. Clearly, h0j is increasing R− and decreasing on R+. We define

weights as in Wu and Ghosal (2008). Suppose t1 > 0 and t2 > 0 such that h0j(t1) = a1 and

h0j(t2) = b1, where 0 < b1 < 1 and b1 < a1 < h0j(0). For given M , let M1 and M2 be such that

M1

M≤ t1 ≤ M1+1

Mand M2

M≤ t2 ≤ M2+1

M. Set

w∗ji =

iM{h0j

(iM

)− h0j

(i+1M

)}, 1 ≤ i < M1,

M1

M{h0j

(M1

M

)− a1}, i = M1,

M+1M{a1 − h0j

(M+1M

)}, i = M1 + 1,

iM{h0j

(i−1M

)− h0j

(iM

)}, M1 + 1 < i ≤M2,

iM{h0j

(i−1M

)− h0j

(iM

)}, i ≥M1 + 1

4

We define h∗jM(e) =∑∞

1 w∗jiK(e; iM

), where K(e; θ) = 12θ

1(−θ≤e≤θ). By the continuity of

h0j, h∗jM(e) converges to h0j(e) pointwise. However,

∑∞1 w∗ji 6= 1 and h∗jM(e) is not a pdf. We

define wji = w∗ji1−

∑M1−11 w∗ji−

∑∞M2+1 w

∗ji∑M2

M1w∗ji

for M1 ≤ i ≤ M2. Then,∑∞

1 wji = 1. Let hjM(e) =∑∞1 wjiK(e; i

M), where K(e; θ) = 1

2θ1(−θ≤e≤θ). Observe that

h∗jM(e)− hjM(e) =

(∞∑1

w∗ji −∞∑1

wji

)

=

(1−

∑M1−11 wji −

∑∞M2+1wji∑M2

M1w∗ji

− 1

)(M2∑M1

w∗jiM

2i

)

≤

(1−

∑M1−11 wji −

∑∞M2+1wji∑M2

M1w∗ji

− 1

)(M2∑M1

w∗ji

)M

2M1

=

(1−

M1−1∑1

wji −∞∑

M2+1

wji −M2∑M1

w∗ji

)M

2M1

=

(1− 1

M

∞∑1

h0j(i/M)− a1

M

)M

2M1

→ 0

as M → ∞, by definition of Riemann integral. Thus hjM(e) converges to h0j. Let M be large

such that the RHS of the equation is less than a2. Define fjM(e) = 2

b

∫∞0hjM(zj)hjM(

ej−azjb

)dzj,

where a = α√1+α2 and b = 1√

1+α2 . Hence,

| log fjM(ej)| = | log

(2

b

)+ log

∫ ∞0

hjM(zj)hjM(e∗j)|

≤ | log

(2

b

)|+ | log

∫ ∞0

hjM(zj)hjM(e∗j)|

Since hjM → h0j pointwise, by construction of hjM , we have c1h0j(zj) ≤ hjM(zj) ≤ c2h0j(zj),

and c1h0j(e∗j) ≤ hjM(e∗j) ≤ c2h0j(e

∗j). Thus we have

c1

∫ ∞0

h0j(zj)h0j(e∗j) ≤

∫ ∞0

hjM(zj)hjM(e∗j) ≤ c2

∫ ∞0

h0j(zj)h0j(e∗j).

Hence, we also have | log fjM(ej)| <∞. If f0 ∈ FKL, we have∫f0j| log f0j| <∞ for all j. Since

| log h0j(ej)| is h0-integrable, using dominated convergence theorem, we have∫h0j log

h0jhjM→ 0 as

M →∞. Also, | log fjM(ej)| is bounded above by an integrable function and hence∫f0j log

f0jfjM→

0 as M → ∞. Below we show that f (M) → f0 pointwise, and construct an f0 integrable upper

bound of gM = log f0f (M) . Since, fjM → f0j pointwise for j = 1, . . . ,m, by Scheffe’s theorem

5

supe∈R|FjM(e) − F0j(e)| → 0 as M → ∞. φ and Φ being continuous in their arguments, we

conclude that fM → f0 pointwise. Observe that

|gM | ≤ | log f0|+ | log f (M)|

and

| log f (M)| ≤ | log(2m|J |)|+ (C-1)

| log

∫Rm+

m∏j=1

hjM(zj)hjM(e∗j)1

|ΣM |1/2exp

[−1

2H∗

′

M(e∗)(Σ−1M − I)H∗M(e∗)

]dz|.

Let, IM = 1|ΣM |1/2

exp[−1

2H∗

′M(e∗)(Σ−1

M − I)H∗M(e∗)]. Then,

log(IM) = −1

2log |ΣM | −

1

2H∗

′

M(e∗)(Σ−1M − I)H∗M(e∗)

= C1 −1

2(H∗M(e∗)−H∗0 (e∗))

′(Σ−1

M − I)(H∗M(e∗)−H∗0 (e∗))

− 1

2H∗

′

0 (e∗)(Σ−1M − Σ−1

0 )H∗0 (e∗)− 1

2H∗

′

0 (e∗)(Σ−10 − I)H∗0 (e∗) (C-2)

We have H∗M = (H∗1M , . . . , H∗mM) and H∗0 = (H∗01, . . . , H

∗0m), where H∗jM(e∗j) = Φ−1{HjM(e∗j)} and

H∗0j(e∗j) = Φ−1{H0j(e

∗j)} for j = 1, . . . ,m. Using the Taylor series expansion of H∗jM around H∗0j,

we have for ζ ∈ [0, 1]

H∗jM = H∗0j +HjM (e∗j )−H0j(e

∗j )

φ1(H0j(e∗j ))+

φ′(ζ){HjM (e∗j )−H0j(e∗j )}

φ2(ζ)

Hence, |H∗jM − H∗0j| → 0 uniformly in e∗j , and the first term in (C-4) is 0 as M → ∞. As

ρM → ρ0 as M →∞, the m eigenvalues of Σ−1M converge to the m eigenvalues of Σ−1

0 . Therefore,

12H∗

′0 (e∗)(Σ−1

M − Σ−10 )H∗0 (e∗) ≤ maxj |λ0j − λMj|H∗

′0 H

∗0 . Thus, there exists constants C1, C2 > 0

such that

exp{−(C1 + C2| log(I0)|)} ≤ IM ≤ exp{C1 + C2| log(I0)|}.

We know that hjM(·)→ h0j(·) for all j as M →∞. Hence

| log f (M)(e)| ≤ | log(2m|J |)|+ | log Ψ|, with

∫f0| log f (M)(e)| <∞.

By dominated convergence theorem, we conclude that∫f0 log f0

f (M) → 0 as M → ∞ and FKL ⊂

KL(Π).

6

Theorem 1 Suppose (f0, β0) ∈ FKL × Rp. Consider a prior Π = (Π ⊗ Πβ) on FKL × Rp. Then

Π{(fm, β) ∈ U c|Dn} → 0 a.s. under the true data generating distribution, Pf0,β0 that generates

data Dn.

Proof of Theorem 1

Suppose for any two densities g1 and g2, K(g1, g2) =∫R g1(w) log{g1(w)/g2(w)}dw and V (g1, g2) =∫

R g1(w)[log+{g1(w)/g2(w)}]2dw, where log+(u) = max(log(u), 0). Set Ki(f, β) = K(f0i, fβi) and

Vi(f, β) = V (f0i, fβi). The proof of Theorem 1 follows from (Pati and Dunson, 2014) and (Tang

et al., 2015) with minor changes under the condition that there exist test functions {Φn}∞n=1, sets

Θn =Wε(f0)×Θβn, n ≥ 1, and constants C1, C2, c1, c2, c > 0 such that

1.∑∞

n=1E∏ni=1 f0i

Φn <∞

2. sup(f,β)∈Ucn∩Θn E∏ni=1 fβi

(1− Φn) ≤ C1e−c1n

3. Πβ(Θcβn) ≤ C2e

−c2T 2n

4. For all δ > 0 and for almost every data sequence {yi, xi}∞i=1,

Π{(f, β) : Ki(f, β) < δ∀i,∑∞

i=1Vi(f,β)i2

<∞} > 0

To verify the above conditions, we construct sieves Θn =Wε(f0)×Θβn, where Θβn = {β : ‖β‖ <

Tn} for sequences Tn to be chosen later.

Our claim is: Πβ(Θcβn) ≤ 4p√

2πTne−T

2n/2, logN(ε,Θβn, ‖.‖) ∼ o(n)

Θβn = {β : ‖β‖ < Tn}, then logN(ε,Θβn, ‖.‖) ≤ log(2Tnε

). Since β follows Gaussian distribution,

Πβ(Θcβn) ≤ 4p√

2πTne−T

2n/2. Choosing Tn = O(

√n), Πβ(Θc

βn) can be made exponentially small and

logN(ε,Θβn, ‖.‖) ∼ o(n). Hence condition (3) is satisfied.

Condition 1 & 2:

We write U = W1n ∪ W2n, where W1n = Wε(f0)c × {β : ‖β − β0‖ < δ} and {W2n = (f, β) :

‖β − β0‖ > δ}. First, we prove the existence of exponentially consistent sequence of tests for

H0 : (f, β) = (f0, β0) against H0 : (f, β) ∈ W1n ∩Θn Without loss of generality,

Wε(f0) = {f :

∫Φ(y)f(y)dy −

∫Φ(y)f0(y)dy < ε}, (C-3)

7

where 0 ≤ Φ ≤ 1 and Φ is Lipschitz continuous. Hence there exists M > 0 such that |Φ(y1) −

Φ(y2)| < M∑m

j=1 |y1j − y2j|. Set f0i = f0(yi − xTi β0) and Φi(y) = Φ(yi − xT

i β0). Hence, Ef0iΦi =

Ef0Φ.

EfβiΦi(y) =

∫Φifβi(y)

=

∫Φ(y)f(y −

{xTi β − xT

i β0

})dy

≥∫

Φ(y −{xTi β − xT

i β0

})f(y −

{xTi β − xT

i β0

})dy

−∫|Φ(y)− Φ(y −

{xTi β − xT

i β0

})|f(y −

{xTi β − xT

i β0

})dy

=

∫Φ(y −

{xTi β − xT

i β0

})f(y −

{xTi β − xT

i β0

})dy

−∫|Φ(y)− Φ(y − xT

i {β − β0}) |f(y − xTi {β − β0})dy

≥∫

Φ(y)f(y)dy −M ‖xi‖ ‖β − β0‖

≥∫

Φ(y)f(y)dy −M1 ‖β − β0‖.

Hence, EfβiΦi(y) ≥ Ef0Φi(y)+ε−M1δ for any f ∈ U c and for some constant M1 > 0 depending

on M .We complete the proof by choosing the δ sufficiently small and applying the Proposition 3.1

of Amewou-Atisso et al. (2003).

The proof of the existence of consistent sequence of tests for H0 : (f, β) = (f0, β0) against

H0 : (f, β) ∈ W2n ∩ Θn follows from the Proposition 3.1 of Amewou-Atisso et al. (2003) (when

applied to ‖β − β0‖ > δ).

Condition 4:

Note that V (f0, f(M)) ≤

∫| log f (M) − log f0|2f0 is finite as | log f (M)|2 | log f (M)| are bounded by

integrable functions. Hence condition 4 trivially follows from the Lemma1

8

Web Appendix D: Additional simulation studies

Simulation 2

For this simulation study, we compare the performance of the regression estimates based on our

SMS and PGM models with the regression estimates based on the competing parametric skew-

t (ST) model. Unlike the simulation study-1, we compare these competing methods when the

true error densities do not follow the SMS model assumption of equation (2). For this purpose,

we simulate N = 100 replications of data sets of bivariate responses for sample size n = 200

and another 100 replications for n = 400. For each replication of dataset, we first generate

µij = β0j + β1jxij1 + β2jxij2 for i = 1, . . . , n and j = 1, 2 with true parameters (β11, β21, β12, β22)

= (0.5, 1, 0.5, 1) and random covariates xij1 ∼ Ber(0.5) and xij2 ∼ Half-Normal(0, 1). Then we

simulate the bivariate response Yi = (Yi1, Yi2) with marginal densities Yi1 ∼ N(µi1, 1) and Yi2 ∼

Exp( 1µi2

) using inverse probability transformation while incorporating the within-pair association

via a Gaussian copula with correlation ρ = 0.7. The resulting marginal regression model is

Yij = β0j + xij1β1j + xij2β2j + εij for i = 1, . . . , n and j = 1, 2.

The comparison of three competing methods of estimation are based on the average (based on

N = 100 replications) mean-squared error (MSE), estimated standard error (SE), percentage of

relative bias (RB%), and approximate coverage probability (CP) of the 95% interval estimates of

the regression parameters (β11, β21; β12, β22). The summary of the results is given in Table T1.

For each method, the MSE, SE as well as the RB of the parameter estimates for n = 400

are lower than corresponding measures for the estimates with sample-size n = 50, 200. Overall,

the SMS method yields better CP for each 95% interval-estimate, compared to the parameteric

methods– PGM and ST. This is not surprising because the estimates obtained from methods

using wrong parametric assumptions are expected to be less robust than the estimates obtained

from semiparametric methods. However, it is important to note that even the estimates of the

regression parameters of Y1 from parametric joint models are performing worse than the corre-

sponding estimates from semiparametric joint models, even when the parametric assumptions are

only erroneous for Y2.

Under this simulation setting, the true distributional form of the component Y2 is not from

the SMS class in equations (2)-(4) of the paper. So, we report the summary results only for α1 in

the last rows of Table T1. Under this simulation model, the true value of the skewness parameter

9

Tab

leT

1:F

orsa

mple

size

sn

=20

0an

d40

0,th

esi

mula

tion

resu

lts

toco

mpar

eth

eap

pro

xim

ate

(bas

edon

100

replica

tion

s)M

ean

Squar

edE

rror

(MSE

),av

erag

eSta

ndar

dE

rror

(SE

),p

erce

nta

geof

Rel

ativ

eB

ias

(RB

%),

and

Cov

erag

eP

robab

ilit

y(C

P)

of95

%in

terv

ales

tim

ates

from

thre

eco

mp

etin

gm

ethods:

SM

S,

PG

Man

dST

.T

he

dat

aset

sar

esi

mula

ted

usi

ng

Gau

ssia

nco

pula

wit

has

soci

atio

nρ

12

=0.

7,tr

ue

par

amet

erva

lues

(β11,β

21,β

12,β

22)

=(0

.5,1

,0.5

,1),

and

univ

aria

teG

auss

ian

asth

em

argi

nal

den

sity

ofY

1

and

the

exp

onen

tial

den

sity

asth

em

argi

nal

den

sity

ofY

2.

SM

SP

GM

ST

n=

50M

ean

MS

ES

ER

B%

CP

Mea

nM

SE

SE

RB

%C

PM

ean

MS

ES

ER

B%

CP

β11

0.52

280.

0418

0.22

212.1

0.9

60.5

106

0.0

372

0.2

246

2.1

0.9

80.5

114

0.0

464

0.2

273

2.3

0.9

6β21

1.05

190.

0362

0.19

634.8

0.9

51.0

172

0.0

283

0.1

932

1.7

0.9

81.0

023

0.0

324

0.1

935

2.1

0.9

5β12

0.40

700.

1083

0.30

879.6

0.9

10.4

349

0.0

924

0.3

372

12.6

0.9

40.4

342

0.0

894

0.3

233

13.2

0.9

4β22

0.92

470.

0963

0.28

087.5

0.9

40.8

616

0.0

943

0.2

924

13.8

0.9

30.8

574

0.0

933

0.2

864

14.3

0.9

4α1

0.25

070.

1899

0.51

62–

0.9

70.4

398

0.2

863

0.5

604

–0.7

00.5

029

0.3

932

0.5

322

–0.8

3

n=

200

Mea

nM

SE

SE

RB

%C

PM

ean

MS

ES

ER

B%

CP

Mea

nM

SE

SE

RB

%C

P

β11

0.51

850.

0105

0.10

503.7

0.9

40.5

191

0.0

103

0.1

075

3.8

0.9

70.5

182

0.0

111

0.1

074

3.6

0.9

7β21

1.00

970.

0069

0.08

92<

10.9

61.0

012

0.0

064

0.0

900

<1

0.9

60.9

996

0.0

065

0.0

892

<1

0.9

8β12

0.42

550.

0392

0.15

0515

0.9

10.4

360

0.0

274

0.1

580

13

0.9

70.4

383

0.0

284

0.1

576

13

0.9

1β22

0.88

200.

0393

0.13

1714

0.9

60.8

850

0.0

350

0.1

344

12

0.8

10.8

919

0.0

354

0.1

360

11

0.7

9α1

0.19

210.

1250

0.37

45–

0.9

30.4

111

0.2

119

0.3

699

–0.8

40.4

905

0.3

115

0.3

309

–0.8

4

n=

400

Mea

nM

SE

SE

RB

%C

PM

ean

MS

ES

ER

B%

CP

Mea

nM

SE

SE

RB

%C

P

β11

0.49

900.

0061

0.07

52<

10.9

70.5

001

0.0

059

0.0

755

<1

0.9

60.5

012

0.0

060

0.0

758

<1

0.9

5β21

0.99

450.

0041

0.06

34<

10.9

20.9

919

0.0

052

0.0

635

<1

0.9

10.9

935

0.0

051

0.0

633

<1

0.7

8β12

0.43

280.

0208

0.10

6312

0.9

10.4

385

0.0

186

0.1

132

12

0.9

00.4

398

0.0

176

0.1

118

12

0.9

0β22

0.85

180.

0323

0.09

2414

0.6

50.8

604

0.0

318

0.0

957

14

0.6

50.8

639

0.0

298

0.0

968

14

0.6

6α1

0.19

450.

1156

0.30

15–

0.9

20.4

523

0.2

356

0.2

956

–0.7

00.5

173

0.3

248

0.2

603

–0.6

5

10

α1 corresponding to the component Y1 is 0 (with Gaussian marginal density). Under the SMS

model, the average (based on 100 replications) estimated value of α1 (0.19 for n = 200, 400 and

0.25 for n = 50) is substantially smaller than the average estimates from the methods based on

parametric models PGM and ST. Even the approximate CP (0.93 and 0.92 respectively for n = 200

and n = 400 and 0.97 for n = 50) of the interval estimates of α1 are better than corresponding

CP of interval estimates from parametric methods. Moreover, these coverage probabilities from

parametric methods seem to get substantially worse for increasing the sample size n!

Under this simulation model, the marginal density of the response component Y2 is exponential

with Pearsonian skewness = 2. Under the SMS model, the average estimated value for the Pear-

sonian skewness parameter for the residuals is 1.88, 1.93 and 1.98 respectively for n = 50,n = 200

and n = 400. This suggests that our SMS model captures the true Pearsonian skewness of the

response even when the true marginal density is different from that of the SMS model. This is a

valuable practical advantage for analysis of many real life biomedical studies, when the goal is to

estimate the covariate effects as well as to account for the level of skewness.

Simulation 3

The goal of this small scale simulation study is to compare the performance of the regression

estimates obtained from our SMS and PGM models to the regression estimates obtained from

the generalized estimating equations (GEE), the popular analysis tool for repeated measures. We

want to make this comparison even when the true distribution of the errors is not a member of the

class of SMS models in equations (2)-(4). For this purpose, we simulate N = 100 replications of

datasets of sample-size n = 50, 200 from a bivariate repeated measures model with true marginal

regression model Yij = β0 + xij1β1 + xij2β2 + εij for i = 1, . . . , n and j = 1, 2 (same regression

parameters (β1, β2) for both components Yi1, Yi2). The bivariate joint density of error (εi1, εi2)

is same as in the Simulation 2 with Gaussian marginal density for Y1 and exponential marginal

density for Y2. The summary of the results is given in Table T2.

Given that the estimates of the skewness parameters α = (α1, α2) are not estimable under

GEE, we do not report them in the summary. Also, the intercepts of SMS and PGM models are

not comparable to intercept β0 under GEE. Hence, following Simulation 2, we report the MSE, SE,

RB% and CP summaries (based on 100 replicates) for only the regression parameters β1 and β2.

11

We observe that the GEE produces slightly biased regression estimates compared to the estimates

obtained using SMS and PGM models. Compared to the GEE estimates, the SMS and PGM

models yield smaller MSE (at least 30% reduction), average SE (at least 20% reduction) as well

as smaller RB% even when the sample size is as small as n = 50. Thus, even when the true error

model is not same as the assumptions of the SMS model, Bayesian estimates of the regression

parameters derived from our semiparametric SMS model are comparable to the estimates from

parametric model, and more precise than those obtained via the GEE method.

12

Tab

leT

2:B

ased

on10

0re

plica

tes

ofth

edat

a-se

tsof

sam

ple

sizen

=50,2

00,

the

sim

ula

tion

resu

lts

com

par

ing

the

appro

xim

ate

Mea

nSquar

edE

rror

(MSE

),Sta

ndar

dE

rror

(SE

),R

elat

ive

Bia

s(R

B),

and

Cov

erag

eP

robab

ilit

y(C

P)

ofre

gres

sion

esti

mat

esfr

omth

ree

com

pet

ing

met

hods

(SM

S,

par

amet

ric

PG

Man

dG

EE

).T

he

true

sim

ula

tion

model

ofbiv

aria

tere

pea

ted

mea

sure

sis

Gau

ssia

nco

pula

wit

has

soci

atio

npar

amet

erρ

12

=0.

7,tr

ue

regr

essi

onpar

amet

ers

(β1,β

2)

=(0

.5,1

),an

dw

ith

Gau

ssan

and

exp

onen

tial

mar

ginal

den

siti

es.

SM

SP

GM

GE

E

n=

50M

ean

MS

ES

ER

B%

CP

Mea

nM

SE

SE

RB

%C

PM

ean

MS

ES

ER

B%

CP

β1

0.48

170.

0307

0.18

553.6

0.9

50.4

935

0.0

286

0.1

900

1.4

0.9

40.4

734

0.0

509

0.2

35.3

0.9

4β2

0.96

460.

0229

0.15

943.5

0.9

70.9

898

0.0

190

0.1

616

10.9

60.9

894

0.0

370

0.2

053

1.1

0.9

5

n=

200

Mea

nM

SE

SE

RB

%C

PM

ean

MS

ES

ER

B%

CP

Mea

nM

SE

SE

RB

%C

P

β1

0.50

150.

0096

0.08

89<

10.9

60.5

102

0.0

082

0.0

911

10.9

60.5

177

0.0

139

0.1

148

3.5

0.9

5β2

0.99

120.

0069

0.07

47<

10.9

41.0

045

0.0

061

0.0

761

<1

0.9

41.0

194

0.0

116

0.1

034

1.9

0.9

4

13

Web Appendix E: Computer codes

R and JAGS codes for fitting the SMS and PGM models to simulated data are available in the

GitHub link:


14

References

Amewou-Atisso, M., Ghosal, S., Ghosh, J. K., Ramamoorthi, R., et al. (2003). Posterior consis-

tency for semi-parametric regression problems. Bernoulli 9, 291–312.

Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal

of Statistics 12, 171–178.

Azzalini, A. and Dalla Valle, A. (1996). The multivariate skew-normal distribution. Biometrika

83, 715–726.

Chang, S.-C. and Zimmerman, D. L. (2016). Skew-normal antedependence models for skewed

longitudinal data. Biometrika 103, 363–376.

Pati, D. and Dunson, D. B. (2014). Bayesian nonparametric regression with varying residual

density. Annals of the Institute of Statistical Mathematics 66, 1–31.

Tang, Y., Sinha, D., Pati, D., Lipsitz, S., and Lipshultz, S. (2015). Bayesian partial linear model

for skewed longitudinal data. Biostatistics (Oxford, England) 16, 441–453.

Wu, Y. and Ghosal, S. (2008). Kullback Leibler property of kernel mixture priors in Bayesian

density estimation. Electronic Journal of Statistics 2, 298–331.

15

Semiparametric Bayesian latent variable regression for skewed …dbandyop/SkewPaperCombined.pdf ·...

Documents

Transcript of Semiparametric Bayesian latent variable regression for skewed …dbandyop/SkewPaperCombined.pdf ·...