Semiparametric Bayesian latent variable regression for skewed …dbandyop/SkewPaperCombined.pdf ·...
Transcript of Semiparametric Bayesian latent variable regression for skewed …dbandyop/SkewPaperCombined.pdf ·...
Biometrics , 1–20
Semiparametric Bayesian latent variable regression for skewed multivariate
data
Apurva Bhingare 1,∗, Debajyoti Sinha2, Debdeep Pati3, Dipankar Bandyopadhyay4
and Stuart R. Lipsitz5
1Eunice Kennedy Shriver National Institute of Child Health & Human Development, Bethesda, MD
2Department of Statistics, Florida State University, Tallahassee, FL
3Department of Statistics, Texas A & M University, College Station, TX
4Department of Biostatistics, Virginia Commonwealth University, Richmond, VA
5Department of Medicine, Brigham and Women’s Hospital, Boston, MA
*email: [email protected]
Summary: For many real-life studies with skewed multivariate responses, the level of skewness and association
structure assumptions are essential for evaluating the covariate effects on the response and its predictive distribution.
We present a novel semiparametric multivariate model and associated Bayesian analysis for multivariate skewed
responses. Similar to multivariate Gaussian densities, this multivariate model is closed under marginalization, allows
a wide class of multivariate associations, and has meaningful physical interpretations of skewness levels and covariate
effects on the marginal density. Other desirable properties of our model include the Markov Chain Monte Carlo com-
putation through available statistical software, and the assurance of consistent Bayesian estimates of the parameters
and the nonparametric error density under a set of plausible prior assumptions. We illustrate the practical advantages
of our methods over existing alternatives via simulation studies and the analysis of a clinical study on periodontal
disease.
Key words: Dirichlet process; Kernel density; Markov Chain Monte Carlo; Periodontal disease; Skewed error
This paper has been submitted for consideration for publication in Biometrics
Bayesian semiparametric multivariate skewed models 1
1 Introduction
In many biomedical and health-care cost studies, we often come across highly skewed
multivariate responses. For example, the preliminary analysis (see Figure 2) of clinical
data from a periodontal disease (PD) study (Fernandes et al., 2009) of Gullah-speaking
African-Americans diabetics (henceforth, GAAD study) clearly reveals that the two major
(correlated) clinical endpoints of subject-level PD status (Greenstein, 1997)– the mean
(average) periodontal pocket depth (PPD) and the mean clinical attachment level (CAL)
of all available teeth of each subject (cluster) are highly non-Gaussian (skewed), with the
skewness of CAL being different from the skewness of PPD. While the CAL is the accepted
‘gold standard’ endpoint for measuring the progression and history of PD, the PPD measures
the current disease activity. Also, CAL can occur in presence, or absence of PPD, and
vice versa. Hence, a holistic approach towards assessing the impact of a covariate (say,
diabetes status) on PD should consider the effect on the joint distribution of CAL and
PPD. A typical multivariate (normal) regression analysis that ignores non-Gaussianity (say,
skewness of CAL) may lead to biased parameter estimates (Azzalini and Capitanio, 2003),
and the corresponding predictive density. Although analysis may proceed transforming the
multivariate skewed responses to multivariate normal, difficulties remain in choosing a suit-
able transformation, and interpreting the covariate effects on the original non-transformed
responses. As alternatives, the quantile regression (QR) methods (Koenker, 2005) and their
multivariate extensions mostly focus on a single pre-specified quantile, without evaluating
the levels of skewness, and the whole joint density of the PPD and CAL given covariates.
This paper addresses a number of major challenges and limitations of existing model-
based regression analysis of multivariate skewed response data in biostatistical applications.
First, many recently developed model-based approaches (Genton, 2004; Sahu et al., 2003;
Bandyopadhyay et al., 2012) centers around the restricted parametric family pioneered
2 Biometrics,
by Azzalini and Dalla Valle (1996). Our less-restrictive semiparametric proposition with
skewed nonparametric marginal error densities will remain attractive under concerns about
the validity and consequence of the parametric assumptions on the error density. Second,
unlike most existing multivariate skewed models, our elegant model class is closed under
marginalization. Specifically, the distribution of any subset of the response vector is within
the same multivariate distribution class of the original vector, however, the smaller vector’s
density involves only the relevant subset of the parameters. So, unlike existing models, where,
say, the marginal variance and skewness of the CAL depend on the parameters of PPD, the
marginal variance and skewness of CAL in our model do not depend on the parameters
of PPD. This property alleviates the major roadblock for component-wise interpretation
of skewness and covariate effects, and also paves way for specifying priors based solely on
available marginal opinion for each response. Third, the accommodation of a flexible class of
multivariate associations (such as antedependence, toeplitz, or autoregressive structures) via
a covariance matrix continues to remain a major challenge. Only a recent proposal (Chang
and Zimmerman, 2016) accommodates the antependence association, however, at the cost
of a strict condition on the skewness parameters. Our Bayesian semiparametric formulation
allows a broader class of multivariate associations (including the aforementioned structures)
within the response vector, without imposing any restriction on skewness parameters. Fourth,
our proposal is computationally scalable, with easy implementation in freeware like R and
JAGS using available Markov Chain Monte Carlo (MCMC) tools. Finally, we also explore
and present some desirable asymptotic justification (Bayesian posterior consistency) of our
method under practically reasonable prior assumptions.
The rest of the paper proceeds as follows. After introducing our semiparametric model in
Section 2, we present the relevant components of Bayesian inference, such as prior choices,
likelihood specifications, posteriors, and the MCMC implementation routines in Section 3.
Bayesian semiparametric multivariate skewed models 3
In Sections 4 and 5, we assess the posterior consistency and the finite sample properties of
the Bayesian estimates (via simulation studies), respectively. In Section 6, we illustrate our
methods via analyzing the motivating GAAD dataset. Finally, some concluding remarks are
presented in Section 7.
2 Semiparametric Model for Multivariate Skewed Response
Let Yi = (Yi1, . . . , Yim) ∈ Rm be a row vector of skewed responses from subject (cluster)
i = 1, . . . , n. For brevity, we use fixed number of m components per subject, which can be
generalized to accommodate unequal mi. Our regression model is
Yij = µij + εij, for j = 1, . . . ,m, (1)
where µij = xijβj, xij = (1, xij1, . . . , xij,p−1) is the row vector of (p−1) covariates for Yij (with
first term for the intercept), βj is the corresponding (p× 1) vector of regression parameters,
and εi = (εi1, . . . , εim) is the row vector of errors following a m-variate multivariate density
fm(ε), with possibly skewed marginal densities. For the PD study, m = 2 for every subject.
We introduce a new class of semiparametric multivariate skew (SMS) density for fm(ε),
denoted by εi ∼ SMS(0, h, Rρ, α), where Rρ is a (m×m) correlation matrix, h = (h1, . . . , hm)
is a vector of m unknown nonparametric (baseline) symmetric univariate densities, and
α = (α1, . . . , αm) ∈ Rm is a vector of skewness parameters. This SMS(0, h, Rρ, α) density
has the following stochastic representation
εTi = Aα|Z1i|+ A∗αZ2i , (2)
where Z1i = (Z1i1, . . . , Z1im)T and Z2i = (Z2i1, . . . , Z2im)T are two independent (m × 1)
vectors of latent variables with hj(·) being the common marginal density for Z1ij and Z2ij,
Aα and A∗α are diagonal matrices with the jth diagonal elements aj = αj(1 + α2j )−1/2 and
a∗j = (1 + α2j )−1/2 respectively. The univariate error εij in (2) can also be expressed as
εij = ajZ2ij + a∗j |Z1ij|, using the “skewing shock” |Z1ij| to the symmetric random noise
Z2ij with weights a∗j and aj, such that a2j + a∗j2 = 1. Only when the skewness parameters
(α1, . . . , αm) are all equal to zero, the εi of (2) equals Z2i with symmetric marginal densities
4 Biometrics,
hj(·). We assume m “skewing shocks” Z1ij, j = 1, . . . ,m to be independent with marginal
densities h1(·), . . . , hm(·), and we model the association within εi via the m-variate Gaussian
Copula (Nelsen, 2007)
Cm(Z2i | h,Rρ) = φm[Φ−11 {H1(z2i1)}, . . . ,Φ−11 {Hm(z2im)} | Rρ
]×
m∏j=1
hj(z2ij)
φ1
[Φ−11 {Hj(z2ij)}
](3)
for Z2i, where φm(. | M) is the m-variate Gaussian density with mean 0 and variance-
covariance matrix M , Rρ is a correlation matrix with unknown parameter vector ρ, Φ−11 (.)
is the inverse cdf of N(0, 1), and Hj(·) is the cdf of the unknown symmetric density hj(·).
The copula structure of (3) ensures that Z2ij has the nonparametric marginal density hj(·)
(same as Z1ij) for j = 1, . . . ,m. We model hj(·) as a scale mixture of symmetric (around 0)
density kernel K(· | σ), given by
hj(u) =
∫ ∞0
K(u | σ)dGj(σ), (4)
where Gj(·) is the unknown nonparametric mixing distribution of scale σ with support in
R+ = (0,∞). In this paper, we use the popular Gaussian kernel N(0, σ2) as the choice for
kernel K(· | σ) in (4). However, other choices are possible. For example, using the result of
Feller (1971) (p. 158), when the kernel K(· | σ) is Uniform(−σ, σ), the nonparametric hj(u)
in (4) represents the class of all unimodal and symmetric around 0 densities. The semipara-
metric Bayesian inference proceeds via specifying a Dirichlet process prior (Ferguson, 1974;
Muller et al., 2015) DP(G0j, C) on Gj(·), with pre-specified prior mean G0j(·) and precision
C. In Section 3, we explain that this DP(G0j, C) prior process on Gj(·) induces a flexible
prior on the nonparametric density hj(·) in (4).
When Z1i and Z2i in (2) are assumed to have parametric multivariate elliptical densities,
the resulting model is similar to the “multiple skewed shocks” model of Sahu et al. (2003).
However, our semiparametric model with nonparametric marginals has two additional crucial
differences: (a) a new parametrization of weights (Aα, A∗α), and (b) a multivariate copula
Bayesian semiparametric multivariate skewed models 5
modeling of Z2i as in (3). Also, our model is applicable even in studies where no actual
latent skewing shock vectors are clinically interpretable in practice. Later, we demonstrate
that the latent vector representation in (2) provides computational convenience.
Now, a parametric subclass of the semiparametric SMS(0, h, Rρ, α) density of (2)-(3) is
obtained when Z1i and Z2i in (2) have parametric mean-zero multivariate normal densities
Nm(0, D2σ) and Nm(0,Σ) respectively with variance-covariance matrices Σ = DσRρDσ, where
Dσ = diag(σ1, . . . , σm), and Rρ is a positive definite correlation matrix. We call this a
(multivariate) parametric Gaussian mixture (PGM) model, denoted by εi ∼ PGM(0, α,Σ).
Despite similar parametrization based on (α,Σ), our parametric PGM model has key dif-
ferences with the location 0 parametric multivariate skew-normal (MSN) model of Azza-
lini and Dalla Valle (1996), denoted by εi ∼ MSN(0, α,Σ), with the corresponding joint
density fm(εi1, . . . , εim) = 2φm(εi|Σ)Φ1(εTi α). The marginal distribution of each compo-
nent εij for PGM(0, α,Σ) density is the univariate skew-normal (SN) density fj(εij) =
2φ1(εij|σj)Φ1(αjεij) denoted by SN(0, αj, σj) and dependent only on the component-specific
parameters (αj, σj), whereas, for the MSN(0, α,Σ) model of Azzalini and Dalla Valle (1996),
the corresponding univariate marginal density is SN(0, αj, σj), however, with parameter αj
being a function of the entire (α,Σ) (Chang and Zimmerman, 2016). See Appendix A for
details on the expression of αj. In general, similar to the parametric formulation of Sahu
et al. (2003), the model class in (2) is closed under marginalization, because the marginal
density of any subset ε(1)i of εi = (ε
(1)i , ε
(2)i ) is from the same class of semiparametric density
SMS(0, h(1), R11, α(1)) indexed by (h(1), R11, α
(1)), where α = (α(1), α(2)), (h1, . . . , hm) =
(h(1), h(2)) and matrix R =
R11 R12
R12 R22
are partitions of (α, h,R). Unlike the multivariate
model of (2) based on multiple “skewing shocks” |Z1i1|, . . . , |Z1im|, the existing multivariate
models, such as MSN(0, α,Σ) are based on univariate “skewing shock” |Z∗1i| (common to
all m components). As a consequence, these models are closed under marginalization only
6 Biometrics,
in a restricted sense because εi ∼ MSN(0, α,Σ) ⇒ ε(1)i ∼ MSN(0, α∗(1),Σ
∗(1)), where the
parameters (α∗(1),Σ∗(1)) are different from (α(1),Σ11), and in fact, they are even functions
of (α(2),Σ12,Σ22) (see Theorem 1 of Chang and Zimmerman (2016) for details). For the
bivariate PD response in GAAD study, the existing MSN models are essentially based on a
subject-specific univariate skewing shock |Z∗1i| common to both CAL and PPD, compared to
the paired skewing shocks (|Z1i|, |Z2i|) in model (2). Consequently, α1, depends on α1, as well
as (α2, σ2, ρ). This complicates the marginal interpretation and Bayesian prior specification
of α1 for the MSN model.
To further demonstrate the lack of interpretation of α as a skewness parameter within
the MSN model compared to the straightforward interpretation within our model, Figure 1
presents the contour plots and marginal densities of the bivariate MSN (Chang and Zim-
merman, 2016) model (right panel) and our bivariate PGM model (left panel), when both
models have common skewness parameter α1 = α2 = α0 = 10, ρ = −0.7 and σ1 = σ2 = 1.
While Figure 1(a) demonstrate substantial positive skewness of both components from the
bivariate PGM model (as expected from the chosen α0 as high as 10), both the (near elliptical)
contour and marginal density plots in Figure 1(b) indicate approximately symmetric marginal
densities from the MSN model. This illustrates the fact that, depending on the value of ρ and
other parameters of the model, the value (even its sign) of the marginal Pearsonian skewness
of the MSN model may be very different from the value of α0. However, for our multivariate
model in (2), the Pearsonian skewness of each marginal density does not depend on ρ and
scale parameters of other components. Further comparisons of contour plots of the bivariate
PGM with the bivariate SN density for various choices of α0 = 0, 2, 5, 10 and σ1 = σ2 = 1
with ρ = 0.7 are presented in Web Appendix B.
The marginal mean of Yi for our SMS model is E (Yij|Xi) = β∗j0 +∑p−1
l=1 xijlβjl with
β∗j0 = βj0 + {αj/(1 + α2j )
1/2}λj. The covariance matrix of Yi is var (Yi|Xi) = V , where
Bayesian semiparametric multivariate skewed models 7
the diagonal elements Vjj = σ2hj − {α2
j/(1 + α2j )}λ2j , and the off-diagonal elements Vjk =
σhjka∗ja∗k, with weight a∗j = 1/(1 + α2
j )1/2 for all j 6= k ∈ {1, . . . ,m}. Here, the common
expectations E(|Z1ij|) = λj and variances V ar(Z1ij) = V ar(Z2ij) = σ2hj are taken with
respect to the nonparametric marginal density hj(·) of Z1ij and Z2ij, given in (4). The
covariance Cov(Z2ij, Z2ik) = E[Z2ijZ2ik] = σhjk involves the expectation with respect to the
bivariate density of the pair (Z2j, Z2k) based on the copula model of (3). For the GAAD
study, the correlation Corr(εi1, εi2) between PPD and CAL responses within subject/cluster
i is σh12a∗1a∗2/[{σ2
h1 − a21λ21}{σ2h2 − a22λ22}]1/2, where σh12 = E[Z1Z2] is the expectation taken
with respect to the bivariate version of (3) for m = 2.
3 Bayesian Inference: Likelihood, Prior and Posterior
In this section, we develop the semiparametric Bayesian inferential framework for the SMS
model. Given observed data D = {yi, Xi; i = 1, . . . , n} from n independent multivariate
responses, the likelihood function of the parameter space Θ = (β, α, ρ, σ,G) is
L(Θ | D) =n∏i=1
fm(yi −Xiβ), (5)
where fm(εi) is the integral∫Cm(A∗−1α {εi −Aα|Z1i|} | h,Rρ)[
∏mj=1 hj(Z1ij)] dZ1i, taken over
the support Rm of latent vectors Z1i in (2), hj(·) is the independent marginal density of
Z1ij, εi = yi − Xiβ, and Cm(· | h,Rρ) is the copula specification in (3). Clearly, fm(·) of
εi = yi −Xiβ in (2) has a complicated analytical expression. Based on the likelihood in (5),
the joint posterior density is
p(Θ | D) ∝ L(Θ | D)π1(β)π2(α)π3(ρ)π4(G) , (6)
with the following priors.
(i) The priors π1(β) and π2(α) for β and α are independent mean zero multivariate normal
with pre-specified covariance matrices Mβ and Mα, respectively. In particular, we choose
these as σ2I, where σ2 = 100.
(ii) For the GAAD study with bivariate response, the prior π3(ρ) of the scalar parameter ρ in
8 Biometrics,
Rρ (3) is Uniform(−1, 1). However, for a vector ρ, this π3(ρ) prior has to be a multivariate
density with an appropriate support.
(iii) The joint prior process π4(G) of G = (G1, . . . , Gm) is the product of m independent Dirichlet
processes DP(G0j, C), for j = 1, . . . ,m (Ferguson, 1974), where the prior mean G0j of Gj is
specified based on the “prior guess” h0j(u) =∫∞0K(u|σ)dG0j(σ) of the density hj(u) of Z1ij
in (4). The user-specified concentration/precision parameter C is the measure of uncertainty
of Gj around G0j; the larger the value of C, the closer the sample-paths of hj(u) are to the
“prior guess” h0j(u), while smaller values of C allow the sample-path of hj(u) to be very
different from h0j(u).
The posterior in (6) is analytically intractable, hence, we proceed via MCMC sampling
from the joint distribution
p(Θ, Z1 | D) ∝ [n∏i=1
φm(Wi | 0, Rρ)m∏j=1
hj(Z1ij)]π1(β)π2(α)π3(ρ)π4(G) , (7)
where φm(u|0, R) with (m × m) covariance matrix R is a mean-zero multivariate normal
density evaluated at u ∈ Rm, Wi = (Wi1, . . . ,Wim) with Wij = Φ−11 [Hj{(1 + α2j )
1/2(yij −
Xijβ)−αj|Z1ij|}], Φ−11 is the inverse-cdf of the standard normal density, and Hj(·) = Hj(·|Gj)
is the cdf corresponding to the density hj(·|Gj). For the parametric case, hj(·|Gj) and π4(G)
are replaced with the parametric density hj(.|σj) and the joint parametric prior of π4(σ).
For example, when hj(·|σj) is N(0, σ2j ) density with unknown σj, we use the parametric
prior distributions σj ∼ gj(·|γj) independently for j = 1, . . . ,m with pre-specified hyperpa-
rameters γj. For this special case, Wij simplifies to {(1 + α2j )
1/2(yij −Xijβ) − αj|Z1ij|}/σj.
For a semiparametric analysis with a particular symmetric density kernel K(z|σ) for the
nonparametric kernel-mixture densities h = (h1, . . . , hm) in (4), we implement the joint
density in (7) by assuming (a) (Wi1, . . . ,Wim) for i = 1, · · · , n are mean-zero multivariate
Nm(0, Rρ), where Wij = Φ−11 [Kij{(1 + α2j )
1/2(yij −Xijβ)− αj|Z1ij|}], and Kij and K−1ij are
the cdf and inverse-cdf of K(·|σij); (b) Z1ij for j = 1, . . . ,m are independent with density
Bayesian semiparametric multivariate skewed models 9
K(·|σij), and (c) σij for j = 1, . . . ,m are independent with nonparametric cdf Gj following
Dirichlet Process (DP) prior DP(G0j, C). For the Gaussian kernel in (4), Kij(u) = Φ(u/σij)
and Wij further simplifies to Wij = {(1 + α2j )
1/2(yij −Xijβ)− αj|Z1ij|}/σij.
For convenient MCMC implementation, we use the constructive definition of the DP
(Sethuraman, 1994), truncated at a user specified finite number (K0) of components for the
prior on Gj. Suppose (δ1, . . . , δK0) are generated independently from G0j, (B1, . . . , BK0−1)
are generated independently from Beta(1, C), ω1 = B1, ωh = Bk
∏l<k(1 − Bl) for k =
2, . . . ,K0 − 1 and ωK0 = 1 −∑K0−1
l=1 ωl. Then, the sample path of Gj is approximated by
Gj =∑K0
l=1 ωlIδl , when K0 is large. These likelihood steps and priors facilitate easy Bayesian
semiparametric implementation for either the Gaussian or uniform kernels using available
freeware, such as WinBUGS/JAGS.
4 Posterior Consistency
For theoretically valid inference from a complex semiparametric Bayesian model, it is
important to assure that the parameter posteriors become increasingly concentrated around
the true parameter values with increasing sample size. This necessitates the investigation of
the asymptotic properties, along with the finite sample properties. In this section, we provide
sufficient conditions for posterior consistency, the most important asymptotic property. We
present our theory for the case where the dimension of the regression parameter, β, is p.
We first investigate if the support of the prior is large enough to cover all possible relevant
densities. Suppose F denotes the class of all univariate asymmetric unimodal residual den-
sities and C{(F)m × (0, 1)} denotes the collection of all m-dimensional residual densities f0
with unknown correlation ρjk ∈ (0, 1) having the form,
f0(e) =
∫Rm+
2m|J |m∏j=1
h0j(zj)h0j(e∗j)
φ1
[Φ−1{H0j(e∗j)}
]φm [Φ−1{H01(e∗1)}, . . . ,Φ−1{H0m(e∗m)}|Rρ
]dz,
with the j-th marginal density given by f0j(ej) =∫R+
2√
1 + α2jh0j(zj)h0j(e
∗j)dzj j =
1, . . . ,m, where h0j(·) is the true (unknown) symmetric around 0 density, H0j(·) is the
10 Biometrics,
corresponding cumulative distribution function, e∗j =√
1 + α2jej − αzj, and the Jacobian
of transformation |J | is |J | =
∣∣∣∣∣∣∣I 0
A∗−1α −A∗−1α Aα
∣∣∣∣∣∣∣. Let Π denote a prior on C{(F)m × (0, 1)},
such that Π = (πF)m ⊗ π3(ρ), where πF is a prior on F defined by Gj ∼ DP(G0j, C)
for j = 1, . . . ,m, α ∼ π2(α) and π3(ρ) is a prior on (0, 1). For ease of exposition, we
will assume that α is known. To show that posterior consistency holds at the true den-
sity f0, we first define the Kullback-Liebler divergence as KL(f0, f) =∫Rm f0 log(f0/f)
and Kullback-Leibler neighborhood of size ε as κε(f0) = {f : KL(f0, f) < ε}. For any
prior Π∗ on the density space F∗, f0 is said to be in the Kullback-Leibler support of
Π∗ denoted by KL(Π∗) if f0 ∈ S, where S = {f0 : Π∗(f : KL(f0, f) < ε)∀ε > 0}. Define
FKL ={f0 ∈ C{(F)m × (0, 1)},
∫f0| log f0| <∞
}. We characterize the Kullback-Leibler
support of Π in the following lemma, with its proof available in Web Appendix C.
Lemma 1: FKL ⊂ KL(Π) if G0j is defined on support R+ for all j = 1, . . . ,m and G0j
is absolutely continuous with respect to Lebesgue measure and supp(π3(ρ)) = (0, 1).
Suppose Dn is the observed data with sample size n, and β0 and f0 are the true values
of the regression parameter vector and the multivariate error density, respectively. We now
present the sufficient conditions to ensure that as the sample size n increases, the posterior
distributions of the parameter β and the error density f are concentrated around a small
neighborhood around their true values. We are essentially interested in the inference on the
regression parameter β; so, we define a strong L1 neighborhood of radius δ > 0 around
the true value β0 as the set S(β0) = {β : ‖β − β0‖ < δ}, and weak neighborhood around
f0 as Wε(f0) ={f : |
∫ϕf −
∫ϕf0| < ε
}for a bounded continuous function ϕ. Consider
U = Wε(f0) × S(β0) for any arbitrary δ > 0. The following theorem presents the result on
posterior consistency, with the proof given in Web Appendix C.
Theorem 1: Suppose (f0, β0) ∈ FKL×Rp. Consider a prior Π = (Π⊗Πβ) on FKL×Rp.
Bayesian semiparametric multivariate skewed models 11
Then, Π{(fm, β) ∈ U c|Dn} → 0 a.s. under the true data generating distribution Pf0,β0 that
generates data Dn.
5 Simulation Study
We conduct three simulation studies to compare the finite sample properties of our Bayesian
estimates to those obtained from competing models under various data generation schemes.
Here, we present results from Simulation 1, where the data is generated from the model in
(2). Details on Simulations 2 and 3 are presented in Web Appendix D, where the data are
generated under the competing parametric skew-t assumptions.
Simulation 1: We consider N = 500 replications of bivariate responses with sample size
n = 50. The bivariate skewed errors (εi1, εi2) were generated from the bivariate PGM model,
where α = (α1, α2) = (2, 2), Z1i ∼ N2(0, I) and Z2i ∼ N2(0, Rρ), Rρ is a (2 × 2) correlation
matrix with R12 = 0.7. The regression structure is Yij = β0 + xij1β1 + xij2β2 + εij, j =
1, 2 where (β0, β1, β2) = (2, 0.5, 0.5), with xij1 and xij2 sampled independently, such that
xij2 ∼ N(0, 1) and xij1 = 1, or −1, each with probability 0.5. This simulation model has
the same regression parameters (common β) for both components (different from separate
β for two responses in GAAD study). We fit the SMS model, where symmetric h1(·) and
h2(·) are expressed as unknown scale mixtures of Gaussian kernels. We also fit the PGM
model, assuming Z1ij ∼ N(0, σ2j ) and Z2i ∼ Nm(0,Σ), with (Σ)11 = σ2
1, (Σ)22 = σ22 and
(Σ)12 = ρσ1σ2. We use independent N(0, 1) priors for β0, β1 and β2, and also for the skewness
parameters α1 and α2. For the DP(G01, C) and DP(G02, C) priors corresponding to G1 and
G2 respectively, we assume the prior guess G01 and G02 to be a Gamma(2, 1) density and
C = 1, implying the prior guess for εij is mean 0 with ranges ±6, with high prior probability
for a symmetric true density. This also implies the skewing shock has range of (0, 6), with
a high probability. Although these prior guesses for G1 and G2 corresponds to a prior guess
12 Biometrics,
of f(εi) far away from the true density of εi, we would like to demonstrate that even this
guess produces posterior estimates with good finite sample properties. For the PGM, we
specify Gamma(2, 1) for σ21 = σ2
2. To test the performances, we use the mean squared error
MSE(θ) =∑N
k=1(θk − θ)2/N , where θk is the posterior estimate of θ from the kth simulated
dataset, k = 1, 2, . . . , N . We also report the relative bias (RB),∑N
k=1(θk − θ)/(Nθ) and the
coverage probability (CP) of the 95% interval estimates from these competing methods. In
conjunction, we also compared a generalized estimating equation (GEE) fit (Liang and Zeger,
1986), given that GEE is equivalent to a mixed model with a sandwich variance estimate,
although producing a biased marginal estimate of the intercept β∗0j = β0−{αj/(1+α2j )}1/2λj.
The results are summarized in Table 1.
We observe that both the SMS and the PGM models yield smaller MSE’s (at least 20%
reduction), and overall better CPs, compared to the GEE estimates. Unlike the GEE, our
methods provides the estimates of skewness, and more precise interval estimates of the
regression parameters. Also, under model misspecification (data simulated from the PGM
model), the posterior estimates from the SMS model are comparable to the PGM in terms of
MSE and RB. Additional simulation studies (see Simulations 2 and 3 in the Web Appendix D)
that avoid stringent assumptions on the true form of the error densities reveal the superiority
of the SMS and PGM models over various existing alternatives, such as the skew-t, and GEE.
6 Analysis of GAAD Study
The GAAD study aims to evaluate the PD status of the Gullah-speaking African-Americans
as specified by the subject-level (mean) CAL and PPD endpoints, and quantify the ef-
fects of subject-level covariates such as age (in years), gender (1=Female, 0=Male), Body
Mass Index or BMI (obese=1 if BMI >= 30, = 0, otherwise), glycemic/HbA1c status (1=
High/uncontrolled, 0 = controlled) and smoking status (1 = smoker, 0 = never smoker),
on these endpoints. For our analysis, we select n = 288 patients (subjects) with complete
Bayesian semiparametric multivariate skewed models 13
covariate information. About 31% of the subjects are smokers, with a mean age of 55 years,
ranging from 26-87 years. Female subjects are predominant in our data (about 76%), which
is not uncommon in Gullah population (Johnson-Spruill et al., 2009). Also, 68% of subjects
are obese (BMI >= 30), and 59% are with uncontrolled HbA1c level.
Here, the bivariate correlated responses, the mean PPD and mean CAL, calculated as
averages of the corresponding measurements across all sites and tooth for that subject, are
non-Gaussian (heavily skewed). Hence, the validity of estimation under a standard linear
mixed model (LMM) framework remains questionable. For further illustration, Figure 2
[panels (a)-(f)] presents the histograms of the responses, the histogram and Q-Q plot of
the empirical Bayes estimates of the random effects, and the histograms of the residuals,
after fitting LMMs separately to the responses using the nlme package in R. These plots
clearly reveal evidence of different degrees of skewness for the two error terms, and also
for the random subject effects. However, to avoid a overly complicated model, we assume
that the skewness of, say, CAL is the same for all subjects/clusters. To accommodate this,
we illustrate the application of our models developed in Sections 2 and 3 on this dataset.
Specifically, we compare the fit of the following 4 competing models:
(1) The proposed SMS model, with unknown symmetric densities h1(·) and h2(·) as nonpara-
metric scale mixtures of Gaussian kernels,
(2) The PGM model, with latent vectors Z1i ∼ N2(0, D2σ) and Z2i ∼ N2(0,Σ), where D2
σ =
diag(σ21, σ
22) and (Σ)11 = σ2
1, (Σ)22 = σ22, (Σ)12 = ρσ1σ2.
(3) The skew-t model (ST), with a Skew-t (Sahu et al., 2003) density for (εi1, εi2), with ν
degrees of freedom,
(4) The Bivariate Normal (BVN) model, with a symmetric N2(0,Σ) density for (εi1, εi2), ignor-
ing the skewness.
We use practically flat independent N(0, 100) priors for all the regression parameters
14 Biometrics,
(the components of β1 and β2), and the skewness parameters α1 and α2. For the unknown
mixing distributions G1 and G2, we choose independent DP(G0j, C) priors, with C = 1, and
assuming same Gamma(2, 1) density for the prior guess of G01 and G02. These prior guesses
matches with the prior choices for σ1 and σ2 in the PGM model, the skew-t and bivariate
normal models. For the degrees of freedom ν of the skew-t density, we choose Exp(0.1)I[2,∞),
the exponential density truncated at 2. For all four models, we use a Uniform(−1, 1) as
the prior for ρ. It is important to note that our choice of priors, although practically flat,
is primarily for illustration of our statistical methods, and may not represent the prior
opinion of clinical investigators. We generated two parallel MCMC chains of size 150,000 and
computed the posterior estimates after discarding the first 100,000 iterations (burn-in). To
guard against potential autocorrelation among successive iterations, we used a thinning of 25.
To assess model convergence, we use the trace plots, autocorrelation plots and the Gelman-
Rubin R. Compared to the SMS model, the PGM required larger number of iterations to
converge. The relevant R and JAGS codes for implementing these models on simulated data
are available at a GitHub repository (see Web Appendix E).
We use the Conditional Predictive Ordinate (cpo) statistic, (cpo)i =∫fm(yi − Xiβ |
Θ)p(Θ | D(−i))dΘ (Gelfand et al., 1992), to compare the model performances, where D(−i)
is the cross-validated data after deleting the ith observation from the full data D, and
p(Θ | D(−i)) is the posterior density of Θ given D(−i). The corresponding summary statistic
using the cpoi is the log of the pseudo-marginal likelihood (lpml), given by lpml =∑ni=1 log (cpoi). We also use the Watanabe-Akaike information criterion (waic) (Vehtari
et al., 2017), considered a state-of-the-art Bayesian model selection tool. The computations
of the waic and lpml are conveniently based on MCMC samples from the full posterior
distribution p(Θ | D). Models with larger lpml and smaller waic indicate more support for
observed data. These values suggest the PGM model to be the most appropriate (lpml=
Bayesian semiparametric multivariate skewed models 15
–127.882 and waic= 441.89), followed by the SMS model (lpml= –134.03 and waic=
516.88). Recall that the PGM model is the parametric version of the SMS model. The lpml
(waic) values for the skew-t and bivariate normal models are –244.546 (618.94) and –300.315
(654.60) respectively, both substantially smaller (larger) than our new semiparametric and
parametric Bayesian methods. These numbers also suggest that compared to existing para-
metric competitors, our proposal is more appropriate for the GAAD dataset.
The posterior estimates, standard deviations, and 95% credible intervals (CIs) of model
parameters obtained from fitting the PGM and SMS models are presented in Table 2, along
with the corresponding estimates (without the 95% CIs) from the skew-t and the BVN
models. The CIs for the skewness parameters α1 (for PPD) and α2 (for CAL) do not contain
0, and are positive under both PGM and SMS models, implying substantial evidence of
right-skewness for both responses, also revealed in Figure 2. However, the CIs for α1 for the
skew-t contains 0, and fail to detect skewness for PPD. Posterior estimates of the effect of
HbA1c on both PPD (not substantial data evidence under SMS) and CAL are positive, with
95% CIs excluding zero, implying higher glycemic status may lead to substantially higher
levels of PD. Among smokers, there is a trend (without strong evidence) of higher PPD and
CAL, compared to non-smokers, with significant evidence of higher CAL only from the PGM
model. Also, strong evidence of higher PPD and CAL are observed in males compared to
females, from both SMS and PGM models. However, under the skew-t and BVN models,
gender does not have enough posterior evidence. In addition, there is also not enough evidence
for the effects of age and BMI on PPD and CAL. On the overall, for most parameters, our
new proposals provide more precise posterior estimates, as revealed by tighter 95% CIs. In
terms of sensitivity analysis, we observe that moderate changes in the choice of priors do
not affect the parameter estimates (both magnitude and sign) and the model comparison
measures (lpml and waic) greatly.
16 Biometrics,
To illustrate the practical usefulness of Bayesian inference using the SMS model, we focus
on the difference in median responses (PPD and CAL) for two future patients with same
age, BMI and smoking status, but different gender and HbA1c values. The 95% CI estimates
of difference in median PPD (DPPD) and difference in median CAL (DCAL) between patient1
(Female, low HbA1c) and patient2 (Male, high HbA1c) is (-0.5562, -0.0897) and (-0.8076,
-0.2570), respectively. The actual PPD differences are wider than the difference in the
estimated medians, however the interval is still way above 0. Under the PGM model, the 95%
CI estimates of DPPD and DCAL are (-0.6496, -0.0204) and (-1.0301, -0.2071) respectively,
both wider than those from the SMS model, but well above 0. However, under the skew-t
model, the 95% interval estimates for DPPD and DCAL are (-0.5712, 0.1156) and (-0.8983,
0.0890), respectively, both covering 0 and indicating lack of posterior evidence in support
of the difference in the median PPD and median CAL values between patient1and patient2.
Figures 3(a) and Figure 3(b) present the marginal density histogram of residuals for PPD
and CAL responses, respectively, obtained after fitting the SMS model, while Figure 3 (c)
presents the contour plot of the joint bivariate density of these residuals. Evidently, the
marginal, as well as the joint bivariate density of the residuals for PPD and CAL is skewed,
with one outlying residual corresponding to a 53 year old non-smoker female patient with
low BMI but with extremely high PPD and CAL values.
7 Discussion
Our methods development follows the recent joint EU/USA Periodontal Epidemiology
Working Group (Holtfreter et al., 2015) recommendations on modeling the full-mouth average
PPD and CAL, and is different from prior published work (Bandyopadhyay et al., 2010; Reich
and Bandyopadhyay, 2010) in terms of the modeling objectives and clinical endpoints con-
sidered. Separate regression modeling of full-mouth average PPD and CAL is not uncommon
Bayesian semiparametric multivariate skewed models 17
(Jentsch et al., 2016). However, to cast new light on the biology of PD, this paper breaks
new ground via joint modeling of the two responses, instead of analyzing them separately.
In biomedical research, a semiparametric model is often preferred, because they avoid
restrictive and hard to verify parametric assumptions, specially on model features that
are not of direct interest, such as the hj(.) function in our model. A major strength is
the results on posterior consistency that establishes the theoretical validity of our model.
Also, our inferential framework is substantially different from the estimating equations (EE)
approach of Ma et al. (2005). EE based models and existing parametric “single shock” models
simultaneously adjust for the skewness and the within-cluster association via essentially
assuming |Z1i1|, . . . , |Z1im| in (2) be equal to a single univariate cluster-specific skewing shock
variable |Z∗i |. This assumption of a single skewing shock per cluster is sometimes biologically
untenable, and impedes the interpretation of the parameters and the link between skewness
and association. In particular, for Bayesian inference, the consequence of this assumption is a
major impediment for prior specifications on the parameters α and ρ, based on available prior
opinions about the marginal skewness and the within-cluster association. For our multiple-
shock model, the marginal Pearsonian coefficient of skewness for each εij is only a function
of (αj, hj). As a consequence of having independent skewing shocks for each component,
our model in (2) has comparatively moderate within-cluster association in tails, and is more
applicable when the association is not entirely driven by the tail-association. To accommodate
strong tail-association, a possible extension of the model in (2) is to assume a multivariate
distribution for Z1i, allowing dependence among Z1i1, . . . , Z1im. However, such a model is
less parsimonious than our current model and lacks advantages of ease of prior specifications
and computational scalability via standard software. Furthermore, our methods can also
estimate any desired quantile function, and are more appropriate compared to available
quantile regression techniques focusing on effects of covariates on certain quantiles.
18 Biometrics,
Our class of models can incorporate a large and flexible within-subject association including
Toeplitz, AR(1) and even higher-dimensional structures for the association matrix Rρ. Also,
for our model, the adopted association structure (form of Rρ) does not affect the marginal
densities. In addition, our latent variable representation facilitates the MCMC-based com-
putation, and allows the joint density class to be closed under marginalization. The later
property helps to extend our methods straightforwardly to studies where the dimensions mi
and association matrix Ri of the response vectors vary over subjects.
8 Supplementary Materials
Web appendices and computer codes referenced in Sections 6 are available with this article
at the Biometrics website on Wiley Online library and at the GitHub link:
https://github.com/bandyopd/Multivariate-Skewed-models
Acknowledgements
The authors thank the editor, the associate editor and two referees, for their constructive
comments. They also acknowledge NIH grants R01-DE024984 and R03-CA205018 and the
Gustavus & Louise Pfeiffer Research Foundation grant for their support.
References
Azzalini, A. and Capitanio, A. (2003). Distributions generated by perturbation of symmetry
with emphasis on a multivariate skew-t distribution. Journal of the Royal Statistical
Society: Series B (Statistical Methodology) 65, 367–389.
Azzalini, A. and Dalla Valle, A. (1996). The multivariate skew-normal distribution. Biometri-
ka 83, 715–726.
Bandyopadhyay, D., Lachos, V. H., Abanto-Valle, C. A., and Ghosh, P. (2010). Linear
mixed models for skew-normal/independent bivariate responses with an application to
periodontal disease. Statistics in Medicine 29, 2643–2655.
Bandyopadhyay, D., Lachos, V. H., Castro, L. M., and Dey, D. K. (2012). Skew-
Bayesian semiparametric multivariate skewed models 19
normal/independent linear mixed models for censored responses with applications to
Hiv viral loads. Biometrical Journal 54, 405–425.
Chang, S.-C. and Zimmerman, D. L. (2016). Skew-normal antedependence models for skewed
longitudinal data. Biometrika 103, 363–376.
Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2. Wiley,
New York, NY, second edition.
Ferguson, T. (1974). Prior distribution on spaces of probability measures. The Annals of
Statistics 2, 615–629.
Fernandes, J., Wiegand, R., Salinas, C., Grossi, S., Sanders, J., Lopes-Virella, M., and Slate,
E. (2009). Periodontal disease status in Gullah African Americans with type 2 diabetes
living in South Carolina. Journal of Periodontology 80, 1062–1068.
Gelfand, A. E., Dey, D. K., and Chang, H. (1992). Model determination using predictive
distributions with implementation via sampling-based methods. Technical report, DTIC
Document.
Genton, M. G. (2004). Skew-elliptical distributions and Their Applications: A Journey
Beyond Normality. Chapman and Hall/CRC.
Greenstein, G. (1997). Contemporary interpretation of probing depth assessments: diagnostic
and therapeutic implications. A literature review. Journal of Periodontology 68, 1194–
1205.
Holtfreter, B., Albandar, J. M., Dietrich, T., Dye, B. A., Eaton, K. A., Eke, P. I., Papapanou,
P. N., and Kocher, T. (2015). Standards for reporting chronic periodontitis prevalence
and severity in epidemiologic studies: Proposed standards from the Joint EU/USA
Periodontal Epidemiology Working Group. Journal of Clinical Periodontology 42, 407–
412.
Jentsch, H. F., Buchmann, A., Friedrich, A., and Eick, S. (2016). Nonsurgical
20 Biometrics,
therapy of chronic periodontitis with adjunctive systemic azithromycin or amoxi-
cillin/metronidazole. Clinical Oral Investigations 20, 1765–1773.
Johnson-Spruill, I., Hammond, P., Davis, B., McGee, Z., and Louden, D. (2009). Health
of Gullah families in South Carolina with Type-2 diabetes: Diabetes self-management
analysis from Project SuGar. The Diabetes Educator 35, 117–123.
Koenker, R. (2005). Quantile Regression. Number 38 in Econometric Society Monographs.
Cambridge University Press.
Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear
models. Biometrika 73, 13–22.
Ma, Y., Genton, M. G., and Tsiatis, A. A. (2005). Locally efficient semiparametric esti-
mators for generalized skew-elliptical distributions. Journal of the American Statistical
Association 100, 980–989.
Muller, P., Quintana, F. A., Jara, A., and Hanson, T. (2015). Bayesian Nonparametric Data
Analysis. Springer.
Nelsen, R. B. (2007). An Introduction to Copulas. Springer Science & Business Media.
Reich, B. and Bandyopadhyay, D. (2010). A latent factor model for spatial data with
informative missingness. Annals of Applied Statistics 4, 439–459.
Sahu, S. K., Dey, D. K., and Branco, M. D. (2003). A new class of multivariate skew
distributions with applications to Bayesian regression models. Canadian Journal of
Statistics 31, 129–150.
Sethuraman, J. (1994). A Constructive Definition of Dirichlet Priors. Statistica Sinica 4,
639–650.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using
leave-one-out cross-validation and WAIC. Statistics and Computing 27, 1413–1432.
Bayesian semiparametric multivariate skewed models 21
(a) (b)
Figure 1: Comparing contour plots, when α1 = α2 = 10, ρ = −0.7 and σ1 = σ2 = 1.Bivariate parametric Gaussian mixture (PGM) model with marginal densities (left panel);bivariate version of the multivariate skew-normal (MSN) model with marginal densities (rightpanel). (This figure appears in color in the electronic version of this article).
22 Biometrics,
(a) (b)
(c) (d)
(e) (f)
Figure 2: GAAD data: Histograms of the Periodontal Pocket Depth (PPD) responses (panel a) and ClinicalAttachment Level (CAL) responses (panel b); Histogram (panel c) and Q-Q plot (panel d) of the empirical Bayesianestimates of the random effects; Histograms of residuals for PPD (panel e) and CAL (panel f), obtained after fittinglinear mixed models.
Bayesian semiparametric multivariate skewed models 23
(a)
(b)
(c)
Figure
3:
GA
AD
dat
a:H
isto
gram
sof
resi
dual
sfo
rP
PD
(pan
ela)
and
CA
L(p
anel
b)
resp
onse
s,ob
tain
edaf
ter
fitt
ing
the
SM
Sm
odel
.C
onto
ur
plo
tof
the
biv
aria
teden
sity
ofth
ere
sidual
sar
epre
sente
din
pan
elc
24 Biometrics,
Tab
le1:
Sim
ula
tion
resu
lts
bas
edon
500
replica
tes
ofth
edat
aco
mpar
ing
the
Mea
nSquar
edE
rror
(MSE
),Sta
ndar
dE
rror
(SE
),R
elat
ive
Bia
s(R
B),
and
Cov
erag
eP
robab
ilit
y(C
P)
for
asso
ciat
ion
par
amet
erρ12
=0.
7sa
mple
size
isn
=50
.T
rue
par
amet
erva
lues
are
(β0,β1,β2,α1,α2)
=(2
,0.
5,0.
5,2,
2).
SP
GM
PG
MG
EE
β0
β1
β2
α1
α2
β0
β1
β2
α1
α2
β0
β1
β2
100×
MSE
1.70
20.
376
0.43
216
.22
19.9
951.
461
0.35
90.
397
18.0
714
.88
56.8
970.
479
0.54
510×
SE
0.94
80.
616
0.63
62.
361
3.03
60.
766
0.60
00.
627
2.96
62.
776
0.68
70.
632
0.64
0R
B0.
045
0.00
4-0
.025
-0.0
16-0
.016
0.04
70.
005
-0.0
16-0
.153
0.13
40.
375
0.01
9-0
.009
CP
0.94
50.
950.
975
0.97
51
0.95
0.96
0.96
10.
980.
20.
910.
9
Bayesian semiparametric multivariate skewed models 25
Tab
le2:
Pos
teri
ores
tim
ates
ofth
ere
gres
sion
par
amet
ers
and
the
skew
nes
spar
amet
ers
(α1
andα2),
corr
esp
ondin
gto
the
resp
onse
s‘p
erio
don
tal
pock
etdep
th’
(PP
D)
and
‘clinic
alat
tach
men
tle
vel’
(CA
L),
obta
ined
afte
rfitt
ing
the
Sem
ipar
amet
ric
Mult
ivar
iate
Ske
w(S
MS)
resp
onse
model
,par
amet
ric
Gau
ssia
nm
ixtu
re(P
GM
),par
amet
ric
skew
-t(S
T),
and
biv
aria
tenor
mal
(BV
N)
model
s.SD
and
CI
den
ote
the
pos
teri
orst
andar
ddev
iati
onan
d95
%cr
edib
lein
terv
als,
resp
ecti
vely
.
SMS
PGM
ST
BVN
PPD
Mea
nSD
CI
Mea
nSD
CI
Mea
nSD
Mea
nSD
Intercep
t2.2624
0.3439
(1.6026,2.9764)
2.1887
0.3881
(1.4807,2.9230)
2.0865
0.6557
2.1169
0.6298
Age
-0.0062
0.0037
(-0.0137,0.0011)
-0.0058
0.0043
(-0.0137,0.0032)
-0.0069
0.0076
-0.0098
0.0073
Gen
der
-0.2005
0.0918
(-0.3793,-0.0313)
-0.2096
0.1219
(-0.4301,-0.0448)
-0.1458
0.1861
-0.0811
0.1753
BMI
-0.0903
0.0797
(-0.2302,0.0759)
-0.1378
0.1021
(-0.3769,0.0505)
-0.2259
0.1402
-0.1056
0.1645
Smoker
0.0544
0.0925
(-0.1244,0.2343)
0.0746
0.0987
(-0.1270,0.2534)
0.1663
0.1767
0.1485
0.1696
HbA1c
0.0801
0.0797
(-0.0704,0.2431)
0.1331
0.0634
(0.0156,0.2618)
0.2191
0.1546
0.1701
0.1501
α1
0.5862
0.1848
(0.1631,0.9053)
0.6968
0.1176
(0.4709,0.9378)
0.0687
0.0843
––
CAL
Mea
nSD
CI
Mea
nSD
CI
Mea
nSD
Mea
nSD
Intercep
t1.2180
0.4328
(0.3866,2.0272)
0.9896
0.4851
(-0.0245,1.8539)
1.4133
0.6813
1.1401
0.6485
Age
0.0062
0.0042
(-0.0018,0.0148)
0.0081
0.0057
(-0.0025,0.0198)
0.0
050
0.0078
0.0073
0.0075
Gen
der
-0.3339
0.1059
(-0.5560,-0.1528)
-0.3556
0.1592
(-0.6681,-0.0743)
-0.2505
0.1968
-0.3094
0.1872
BMI
-0.0392
0.0993
(-0.2249,0.1658)
-0.1022
0.1229
(-0.3778,0.1188)
-0.1936
0.1619
-0.1218
0.1725
Smoker
0.1495
0.1441
(-0.0510,0.3444)
0.2546
0.1145
(0.0259,0.4605)
0.2835
0.1865
0.3581
0.1813
HbA1c
0.1668
0.0979
(0.0047,0.3791)
0.2392
0.0864
(0.0708,0.3981)
0.3005
0.1623
0.3143
0.1573
α2
1.0500
0.1617
(0.7552,1.4063)
1.1646
0.1333
(0.9216,1.4275)
0.5294
0.0919
––
Web-based Supplementary Materials for
‘Semiparametric Bayesian latent variable regression for skewed multivariate data’
by
Apurva Bhingare, Debajyoti Sinha, Debdeep Pati, Dipankar Bandyopadhyay, and Stuart R. Lipsitz
Web Appendix A: Restrictive parameterization in the Az-
zalini & Dalla Valle’s multivariate skew-normal model
Unlike our semiparametric multivariate skew (SMS) model (equation 2 in the paper), pa-
rameters (α∗(1),Σ∗(1)) in the marginal multivariate skew-normal (MSN) specification (Azzalini and
Dalla Valle, 1996) of ε(1)i are functions of (α(2),Σ22,Σ12); see Theorem 1 of Chang and Zimmer-
man (2016) for details. In particular, the scalar component εij has the univariate skew-normal
distribution (Azzalini, 1985) denoted by SN(0; αj, σj) with density
f(εij) = 2φ1(εij|σj)Φ1(αjεij), (A-1)
where φ1(.|σ) is the N(0, σ2) density, (Σ)jj = σ2j , and the marginal skewness parameter αj is
αj =αj + (1/σ2
j )ΣT.j(−j)α(−j)
1 + αT(−j)Σ(−j,−j)α(−j) − (1/σ2
j )(Σ.j(−j)α(−j))2, (A-2)
where Σ.j is the j-th column of the covariance matrix Σ, R(−j) denotes the reduced vector after
deleting j-th element of the vector R, and Q(−j,−k) denotes the reduced matrix after deleting j-th
row and k-th column of matrix Q. It is obvious from (A-1)-(A-2) that the marginal distribution
of εij, including its mean, variance and skewness, clearly depends on all the components of vector
α and matrix Σ via αj, which is a function of αk and σk, ∀ k = 1, . . . ,m.
1
To illustrate the difference between the marginal distribution from the bivariate PGM model
(parametric subclass of SMS model) and the marginal distribution from the bivariate MSN model
of Azallini & Dalla Valle, we consider α = (α1, α2) and Σ =
σ21 ρσ1σ2
ρσ1σ2 σ22
, common to both
bivariate models. Under the MSN model, the univariate SN marginal density (Azzalini, 1985) is
SN(0; α1, σ1), with the marginal skewness parameter α1 = {α1 + ρα2σ2/σ1}/{1 + α22σ
22(1 − ρ2)},
with ρ = Corr(εi1, εi2) = σ12/{σ1σ2}, whereas, the marginal distribution of εi1 under the PGM
model (parametric case of our model) is SN(0;α1, σ1). It is obvious that the magnitudes of α1 and
α1 are different, and even the signs of α1 and α1 are different whenever α21σ1 + ρα1α2σ2 < 0. For
Bayesian analysis, in particular, this complicates the prior specification for, say, (α1, σ1) based on
available prior opinion about the marginal skewness α1 and variability of PPD. This model also
imposes a restriction that, even when α1 = 0, εi1 is skewed if α2, ρ 6= 0. In general, this model
imposes several restrictions on within cluster association structure.
Web Appendix B: Contour plots of our bivariate PGM(0, α,Σ)
density and the bivariate MSN density
Figure F1 compares the contour plot of our parametric bivariate PGM(0, α,Σ) density (right
panel) with the contour plot of the parametric bivariate MSN(0, α,Σ) density of Chang and Zim-
merman (2016) (left panel), for 4 choices of the common skewness parameter α0 = α1 = α2 =
0, 2, 5, 10, the common (for two components) scale parameter σ1 = σ2 = 1 and association param-
eter ρ = 0.7. Even when the common α0 is moderately large, the contours of these two bivariate
densities are noticeably different. Both bivariate densities become increasingly non-elliptical as α0
increases. Although, the effect of an increasing α0 on the skewness of the marginal density of the
bivariate MSN is not as profoundly obvious as this effect on the skewness of the marginal density
of our PGM. For the bivariate MSN densities, there is even a moderately negative association in
the left tail in spite of the association parameter ρ being as large as 0.7.
2
(a) α0 = 0 (b) α0 = 0
(c) α0 = 2 (d) α0 = 2
(e) α0 = 5 (f) α0 = 5
(g) α0 = 10 (h) α0 = 10
Figure F1: Contour plots of the bivariate parametric Gaussian mixture (PGM) model density (rightpanel) and of the bivariate skew-normal (MSN) densities (left panel) for association parameter ρ = 0.7,common scale parameter σ1 = σ2 = 1, and different values of common skewness α0 = α1 = α2.
3
Web Appendix C: Posterior Consistency
Lemma 1 FKL ⊂ KL(Π) if G0j is defined on support R+ for all j = 1, . . . ,m and G0j is absolutely
continuous with respect to Lebesgue measure and supp(π3(ρ)) = (0, 1).
Proof of Lemma 1
Given a density f0 ∈ F , the idea is to construct a sequence of functions f (M) ∈ C{(F)m ×
(0, 1)} M ≥ 1 such that KL(f0, f(M)) → 0 as M → ∞. Let e = (e1, . . . , em), Aα =
diag(α1(1 + α2
1)−1/2, . . . , αm(1 + α2m)−1/2
)and A∗α = diag
((1 + α2
1)−1/2, . . . , (1 + α2m)−1/2
). We
assume α1 = α2 = . . . = αm = α0, and α0 is known. Define,
f (M)(e) =
∫Rm+
2m|J |m∏j=1
hjM(zj)hjM(e∗j)
φ1
[Φ−1{HjM(e∗j)}
]φm [Φ−1{H1M(e∗1), . . . ,Φ−1{HmM(e∗m)}| RρM
]dz,
where e∗j = (1 + α20)1/2ej − α0zj for j = 1, . . . ,m, HjM is the cdf of hjM , ρM → ρ as M →∞, and
|J | =
∣∣∣∣∣∣ I 0
A∗−1α −A∗−1
α Aα
∣∣∣∣∣∣ =∏m
j=1 |αj| with the marginal density given by,
fjM(ej) =
∫ ∞0
2(1 + α20)1/2hjM(zj)hjM(e∗j)dzj.
We now construct hjM . Suppose FjM(·) denotes the cdf of fjM(·), and h0j is continuous and
symmetric (around 0) density. Clearly, h0j is increasing R− and decreasing on R+. We define
weights as in Wu and Ghosal (2008). Suppose t1 > 0 and t2 > 0 such that h0j(t1) = a1 and
h0j(t2) = b1, where 0 < b1 < 1 and b1 < a1 < h0j(0). For given M , let M1 and M2 be such that
M1
M≤ t1 ≤ M1+1
Mand M2
M≤ t2 ≤ M2+1
M. Set
w∗ji =
iM{h0j
(iM
)− h0j
(i+1M
)}, 1 ≤ i < M1,
M1
M{h0j
(M1
M
)− a1}, i = M1,
M+1M{a1 − h0j
(M+1M
)}, i = M1 + 1,
iM{h0j
(i−1M
)− h0j
(iM
)}, M1 + 1 < i ≤M2,
iM{h0j
(i−1M
)− h0j
(iM
)}, i ≥M1 + 1
4
We define h∗jM(e) =∑∞
1 w∗jiK(e; iM
), where K(e; θ) = 12θ
1(−θ≤e≤θ). By the continuity of
h0j, h∗jM(e) converges to h0j(e) pointwise. However,
∑∞1 w∗ji 6= 1 and h∗jM(e) is not a pdf. We
define wji = w∗ji1−
∑M1−11 w∗ji−
∑∞M2+1 w
∗ji∑M2
M1w∗ji
for M1 ≤ i ≤ M2. Then,∑∞
1 wji = 1. Let hjM(e) =∑∞1 wjiK(e; i
M), where K(e; θ) = 1
2θ1(−θ≤e≤θ). Observe that
h∗jM(e)− hjM(e) =
(∞∑1
w∗ji −∞∑1
wji
)
=
(1−
∑M1−11 wji −
∑∞M2+1wji∑M2
M1w∗ji
− 1
)(M2∑M1
w∗jiM
2i
)
≤
(1−
∑M1−11 wji −
∑∞M2+1wji∑M2
M1w∗ji
− 1
)(M2∑M1
w∗ji
)M
2M1
=
(1−
M1−1∑1
wji −∞∑
M2+1
wji −M2∑M1
w∗ji
)M
2M1
=
(1− 1
M
∞∑1
h0j(i/M)− a1
M
)M
2M1
→ 0
as M → ∞, by definition of Riemann integral. Thus hjM(e) converges to h0j. Let M be large
such that the RHS of the equation is less than a2. Define fjM(e) = 2
b
∫∞0hjM(zj)hjM(
ej−azjb
)dzj,
where a = α√1+α2 and b = 1√
1+α2 . Hence,
| log fjM(ej)| = | log
(2
b
)+ log
∫ ∞0
hjM(zj)hjM(e∗j)|
≤ | log
(2
b
)|+ | log
∫ ∞0
hjM(zj)hjM(e∗j)|
Since hjM → h0j pointwise, by construction of hjM , we have c1h0j(zj) ≤ hjM(zj) ≤ c2h0j(zj),
and c1h0j(e∗j) ≤ hjM(e∗j) ≤ c2h0j(e
∗j). Thus we have
c1
∫ ∞0
h0j(zj)h0j(e∗j) ≤
∫ ∞0
hjM(zj)hjM(e∗j) ≤ c2
∫ ∞0
h0j(zj)h0j(e∗j).
Hence, we also have | log fjM(ej)| <∞. If f0 ∈ FKL, we have∫f0j| log f0j| <∞ for all j. Since
| log h0j(ej)| is h0-integrable, using dominated convergence theorem, we have∫h0j log
h0jhjM→ 0 as
M →∞. Also, | log fjM(ej)| is bounded above by an integrable function and hence∫f0j log
f0jfjM→
0 as M → ∞. Below we show that f (M) → f0 pointwise, and construct an f0 integrable upper
bound of gM = log f0f (M) . Since, fjM → f0j pointwise for j = 1, . . . ,m, by Scheffe’s theorem
5
supe∈R|FjM(e) − F0j(e)| → 0 as M → ∞. φ and Φ being continuous in their arguments, we
conclude that fM → f0 pointwise. Observe that
|gM | ≤ | log f0|+ | log f (M)|
and
| log f (M)| ≤ | log(2m|J |)|+ (C-1)
| log
∫Rm+
m∏j=1
hjM(zj)hjM(e∗j)1
|ΣM |1/2exp
[−1
2H∗
′
M(e∗)(Σ−1M − I)H∗M(e∗)
]dz|.
Let, IM = 1|ΣM |1/2
exp[−1
2H∗
′M(e∗)(Σ−1
M − I)H∗M(e∗)]. Then,
log(IM) = −1
2log |ΣM | −
1
2H∗
′
M(e∗)(Σ−1M − I)H∗M(e∗)
= C1 −1
2(H∗M(e∗)−H∗0 (e∗))
′(Σ−1
M − I)(H∗M(e∗)−H∗0 (e∗))
− 1
2H∗
′
0 (e∗)(Σ−1M − Σ−1
0 )H∗0 (e∗)− 1
2H∗
′
0 (e∗)(Σ−10 − I)H∗0 (e∗) (C-2)
We have H∗M = (H∗1M , . . . , H∗mM) and H∗0 = (H∗01, . . . , H
∗0m), where H∗jM(e∗j) = Φ−1{HjM(e∗j)} and
H∗0j(e∗j) = Φ−1{H0j(e
∗j)} for j = 1, . . . ,m. Using the Taylor series expansion of H∗jM around H∗0j,
we have for ζ ∈ [0, 1]
H∗jM = H∗0j +HjM (e∗j )−H0j(e
∗j )
φ1(H0j(e∗j ))+
φ′(ζ){HjM (e∗j )−H0j(e∗j )}
φ2(ζ)
Hence, |H∗jM − H∗0j| → 0 uniformly in e∗j , and the first term in (C-4) is 0 as M → ∞. As
ρM → ρ0 as M →∞, the m eigenvalues of Σ−1M converge to the m eigenvalues of Σ−1
0 . Therefore,
12H∗
′0 (e∗)(Σ−1
M − Σ−10 )H∗0 (e∗) ≤ maxj |λ0j − λMj|H∗
′0 H
∗0 . Thus, there exists constants C1, C2 > 0
such that
exp{−(C1 + C2| log(I0)|)} ≤ IM ≤ exp{C1 + C2| log(I0)|}.
We know that hjM(·)→ h0j(·) for all j as M →∞. Hence
| log f (M)(e)| ≤ | log(2m|J |)|+ | log Ψ|, with
∫f0| log f (M)(e)| <∞.
By dominated convergence theorem, we conclude that∫f0 log f0
f (M) → 0 as M → ∞ and FKL ⊂
KL(Π).
6
Theorem 1 Suppose (f0, β0) ∈ FKL × Rp. Consider a prior Π = (Π ⊗ Πβ) on FKL × Rp. Then
Π{(fm, β) ∈ U c|Dn} → 0 a.s. under the true data generating distribution, Pf0,β0 that generates
data Dn.
Proof of Theorem 1
Suppose for any two densities g1 and g2, K(g1, g2) =∫R g1(w) log{g1(w)/g2(w)}dw and V (g1, g2) =∫
R g1(w)[log+{g1(w)/g2(w)}]2dw, where log+(u) = max(log(u), 0). Set Ki(f, β) = K(f0i, fβi) and
Vi(f, β) = V (f0i, fβi). The proof of Theorem 1 follows from (Pati and Dunson, 2014) and (Tang
et al., 2015) with minor changes under the condition that there exist test functions {Φn}∞n=1, sets
Θn =Wε(f0)×Θβn, n ≥ 1, and constants C1, C2, c1, c2, c > 0 such that
1.∑∞
n=1E∏ni=1 f0i
Φn <∞
2. sup(f,β)∈Ucn∩Θn E∏ni=1 fβi
(1− Φn) ≤ C1e−c1n
3. Πβ(Θcβn) ≤ C2e
−c2T 2n
4. For all δ > 0 and for almost every data sequence {yi, xi}∞i=1,
Π{(f, β) : Ki(f, β) < δ∀i,∑∞
i=1Vi(f,β)i2
<∞} > 0
To verify the above conditions, we construct sieves Θn =Wε(f0)×Θβn, where Θβn = {β : ‖β‖ <
Tn} for sequences Tn to be chosen later.
Our claim is: Πβ(Θcβn) ≤ 4p√
2πTne−T
2n/2, logN(ε,Θβn, ‖.‖) ∼ o(n)
Θβn = {β : ‖β‖ < Tn}, then logN(ε,Θβn, ‖.‖) ≤ log(2Tnε
). Since β follows Gaussian distribution,
Πβ(Θcβn) ≤ 4p√
2πTne−T
2n/2. Choosing Tn = O(
√n), Πβ(Θc
βn) can be made exponentially small and
logN(ε,Θβn, ‖.‖) ∼ o(n). Hence condition (3) is satisfied.
Condition 1 & 2:
We write U = W1n ∪ W2n, where W1n = Wε(f0)c × {β : ‖β − β0‖ < δ} and {W2n = (f, β) :
‖β − β0‖ > δ}. First, we prove the existence of exponentially consistent sequence of tests for
H0 : (f, β) = (f0, β0) against H0 : (f, β) ∈ W1n ∩Θn Without loss of generality,
Wε(f0) = {f :
∫Φ(y)f(y)dy −
∫Φ(y)f0(y)dy < ε}, (C-3)
7
where 0 ≤ Φ ≤ 1 and Φ is Lipschitz continuous. Hence there exists M > 0 such that |Φ(y1) −
Φ(y2)| < M∑m
j=1 |y1j − y2j|. Set f0i = f0(yi − xTi β0) and Φi(y) = Φ(yi − xT
i β0). Hence, Ef0iΦi =
Ef0Φ.
EfβiΦi(y) =
∫Φifβi(y)
=
∫Φ(y)f(y −
{xTi β − xT
i β0
})dy
≥∫
Φ(y −{xTi β − xT
i β0
})f(y −
{xTi β − xT
i β0
})dy
−∫|Φ(y)− Φ(y −
{xTi β − xT
i β0
})|f(y −
{xTi β − xT
i β0
})dy
=
∫Φ(y −
{xTi β − xT
i β0
})f(y −
{xTi β − xT
i β0
})dy
−∫|Φ(y)− Φ(y − xT
i {β − β0}) |f(y − xTi {β − β0})dy
≥∫
Φ(y)f(y)dy −M ‖xi‖ ‖β − β0‖
≥∫
Φ(y)f(y)dy −M1 ‖β − β0‖.
Hence, EfβiΦi(y) ≥ Ef0Φi(y)+ε−M1δ for any f ∈ U c and for some constant M1 > 0 depending
on M .We complete the proof by choosing the δ sufficiently small and applying the Proposition 3.1
of Amewou-Atisso et al. (2003).
The proof of the existence of consistent sequence of tests for H0 : (f, β) = (f0, β0) against
H0 : (f, β) ∈ W2n ∩ Θn follows from the Proposition 3.1 of Amewou-Atisso et al. (2003) (when
applied to ‖β − β0‖ > δ).
Condition 4:
Note that V (f0, f(M)) ≤
∫| log f (M) − log f0|2f0 is finite as | log f (M)|2 | log f (M)| are bounded by
integrable functions. Hence condition 4 trivially follows from the Lemma1
8
Web Appendix D: Additional simulation studies
Simulation 2
For this simulation study, we compare the performance of the regression estimates based on our
SMS and PGM models with the regression estimates based on the competing parametric skew-
t (ST) model. Unlike the simulation study-1, we compare these competing methods when the
true error densities do not follow the SMS model assumption of equation (2). For this purpose,
we simulate N = 100 replications of data sets of bivariate responses for sample size n = 200
and another 100 replications for n = 400. For each replication of dataset, we first generate
µij = β0j + β1jxij1 + β2jxij2 for i = 1, . . . , n and j = 1, 2 with true parameters (β11, β21, β12, β22)
= (0.5, 1, 0.5, 1) and random covariates xij1 ∼ Ber(0.5) and xij2 ∼ Half-Normal(0, 1). Then we
simulate the bivariate response Yi = (Yi1, Yi2) with marginal densities Yi1 ∼ N(µi1, 1) and Yi2 ∼
Exp( 1µi2
) using inverse probability transformation while incorporating the within-pair association
via a Gaussian copula with correlation ρ = 0.7. The resulting marginal regression model is
Yij = β0j + xij1β1j + xij2β2j + εij for i = 1, . . . , n and j = 1, 2.
The comparison of three competing methods of estimation are based on the average (based on
N = 100 replications) mean-squared error (MSE), estimated standard error (SE), percentage of
relative bias (RB%), and approximate coverage probability (CP) of the 95% interval estimates of
the regression parameters (β11, β21; β12, β22). The summary of the results is given in Table T1.
For each method, the MSE, SE as well as the RB of the parameter estimates for n = 400
are lower than corresponding measures for the estimates with sample-size n = 50, 200. Overall,
the SMS method yields better CP for each 95% interval-estimate, compared to the parameteric
methods– PGM and ST. This is not surprising because the estimates obtained from methods
using wrong parametric assumptions are expected to be less robust than the estimates obtained
from semiparametric methods. However, it is important to note that even the estimates of the
regression parameters of Y1 from parametric joint models are performing worse than the corre-
sponding estimates from semiparametric joint models, even when the parametric assumptions are
only erroneous for Y2.
Under this simulation setting, the true distributional form of the component Y2 is not from
the SMS class in equations (2)-(4) of the paper. So, we report the summary results only for α1 in
the last rows of Table T1. Under this simulation model, the true value of the skewness parameter
9
Tab
leT
1:F
orsa
mple
size
sn
=20
0an
d40
0,th
esi
mula
tion
resu
lts
toco
mpar
eth
eap
pro
xim
ate
(bas
edon
100
replica
tion
s)M
ean
Squar
edE
rror
(MSE
),av
erag
eSta
ndar
dE
rror
(SE
),p
erce
nta
geof
Rel
ativ
eB
ias
(RB
%),
and
Cov
erag
eP
robab
ilit
y(C
P)
of95
%in
terv
ales
tim
ates
from
thre
eco
mp
etin
gm
ethods:
SM
S,
PG
Man
dST
.T
he
dat
aset
sar
esi
mula
ted
usi
ng
Gau
ssia
nco
pula
wit
has
soci
atio
nρ
12
=0.
7,tr
ue
par
amet
erva
lues
(β11,β
21,β
12,β
22)
=(0
.5,1
,0.5
,1),
and
univ
aria
teG
auss
ian
asth
em
argi
nal
den
sity
ofY
1
and
the
exp
onen
tial
den
sity
asth
em
argi
nal
den
sity
ofY
2.
SM
SP
GM
ST
n=
50M
ean
MS
ES
ER
B%
CP
Mea
nM
SE
SE
RB
%C
PM
ean
MS
ES
ER
B%
CP
β11
0.52
280.
0418
0.22
212.1
0.9
60.5
106
0.0
372
0.2
246
2.1
0.9
80.5
114
0.0
464
0.2
273
2.3
0.9
6β21
1.05
190.
0362
0.19
634.8
0.9
51.0
172
0.0
283
0.1
932
1.7
0.9
81.0
023
0.0
324
0.1
935
2.1
0.9
5β12
0.40
700.
1083
0.30
879.6
0.9
10.4
349
0.0
924
0.3
372
12.6
0.9
40.4
342
0.0
894
0.3
233
13.2
0.9
4β22
0.92
470.
0963
0.28
087.5
0.9
40.8
616
0.0
943
0.2
924
13.8
0.9
30.8
574
0.0
933
0.2
864
14.3
0.9
4α1
0.25
070.
1899
0.51
62–
0.9
70.4
398
0.2
863
0.5
604
–0.7
00.5
029
0.3
932
0.5
322
–0.8
3
n=
200
Mea
nM
SE
SE
RB
%C
PM
ean
MS
ES
ER
B%
CP
Mea
nM
SE
SE
RB
%C
P
β11
0.51
850.
0105
0.10
503.7
0.9
40.5
191
0.0
103
0.1
075
3.8
0.9
70.5
182
0.0
111
0.1
074
3.6
0.9
7β21
1.00
970.
0069
0.08
92<
10.9
61.0
012
0.0
064
0.0
900
<1
0.9
60.9
996
0.0
065
0.0
892
<1
0.9
8β12
0.42
550.
0392
0.15
0515
0.9
10.4
360
0.0
274
0.1
580
13
0.9
70.4
383
0.0
284
0.1
576
13
0.9
1β22
0.88
200.
0393
0.13
1714
0.9
60.8
850
0.0
350
0.1
344
12
0.8
10.8
919
0.0
354
0.1
360
11
0.7
9α1
0.19
210.
1250
0.37
45–
0.9
30.4
111
0.2
119
0.3
699
–0.8
40.4
905
0.3
115
0.3
309
–0.8
4
n=
400
Mea
nM
SE
SE
RB
%C
PM
ean
MS
ES
ER
B%
CP
Mea
nM
SE
SE
RB
%C
P
β11
0.49
900.
0061
0.07
52<
10.9
70.5
001
0.0
059
0.0
755
<1
0.9
60.5
012
0.0
060
0.0
758
<1
0.9
5β21
0.99
450.
0041
0.06
34<
10.9
20.9
919
0.0
052
0.0
635
<1
0.9
10.9
935
0.0
051
0.0
633
<1
0.7
8β12
0.43
280.
0208
0.10
6312
0.9
10.4
385
0.0
186
0.1
132
12
0.9
00.4
398
0.0
176
0.1
118
12
0.9
0β22
0.85
180.
0323
0.09
2414
0.6
50.8
604
0.0
318
0.0
957
14
0.6
50.8
639
0.0
298
0.0
968
14
0.6
6α1
0.19
450.
1156
0.30
15–
0.9
20.4
523
0.2
356
0.2
956
–0.7
00.5
173
0.3
248
0.2
603
–0.6
5
10
α1 corresponding to the component Y1 is 0 (with Gaussian marginal density). Under the SMS
model, the average (based on 100 replications) estimated value of α1 (0.19 for n = 200, 400 and
0.25 for n = 50) is substantially smaller than the average estimates from the methods based on
parametric models PGM and ST. Even the approximate CP (0.93 and 0.92 respectively for n = 200
and n = 400 and 0.97 for n = 50) of the interval estimates of α1 are better than corresponding
CP of interval estimates from parametric methods. Moreover, these coverage probabilities from
parametric methods seem to get substantially worse for increasing the sample size n!
Under this simulation model, the marginal density of the response component Y2 is exponential
with Pearsonian skewness = 2. Under the SMS model, the average estimated value for the Pear-
sonian skewness parameter for the residuals is 1.88, 1.93 and 1.98 respectively for n = 50,n = 200
and n = 400. This suggests that our SMS model captures the true Pearsonian skewness of the
response even when the true marginal density is different from that of the SMS model. This is a
valuable practical advantage for analysis of many real life biomedical studies, when the goal is to
estimate the covariate effects as well as to account for the level of skewness.
Simulation 3
The goal of this small scale simulation study is to compare the performance of the regression
estimates obtained from our SMS and PGM models to the regression estimates obtained from
the generalized estimating equations (GEE), the popular analysis tool for repeated measures. We
want to make this comparison even when the true distribution of the errors is not a member of the
class of SMS models in equations (2)-(4). For this purpose, we simulate N = 100 replications of
datasets of sample-size n = 50, 200 from a bivariate repeated measures model with true marginal
regression model Yij = β0 + xij1β1 + xij2β2 + εij for i = 1, . . . , n and j = 1, 2 (same regression
parameters (β1, β2) for both components Yi1, Yi2). The bivariate joint density of error (εi1, εi2)
is same as in the Simulation 2 with Gaussian marginal density for Y1 and exponential marginal
density for Y2. The summary of the results is given in Table T2.
Given that the estimates of the skewness parameters α = (α1, α2) are not estimable under
GEE, we do not report them in the summary. Also, the intercepts of SMS and PGM models are
not comparable to intercept β0 under GEE. Hence, following Simulation 2, we report the MSE, SE,
RB% and CP summaries (based on 100 replicates) for only the regression parameters β1 and β2.
11
We observe that the GEE produces slightly biased regression estimates compared to the estimates
obtained using SMS and PGM models. Compared to the GEE estimates, the SMS and PGM
models yield smaller MSE (at least 30% reduction), average SE (at least 20% reduction) as well
as smaller RB% even when the sample size is as small as n = 50. Thus, even when the true error
model is not same as the assumptions of the SMS model, Bayesian estimates of the regression
parameters derived from our semiparametric SMS model are comparable to the estimates from
parametric model, and more precise than those obtained via the GEE method.
12
Tab
leT
2:B
ased
on10
0re
plica
tes
ofth
edat
a-se
tsof
sam
ple
sizen
=50,2
00,
the
sim
ula
tion
resu
lts
com
par
ing
the
appro
xim
ate
Mea
nSquar
edE
rror
(MSE
),Sta
ndar
dE
rror
(SE
),R
elat
ive
Bia
s(R
B),
and
Cov
erag
eP
robab
ilit
y(C
P)
ofre
gres
sion
esti
mat
esfr
omth
ree
com
pet
ing
met
hods
(SM
S,
par
amet
ric
PG
Man
dG
EE
).T
he
true
sim
ula
tion
model
ofbiv
aria
tere
pea
ted
mea
sure
sis
Gau
ssia
nco
pula
wit
has
soci
atio
npar
amet
erρ
12
=0.
7,tr
ue
regr
essi
onpar
amet
ers
(β1,β
2)
=(0
.5,1
),an
dw
ith
Gau
ssan
and
exp
onen
tial
mar
ginal
den
siti
es.
SM
SP
GM
GE
E
n=
50M
ean
MS
ES
ER
B%
CP
Mea
nM
SE
SE
RB
%C
PM
ean
MS
ES
ER
B%
CP
β1
0.48
170.
0307
0.18
553.6
0.9
50.4
935
0.0
286
0.1
900
1.4
0.9
40.4
734
0.0
509
0.2
35.3
0.9
4β2
0.96
460.
0229
0.15
943.5
0.9
70.9
898
0.0
190
0.1
616
10.9
60.9
894
0.0
370
0.2
053
1.1
0.9
5
n=
200
Mea
nM
SE
SE
RB
%C
PM
ean
MS
ES
ER
B%
CP
Mea
nM
SE
SE
RB
%C
P
β1
0.50
150.
0096
0.08
89<
10.9
60.5
102
0.0
082
0.0
911
10.9
60.5
177
0.0
139
0.1
148
3.5
0.9
5β2
0.99
120.
0069
0.07
47<
10.9
41.0
045
0.0
061
0.0
761
<1
0.9
41.0
194
0.0
116
0.1
034
1.9
0.9
4
13
Web Appendix E: Computer codes
R and JAGS codes for fitting the SMS and PGM models to simulated data are available in the
GitHub link:
https://github.com/bandyopd/Multivariate-Skewed-models
14
References
Amewou-Atisso, M., Ghosal, S., Ghosh, J. K., Ramamoorthi, R., et al. (2003). Posterior consis-
tency for semi-parametric regression problems. Bernoulli 9, 291–312.
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal
of Statistics 12, 171–178.
Azzalini, A. and Dalla Valle, A. (1996). The multivariate skew-normal distribution. Biometrika
83, 715–726.
Chang, S.-C. and Zimmerman, D. L. (2016). Skew-normal antedependence models for skewed
longitudinal data. Biometrika 103, 363–376.
Pati, D. and Dunson, D. B. (2014). Bayesian nonparametric regression with varying residual
density. Annals of the Institute of Statistical Mathematics 66, 1–31.
Tang, Y., Sinha, D., Pati, D., Lipsitz, S., and Lipshultz, S. (2015). Bayesian partial linear model
for skewed longitudinal data. Biostatistics (Oxford, England) 16, 441–453.
Wu, Y. and Ghosal, S. (2008). Kullback Leibler property of kernel mixture priors in Bayesian
density estimation. Electronic Journal of Statistics 2, 298–331.
15