BIEMS: A Fortran 90 Program for Calculating Bayes Factors ... · between the model parameters, such...

JSS Journal of Statistical SoftwareJanuary 2012, Volume 46, Issue 2. http://www.jstatsoft.org/

BIEMS: A Fortran 90 Program for Calculating Bayes

Factors for Inequality and Equality Constrained

Models

Joris MulderTilburg University

Herbert HoijtinkUtrecht University

Christiaan de LeeuwUtrecht University

Abstract

This paper discusses a Fortran 90 program referred to as BIEMS (Bayesian inequalityand equality constrained model selection) that can be used for calculating Bayes factorsof multivariate normal linear models with equality and/or inequality constraints betweenthe model parameters versus a model containing no constraints, which is referred to asthe unconstrained model. The prior that is used under the unconstrained model is theconjugate expected-constrained posterior prior and the prior under the constrained modelis proportional to the unconstrained prior truncated in the constrained space. This re-sults in Bayes factors that appropriately balance between model fit and complexity fora broad class of constrained models. When the set of equality and/or inequality con-straints in the model represents a hypothesis that applied researchers have in, for instance,(M)AN(C)OVA, (multivariate) regression, or repeated measurements, the obtained Bayesfactor can be used to determine how much evidence is provided by the data in favor ofthe hypothesis in comparison to the unconstrained model. If several hypotheses are underinvestigation, the Bayes factors between the constrained models can be calculated usingthe obtained Bayes factors from BIEMS. Furthermore, posterior model probabilities ofconstrained models are provided which allows the user to compare the models directlywith each other.

Keywords: Bayes factors, equality and inequality constrained models, Fortran 90, Gibbs sam-pler.

1. Introduction

This paper discusses a program that can be used to determine which model (or hypothesis)receives most evidence from the data given a set of models that contain equality constraintsand/or inequality constraints on the model parameters. This will be done using the Bayes

http://www.jstatsoft.org/

2 BIEMS: Bayes Factors for Inequality and Equality Constrained Models

factor, a Bayesian model selection criterion. The program BIEMS (Bayesian inequality andequality constrained model selection) is written in Fortran 90 using IMSL subroutines (VisualNumerics 2003). A user friendly interface has been developed for practitioners who are notfamiliar with the technical details of statistical modeling. The interface allows one to simplyspecify equality and inequality constraints on the parameters of interest. From the websitehttp://tinyurl.com/informativehypotheses, it is possible to download a compiled versionof the program, including the interface. Source codes are available along with this manuscript.

The program is designed for applied researchers, e.g., social or medical, who have specificexpectations that can be formulated in terms of equality and inequality constraints betweenthe model parameters. Hoijtink et al. (2008) refer to such expectations as informative hy-potheses in contrast to hypotheses that do not contain any information about the relationshipbetween the model parameters, such as the standard null and alternative in classical hypoth-esis testing. For references about the use of Bayes factors to test scientific theories, see Kassand Raftery (1995) and the references therein. The methodology that is used in BIEMS isbased on Mulder et al. (2010) and Mulder et al. (2009). The first reference contains thetechnical details of the method, such as, prior specification. The latter focusses on repeatedmeasurements and is more accessible for applied researchers.

In Section 2.1, the multivariate normal linear model is discussed. Section 2.2 describes howtime-varying covariates can be included in the model. The formulation of model constraintsis addressed in Section 2.3. Section 3 describes the specification of the unconstrained defaultprior which is the conjugate expected-constrained posterior prior (CECPP). In Section 3.1,two basic properties of this prior are discussed. Section 3.2 describes how linear restrictions arespecified to obtain so-called constrained posterior priors. Subsequently, the hyper-parametersof the constrained posterior priors are calculated in Section 3.3 and the construction of theCECPP is given in Section 3.4 including the corresponding posterior distribution. In Sec-tion 3.5, it is discussed how minimal training samples are sampled which are used to con-struct the constrained posterior priors. In Section 4.1, it is described how the Bayes factor isestimated using MCMC (Markov chain Monte Carlo) sampling for an inequality constrainedmodel versus the unconstrained model. Section 4.2 illustrates how the estimation error of theBayes factor is determined. In Section 4.3, the procedure is discussed for estimating Bayesfactors of models containing equality constraints versus the unconstrained model. Section 5provides several short examples of model implementations for analysis of variance (ANOVA),linear regression, multivariate analysis of variance (MANOVA), and repeated measures. InSection 6, a short conclusion is given.

2. Constrained model specification

2.1. The multivariate normal linear model

The multivariate normal linear model can be written as

yi = M>di + A>xi + εi. (1)

In this model, a P -variate dependent variable yi, i = 1, . . . , N , is explained by a vector di of

http://tinyurl.com/informativehypotheses

Journal of Statistical Software 3

size J containing group membership, i.e.,

dij =

{1 if the ith observation belongs to group j0 otherwise,

a K-variate explanatory variable xi, and εi a multivariate Gaussian error with εi ∼ N (0,Σ).Furthermore, M is a J × P matrix with group means where the (j, p)th element µjp denotesthe mean (or intercept) of the pth dependent variable for the jth group and A is a K × Pmatrix with regression coefficients where the (k, p)th element αkp denotes the coefficient ofthe kth covariate for the pth dependent variable.

For N observations, the model can be written as

Y = XB + E, (2)

where

Y =

y>1...

y>N

, X =

d>1 x>1...

...

d>N x>N

and B =

[MA

](3)

with

M =

µ11 · · · µ1P...

. . ....

µJ1 · · · µJP

, and A =

α11 · · · α1P...

. . ....

αK1 · · · αKP

. (4)

The likelihood of the multivariate normal linear model (2) is given by

g(Y|X,B,Σ) ∝ |Σ|−N/2 exp{−1

2tr Σ−1(Y−XB)>(Y−XB)}

= exp{−1

2(β − β)>[Σ⊗ (X>X)−1]−1(β − β)} ·

·|Σ|−N/2 exp{−1

2tr Σ−1S}

∝ Nβ|Σ(β,Σ⊗ (X>X)−1)inv-WΣ(S, N), (5)

where β is a vector of length D = P (J+K), which is the vectorization of B, i.e., vec(B) = β,β is the maximum likelihood estimator of β, with

B = (X>X)−1X>Y,

and

S = (Y−XB)>(Y−XB). (6)

The least squares estimate of Σ, which is provided in the output of BIEMS, equals Σ = S/(N−J −K). Expression (5) shows that the likelihood is proportional to a normal-inverse-Wishartdistribution. For this reason, a multivariate normal-inverse Wishart prior for {β,Σ} is usedin BIEMS which is generalized natural conjugate (Press 2005, p. 253–256), i.e., π0(β,Σ) =π0(β)π0(Σ). The unconstrained prior is discussed in Section 3.


2.2. Modeling time-varying covariates

When modeling repeated measurements using the multivariate normal linear model, time-varying covariates can be included in (2) according to

X =

d>1 x>1 w>11 . . . w>1U...

......

. . ....

d>N x>N w>N1 . . . w>NU

and B =

MAG1...

GU

, (7)

with wiu is a vector of length P containing the scores on the uth time-varying covariates ofthe ith observation, Gu = diag(γu1, . . . , γuP ), where γup is the regression parameter of theuth time-varying covariate measured at the pth time point, for u,= 1, . . . , U . The matricesGu are diagonal to ensure that the uth time-varying covariate measured at time point p isnot regressed on the outcome variable measured at time point p> 6= p. In the vectorizationof B, i.e., β, which is important when specifying the (in)equality constraints of the model in(10), the zeros are omitted (see also Section 5.5). Note here that it is also possible to modelmore than one dependent variable as repeated measurement.

When time-varying covariates are present, the likelihood of the parameters in the structuredparameter matrix B given Σ can be obtained as follows. First, an unstructured parametermatrix is substituted for B, which is denoted by

B∗ =

MAG∗1...

G∗U

,

where G∗u, for u = 1, . . . , U , are unstructured parameter matrices which replace the diagonalmatrices Gu, for u = 1, . . . , U , B in (7). The vector with all off-diagonal parameters of thematrices G∗u will be denoted by γ∗ which contains 1

2UP (P − 1) elements. Therefore, the

vectorization of B∗ is a permutation of the parameter vector (β>,γ∗>)>, where β is thevectorization of B in (7) with zeros omitted. The likelihood of the unstructured parametermatrix B∗ and Σ is given in (5), and therefore, the likelihood of β,γ∗|Σ is multivariatenormal. Subsequently, the likelihood of β|Σ,γ∗ = 0 is also multivariate normal and given by

N (β − cov(γ∗,β)var(γ∗)−1γ∗, var(β) + cov(γ∗,β)var(γ∗)−1cov(β,γ∗)),

where β and γ∗ are the maximum likelihood estimates, and cov(γ∗,β), cov(β,γ∗), var(β), andvar(γ∗) are the appropriate submatrices of the covariance matrix Σ⊗(X>X)−1. Subsequently,the conditional posterior of β|Σ,γ∗ = 0 can be obtained by combining this likelihood withthe multivariate normal prior of β which is discussed in the next section.

An example of a model including a time-varying covariate is given in Section 5.5. More detailsabout modeling time-varying covariates using the multivariate normal linear model can befound in Mulder et al. (2009).


2.3. Equality and inequality constrained multivariate normal linear models

The system of Lt inequality constraints on the parameters β of the inequality constrainedmodel Mt can be written as

Rtβ > rt, (8)

where Rt is a Lt×D matrix containing inequality coefficients and rt is a vector of length Lt ofconstants. The augmented matrix [Rt|rt] should be implemented in the fileinequality_constraints.txt in the stand-alone version of BIEMS (Appendix B). Examplesare provided in Section 5.

For an equality constrained model Mt, the system of equalities is given by

Qtβ = qt, (9)

where Qt is a Ht × D matrix containing Ht equality coefficients and qt is a vector oflength Ht containing constants. The augmented matrix [Qt|qt] should be given in the fileequality_constraints.txt (Appendix B).

When combining (8) and (9), we obtain a constrained model with inequalities and equalities.The system of constraints of model Mt will be denoted by[

Rt

Qt

]β ≥t

[rtqt

]⇔ Rtβ ≥t rt, (10)

where ≥t denotes that each constraint of this system is either a “>” or a “=”, depending onthe model Mt.

3. Default prior specification

In order to compute Bayes factors, priors distributions must be specified for β and Σ. Becauseall constrained models are nested in the unconstrained model M0 : β ∈ RD, one only needsto specify a prior under the unconstrained model and use truncations of this prior under theconstrained models, i.e., πt(β,Σ) = c−1t π0(β,Σ)1Mt(β) with ct =

∫∫β∈Mt

π0(β,Σ)∂β∂Σ,where π0 denotes the prior under M0 and πt denotes the prior under Mt. This prior choicehas two important advantages. First, only a single prior π0 needs to be specified. Second,the Bayes factor Bt0 of a constrained model against the unconstrained model can be obtainedwithout computing the marginal likelihoods of the models. This is discussed in the Section 4.In this section, it will be described how a default prior under the unconstrained model canbe obtained which results in Bayes factors that are effective for evaluating models containingequality and inequality constraints on the parameters.

3.1. Properties of the unconstrained prior

In Mulder et al. (2010) and Mulder et al. (2009), a prior specification was proposed thatresulted in Bayes factors with an appropriate Occam’s razor for testing equality and inequalityconstrained models due to an effective balance between model fit and model complexity. Theimplementation of this method in BIEMS is discussed in this section.

The CECPP is used as unconstrained prior for β (Mulder et al. 2009). The CECPP is basedon constrained posterior priors, which are obtained as follows. For the covariance matrix Σ,


Jeffreys’ prior is used, i.e., π0(Σ) ∝ |Σ|−(P+1)/2. This noninformative improper prior can beused because of arguments of orthogonal transformations of scale and location parameters(Jeffreys 1961).

The CECPP of β has the following three key properties.

1. It is based on minimal training samples which are subsets of the data that containminimal information to obtain a proper posterior prior (e.g., Berger and Pericchi 1996).This property is essential for testing models with equality constraints on the modelparameters. The Bartlett paradox (Bartlett 1957) is then avoided.

2. It is centered on the boundary of the constrained parameter space under investigation.The boundary of the constrained space under investigation is defined as

B = {β|R0β = r0} with R0 =

R1...

RT

and r0 =

r1...

rT

. (11)

This property is essential for testing models containing inequality constraints. Notethat B must be nonempty which implies that the models under investigation must becomparable (Mulder et al. 2010). For instance, M1 : β = 0 and M2 : β > 1 arenot comparable but M1 : β = 0 and M3 : β > 0 are comparable. In this case, theCECPP of β is centered at β = 0. Note that B = {0} in this case. In the interface ofBIEMS (Appendix A), it is automatically checked whether the constrained models arecomparable when using the default prior specification setting.

3. The parameters that are compared with each other in the constrained models haveequal variances in the CECPP. This is essential for incorporating the complexity ofinequality constrained models with unequal weights on the parameters of interest, e.g.,M1 : β1 > 2β2 > 0. This will be elaborated in Section 3.3.

In the remaining part of this section, it will be discussed how the CECPP and the corre-sponding posterior distribution is obtained.

3.2. Linear restrictions for constrained posterior priors

The CECPP can be seen as an average over all constrained posterior priors (CPPs). In con-trast to standard posterior priors which are obtained by updating a noninformative improperprior with a minimal training sample, CPPs are obtained by updating a noninformative im-proper prior with a minimal training sample under certain linear restrictions. The linearrestrictions ensure that the constrained posterior priors are located on the boundary of theconstrained parameter space. Note that standard posterior priors are located around the like-lihood, and therefore, generally will not satisfy the second property in Section 3.1. Mulderet al. (2010) showed that the resulting intrinsic Bayes factors may not be effective for testinginequality constrained models. In this section, it will be discussed how the linear restrictionsare specified to obtain constrained posterior priors that satisfy the second property.

Two sets of linear restrictions will be constructed: One for the CPP means, say [R0,M |r0,M ],and one for the CPP variances, say [R0,V |r0,V ]. The reason that different sets of restrictionsare needed can for instance be seen when testing M1 : β = 0 versus M0 : β ∈ R1. The


restriction for the CPP mean will be β = 0 to ensure that the CPP mean of β equals 0 (tosatisfy the second property), while for the CPP variance of β no restriction in needed. Notethat if β = 0 would be a restriction for the CPP variance, the CPP variance of β wouldbecome equal to zero which is undesirable. On the hand, when testing M2 : β1 > β2 > β3versus M0 : β ∈ R3, the restriction for the CPP means and CPP variances are identical,namely, [

1 −1 00 1 −1

]β = 0, (12)

so that the CPP means are identical and the CPP variances are identical for β1, β2, and β3,for every randomly selected minimal training sample.

The restrictions for the CPP means, [R0,M |r0,M ], are identical to the row echelon form of[R0|r0] where the zero rows are omitted. This ensures that the CPP means are located onthe boundary B. The use of the row echelon form with zero rows omitted is important forobtaining the hyperparameter of CPPs which is discussed in the following section.

The restrictions for the CPP variances are obtained from the row echelon form of [R∗0|0]where the zero rows are removed. Hence, the vector of constants does not play a role whenrestricting the CPP variances, i.e., r0,V = 0. The matrix R∗0 is obtained from R0 as follows.

1. Rows of R0 that are permutations of (wh,1, 0, . . . , 0), with nonzero constants wh,1 ∈ R1,

are removed in R∗0. This is important for avoiding CPP variances of zero.

2. Rows of R0 that are permutations of (wh,1, wh,2, 0, . . . , 0), with nonzero constants wh,1,wh,2 ∈ R1 where h is the row index, are replaced by the same permutations of the form(sign(wh,1), sign(wh,2), 0, . . . , 0). These restrictions ensure that the CPP variances of β’sthat are compared with each other have equal variances.

3. Rows of R0 that are permutations of (wh,1, . . . , wh,V , 0, . . . , 0) with V > 2, i.e., rows thatcontain more than two nonzero elements, are replaced by a set of V −1 rows that ensurethat the corresponding β’s have equal variances. For instance, M1 : β2 − β1 < β3 − β2results in a CPP mean restriction of [1 − 2 1|0] and CPP variance restrictions given by(12).

If time-varying covariates are also incorporated in the model, additional restrictions of theform γ∗ = 0 are automatically added to the mean restrictions [R0,M |r0,M ] and variance re-strictions [R0,V |0]. The mean and variance restrictions that were used to obtain the CPPscan be found at “View/Edit Default Prior” in the interface. Via “Specify restrictions

manually”, a user can also specify the linear restrictions for β manually. However, we recom-mend to use the default setting as specified above.

In the stand-alone version of BIEMS, the restriction matrix for the CPP means can be man-ually specified in the file restriction_matrix.txt. The restrictions for the CPP variancesare based on the rules stated above. By default, the system of restrictions is equal to the rowechelon form of the system of constraints of the model Mt, i.e., [Rt|rt]. Note that the samerestrictions must be used when computing Bt0. This is automatically done when using theBIEMS interface.


3.3. Hyper-parameters of the constrained posterior priors

Based on the linear restrictions for the means and the linear restrictions for the variances,CPPs are obtained from randomly selected minimal training samples, [Yl|Xl] where l is thetraining sample index, using restricted regression (Judge et al. 1980). The CPP of β ismultivariate normally distributed,

πC0 (β|Yl) = Nβ(βR

l , ΨRl ), (13)

where βR

l is the restricted maximum likelihood (RML) estimate that is obtained from the

minimal training sample and the restrictions [R0,M |r0,M ] and ΨRl is the diagonal covariance

matrix that is obtained from the minimal training sample and the linear restrictions [R0,V |0].

The RML estimate of β maximizes the likelihood of [Yl|Xl] under the system of restrictionsR0,Mβ = r0,M , which yields

βR

l = βl + (X>l Xl)

−1R>0,M [R0,M (X>l Xl)

−1R>0,M ]−1(r0,M −R0,M βl), (14)

where Xl = IP ⊗ Xl and βl is the unrestricted ML estimate of β given [Yl|Xl]. This is

explained in more detail in Mulder et al. (2010). Because R0,M βR

l = r0,M holds, for alltraining samples, all CPPs are centered on the boundary under investigation B.

Based on the minimal training sample and the restrictions R0,V β = 0, restricted regressionis used to obtain the covariance matrix, which is given by

ΨRl =

1

N∗lΨRl

[(Φ∗l [X

∗>l X∗l ]

−1(Φ∗l )>)diag

]−1, (15)

with

ΨRl =

[Φl(X

>l Xl)

−1X>l

{[ΣRl

]diag⊗ INl

}Xl(X

>l Xl)

−1Φ>l

]diag

ΣRl =

1

Nl

(Yl −XlB

Rl

)> (Yl −XlB

Rl

)Φl = ID − (X>l Xl)

−1R>0,V [R0,V (X>l Xl)−1R>0,V ]−1R0,V (16)

Φ∗l = ID − [X∗>l X∗l ]

−1R>0,V [R0,V [X∗>l X∗l ]

−1R>0,V ]−1R0,V

Xl = IP ⊗Xl

X∗l = IP ⊗

d>1 1>K+PU...

...

d>N 1>K+PU

,where 1K+PU is a vector of length K + PU with ones, ID is an identity matrix of size

D = P (J + K + PU), ΣRl is the RML estimate of the error covariance matrix Σ, and

vec(BRl ) = β

R

l . See Mulder et al. (2010) for a thorough explanation of (15).

In (15), ΨRl is the covariance matrix that is obtained when using restricted regression under

the assumption of a diagonal covariance matrix Σ and independent variables in β in theconstrained posterior prior. Because the training sample [Yl|Xl] of size Nl = J + K + PU


β1

β2

β1

2β =2

π3

π2

π1

Figure 1: Contours of different priors: π1 represents a uniform prior for (β1, β2), π2 represents

identical normal priors for (β1, β2), and π3 represents normal priors with var(β1) = 4var(β2).The complexity of the inequality constrained model M1 : β1 > 2β2 > 0 corresponds to therelative size of the grey areas. Hence, c11 = 0.0625, c21 = 0.0738, and c31 = 0.125.

is minimal under the unrestricted model, and not under the restricted model, the variances

are rescaled to a single observation by multiplying ΨRl with

[(Φ∗l [X

∗>l X∗l ]

−1(Φ∗l )>)diag

]−1.

Subsequently, when multiplying this with 1/N∗l , the variances are rescaled according to atraining sample size of size N∗l .

By default, the value of N∗l is chosen equal to the minimal training sample size under therestricted model, which is equal to the number of free parameters under the restricted model.This is equal to the number of free parameters in the unrestricted model D minus the numberof pivots in [R0,M |r0,M ], plus the number of diagonal elements of Σ, which equals P . The usercan adjust the scale N∗l to construct more or less informative constrained posterior priors.This can be changed in the interface under “Scale of prior variance”.

In the CPP covariance matrix ΨRl , the variances of parameters are equal which are compared

with each other in the constrained models under investigation, i.e., var(βd1) = var(βd2) underthe restriction v1βd1 = v2βd2 for ∀v1, v2 6= 0. Note that this differs from the CPP varianceproposed by Mulder et al. (2010) where v21var(βd1) = v22var(βd2) under the same restriction.To justify this choice, we consider an example of an inequality constrained model with non-equal weights on the parameters of interest, e.g.,M1 : β1 > 2β2 > 0. In Mulder et al. (2010),the complexity of an inequality constrained model was defined as the prior probability massin the inequality constrained space assuming a uniform prior on (−`, `) for `→∞. For modelM1, this implies that a measure of complexity that is equal to 1/16 = 0.0625 according tothis definition. In the Bayes factor based on CPPs, model complexity of M1 is incorporatedas the CPP probability mass in the inequality constrained space {β|β1 > 2β2 > 0}. Under thelinear restrictions β1 = β2 = 0 for the CPP means and β1 = β2 for the CPP variances, suchthat var(β1) = var(β2) and E(β1) = E(β2) = 0 in the CPP, model complexity is incorporatedas approximately 0.0738. The probabilities 0.0625 and 0.0738 correspond to an appropriatemeasure of complexity, based on, respectively, identical uniform priors or identical normal


priors for β1 and β2 when evaluatingM1 : β1 > 2β2 > 0. This is not the case for the variancerestriction β1 = 2β2 (suggested by Mulder et al. 2010) yielding var(β1) = 4var(β2) and asubstantial different measure for the complexity for M1 : β1 > 2β2 > 0, namely 0.125. Thisis illustrated in Figure 1.

Note that when time-varying covariates are also incorporated in the model, the restrictionsγ∗ = 0, which fix the dummy parameters to zero, are also present in [R0,V |0]. As a result, the

variances of the parameters γ∗ are equal to zero in ΨRl . Therefore, the CECPP covariance

matrix of the parameters β∗, i.e., the full parameter vector β with the dummy parametersγ∗ omitted, can be obtained by removing the rows and columns in ΨR

l that belong to thedummy parameters γ∗.

3.4. CECPP and posterior

The CECPP of β, given byπEC0 (β) = Nβ(βEC0 ,ΨEC

0 ), (17)

is the conjugate form of the empirical expected-posterior prior (Perez and Berger 2002) whenusing constrained posterior priors instead of standard posterior priors. Note here that Jeffreys’prior is used for the covariance matrix Σ. The hyper parameters of the CECPP of β arerobustly estimated from 1,000,000 draws based on 1,000 draws from the constrained posteriorpriors of 1,000 randomly selected training samples according to the scheme in Table 1.

In step (vi), the CECPP means were calculated as the sample means of the RML estimates βR

l

when R0,M contains rows with more than two non-zero elements. The reason is that, althoughthe medians are more robust to outliers, the vector of medians (medianl β

Rl,1, . . . ,medianl β

Rl,D)>

will generally not satisfy the mean restrictions [R0,M |r0,M ] in this case. Consequently, thesecond key property (Section 3.1) would not be satisfied when using the medians. In all othercases, the CECPP means are calculated as the sample medians. As a result, the CECPP willbe located on the boundary of the constrained space of interest.

Due to sampling fluctuations and restrictions on the mean with unequal weights, e.g., β1 =2β2, an additional modification of the CECPP standard deviations was performed at theend of step (vii) which was based on restricted regression as in (14). This ensures that theparameters that are compared with each other have equal CECPP variances. Consequently,the CECPP variances will satisfy the third key property (Section 3.1).

The conditional posterior distributions of β|Σ,Y and Σ|β,Y can be determined using Bayes’law from the CECPP and the likelihood in (5),

π0(β|Σ,Y) = Nβ(βN ,ΨN )

π0(Σ|β,Y) =W−1Σ (ΛN , νN )

(18)

with

βN =[(

ΨEC0

)−1+ Σ−1 ⊗X>X

]−1 [(ΨEC

0

)−1βEC0 + (Σ−1 ⊗X>X)β

]ΨN =

[(ΨEC

0

)−1+ Σ−1 ⊗X>X

]−1ΛN = (Y −XB)>(Y −XB)

νN = N + P + 1.


(i) Initialize the training sample size to Nl = J +K + PU + 1.

(ii) Randomly draw a minimal training sample [Yl|Xl] of size Nl.

(iii) Determine the hyperparameters {βRl , ΨRl } of the lth constrained posterior prior

πC0 (β|Yl).

(iv) Sample 1000 draws β(s)0,l from the constrained posterior prior of β, for s = 1000(l− 1) +

1, . . . , 1000l.

(v) Repeat steps (ii) to (iv) for l = 1, . . . , 1000.

(vi) The CECPP mean of β is calculated as βEC0 = L−1∑

l βR

l if R0,M contains rows with

more than 2 non-zero elements, or as βEC0,d = medianl βR

l,d, for d = 1, . . . , D,elsewhere.

(vii) Robust estimates (Huber 1981) are used for the CECPP covariance matrix of β:

ΨEC0 [d1, d2] =

σ2(βd1 + βd2)− σ2(βd1 − βd2)

σ2(βd1 + βd2) + σ2(βd1 − βd2)σ(βd1)σ(βd2), for d1, d2 = 1, . . . , D

where σ(βd) = medians,l(|β(s)0,l,d − β

EC0,d |)/Φ−1(.75), for s = 1, . . . , 1000

with Φ−1(·) the inverse of the cdf of the standard normal distribution

The standard deviations and the correlation matrix of ΨEC0 are separated according to

ΨEC0 = diag(sEC0 )CEC

0 diag(sEC0 ). Subsequently, ΨEC0 = diag(sEC0 )CEC

0 diag(sEC0 ) withsEC0 = (ID −R>0,V (R0,V R>0,V )−1R0,V )sEC0 .

Table 1: CECPP construction scheme.

3.5. Issues with minimal training samples

Because the β’s are modeled independently in the constrained posterior prior (ΨRl is diagonal),

training samples of size J +K +PU + 1 generally result in appropriate constrained posteriorpriors. However, in some cases, e.g., data with many identical observations, the diagonalelements of these matrices may be (approximately) zero. This problem can be overcomeby increasing the training sample size such that the variance in the training data increases.In BIEMS, the training sample size is increased by 1 after 5 training samples have beensequentially selected for which the estimate of the generalized error variance based on the

training sample, i.e., det(ΣRl /Nl), is smaller than 0.05 times the estimate of the generalized

error variance of the complete data matrix, i.e., det(S/N), where ΣRl is the estimate of Σ

based on the RML estimate of B of the lthe training sample given in (16). Therefore, thescheme in Table 2 is used where the roman numbers refer to the steps in Table 1. Note that theconstrained posterior prior variance of β is not influenced by the size of the minimal trainingsample because the constrained posterior prior variances of the β’s are scaled according tothe number of free parameters under the restriction [R0,M |r0,M ], which remains unchanged.


1. Initialize Nl = J +K + PU , and calculate det(S/N) with S = (Y −XB)>(Y −XB).

2. Set Nl = Nl + 1, V = 0, and the training sample counter to 1.

3. Do steps (ii), and (iii), and determine det(ΣRl ).

4. If det(ΣRl /Nl) < 0.05·det(S/N), then “bad training sample”, set V = V + 1, go to 5.

Else, go to 6.

5. If V < 5, go to 3. Else, discard all constrained posterior prior draws and go to 2.

6. Set V = 0. Do step (v). If l < 1, 000, go to 2, else, do step (vii).

Table 2: Training sample size adjustment scheme.

4. Calculating Bayes factors

When the CECPP is used as prior under the unconstrained model M0 and the prior underthe inequality constrained modelMt is equal to the CECPP truncated in the parameter spaceof the constrained model Mt, denoted by Mt, the Bayes factor of model Mt against modelM0 can be be expressed by

Bt0 =ftct

=

∫β∈Mt

π0(β|Y)∂β∫β∈Mt

πEC0 (β)∂β, (19)

where π0(β|Y) is the marginal posterior of β that is based on the CECPP and the completedata Y. The derivation can be found in Klugkist and Hoijtink (2007). They refer to theunconstrained prior as the encompassing prior.

BIEMS automatically computes all Bayes factors Bt0 of each constrained model Mt againstthe unconstrained model M0. Because the same CECPP is used for calculating the Bayesfactors Bt0, for t = 1, . . . , T , the Bayes factor between two constrained models M1 and M2

can be computed by

B12 =B10

B20.

For an inequality constrained modelMt, BIEMS automatically computes the Bayes factor ofmodel Mt versus its complement, i.e., Mt′ : β 6∈Mt, which equals

Btt′ =ft

1− ft1− ctct

.

In the BIEMS interface (Appendix A), the posterior model probabilities can be computed bypressing the “Compute” button in the “Posterior model probabilities” frame. These arecalculated according to

P (Mt|Y) =Bt0∑Tt=0Bt0

. (20)


4.1. Estimating ft and ct for an inequality constrained model

The quantities ft and ct are estimated using a Gibbs sampler (e.g., Gelman et al. 2004). Thecomplexity ct can be estimated by sampling S draws from the multivariate normal CECPPof β in (17). The estimate is equal to the proportion of draws satisfying the inequalityconstraints of model Mt, i.e.,

ct = S−1I(β(s)0 ∈Mt), (21)

where β(s)0 is the sth draw from the CECPP, for s = 1, . . . , S (S = 20, 000 by default,

modifiable in the interface, see Appendix A, or in the file input_BIEMS.txt, see Appendix B),and I(·) is the identity function. Similarly, ft can be estimated as the proportion of drawsfrom the unconstrained posterior in (18) that satisfy the inequality constraints of modelMt,i.e.,

ft = S−1I(β(s)N ∈Mt), (22)

where β(s)N is the sth draw from the unconstrained posterior. Based on the posterior condi-

tionals of β|Σ,Y and Σ|β,Y in (18), a sample from the unconstrained posterior is obtainedusing the MCMC procedure in Table 3.

In the first step, the initial value of Σ is obtained using the posterior estimate BN , with

vec(BN ) = βN =[(

ΨEC0

)−1+ Σ

−1 ⊗X>X]−1 [(

ΨEC0

)−1βEC0 + (Σ

−1 ⊗X>X)β], which is

a weighted average of the prior mean of BEC0 (with vec(BEC

0 ) = βEC0 ) and the ML estimateB (with vec(B) = β).

More efficient estimation for small ct

Estimating ct and ft using the above described method can be computationally intensive formodels with small values for these probabilities. For example, for c1 = 1/10! ≈ 2.76e−7corresponding to a model M1 : β1 > . . . > β10 based on the CECPP with identical priorsfor each β. A sample of at least 1e9 draws is needed to obtain a fair estimate of c1. For thisreason, ct and ft are estimated in steps of two inequalities. For model M1, the probabilities

1. Set the initial value Σ(0) = N−1(Y −XBN )>(Y −XBN ).

2. Draw β(s)N from π0(β|Σ(s−1),Y) in (18).

3. Draw Σ(s) from π0(Σ|β(s),Y) in (18).

4. Repeat steps 2 and 3 for s = 1, . . . , S.

Table 3: MCMC sampling from the posterior scheme.


ct and ft can be expressed stepwise as

P (β1 < . . . < β10) = P (β1 < β2 < β3)P (β3 < β4 < β5|β1 < β2 < β3) ·P (β5 < β6 < β7|β1 < . . . < β5)P (β7 < β8 < β9|β1 < . . . < β7)

P (β9 < β10|β1 < . . . < β9)

= c1,1 · . . . · c1,5 for the prior (23)

= f1,1 · . . . · f1,5 for the posterior, (24)

where the second index of c and f denotes the step.

Based on the CECPP, these (conditional) probabilities are equal to 16 , 1

20 , 142 , 1

72 , and 110 , re-

spectively. The first probability P (β1 < β2 < β3) can be estimated by directly sampling fromthe unconstrained CECPP and posterior and the proportion of draws satisfying β1 < β2 < β3are the estimates c1 and f1, respectively. The estimates of the conditional probabilities areequal to the proportion of draws satisfying each pair of inequality constraints based on asample from the conditional distributions of βd|β−d,Σ, where β−d is the vector of β’s withβd omitted. If βd is constrained by β−d, the conditional distribution is truncated normal, i.e.,βd|β−d,Σ ∼ N (mβd , s

2βd

;βL, βU ) where mβd and s2βd are the conditional mean and varianceof βd, respectively, and βL and βU are the lower and upper bound of βd, respectively. Ifβd is unconstrained, i.e., βd|β−d,Σ ∼ N (mβd , s

2βd

), the conditional distribution is normal.Sampling from the truncated normal distribution can be done using inverse probability sam-pling (Gelfand et al. 1992). The number of draws to estimate these conditional probabilitiesis 20,000 by default. Thus, the total number of draws to estimate c1 is 100,000. The samemethodology is used to calculate the posterior probability f1.

Burn-in and convergence

BIEMS does not provide explicit MCMC output because sampling the mean parameter vectorβ and the covariance matrix Σ is relatively straightforward in the multivariate normal linearmodel using the Gibbs sampler presented in Table 3. By starting at the posterior expectationof Σ, no burn-in period is needed for obtaining a posterior sample, and consequently, conver-gence is not an issue. The same holds for obtaining a sample from the CECPP of β which ismultivariate normal and independent of Σ.

When sampling under inequality constraints as was discussed above, the starting values forβ are equal to the ML estimates. When the starting values are not located in the allowedparameter space, e.g., the ML estimate equals (β1, . . . , β5) = (1, 1, 1, 1, 0) when calculatingP (β3 < β4 < β5|β1 < β2 < β3), a burn-in period of 1,000 iterations is used to obtain a startingvalue for β and Σ in the highest probability density region under the restrictions at hand.The length of this burn-in period was always sufficient during the debugging of the program,also in extreme situations.

Furthermore, by default different seeds are used for every MCMC run which always resultedin accurate estimates of the Bayes factor during the debugging of the program. If necessary,the user can increase the MCMC sample size to obtain an estimate of the Bayes factor withhigher accuracy. The computation of the sampling error of the Bayes factor is discussed laterin this section.


Inverse probability sampling when the normal density is approximately zero

If the posterior density of βp is approximately zero in the allowed parameter space, say(βL, βU ), inverse probability sampling (Gelfand et al. 1992), according to β(s) = Φ−1(u(s)),with u(s) ∼ U(Φ(βL),Φ(βU )) where Φ is the conditional posterior cdf of β, does not workbecause Φ(βL) ≈ Φ(βU ) ≈ 0 or 1. This occurs for instance when the constraints badly fit thedata.

When the allowed parameter space is an interval (βL, βU ) with approximately zero density,which is the case when the conditional posterior is β ∼ N(mβ, s

2β) with mβ � βL or mβ � βU ,

the conditional posterior of β in the interval (βL, βU ) is approximated using an exponentialdistribution, i.e., π(β) = λ exp(−λβ). If mβ � βL, the parameter λ is obtained by firstdetermining the distance ξ from the lower bound βL for which the posterior density is halved,i.e.,

φ(βL + ξ;mβ, s2β)

φ(βL;mβ, s2β)

=1

2⇒ ξ = mβ − βL +

√(βL −mβ)2 + 2 log(2)s2β.

Because the exponential density is halved at log(2)/λ, it holds that λ = log(2)/ξ with ξgiven above. Based on this λ, a draw is obtained from the exponential distribution, β(e) ∼exp(λ;βU ) truncated in the interval [0, βU − βL]. Subsequently, β(s) = βL + β(e). In the caseof mβ � βU , the procedure is similar.

4.2. Sampling error of the estimate of the Bayes factor

The estimation of the probabilities ft and ct as proportions of, respectively, the posterior andprior draws that satisfy the inequality constraints results in an estimation error of the Bayesfactor. This estimation error is proportional to the reciprocal of the sample size. In BIEMS,this estimation error is reported in the output file. In this section, it is illustrated how thestandard error of the estimate is calculated.

Again, we consider the inequality constrained modelM1 : µ1 < . . . < µ10. Based on (23) and(24), the Bayes factor of M1 versus M0 is estimated as

B10 =f1,1 · . . . · f1,5c1,1 · . . . · c1,5

. (25)

In BIEMS, the estimates of these probabilities are obtained as the proportion of hits in theCECPP and posterior, for c1,h and f1,h, h = 1, . . . , 5, respectively, satisfying the inequalityconstraints. Hence, each probability has a beta distribution given by beta(`h− 1, S − `h− 1),where `h is the number of draws satisfying the inequality constraints in step h from a totalof S draws from CECPP or posterior. Subsequently, a sample of the Bayes factor B10 canbe obtained by sampling from these beta-distributions. The standard error of the estimatedBayes factor B10 is equal to the standard deviation of this sample.

A small simulation study was conducted to determine the relationship between the sample sizeper two inequalities and the estimation error of the Bayes factor. A data set of size N = 10was generated from a population with means β = (0, 0, 0, 1, 1, 1, 2, 2, 2, 2) and covariancematrix Σ = I10. The Bayes factor of M1 : β1 < . . . < β10 versus M0 : β ∈ R10 was equalto 2.0e6. In Figure 2, a graph is displayed where the standard error of the Bayes factor isplotted versus the sample size per two inequalities. Given the large outcome of the Bayes


0 10000 20000 30000 40000 50000

010

020

030

040

0

Sample size

sd(B

F)

Figure 2: Regression line of the sample size per two inequality constraints versus the standarderror of the Bayes factor. The dots are the actual measurements of the standard errors. TheBayes factor of this data set is equal to 2.0e6.

factor, these standard errors are reasonably low. The default sample size is equal to 20,000iterations per two inequalities. The sample size can be adjusted in the interface (Appendix A)and in the file input_BIEMS.txt (Appendix B) if a more accurate estimate is needed.

4.3. Equality constrained models

For constrained models with equality constraints between the model parameters (possibly inaddition to inequalities), ft and ct cannot be estimated by directly using the above describedmethod because sampled values are never exactly equal. In this section, it is described howthe Bayes factor of such models versus the unconstrained model is estimated in BIEMS basedon the methodology given in the previous section. The idea was suggested by (Laudy 2006,p. 115).

First, the equality constraints in model Mt are replaced by approximate equalities, e.g.,

β1 = β2 ⇒ β1 ≈ β2 ⇔{β1 − β2 > −δ0β2 − β1 > −δ0.

The evaluation bound δ0 is chosen equal to the CECPP standard error of β1. Thus, theconstrained modelMt is approximated by an inequality constrained model, which is denotedby Mt,δ0 . The Bayes factor of model Mt,δ0 versus M0, i.e., B(t,δ0)0, can be estimated usingthe method of the previous section.

Subsequently, the Bayes factor is estimated between the inequality constrained model Mt,δ1

and the inequality constrained model Mt,δ0 , with δ1 = δ0/2, as the ratio of proportion ofdraws from the posterior and CECPP truncated at the inequality constraints of modelMt,δ0

that satisfy the inequality constraints of model Mt,δ1 . CECPP and posterior samples under


Figure 3: Draws from the unconstrained posterior of (β1, β2) (black), and subsequent con-strained posteriors by |β1− β2| < δw, with δa+1 = δa/2, δ0 = 0.9. The dashed line is β1 = β2.The data was generated for two groups of size 100 each with means β = (0, 1.5)> and errorvariance σ2 = 1.

the inequality constrained model Mt,δ0 can be obtained using inverse probability sampling(Gelfand et al. 1992) and the method described in Section 4.1.

Then, the Bayes factor is estimated between the inequality constrained model Mt,δa and theinequality constrained model Mt,δa−1 , with δa = δa−1/2, for a = 2, . . . , A until the Bayes

factor is approximately one, i.e., |B(t,δA)(t,δA−1) − 1| < 0.01. The Bayes factor of model Mt

versus M0 is then estimated as

Bt0 = B(t,δ0)0

A∏a=1

B(t,δa)(t,δa−1).

As an example, data was generated of size 100 for each of two groups with means µ = (0, 1.5)>

and covariance matrix Σ = I2. The model M1 : β1 = β2 was tested against M0 : β ∈ R2. InFigure 3, a sample is displayed from the unconstrained posterior of (β1, β2) (black) and fromthe subsequent posteriors of (β1, β2) under the constraint |β1−β2| < δa, for a = 0 and 1. Theresulting Bayes factor of M1 : β1 = β2 against the unconstrained model was equal to 0.

The Savage-Dickey density ratio

Consider the simple selection problem of an equality constrained model M1 : β = m versusthe unconstrained model M0 : β ∈ R1. The Bayes factor in (19) is then equal to the Savage-Dickey density ratio (Dickey, 1971), which is given by

B10 =π0(β = m|x)

π0(β = m). (26)


-1.5 -0.5 0.5 1.5

0.0

0.5

1.5

1.0

2.0

-1.5 -0.5 0.5 1.5

0.0

0.5

1.5

1.0

2.0

! !

Figure 4: Prior (dashed line) and posterior (solid line) sample density plots of a data set ofsize N = 30 and N (0, 1) (left panel) and a data set of size N = 15 and N (.7, 1) (right panel).Using the Savage-Dickey density ratio, the Bayes factors of M1 : β = 0 versus M0 : µ ∈ R1

equal B10 ≈ 2.10.4 = 5.25 (left panel) and B10 ≈ 0.07

0.29 = 0.24 (right panel).

Hence, the Bayes factor is given by the unconstrained posterior density of β evaluated at mdivided by the unconstrained prior density of β evaluated at m. Wetzels et al. (2010) providea proof that (19) converges to the Savage-Dickey density ratio for the selection problemM1,δ : |β| < δ (for some value δ > 0) versus M0 : β ∈ R1 as δ ↓ 0.

Two data sets were generated of size N = 30 with N (0, 1) and of size N = 15 with N (.7, 1).The sample density plots of the CECPP and the posterior are displayed in Figure 4. Further-more, consider the model selection problemM1 : β = 0 versusM0 : µ ∈ R1. From the figure,it can be concluded that the Bayes factor of modelM1 versusM0 is equal to B10 ≈ 2.1

0.4 = 5.25(left panel) and B10 ≈ 0.07

0.29 = 0.24 (right panel) using the Savage-Dickey density ratio in (26).The Bayes factors provided by BIEMS using the above described mechanism were equal toB10 = 5.35 (with sampling error 7.4e−2) for the first data set and B10 = 0.24 (with samplingerror 4.6e−3). Hence, the true Bayes factors lie well within the confidence intervals providedby BIEMS from which it can be concluded that the methodology can also be used for modelscontaining equality constraints between the parameters. The effectiveness of this procedurewas also shown by van Wesel et al. (2011).

5. Short examples

In this section several commonly used models are briefly discussed including two constrainedmodels of interest. In each example, the parameter matrix B, the vector β, and differ-ent constrained models are given. Also the specification of the restriction matrix is dis-cussed in each example. In Section 5.6, five empirical studies are briefly discussed whichwere analyzed using BIEMS. The analyses of these studies can be found on the internet sitehttp://tinyurl.com/informativehypotheses.

5.1. 2 by 3 ANOVA

Consider an ANOVA design with two factors a = 1, 2 and b = 1, 2, 3. The model can be



implemented in BIEMS as

B = β = (µ11, µ21, µ12, µ22, µ13, µ23)>.

Note here that the order of the µ’s in B and β depends on the coding of the groups 1 to 6in data.txt. Subsequently, the following model selection problem tests the presence of aninteraction effect in a specific direction

M1 : µ11 − µ21 > µ12 − µ22 > µ13 − µ23M2 : µ11 − µ21 = µ12 − µ22 = µ13 − µ23.

The constraints of each model needs to be separately specified in the interface. For instancefor M1, the inequality constraints are µ11 − µ21 > µ12 − µ22 and µ12 − µ22 > µ13 − µ23. Theset of model constraints is then

M1 : R1β >t r1 ⇒[

1 −1 −1 1 0 0 00 0 1 −1 −1 1 0

]β >

[00

].

ForM2, the inequality “>” is replaced by an equality sign “=”. The default set of restrictionsfor the specification of the prior distribution are equal to the row echelon forms of

R0,Mβ = r0,M ⇒[

1 −1 −1 1 0 0 00 0 1 −1 −1 1 0

]β =

[00

]for the CPP means and

R0,V β = 0⇒

1 −1 0 0 0 0 00 1 −1 0 0 0 00 0 1 −1 0 0 00 0 0 1 −1 0 00 0 0 0 1 −1 0

β =

00000

for the CPP variances. The resulting CECPP means and variances also satisfy these restric-tions.

5.2. Regression 1: Inequality constrained models

Consider a univariate regression model with five covariates, i.e.,

B = β = (µ, α1, α2, α3, α4, α5)>,

and the following inequality constrained models

M1 : α1 > α2 > α3 > α4 > α5

M2 : α1 > (α2, α3, α4, α5).

The (in)equality constraints in each model must be separately specified in the model, e.g.,α1 > α2, α1 > α3, α1 > α4, and α1 > α5 for model M2. The default set of restrictions areobtained by taking the row echelon form of the combination of the constraints of both models.


This results in an identical set of restrictions for the CPP means and variances which is givenby

0 1 0 0 0 −1 00 0 1 0 0 −1 00 0 0 1 0 −1 00 0 0 0 1 −1 0

β = 0.

Based on these restrictions, the CECPP of α1, . . . , α5 are identical. After obtaining theBayes factors B10 and B20, the Bayes factor between these two models is equal to B12 =B10/B20. The posterior model probabilities can also be computed in the interface of BIEMS(Appendix A).

When the data is not standardized, the constraints of models M1 and M2 are evaluated forunstandardized regression coefficients. Generally this is not recommendable because covari-ates may be measured on different scales. For this reason, it is recommended to standardizethe data which can be specified in Step 2 in the interface (Appendix A) and in the fileinput_biems.txt (Appendix B). As a result, the model constraints are evaluated for stan-dardized regression coefficients.

5.3. Regression 2: Variable selection

Again, consider the regression model above with five exploratory variables and the followingmodel selection problem

M1 : α2 = 0, α3 = 0, α4 = 0, α5 = 0

M2 : α3 = 0, α5 = 0.

This is a variable selection problem where model M1 only incorporates the first covariate toexplain the dependent variable, modelM2 incorporates the first, second, and fourth covariate,and the unconstrained modelM0 incorporates all five covariates. Similarly as in the previoussection, we recommend to standardize the data.

In the interface where both models are simultaneously implemented, the default set of restric-tions for the CPP means are given by

0 0 1 0 0 0 00 0 0 1 0 0 00 0 0 0 1 0 00 0 0 0 0 1 0

β = 0.

In the stand-alone version where each model is separately evaluated, this set of restrictionsmust be manually set for model M2 (for M1, these restrictions correspond with the defaultsetting). There are no restrictions for the CPP variances in the default setting, and thereforethe CECPP variances will generally differ between the α’s.

5.4. MANCOVA

Consider an experiment where three outcome variables are measured for two different groupswith one covariate, i.e.,

B =

µ11 µ12 µ13µ21 µ22 µ23α11 α12 α13

,


and β = (µ11, µ21, α11, . . . , α13)>, and the following models

M1 : µ11 > µ21, µ12 < µ22, µ13 > 2µ23 > 0

M2 : µ 6∈ M1.

Under model M1, the third measurement mean of the first group µ13 is more than twice aslarge as the measurement mean of the second group µ23, which is larger than zero. Suchconstraints can occur in practice when the µ’s are the means of measurement differences.

The default set of restrictions for the CPP means are equivalent with1 −1 0 0 0 0 0 0 0 00 0 0 −1 1 0 0 0 0 00 0 0 0 0 0 1 −2 0 00 0 0 0 0 0 0 1 0 0.

β = 0,

and for the CPP variances, the restrictions yield 1 −1 0 0 0 0 0 0 0 00 0 0 −1 1 0 0 0 0 00 0 0 0 0 0 1 −1 0 0.

β = 0,

We also recommend to standardize the explanatory variables in a MANCOVA. The Bayes fac-tor of modelM1 against its complementM2 is automatically calculated, which can be viewedby pressing the button “Detailed output” after the Bayes factors have been calculated.

5.5. Repeated measurements

A model with five repeated measurements and one time-varying covariate is modeled as

B =

µ1 µ2 µ3 µ4 µ5γ1,1 0 0 0 00 γ1,2 0 0 00 0 γ1,3 0 00 0 0 γ1,4 00 0 0 0 γ1,5

,

and β = (µ1, γ1,1, µ2, γ1,2, µ3, γ1,3, µ4, γ1,4, µ5, γ1,5)>. As an example, we consider two models

with constraints on the measurement adjusted means µ (constraints between the γ’s can bespecified in a similar way)

M1 : µ1 > µ2 > µ3 > µ4 > µ5

M2 : µ1 ≈ µ2 ≈ µ3 ≈ µ4 ≈ µ5 with δ = 0.5.

In model M2, the maximal difference between subsequent measurement means is δ = 0.5,i.e., |µp − µp+1| < 0.5⇔ µp − µp+1 > −0.5, − µp + µp+1 > −0.5, for p = 1, . . . , 4. Therefore,8 inequality constraints need to be specified for M2, namely, µ1 − µ2 > −0.5, µ2 − µ1 >−0.5, . . . , µ4−µ5 > −0.5, µ5−µ4 > −0.5. It is important that pairs of inequality constraints


that form an approximately equality must be specified next to each other. This is essentialin order to obtain proper default restrictions for the CPPs, which, in this case, are given by

1 0 −1 0 0 0 0 0 0 0 00 0 1 0 −1 0 0 0 0 0 00 0 0 0 1 0 −1 0 0 0 00 0 0 0 0 0 1 0 −1 0 0

β = 0,

for both the CPP means and CPP variances.

The time-varying covariate needs to be standardized, each response wi,1,p is standardizedbased on the sample mean w1 = (NP )−1

∑i,pwi1p and sample standard deviation sw1 =√

(NP )−1∑

i,p(wi1p − w1)2 of the time-varying covariate.

5.6. Empirical studies

Here, we briefly discuss five empirical studies which were analyzed using BIEMS. Replicationmaterials are available along with the manuscript and the data sets can also be found on thewebsite http://tinyurl.com/informativehypotheses. The first data set was discussed byKammers et al. (2009), the second data was discussed by Mulder et al. (2009), and the third,fourth, and fifth data sets were discussed by Mulder et al. (2010).

Repeated measurements 1: Rubber hand illusion

In an experiment discussed by Kammers et al. (2009b), the occluded right index finger andthe visible index finger of a rubber hand were stroked either synchronously (illusion condition)or asynchronously (control condition). After this stimulation period, one of five perceptuallocalization responses was collected for N = 14 subjects. The following constrained modelswere of interest

� M1 : µ1 = µ2 = µ3 = µ4 = µ5

� M2 : µ1 = µ2 < µ3 = µ4 = µ5

� M3 : µ1 = µ2 < µ3 = µ4 < µ5,

where µp is the measurement mean of the perceived location of the occluded hand, for p =1, . . . , 5. Under model M1, the location of the occluded hand is unaffected by the amountof new proprioceptive information. Under model M2, the perceived location of the occludedhand is affected when moving the occluded hand, although it is independent of the amount ofmovements. Under modelM3, the perceived location of the hand is influenced by the numberof movements of the occluded hand.

Repeated measurements 2: CONAMORE

The conflict and management of relationships study (CONAMORE; Meeus et al. 2004) is anongoing longitudinal study in the Netherlands. The goal of the study is to asses relationshipsof adolescents with parents and best friend as well as the emotional states of the participants.In this example, the data consisted of N = 340 Dutch participants who were measured once



a year in a period of five years (from the age of 13 until 17). This was a part of a larger dataset described in Selfhout et al. (2009).

The outcome variable was self-reported satisfaction with peer relations. There were twogroups: boys and girls (coded by j = 1 and 2 in data.txt, respectively) and one time-varyingcovariate “generalized anxiety disorder” which was measured at each wave. Four models wereanalyzed,

� M1 : µg13=µb13, µg14=µb14, µg15=µb15, µg16=µb16, µg17=µb17

� M2 : µg13>µb13, µg14>µb14, µg15>µb15, µg16>µb16, µg17>µb17

� M3 : 0 < µg13−µb13 < µg14−µb14 < µg15−µb15 < µg16−µb16 < µg17−µb17

� M4 : µg13=µb13, µg14=µb14, 0<µg15−µb15<µg16−µb16<µg17−µb17,

where µgp and µbp is the mean of self-reported satisfaction of girls and boys at the age ofp adjusted for generalized anxiety disorder, respectively. Model M1 states that there is nodifference in self-reported satisfaction between boys and girls at each wave. ModelM2 statesthat girls score their satisfaction with peer higher than boys at each wave. Model M3 statesthat girls score higher on self-reported satisfaction than boys and that the difference increasesat each wave. Finally, model M4 states that that there is no difference between boys andgirls at the age of 13 and 14, but from the age of 15 girls score higher than boys and thedifference increases at each wave.

Multivariate one-sided testing: Cell counts of HIV-positive newborn infants

In this example, researchers were interested whether ritonavir therapy had a positive effect onthe immunoreconstruction of 36 HIV-positive newborn infants (Sleasman et al. 1999). Theanalysis was limited to the cell counts of CD45RA T and CD45RO T, which were measuredat birth and after 24 weeks of therapy (Larocque & Labarre, 2004). The models underinvestigation were

� M1 : µ = 0

� M2 : µ > 0

where µ = (µ1, µ2)> which contain the mean differences between the cell counts (after 24

weeks minus at birth) of the two cell types, and 0 = (0, 0)>.

Multivariate analysis of variance: Does vinylidene fluoride cause liver damage?

Vinylidene fluoride is suspected of causing liver damage. To investigate this, 4 groups of 10male Fischer-344 rats received different dosages of this substance by inhalation. Three serumenzyme levels (SDH, SGOT, and SGPT), which often indicate liver damage, were measuredfor each rat. The data, printed in (Silvapulle and Sen 2005, p. 10), appeared in a report ofLitton Bionetics Inc. (1984).

The data can be analyzed with a MANOVA model. When denoting µj = (µj1, µj2, µj3)>

with µjp the mean of the pth serum level of the jth group, for j = 1, . . . , 4, where the dosageof vinylidene fluoride increases from group 1 to 4, the following two models were of interest:


� M1 : µ1 = µ2 = µ3 = µ4

� M2 : µ1 < µ2 < µ3 < µ4

ModelM1 assumes that vinylidene fluoride has no influence on the three serum levels. ModelM2 assumes that each serum level increases when increasing the dosage of vinylidene fluoride.

Multivariate regression: Cigarette burn and sugar percentage

The data that was provided by (Anderson and Bancroft 1952, p. 205) consisted of chemicalcomponents of 25 tobacco leaf samples consisting of six explanatory variables: nitrogen (x1),chlorine (x2), potassium (x3), phosphorus (x4), calcium (x5), and magnesium (x6), and twodependent variables: rate of cigarette burn in inches per 1,000 seconds (y1) and per cent sugarin the leaf (y2). After standardizing the dependent variables and covariates, the followingmodels with inequality and equality constraints on the standardized regression coefficientsαkp with k = 1, . . . , 6 and p = 1, 2 (inspired by the results of Bedrick and Tsai 1994) wereanalyzed

� M1 : α11 = . . . = α61, α12 = . . . = α62

� M2 : 0 < (α11, α31,−α41, α51,−α61) < −α21, and0 < (α12, α32, α42, α52,−α62) < α22,

� M3 : 0 = α11 = α41 = α51 = α61 < α31 < −α21,0 = α32 = α42 = α52 = α62 < −α12 < α22.

ModelM1 assumes all standardized regression coefficients corresponding to a dependent vari-able to be equal, modelM2 assumes that chlorine dominates the other explanatory variablesin predicting the two dependent variables and that all standardized regression coefficients areeither smaller than zero or larger than zero, and model M3 assumes that each dependentvariable is explained by only two specific explanatory variables where chlorine is dominantover the other explanatory variable.

6. Conclusions

The program that was discussed in this paper can be used to determine the Bayes factorof a multivariate normal linear model with equality and/or inequality constraints betweenthe parameters against the unconstrained model. The outcome can be interpreted as therelative amount of evidence that is provided by the data in favor of the constrained modelversus the unconstrained model. Due to the use of the CECPP (Mulder et al. 2009), which isbased on certain linear restrictions and minimal training samples, the obtained Bayes factorsappropriately balance between model fit and complexity for a broad class of equality andinequality constrained models.

A user-friendly interface (Appendix A) can be downloaded from the website http://tinyurl.com/informativehypotheses, which will allow researchers who are not familiar with thetechnical details of statistical modeling to use the proposed methodology. Also a stand-aloneversion (Appendix B) can be downloaded from this website. Because the Bayes factor of each




constrained model against the unconstrained model is computed separately in the stand-aloneversion, it is important that the same set of restrictions (Section 3.2, Appendix B), and there-fore the same CECPP, is used for each computation. In the interface, this is automaticallydone. The interface also provides posterior model probabilities of all models under investi-gation. Based on the posterior model probabilities, the constrained models can be compareddirectly with each other concerning the amount of evidence provided by the data for eachconstrained model.

Acknowledgments

This research was supported by grant NWO-VICI-453-05-002 of the Netherlands Organizationfor Scientific Research.

References

Anderson RL, Bancroft TA (1952). Statistical Theory in Research. McGraw-Hill.

Bartlett M (1957). “A Comment on D.V. Lindley’s Statistical Paradox.” Biometrika, 44,533–534.

Bedrick EJ, Tsai CL (1994). “Model Selection for Multivariate Regression in Small Samples.”Biometrics, 50, 226–231.

Berger JO, Pericchi LR (1996). “The Intrinsic Bayes Factor for Model Selection and Predic-tion.” Journal of American Statistical Association, 91, 109–122.

Gelfand AE, Smith AFM, Lee T (1992). “Bayesian Analysis of Constrained Parameter andTruncated Data Problems Using Gibbs Sampling.” Journal of the American StatisticalAssociation, 87, 523–532.

Gelman A, Carlin JB, Stern HS, Rubin DB (2004). Bayesian Data Analysis. 2nd edition.Chapman & Hall, London.

Hoijtink H, Klugkist I, Boelen PA (2008). Bayesian Evaluation of Informative Hypotheses.Springer-Verlag, New York.

Huber P (1981). Robust Statistics. John Wiley & Sons.

Jeffreys H (1961). Theory of Probability. 3rd edition. Oxford University Press, New York.

Judge GG, Griffiths WE, Hill RC, Lee TC (1980). The Theory and Practice of Econometrics.John Wiley & Sons.

Kammers MPM, de Vignemont F, Verhagen L, Dijkerman HC (2009b). “The Rubber HandIllusion in Action.” Neuropsychologica, 47, 204–211.

Kammers MPM, Mulder J, de Vignemont F, Dijkerman HC (2009). “The Weight of Repre-senting the Body: Addressing the Potentially Indefinite Number of Body Representationsin Healthy Individuals.” Experimental Brain Research, 204, 333–342.


Kass RE, Raftery AE (1995). “Bayes Factors.” Journal of American Statistical Association,90, 773–795.

Klugkist I, Hoijtink H (2007). “The Bayes Factor for Inequality and about Equality Con-strained Models.” Computational Statistics & Data Analysis, 51, 6367–6379.

Laudy O (2006). Bayesian Inequality Constrained Models for Categorical Data. Ph.D. thesis,Utrecht University.

Meeus W, Akse J, Branje S, Bogt TT, Engels R, et al CF (2004). “Codebook Conamore:Conflicts and Management of Relationships.” Utrecht University, Faculty of Social Sciences:Unpublished manuscript.

Mulder J, Hoijtink H, Klugkist I (2010). “Equality and Inequality Constrained MultivariateLinear Models: Objective Model Selection Using Constrained Posterior Priors.” Journal ofStatistical Planning and Inference, 140, 887–906.

Mulder J, Klugkist I, van de Schoot A, Meeus W, Selfhout M, Hoijtink H (2009). “BayesianModel Selection of Informative Hypotheses for Repeated Measurements.” Journal of Math-ematical Psychology, 53, 530–546.

Perez JM, Berger JO (2002). “Expected Posterior Prior Distributions for Model Selection.”Biometrika, 89, 491–511.

Press SJ (2005). Applied Multivariate Analysis: Using Bayesian and Frequentist Methods ofInference. 2nd edition. Krieger, Malabar.

Selfhout M, Branje S, Meeus W (2009). “Development Trajectories of Perceived FriendshipIntimacy, Constructive Problem Solving, and Depression from Early to Late Adolescence.”Journal of Abnormal Child Behavior, 37, 251–264.

Silvapulle MJ, Sen PK (2005). Constrained Statistical Inference: Inequality, Order, and ShapeRestrictions. 2nd edition. John Wiley & Sons.

Sleasman JW, Nelson RP, Goodenow MM, Wilfert D, Hutson A, Basslaer M, Zuckerman J,Pizzo PA, Mueller BU (1999). “Immunoreconstitution After Ritonavir Therapy in Childrenwith Human Immunodeficiency Virus Infection Involves Multiple Lymphcyte Lineages.”Journal of Pediatrics, 134, 597–606.

van Wesel F, Hoijtink H, Klugkist I (2011). “Choosing priors for inequality constrained normallinear models: Methods based on training samples.” Scandinavian Journal of Statistics,38(4), 666–690.

Visual Numerics (2003). IMSL Fortran Numerical Library.

Wetzels R, Grasman RPPP, Wagenmakers EJ (2010). “An Encompassing Prior Generalizationof the Savage-Dickey Density Ratio Test.”Computational Statistics & Data Analysis, 54(9),2094–2102.


A. User manual of the BIEMS interface

The BIEMS interface can be obtained from http://tinyurl.com/informativehypotheses.After downloading and unzipping the folder containing this software package, the executableBIEMS.exe will be found after opening the folder. The purpose of this appendix is to describehow to use the interface. Only a data file data.txt needs to be provided, all the otherinformation can be entered using the the BIEMS Windows interface. Execution of BIEMSwill render output that can be viewed within the Windows interface and will be placed in thesubfolder BIEMS_calfBF.

A.1. Data

The data in data.txt should be formatted as whitespace- or tab-separated values withoutheader and with no missing values, where each column corresponds to a variable. BIEMSuses all variables in the data file, so only the variables of interest should be included. Inaddition, the variables in the data file need to be in the correct order. Consider an examplewith three dependent variables, one explanatory variable, one time-varying covariate, andtwo groups (i.e., using the notation introduced in Section 2, P = 3, K = 1, U = 1, J = 2).As was described in Sections 2.1 and 2.2, the observed variables in the data must ordered indata.txt as follows

yi1 yi2 yi3 xi1 wi11 wi12 wi13 group

8.3 8.6 12.2 2.0 2.7 2.5 2.3 13.0 5.4 −1.0 −0.3 3.0 1.3 −0.5 27.0 9.1 9.6 1.7 2.7 2.2 1.4 1...

......

......

......

...3.3 −2.8 1.3 −1.3 3.8 −0.1 1.5 2

that is, first the dependent variables, then the explanatory variables, than the time-varyingexplanatory variables and finally the variable containing group membership. The group col-umn is required, although it does not need to be sorted. If there is no grouping in the data,it should contain all 1’s.

A.2. Step 1: Data selection

After clicking on BIEMS.exe the opening screen (Figure 5) will unfold. Clicking the buttonSelect or Generate that can be found under the label Step 1 in the upper left hand cornerof the screen, renders the choice to generate data (which is discussed in the last sectionof this manual) or to import your own data. After importing your own data, these willbe displayed in a new screen (Figure 6). In this screen you have to provide the Number of

dependent variables, Number of explanatory variables and Number of time varying

explanatory variables. After pressing Ok, a screen (Figure 7) will pop up that allows youto specify the constrained models of interest. This screen will be discussed in the next section.

A.3. Step 2: Specifying models

In the upper left hand corner of the model specification screen (Figure 7) you find a framewhich is entitled 1: Compose. The tool in this section allows you to specify model constraints.



Figure 5: Opening screen.

If a model of interest is, for example, µ(1, 1) > µ(2, 1) > µ(3, 1) > µ(4, 1) with respect to fourmeans in a one-way ANOVA, this can be specified by subsequently entering:

� µ(1,1) > µ(2,1) followed by a click on Add to list,

� µ(2,1) > µ(3,1) followed by a click on Add to list,

� µ(3,1) > µ(4,1) followed by a click on Add to list.

In the bottom left hand corner of the screen displayed in Figure 7, there is a frame 2:

List of Inequalities and Equalities. Clicking on Define as Model places the com-posed model in the top right hand corner in the frame List of Models. Once this has beendone, another model can be specified using the tool in the frame 1: Compose. To providesome examples of inequality and equality constraints one can specify:

� µ(1, 1)− µ(2, 1) = µ(3, 1)− µ(4, 1),

� µ(1, 1)− 2 ∗ µ(2, 1) + µ(3, 1) > 0,

� µ(1, 1)− µ(2, 1) > 2.5,

� 3 ∗ µ(1, 1)− µ(2, 1) + 2 > 0.


Figure 6: Data file selector screen. The model must be specified be setting the appropriatenumber of dependent variables, number of explanatory variables, and number of time-varyingexplanatory variables.

Note that the Edit and Delete buttons in the frames 2: List of Inequalities and

Equalities and 3: List of Models can be used to modify or delete constraints in frame2 and models in frame 3, respectively. As soon as the models of interest are specified thedefault prior distribution can be generated by clicking the button Generate Default Prior

which can be found in the top of the screen. Before discussing this, some additional featuresin the model specification screen will be elaborated.

A.4. Additional features in the model specification screen

In the bottom right hand corner of the model specification screen (Figure 7), the frame 4:

Settings can be found. Here the following default values for the computation of Bayes factorsand posterior model probabilities can be modified:

� Ticking the circle in front of Specify restrictions manually allows manual specifi-cation of the restrictions used to construct the prior distribution using the tool in frame1: Compose that was previously used to compose models. The restrictions that areused to specify the prior distribution are discussed in Section 3.2 of the paper.


Figure 7: Screen where the equality and inequality constrained models can be specified.

� The scale of the prior variance can manually be adjusted by ticking the circle in front ofManually. The desired scale can be entered in the box located on the right of Manually.The default value of the scale of the prior distribution is discussed in Section 3.3.

� The default MCMC sample size can be adjusted. The MCMC sample size is discussedin Section 4.1 and in Section 4.2.

� The default number of Max BF steps can be adjusted. This number of steps is denotedby A in Section 4.3.

The last item, which can be found in the lower right hand corner of this screen, is impor-tant. Here it can be indicated whether or not the dependent variables and the (time-varying)explanatory variables should be standardized. Note that regression coefficients in a (multivari-ate) multiple regression are only comparable if both the dependent and independent variablesare standardized (Section 5.2). Note also that in the context of an (M)ANCOVA, i.e., thereare groups and continuous explanatory variables in the model, the continuous predictors haveto be standardized (Section 5.4).

The button Step Back brings you back to the previous screen and deletes the information inthe current screen. The button Close will end the current session with BIEMS.


Figure 8: Default prior view which displays the properties and hyperparameters of the con-jugate expected constrained posterior prior (CECPP).

A.5. Step 3: Generate the default prior distribution

As soon as the models of interest are specified, the default prior distribution can be generatedby clicking the button Generate Default Prior which can be found in the top of the screenunder Step 3. A screen will open (Figure 8) showing the prior means and covariance matrix,prior scale and training sample size, and matrices containing the restrictions used for thecomputation of the prior means and variances. More information with respect to the contentof this screen can be found in Sections 3.3 and 3.4. The screen can be closed by clicking theOk button. Note however that it is possible to change the prior means and prior covariancematrix by typing new values for the old values.

A.6. Step 4: View/edit default prior

Pressing the button View/Edit Default Prior which can be found in the top of the screenunder Step 4, renders a return to the screen discussed in the previous section.


Figure 9: Screen where Bayes factors and posterior model probabilities are computed.

A.7. Step 4: Calculate Bayes factors

Pressing the button Calculate Bayes factors, which can be found in the top of the screenunder Step 4, leads to a screen (Figure 9) where the Bayes factors are calculated. These willbe displayed when the MCMC computation is done. For each model, detailed output can beinspected by clicking the buttons Detailed output. Furthermore, by pressing (In)equality

constraints, the (in)equality constraints of each model can be viewed. The main featuresof the output will be discussed in the next section. Finally, the posterior model probabilitieswill be computed after pressing Calculate in the appropriate frame. These will be displayedin the frame below.

A.8. Output

The results of an analysis are given in the file output_BIEMS.txt. At the top of the file underthe label ***Bayes factor***, the Bayes factor for the constrained model Mt against theunconstrained model M0 is given (Section 4), along with its standard error (Section 4.2):

Bayes factor of the constrained model M_t versus the unconstrained model M_0:

BF_{t,0} = 69.76646

standard error BF = 5.669907


If equality constraints are used for the specification ofMt, the Bayes factors for the sub steps(Section 4.3) are given as well. The values in the subBFs column are the Bayes factors forthe constrained model in each step relative to the constrained model in the previous step.The current BF values are the estimated Bayes factors Bt0 at that particular step. In theexample below, the subBF values can be seen to gradually approach the number 1.0 whichimplies that convergence is established.

subBFs current BF

15.56523 15.56523

2.633227 40.98678...

...1.064050 69.54861

1.018774 70.85430

0.9846468 69.76646

If a model only contains inequality constraints, the estimated model complexity ct and esti-mated model fit ft (Section 4.1) are provided, as well as the Bayes factor for the model againstits complement (Section 4). Under the label ***hits/BF in substeps***, the details of thecomputation of ft, ct and the Bayes factor in each substep are provided (Section 4.1 underthe label ‘More efficient estimation for small ct’).

In the output under the label ***Descriptive Statistics***, the maximum likelihood es-timate of the parameter matrix B and the least squares estimate of Σ are given (Section 2.1).Under the label ***posterior***, the posterior means and variances of B are displayed.Under the label ***Prior***, the CECPP mean βEC0 is displayed in its matrix form BEC

0 ,followed by CECPP variances which are the diagonal elements of the prior covariance matrixΨEC

0 in (17), and finally the complete covariance matrix ΨEC0 is provided. These hyperpa-

rameters are ordered in essentially the same way as the matrix B, e.g.,

CECPP mean of B

1.294 1.294 1.294 (group 1)1.294 1.294 1.294 (group 2)4.038 4.692 4.557 (covariate 1)1.354 2.134 2.808 (time-varying covariate 1)

in the case of three dependent variables, two groups, one explanatory variable and one timevarying covariate. Note that the row containing the estimates for the regression coefficientsof the time-varying covariate correspond with the estimates of the diagonal elements of thematrix G1 in Section 2.2. Finally, under the label ***Prior Restrictions***, the restric-tions that were used for the computation of the hyperparameters of the constrained posteriorpriors (CPPs) are presented. First, the restrictions for the CPP means are presented, followedby the restrictions for the CPP variances. These restrictions are elaborated in Sections 3.2and 3.3.

A.9. Step 1: Data generation

After clicking on BIEMS.exe, the opening screen (Figure 5) will unfold. Clicking the buttonSelect or Generate renders the choice to generate data or to import your own data. IfGenerate is chosen, a screen will open (Figure 10) that allows the specification of a populationfrom which data will be generated. The population is described by the multivariate normallinear model presented in Equation 1 in this paper, extended with a multivariate normal


Figure 10: Data generator screen.

distribution for the explanatory variables in each group.

In the opening screen, the Number of dependent variabels, Number of Groups and Number

of explanatory variables have to be given. By clicking the button Use Settings, a framewill be opened where the following characteristics must be given:

� The sample size of each group;

� The error covariance matrix Σ which is introduced a few lines below Equation 1;

� The group means M and the regression coefficients A which can be found in Equation 1in this paper;

� The means and covariance matrix of the explanatory variables in each group.

After all parameters are specified, two options are available: simulate a data set from thispopulation or generate a data set from the population that has sample parameters of the errorcovariance matrix, group means, regression coefficients, means and covariance matrix of theexplanatory variables that are equal to the population parameters. If the latter is desired,the box labeled Generate exact data has to be ticked before clicking the button Generate.

Data generated from a population with known properties can be useful to run tests withBIEMS, for example, to get an impression of the sample sizes needed in order to be able toevaluate the models of interest.


B. User manual of BIEMS (stand-alone version)

The purpose of this appendix is to describe the practical details of the stand-aloneversion of BIEMS which can be downloaded from the web site http://tinyurl.com/

informativehypotheses. Unlike the interface, this version only computes the Bayes fac-tor of one constrained model against the unconstrained model. It is not possible to specifymultiple constrained models. For this reason, when more than one model is under investi-gation, say M1 and M2, it is important that the same linear restrictions and therefore thesame CECPP is used when computing the Bayes factors of each constrained model againstthe unconstrained model, B10 and B20. Subsequently, the two constrained models can di-rectly be compared with each other by calculating the Bayes factor of M1 against M2 byB12 = B10/B20.

A running example will be used throughout this appendix for illustration. The BIEMSprogram consists of one executable file biems.exe, a data file data.txt, a settings fileinput_BIEMS.txt and three further input files containing the constraints of the model and therestrictions on the prior necessary for calculating the Bayes factor of the constrained modelagainst the unconstrained model (equality_constraints.txt, inequality_constraints

.txt and restriction_matrix.txt). To calculate the Bayes factors of a model, set themodel specification by modifying these text files with a plain text editor and run biems.exe.This will generate a file output_BIEMS.txt containing the output for the model. Note thatthis will overwrite previous versions of output_BIEMS.txt when present.

B.1. Data

The data in data.txt should be formatted as whitespace- or tab-separated values withoutheader and with no missing values, where each column corresponds to a variable. BIEMSuses all variables in the data file, so only the variables of interest should be included. Inaddition, the variables in the data file need to be in the correct order. Throughout thisappendix we consider an example with three dependent variables, one explanatory variable,one time-varying covariate, and two groups (i.e., P = 3, K = 1, U = 1, J = 2 when usingthe notation of Section 2.1 and 2.2). The observed variables in the data must ordered indata.txt as follows

yi1 yi2 yi3 xi1 wi11 wi12 wi13 group

8.3 8.6 12.2 2.0 2.7 2.5 2.3 13.0 5.4 −1.0 −0.3 3.0 1.3 −0.5 27.0 9.1 9.6 1.7 2.7 2.2 1.4 1...

......

......

......

...3.3 −2.8 1.3 −1.3 3.8 −0.1 1.5 2

The group column is required, although it does not need to be sorted. If there is no groupingin the data, it should contain all 1’s.

B.2. General settings

The general settings for BIEMS are specified in input_BIEMS.txt, which is divided into fiveinput lines. On the first input line, the main properties of the data are defined:




Input 1: #DV #cov #Tcov N iseed

3 1 1 80 629535

These are the number of dependent variables, the number of explanatory variables, the numberof time-varying covariates and the number of observations, P , K, U and N respectively(Sections 2.1 and 2.2). The iseed value is the seed of the random number generator, whichcan be set to any positive integer. It can also be set to -1. In that case a different randomseed is used each time.

The second input line contains the number of constraints and restrictions that are placedon the model. We consider a model with 3 inequality constraints and 4 equality constraints(given in the next section), i.e.,

Input 2: #inequalities #equalities #restrictions

3 4 -1

These values should correspond to the number of equality constraints and inequality con-straints of the model defined in the files equality_constraints.txt andinequality_constraints.txt, respectively. When the value for #restrictions is set to-1, the default set of restrictions on the prior is used which is equivalent to the row ech-elon form of the set of model constraints (see also Section 3.2). When the restrictions aremanually specified in the file restriction_matrix.txt (discussed in the following section),for instance when multiple constrained models are under investigation and the default setsof restrictions are not identical across the models, the number of restrictions should then bestated in #restrictions.

The third input line allows the user to change some default values used by BIEMS:

Input 3: sample size maxBF steps scale of CPP variance

-1 -1 -1

Each of these options are set to -1, in which case the program’s defaults are used. The firstoption specifies the number of draws used to estimate the (conditional) probabilities for eachset of two inequality constraints, which defaults to 20,000 (see also Section 4.1 and 4.2). Thesecond option gives the maximum number of steps used to approximate the Bayes factor forequality constraints (Section 4.3). By default, BIEMS uses as many steps as it takes for theBayes factors between adjacent steps to become approximately one. The third option givesthe scale of the variances of the constrained posterior prior for β. By default this is equal tothe number of free parameters under the restricted model (Section 3.3).

The fourth and fifth input lines allow the user to standardize the dependent and explanatoryvariables:

Input 4: standardize DV/IV

0 2

Input 5: covariates to standardize (if standardize IV = 2)

1 0

The first value on input line 4 indicates whether to standardize the dependent variables, thesecond value whether to standardize the explanatory variables and time-varying covariates


(0 = no, 1 = yes). The second value can also be set to 2, to indicate that only some of theexplanatory variables and/or time-varying covariates should be standardized. This is specifiedon input line 5, which should then contain a line of 0’s and 1’s of length K + U , indicatingfor each explanatory variable and time-varying covariate whether it should be standardizedor not. As specified above for the example, only the explanatory variable is standardized.When testing constraints on the regression coefficients, it is recommendable to standardizethe dependent variable(s) and the (tim-varying) explanatory variable(s), so that actually theconstraints on the standardized regression coefficients are evaluated. Also in the case of a(M)ANCOVA, we recommend to standardize the explanatory variables.

B.3. Constraints and restrictions

The constraints of the model and restrictions on the prior are set in the filesequality_constraints.txt, inequality_constraints.txt and restriction_matrix.txt.If #restrictions is set to −1 in input_BIEMS.txt, the latter file does not need to be spec-ified. The first two of these files contain the matrices [Qt|qt] and [Rt|rt], as defined in Sec-tion 2.3. All three files have the same structure, with each row corresponding to a constraintor restriction.

For the example we have

B =

DV 1 DV 2 DV 3

µ11 µ12 µ13µ21 µ22 µ23α11 α12 α13

γ11 0 00 γ12 00 0 γ13

group 1group 2expl. var. 1}

time-var. cov. 1.

The vectorization of this matrix along its columns results in the following parameter vector

β> = (µ11, µ21, α11, γ11, µ12, µ22, α12, γ12, µ13, µ23, α13, γ13).

The model under evaluation has the following constraints:

µ11 = µ12 = µ13∧ ∧ ∧µ21 = µ22 = µ23.

Thus, it is expected that the group means do not change over time and the mean of group 2is higher than the mean of group 1 at each time point. This is specified in the input files asfollows (parameter names have been added here for convenience):

equality_constraints.txt

µ11 µ21 α11 γ11 µ12 µ22 α12 γ12 µ13 µ23 α13 γ131 0 0 0 -1 0 0 0 0 0 0 0 0

0 0 0 0 1 0 0 0 -1 0 0 0 0

0 1 0 0 0 -1 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 -1 0 0 0


and

inequality_constraints.txt

µ11 µ21 α11 γ11 µ12 µ22 α12 γ12 µ13 µ23 α13 γ13-1 1 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 -1 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 -1 1 0 0 0

In this example, the default set of restrictions (i.e., #restrictions = -1 in input_BIEMS.txt)can be used for constructing the construct the conjugate expected-constrained posteriorprior (CECPP). This is equal to the row echelon form of the combination of inequality andequality constraints of the model under evaluation. When the constraints in inequality_

constraints.txt and equality_constraints.txt are combines and row operations are per-formed, the system is containing two rows with zeros. After removing these two rows, thefollowing set of restrictions is used

restriction_matrix.txt

µ11 µ21 α11 γ11 µ12 µ22 α12 γ12 µ13 µ23 α13 γ131 0 0 0 0 0 0 0 0 -1 0 0 0

0 1 0 0 0 0 0 0 0 -1 0 0 0

0 0 0 0 1 0 0 0 0 -1 0 0 0

0 0 0 0 0 1 0 0 0 -1 0 0 0

0 0 0 0 0 0 0 0 1 -1 0 0 0,

which results in equal prior distributions for µ11, . . . , µ23 (Section 3).

B.4. Output

The results of an analysis are given in the file output_BIEMS.txt. At the top of the file theBayes factor for the constrained model against the unconstrained model is given, along withits standard error (see Section 4.2):

Bayes factor of the constrained model M_t versus the unconstrained model M_0:

BF_{t,0} = 69.76646

standard error BF = 5.669907

If equality constraints were used, the Bayes factors for the substeps (see Section 3.3) aregiven as well. The values in the subBF column are the Bayes factors for the model in thatstep, relative to the model in the previous step. The current BF values are the estimatedBayes factors Bt0 for that step. For the example, the subBF values can be seen to graduallyapproach the final Bayes factor of 69.8.

subBFs current BF

15.56523 15.56523

2.633227 40.98678...

...1.064050 69.54861

1.018774 70.85430

0.9846468 69.76646


If only inequality constraints were used, the estimated model complexity ct and estimatedmodel fit ft are provided, as well as the Bayes factor for the model against its complement(Section 4). Following this are some more details for each of the substeps.

At the bottom of the file some descriptive statistics are given: the maximum likelihood (ML)estimate of the parameter matrix B (when time-varying covariates are present the restrictedML estimate of B is provided given that γ∗ = 0 (Section 2.2)), the posterior means andvariances of B, as well as the means, variances and covariances for the conjugate expected-constrained posterior prior. The values given under ***prior*** correspond to the symbolsin equation (5.15) as follows: under CECPP means of B are the values of βEC0 (in matrix formB), CECPP variance of B contains the CECPP variances of the elements in B, and CECPP

covariance matrix of beta represents the prior covariance matrix ΨEC0 of β. Parameters

for these and the earlier descriptive statistics are ordered in essentially the same way as thematrix B, e.g.,

CECPP mean of B

1.294 1.294 1.294 (group 1)1.294 1.294 1.294 (group 2)4.038 4.692 4.557 (covariate 1)1.354 2.134 2.808 (time-varying covariate 1)

where the columns represent the measurements at three different time points. As a result ofthe CECPP restrictions, the prior means of the adjusted measurement means µ are identicalin the above matrix. The complete matrix ΨEC

0 is given as CECPP covariance matrix of

beta, and the order of the elements in B is identical to the order of the elements in β = vec(B).Finally under ***Prior Restrictions***, the restriction matrices are given that were usedto obtain the CECPP means and the CECPP variances of β.

Affiliation:

Joris MulderDepartment of Methodology and StatisticsTilburg University5037 AB Tilburg, The NetherlandsE-mail: [email protected]: http://www.tilburguniversity.edu/webwijs/show/?anr=629387

Journal of Statistical Software http://www.jstatsoft.org/

published by the American Statistical Association http://www.amstat.org/

Volume 46, Issue 2 Submitted: 2010-08-21January 2012 Accepted: 2011-09-26

mailto:[email protected]

http://www.tilburguniversity.edu/webwijs/show/?anr=629387

http://www.jstatsoft.org/

http://www.amstat.org/

BIEMS: A Fortran 90 Program for Calculating Bayes Factors ... · between the model parameters, such...

Documents

Transcript of BIEMS: A Fortran 90 Program for Calculating Bayes Factors ... · between the model parameters, such...