Viva extented final
-
Upload
silia-vitoratou -
Category
Technology
-
view
488 -
download
0
Transcript of Viva extented final
Efficient Bayesian Marginal Likelihood
estimation inGeneralised Linear Latent Variable
Models thesis submitted by
Silia Vitoratou
Athens, 2013
advisorsIoannis Ntzoufras
Irini Moustaki
ATHENS UNIVERSITY OF ECONOMICS AND BUSINESSDEPARTMENT OF STATISTICS
2
Fully Bayesian latent trait models with binary responses
Chapter 2
The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models
Chapter 3
Latent variable models: classical and Bayesian approaches
Chapter 1
Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models
Chapter 4
Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Chapter 5
Implementation in simulated and real life datasets
Chapter 6
Discussion and future researchChapter 7
Thesis structure
Overview
3
• Suppose we want to infer for concepts that cannot be measured directly (such as emotions, attitudes, perceptions, proficiency etc).
• We assume that they can be measured indirectly through other observed items.
• The key idea is that all dependencies among p-manifest variables (observed items) are attributed to k-latent (unobserved) ones.
• By principle, k << p. Hence, at the same time, the LVM methodology is a multivariate analysis technique which aims to reduce the dimensionality, with as little loss of information as possible.
“...co-relation must be the consequence of the variations of the two organs being partly due to
common causes ...“ Francis Galton, 1888.
Key ideas and origins of the latent variable models (LVM).
Chapter 1 Latent variable models: Classical and Bayesian approaches.
4
A unified approach: Generalised linear latent variable models (GLLVM).
Chapter 1 Latent variable models: Classical and Bayesian approaches
Generalized linear latent variable model (GLLVM; Bartholomew &Knott, 1999; Skrondal and Rabe-Hesketh, 2004) . The models assumes that the response variables are linear combinations of the latent ones and it consists of three components:
(a) the multivariate random component: where each observed item Yj, (j = 1, ..., p) has a distribution from the exponential family (Bernoulli, Multinomial, Normal, Gamma),
(b) the systematic component: where the latent variables Zℓ, ℓ = 1, ..., k, produce the linear predictor ηj for each Yj
(c) the link function : which connects the previous two components
5
A unified approach: Generalised linear latent variable models (GLLVM).
Chapter 1 Latent variable models: classical and Bayesian approaches
Special case: Generalized linear latent trait model- with binary items (Moustaki &Knott, 2000) .
The conditionals are in this case Bernoulli( ), where is the conditional probability of a positive response to the observed item. The logistic model is used for the response probabilities:
• The item parameters are often referred to as the difficulty and the discrimination parameters (respectively) of the item j.
All examples considered in this thesis refer to multivariate IRT (2-PL) models. Current findings apply directly or can be expanded to any type
of GLLVM.
6
A unified approach: Generalised linear latent variable models (GLLVM).
Chapter 1 Latent variable models: classical and Bayesian approaches
As only the p-items can be observed, any inference must be based on their joint distribution.
All data dependencies are attributed to the existence of the latent variables. Hence, the observed variables are assumed independent given the latent (local independence assumption) :
where is the prior distribution for the latent variables. A fully Bayesian approach requires that the item parameter vector is also stochastic, associated with a prior probability.
7
The fully Bayesian analogue: GLLTM with binary items
Chapter 2 Fully Bayesian latent trait models with binary responses
A) PriorsAll model parameters are assumed a-priori independent
where
For unique solution we use the Cholesky decomposition on B:
leading to
Prior from Ntzoufras et al. (2000) Fouskakis et al. (2009)
8
The fully Bayesian analogue: GLLTM with binary items
Chapter 2 Fully Bayesian latent trait models with binary responses
B) Sampling from the posterior
C) Model evaluation
• A Metropolis-within-Gibbs algorithm initially presented for IRT models by Patz and Junker (1996) was used here for the multivariate case (k>1).
• In this thesis, the Bayes Factor (BF; Jeffreys, 1961; Kass and Raftery, 1995) was used for model comparison.
• The BF is defined as the ratio of the posterior odds of two competing models (say m1 and m2) multiplied by their corresponding prior odds. Provided that the models have equal prior probabilities, is given by:
that is, the ratio of the two models’ marginal or integrated likelihoods (hereafter Bayesian marginal likelihood; BML).
• Each item is updated in one block. So are the latent variables for each person.
9
Estimating the Bayesian marginal likelihood
Chapter 2 Fully Bayesian latent trait models with binary responses
The BML (also known as the prior predictive distribution) is defined as the expected model likelihood over the model parameters’ prior:
that quite often is a high dimensional integral, not available in closed form. Monte Carlo integration is often used to estimate it, as for instance the arithmetic mean:
This simple estimator does not really work adequately and a plethora of Markov chains Monte Carlo (MCMC) techniques are employed instead in the literature.
10
The point based estimators (PBE) employ the candidates’ identity (Besag, 1989), in a point of high density:• Laplace-Metropolis (LM; Lewis & Raftery, 1997)• Gaussian copula (GC; Nott et al, 2008)• Chib & Jeliazkov (CJ; Chib & Jeliazkov, 2001)
Estimating the Bayesian marginal likelihood
Chapter 2 Fully Bayesian latent trait models with binary responses
The bridge sampling estimators (BSE), employ a bridge function , based on the form of which, several BML identities can be derived (even pre–existing):
• Power posteriors (PPT; Friel & Pettitt, 2008; Lartillot &Philippe, 2006)• Steppingstone (PPS ; Xie at al, 2011)• Generalised steppingstone (IPS; Fan et al, 2011)
The path sampling estimators (PSE), employ a continuous and differential path , to link two un-normalised densities and compute the ratio of the corresponding constants:
• Harmonic mean (HM; Newton & Raftery, 1994)• Reciprocal mean (RM; Gelfand & Dey, 1994)
• Bridge harmonic (BH; Meng & Wong, 1996)• Bridge geometric (BG; Meng & Wong, 1996)
11
From the early readings the methods applied for the parameter estimation of model settings with latent variables relied on the
Monte Carlo integration: the case of GLLVM
Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models
or the
joint likelihood Lord and Novick, 1968; Lord,1980
marginal likelihood Bock and Aitkin, 1981; Moustaki and Knott, 2000Under the conditional independence assumptions of the GLLVMs, there are two
equivalent formulations of the BML, which lead to different MC estimators, namely thejoint BML
and the
marginal BML
12
Monte Carlo integration: the case of GLLVM
Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models
A motivating exampleA simulated data set with p = 6 items, N = 600 cases and k = 2 factors was considered. Three popular BSE were computed under both approaches (R= 50,000 posterior observations , after burn in period of 10,000 and thinning interval of 10).
• BH: Largest error difference but rather close estimation...
• BG: Largest difference in the estimation without large error difference...
Differences are due to Monte Carlo integration, under
independence assumptions
13
Monte Carlo integration: the case of GLLVM
Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models
The joint version of BH comes with much higher MCE than the RM...
...but is the joint version of RM that fails to converge to the true value.
?
14
Monte Carlo integration under independence
Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models
• Consider any integral of the form: • The corresponding MC estimator is:
assuming a random sample of points drawn from h
• The corresponding Monte Carlo Error (MCE) is:
• Assume independence, that is, hence
15
Monte Carlo integration under independence
Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models
The two estimators are associated with different MCEs. Based on the early results of Goodman (1962), for the variance of N independent variables, the variances of the estimators are:
for each term
In finite settings, the difference can be outstanding
16
Monte Carlo integration under independence
Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models
In particular, the difference in the variances is given by
Naturally, it depends on R. Note however that also it depends on
• dimensionality (N), since more positive terms are added, and• on the means and variances of the N variables involved
At the same time, the difference in the means is given by
• Total covariation index (multivariate extension of the covariance).
• At the sample, the covariances, no matter how small, are non-zero leading to non zero TCI.• Under independence the index should be zero (the reverse statement does not hold)
•Depends also on the number of the variables (N), their means, and their variation through the covariances
17
Monte Carlo integration: the case of GLLVM
Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models
A motivating example-Revisited
Total covariance cancels out for the BH.
Different variables are
being averaged, leading to different variance
components
18
Monte Carlo integration & independence
Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models
Refer to Chapter 3 of the current thesis for:• more results on the error difference,
• properties of the TCI,
• extension to conditional independence,• and more illustrative examples.
19
Basic idea
Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models
Based on the work of Chib & Jeliazkov(2001), it is shown in Chapter 3 that the Metropolis kernel can be used to marginalise out any subset of the parameter vector, that otherwise would not be feasible.
Acceptance probability
Proposaldensity
Transitionprobability
• Consider the kernel of the Metropolis – Hastings algorithm, which denotes the transition probability of sampling , given that has been already generated:
• Then, the latent vector can be marginalised out directly from the Metropolis kernel as follows:
20
Chib & Jeliazkov estimator
Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models
Let us suppose that the parameter space is divided into p blocks of parameters. Then, using the Law of total probability, the posterior at a specific point can be decomposed to
• If analytically available use candidates’ (Besag, 1989) formula to compute the BML directly.
• If the full conditionals are known, Chib (1995) uses the output from the Gibbs sampler to estimate them.
• Otherwise Chib and Jeliazkov (2001) show that each posterior ordinate can be computed by
Requires p sequential
MCMC runs.
21
Chib & Jeliazkov estimator for models with latent vectors
Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models
The number of latent variables can be hundreds if not thousands. Hence the method is time consuming. Chib & Jeliazkov suggest to use the last ordinate to marginalise out the latent vector, provided that is analytically tractable (often it is not).
Hence the dimension of the latent vector is not an issue.
In Chapter 4 of the thesis, it is shown that the latent vector can be marginalised out directly from the MH kernel, as follows:
This observation however leads to another result. Assuming local independence, prior independence and a Metropolis - within – Gibbs algorithm, as in the case of the GLLVM, the Chib & Jeliazkov identity is drastically simplified as follows:
Hence the number of blocks , also, is
not an issue.
• The latent vector is marginalised out as previously. • Moreover, even there are p-blocks for the model parameters, only the full MCMC is required.• Can be used under data augmentations schemes that produce independence
22
Independence Chib & Jeliazkov estimator
Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models
Three simulated data sets – under different scenarios. Compare CJI with ML estimators.
30 batches
1000 iterations per batch
2000 iterations per batch
3000 iterations per batch
1st batchRtotal
23
Some results
Chapter 6 Implementation in simulated and real life datasets
kmodel = ktrue
•p =6 items, •N=600 individuals, •k=1 factor
24
Some results
Chapter 6 Implementation in simulated and real life datasets
kmodel = ktrue
•p =6 items, •N=600 individuals, •k=2 factors
25
Some results
Chapter 6 Implementation in simulated and real life datasets
kmodel = ktrue
•p =8 items, •N=700 individuals, •k=3 factor
26
Some results
Chapter 6 Implementation in simulated and real life datasets
kmodel <ktrue
•p =6 items, •N=600 individuals, •k=1 factor
27
Some results
Chapter 6 Implementation in simulated and real life datasets
kmodel >ktrue
•p =6 items, •N=600 individuals, •k=2 factors
28
Concluding comments
Chapter 6 Implementation in simulated and real life datasets
More comparisons are presented in Chapter 6 of the thesis, in simulated and real data sets. Some comments:
o The BG estimator was consistently associated with the smallest error. o The RM was also well behaved in all cases.
o The BH was associated with more error that the former two BSE.
• The harmonic mean failed in all cases.
• The PBE are well behaved: o LM is very quick and efficient – but might fail if the posterior is not symmetrical.o Similarly for the GC.o CJI is well behaved but time consuming. Since it is distributional
free, can be used as a benchmark method to get an idea of the BML.
• The BSE were successful in all examples.
Refer to Chapter 4 of the current thesis for more details on the implementation of the CJI (or see Vitoratou et al, 2013) :
29
Ideas initially implemented in thermodynamics are currently explored in Bayesian model evaluation.
geometric path whichlinks the endpoint densities
Boltzmann-Gibbs distribution
Partition function
Thermodynamics and Bayes
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
temperature parameter
Then the ratio λ can be computed via the thermodynamic integration identity (TI):
Bayes free energy
Assume two unnormalised densities (q1 and q0) and we are interested in the ratio of their normalising constants (λ). For that purpose we use a continuous and differential function of the form
30
The first application of the TI to the problem of estimating the BML is the power posteriors (PP) method (Friel and Pettitt, 2008; Lartillot and Philippe, 2006). Let
thenprior-posterior path
Thermodynamics and BML: Power Posteriors
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
power posterior
leading via the thermodynamic integration to the Bayesian marginal likelihood
For ts close to 0 we sample from densities close to the prior, where the variability is typically high.
31
Lefebvre et al. (2010) considered other options than the prior for the zero endpoint, keeping the unnormalised posterior at the unit endpoint. Any proper density g() will do:
An appealing option is to use an importance (envelope) function, that is a density as close as possible to the posterior).
importance-posterior path
Thermodynamics and BML: Importance Posteriors
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
importance posterior
For ts close to 0 we sample from densities close to the importance function, solving the problem of high variability.
32
Xie et al (2011) using the prior and the posterior as endpoint densities, considered a different approach to compute the BMI, also related to thermodynamics (Neal, 1993). First, the interval [0,1] is partitioned into n points and the free energy can be computed as:
An alternative approach: stepping-stone identities
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
• Under the power posteriors path, Xie et al (2011) showed that the BML occurs as:
• Under the importance posteriors path, Fan et al (2011) showed that the BML occurs as:
However, the stepping–stone identity (SI) is even more general and can be used under different paths, as an alternative to the TI:
Stepping stone
33
Path sampling identities for the BML- revisited
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Hence, there are two general identities to compute a ratio of normalising constants, within the path sampling framework, namely
Different paths lead to different expressions for the BML:
Identity for the BMLTI SI
path
Priorposterior
Power posteriors (PPT)Friel and Pettitt, 2008
Lartillot and Philippe, 2006
Stepping-stone (PPS)Xie et al (2011)
Importance
posterior
Importance posteriors (IPT)
inspired by Lefebvre et al. (2010)
Generalised stepping stone (IPS)Fan et al (2011)
Other paths can be used, under both approaches, to derive identities for the BML or any other ratio of normalising constants.
Hereafter, the identities with be named by the path employed, with a subscript denoting the method implemented, e.g. IPS
34
Lartillot and Philippe (2006) considered as endpoint densities the unormalised posteriors of two competing models:
leading to the model switching path
Thermodynamics & direct BF identities: Model switching
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
leading via the thermodynamic integration to the Bayes Factor
While it is easy to derive the SI counterpart expression:
bidirectional melting-annealingsampling scheme.
35
Based on the idea of Lartillot and Philippe (2006) we may proceed with the compound paths. which consist of
Which can be used either with the TI or the SI approach. If the ratio of interest is the BF, the two BMLs should be derived at the endpoints of [0,1]. The PP and the IP paths are natural choices for the nested part of the identity. For the latter
• a hyper, geometric path
which links two competing models, andfor each endpoint function Qi , i=0,1.
• a nested, geometric path
The two intersecting paths form a quadrivial
Thermodynamics & direct BF identities: Quadrivials
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
36
Sources of error in path sampling estimators
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
a) The integral over [0,1] in the TI is typically approximated via numerical approaches, such as the trapezoidal or Simpson’s rule (Neal, 1993; Gelman and Meng, 1998), which require an n-point discretisation of [0,1]:
Note that the temperature schedule is also required for the SI method (it defines the stepping stone ratios) . The discretisation introduces error to the TI and SI estimators, that is referred to as the discretisation error.It can be reduced by a) increasing the number of points n and/or b) by assigning more points closer to the endpoint that is associated higher variability.
c) As a third source of error can be considered also the path-related error.
We may gain insight into a) and c) by considering the measures of entropy related to the TI.
b) At each point , a separate MCMC run is performed with target distribution the corresponding . Hence, Monte Carlo error occurs also at each run.
37
Performance: Pine data-a simple regression example
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Measurements taken on 42 specimens. A linear regression model was fitted for the specimen’s maximum compressive strength (y), using their density (x) as independent variable:
The objective in this example is to illustrate how each method and path combination responds to prior uncertainty. To do so, we use three different prior schemes, namely:
The ratios of the corresponding BMLs under the three priors were estimated over n1 = 50 and n2 = 100 evenly spaced temperatures. At each temperature, a Gibbs algorithm was implemented and 30,000 posterior observations were generated; after discarding 5,000 as a burn-in period.
38
Performance: Pine data-a simple regression example
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Reflects difference
in thediscretisatio
n error
Reflects difference
in thepath-related
error
All quadrivals come with
smaller batch mean error
Note: PP works just fine under a geometric temperature schedule that samples more points from the prior.
Implementing a uniform temperature schedule:
39
Based on the prior-posterior path, Friel and Pettitt (2008) and Lefebvre et al. (2010) showed that the PP method is connected with the Kullback – Leibler diveregence (KL; Kullback & Leibler, 1951).
Thermodynamic integration & distribution divergencies
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Here we present their findings on a general form, that is, for any geometric path according to the TI
Relative entropy
Differential entropy Cross entropy
it holds that
symmetrised KL
40
Thermodynamic integration & distribution divergencies
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Graphical representation of the TI
What about the intermedi
ate points?
41
Thermodynamic integration & distribution divergencies
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
TI minus free energy at each point
Instead of integrating the mean energy over the entire interval [0,1], there is an optimal temperature, where the mean energy equals the free energy.
42
Thermodynamic integration & distribution divergencies
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Graphical representation of the NTI
functional KL
difference in the KL-
distance of the sampling distribution pt from p1
and p0
The ratio of interest
occurs at the point
where the sampling
distribution is equidistant
from the endpoint densities
43
Thermodynamic integration & distribution divergencies
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
The normalised thermodynamic integral
•According to the PPT method, the BML occurs at the point where the sampling distribution is equidistant from the prior and the posterior.
Hence:
•According to the QMST method, the BF occurs at the point where the
sampling distribution is equidistant from the two posteriors.
•according to the NTI, when geometric paths are employed, the free energy occurs at the point where the Boltzmann-Gibbs distribution is equidistant from the distributions at the endpoint states.
The sampling distribution pt is the Boltzmann-Gibbs distribution pertaining to the Hamiltonian (energy function) . Therefore
44
Thermodynamic integration & distribution divergencies
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Graphical representation of the NTI
What are the areas stand for?
45
Thermodynamic integration & distribution divergencies
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
The normalised thermodynamic integral and probability distribution divergenciesA key observation here is that the sampling distribution embodies the Chernoff coefficient (Chernoff, 1952) :
Based on that, the NTI can be written as:
meaning that
and therefore, the areas correspond to the Chernoff t-divergence. At t=t*, we obtain the so-called Chernoff information:
46
Thermodynamic integration & distribution divergencies
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Using the output from path sampling, the Chernoff divergence can be computed easily (see Chapter 5 of the thesis for a step-by step algorithm). Along with the Chernoff estimation, a number of other f-divergencies can be directly estimated, namely
• the Bhattacharyya distance (Bhattacharyya, 1943) at t = 0.5, • the Hellinger distance (Bhattacharyya, 1943; Hellinger, 1909), • the Rényi t-divergence (Rényi, 1961) and • the Tsallis t-relative entropy (Tsallis, 2001) .These measures of entropy are commonly used in• information theory, pattern recognition, cryptography, machine learning,• hypothesis testing • and recently, in non-equilibrium thermodynamics.
47
Thermodynamic integration & distribution divergencies
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Measures of entropy and the NTI
48
Path selection, temperature schedule and error.
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
These results provide insight also on the error of the path sampling estimators. To begin with Lefebre et al (2010) have showed that the total variance is associated with the J−divergence of the endpoint densities and therefore with the choice of the path. Graphically • the J-distance
coincides with the slope of the secant defined at the endpoint densities.
• the slope of the tangent at a particular point ti, coincides with the local variance
• the graphical representation of two competing paths provides information about the estimators’ variances.
The shape of the curve is a
graphical representation of the total
variance.
Higher local variances, at
the points where the curve is steeper.
Paths with smaller cliffs are easier to
take!
49
Path selection, temperature schedule and error.
Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Numerical approximation of the TI:
Different level of
accuracy towards the two
endpoints
The discretization error depends primarily on
the path
Assign more tis at points where the curve is steeper (higher local variances)
50
Future work
Currently developing a library in R for BML estimation in GLLTM with Danny Arends.Expand results (and R library) to account for other type of data.
Further study on the TCI (Chapter 3).
Use the ideas in Chapter 4 to construct a better Metropolis algorithm for GLLVMs.Proceed further on the ideas presented in Chapter 5, with regard to the quadrivials, the temperature schedule and the optimal t*. Explore applications to information criteria.
51
Bibliography Bartholomew, D. and Knott, M. (1999). Latent variable models and factor analysis. Kendall’s Library of Statistics, 7. Wiley.
Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society, 35:99–109.Besag, J. (1989). A candidate’s formula: A curious result in Bayesian prediction. Biometrika, 76:183.Bock, R. and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46:443–459.Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4).Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90:1313–1321.Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96:270–281.Fan, Y., Wu, R., Chen, M., Kuo, L., and Lewis, P. (2011). Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution, 28(2):523–532.Fouskakis, D., Ntzoufras, I., and Draper, D. (2009). Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of healthcare. Annals of Applied Statistics, 3:663–690.Friel, N. and Pettitt, N. (2008). Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society Series B (Statistical Methodology), 70(3):589–607.Gelfand, A. E. and Dey, D. K. (1994). Bayesian Model Choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological), 56(3):501–514.Gelman, A. and Meng, X. (1998). Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science, 13(2):163–185.Goodman, L. A. (1962). The variance of the product of K random variables. Journal of the American Statistical Association, 57:54–60.Hellinger, E. (1909). Neue Begr¨undung der Theorie quadratischer Formen von unendlichvielen Veranderlichen. Journal fddotur die reine und angewandte Mathematik, 136:210–271.Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186(1007):453–461.Kass, R. and Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:773–795.Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22:49–86.Lewis, S. and Raftery, A. (1997). Estimating Bayes factors via posterior simulation with the Laplace Metropolis estimator. Journal of the American Statistical Association, 92:648–655.Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using Thermodynamic Integration. Systematic Biology, 55:195–207.Lefebvre, G., Steele, R., and Vandal, A. C. (2010). A path sampling identity for computing the Kullback-Leibler and J divergences. Computational Statistics and Data Analysis, 54(7):1719–1731.Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems.Erlbaum Associates, Hillsdale, NJ.Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley, Oxford, UK
52
Meng, X.-L. and Wong, W.-H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6:831–860.Moustaki, I. and Knott, M. (2000). Generalized Latent Trait Models. Psychometrika, 65:391–411.Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods.Technical Report CRG-TR-93-1, University of Toronto.Newton, M. and Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, 56:3–48.Nott, D., Kohn, R., and Fielding, M. (2008). Approximating the marginal likelihood using copula. arXiv:0810.5474v1. Available at http://arxiv.org/abs/0810.5474v1Ntzoufras, I., Dellaportas, P., and Forster, J. (2000). Bayesian variable and link determination for Generalised Linear Models. Journal of Statistical Planning and Inference,111(1-2):165–180.Patz, R. J. and Junker, B. W. (1999b). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2):146–178.Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128:301–323.Raftery, A. and Banleld, J. (1991). Stopping the Gibbs sampler, the use of morphology, and other issues in spatial statistics. Annals of the Institute of Statistical Mathematics, 43(430):32–43.Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Paedagogiske Institut, Copenhagen.Renyi, A. (1961). On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, pages 547–561.Tsallis et al., Nonextensive Statistical Mechanics and Its Applications, edited by S.Abe and Y. Okamoto (Springer-Verlag, Heidelberg, 2001); see also the comprehensive list of references at http://tsallis.cat.cbpf.br/biblio.htm.Vitoratou, S., Ntzoufras, I., and Moustaki, I. (2013). Marginal likelihood estimation from the Metropolis output: tips and tricks for efficient implementation in generalized linear latent variable models. To appear in: Journal of Statistical Computation and Simulation.Xie, W., Lewis, P., Fan, Y., Kuo, L., and Chen, M. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology, 60(2):150–160.
This thesis is dedicated to