Viva extented final

Efficient Bayesian Marginal Likelihood

estimation inGeneralised Linear Latent Variable

Models thesis submitted by

Silia Vitoratou

Athens, 2013

advisorsIoannis Ntzoufras

Irini Moustaki

ATHENS UNIVERSITY OF ECONOMICS AND BUSINESSDEPARTMENT OF STATISTICS

2

Fully Bayesian latent trait models with binary responses

Chapter 2

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Chapter 3

Latent variable models: classical and Bayesian approaches

Chapter 1

Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Chapter 4

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Chapter 5

Implementation in simulated and real life datasets

Chapter 6

Discussion and future researchChapter 7

Thesis structure

Overview

3

• Suppose we want to infer for concepts that cannot be measured directly (such as emotions, attitudes, perceptions, proficiency etc).

• We assume that they can be measured indirectly through other observed items.

• The key idea is that all dependencies among p-manifest variables (observed items) are attributed to k-latent (unobserved) ones.

• By principle, k << p. Hence, at the same time, the LVM methodology is a multivariate analysis technique which aims to reduce the dimensionality, with as little loss of information as possible.

“...co-relation must be the consequence of the variations of the two organs being partly due to

common causes ...“ Francis Galton, 1888.

Key ideas and origins of the latent variable models (LVM).

Chapter 1 Latent variable models: Classical and Bayesian approaches.

4

A unified approach: Generalised linear latent variable models (GLLVM).

Chapter 1 Latent variable models: Classical and Bayesian approaches

Generalized linear latent variable model (GLLVM; Bartholomew &Knott, 1999; Skrondal and Rabe-Hesketh, 2004) . The models assumes that the response variables are linear combinations of the latent ones and it consists of three components:

(a) the multivariate random component: where each observed item Yj, (j = 1, ..., p) has a distribution from the exponential family (Bernoulli, Multinomial, Normal, Gamma),

(b) the systematic component: where the latent variables Zℓ, ℓ = 1, ..., k, produce the linear predictor ηj for each Yj

(c) the link function : which connects the previous two components

5


Chapter 1 Latent variable models: classical and Bayesian approaches

Special case: Generalized linear latent trait model- with binary items (Moustaki &Knott, 2000) .

The conditionals are in this case Bernoulli( ), where is the conditional probability of a positive response to the observed item. The logistic model is used for the response probabilities:

• The item parameters are often referred to as the difficulty and the discrimination parameters (respectively) of the item j.

All examples considered in this thesis refer to multivariate IRT (2-PL) models. Current findings apply directly or can be expanded to any type

of GLLVM.

6


Chapter 1 Latent variable models: classical and Bayesian approaches

As only the p-items can be observed, any inference must be based on their joint distribution.

All data dependencies are attributed to the existence of the latent variables. Hence, the observed variables are assumed independent given the latent (local independence assumption) :

where is the prior distribution for the latent variables. A fully Bayesian approach requires that the item parameter vector is also stochastic, associated with a prior probability.

7

The fully Bayesian analogue: GLLTM with binary items

Chapter 2 Fully Bayesian latent trait models with binary responses

A) PriorsAll model parameters are assumed a-priori independent

where

For unique solution we use the Cholesky decomposition on B:

leading to

Prior from Ntzoufras et al. (2000) Fouskakis et al. (2009)

8

The fully Bayesian analogue: GLLTM with binary items


B) Sampling from the posterior

C) Model evaluation

• A Metropolis-within-Gibbs algorithm initially presented for IRT models by Patz and Junker (1996) was used here for the multivariate case (k>1).

• In this thesis, the Bayes Factor (BF; Jeffreys, 1961; Kass and Raftery, 1995) was used for model comparison.

• The BF is defined as the ratio of the posterior odds of two competing models (say m1 and m2) multiplied by their corresponding prior odds. Provided that the models have equal prior probabilities, is given by:

that is, the ratio of the two models’ marginal or integrated likelihoods (hereafter Bayesian marginal likelihood; BML).

• Each item is updated in one block. So are the latent variables for each person.

9

Estimating the Bayesian marginal likelihood


The BML (also known as the prior predictive distribution) is defined as the expected model likelihood over the model parameters’ prior:

that quite often is a high dimensional integral, not available in closed form. Monte Carlo integration is often used to estimate it, as for instance the arithmetic mean:

This simple estimator does not really work adequately and a plethora of Markov chains Monte Carlo (MCMC) techniques are employed instead in the literature.

10

The point based estimators (PBE) employ the candidates’ identity (Besag, 1989), in a point of high density:• Laplace-Metropolis (LM; Lewis & Raftery, 1997)• Gaussian copula (GC; Nott et al, 2008)• Chib & Jeliazkov (CJ; Chib & Jeliazkov, 2001)

Estimating the Bayesian marginal likelihood


The bridge sampling estimators (BSE), employ a bridge function , based on the form of which, several BML identities can be derived (even pre–existing):

• Power posteriors (PPT; Friel & Pettitt, 2008; Lartillot &Philippe, 2006)• Steppingstone (PPS ; Xie at al, 2011)• Generalised steppingstone (IPS; Fan et al, 2011)

The path sampling estimators (PSE), employ a continuous and differential path , to link two un-normalised densities and compute the ratio of the corresponding constants:

• Harmonic mean (HM; Newton & Raftery, 1994)• Reciprocal mean (RM; Gelfand & Dey, 1994)

• Bridge harmonic (BH; Meng & Wong, 1996)• Bridge geometric (BG; Meng & Wong, 1996)

11

From the early readings the methods applied for the parameter estimation of model settings with latent variables relied on the

Monte Carlo integration: the case of GLLVM

Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

or the

joint likelihood Lord and Novick, 1968; Lord,1980

marginal likelihood Bock and Aitkin, 1981; Moustaki and Knott, 2000Under the conditional independence assumptions of the GLLVMs, there are two

equivalent formulations of the BML, which lead to different MC estimators, namely thejoint BML

and the

marginal BML

12



A motivating exampleA simulated data set with p = 6 items, N = 600 cases and k = 2 factors was considered. Three popular BSE were computed under both approaches (R= 50,000 posterior observations , after burn in period of 10,000 and thinning interval of 10).

• BH: Largest error difference but rather close estimation...

• BG: Largest difference in the estimation without large error difference...

Differences are due to Monte Carlo integration, under

independence assumptions

13



The joint version of BH comes with much higher MCE than the RM...

...but is the joint version of RM that fails to converge to the true value.

?

14

Monte Carlo integration under independence


• Consider any integral of the form: • The corresponding MC estimator is:

assuming a random sample of points drawn from h

• The corresponding Monte Carlo Error (MCE) is:

• Assume independence, that is, hence

15



The two estimators are associated with different MCEs. Based on the early results of Goodman (1962), for the variance of N independent variables, the variances of the estimators are:

for each term

In finite settings, the difference can be outstanding

16



In particular, the difference in the variances is given by

Naturally, it depends on R. Note however that also it depends on

• dimensionality (N), since more positive terms are added, and• on the means and variances of the N variables involved

At the same time, the difference in the means is given by

• Total covariation index (multivariate extension of the covariance).

• At the sample, the covariances, no matter how small, are non-zero leading to non zero TCI.• Under independence the index should be zero (the reverse statement does not hold)

•Depends also on the number of the variables (N), their means, and their variation through the covariances

17



A motivating example-Revisited

Total covariance cancels out for the BH.

Different variables are

being averaged, leading to different variance

components

18

Monte Carlo integration & independence


Refer to Chapter 3 of the current thesis for:• more results on the error difference,

• properties of the TCI,

• extension to conditional independence,• and more illustrative examples.

19

Basic idea

Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Based on the work of Chib & Jeliazkov(2001), it is shown in Chapter 3 that the Metropolis kernel can be used to marginalise out any subset of the parameter vector, that otherwise would not be feasible.

Acceptance probability

Proposaldensity

Transitionprobability

• Consider the kernel of the Metropolis – Hastings algorithm, which denotes the transition probability of sampling , given that has been already generated:

• Then, the latent vector can be marginalised out directly from the Metropolis kernel as follows:

20

Chib & Jeliazkov estimator


Let us suppose that the parameter space is divided into p blocks of parameters. Then, using the Law of total probability, the posterior at a specific point can be decomposed to

• If analytically available use candidates’ (Besag, 1989) formula to compute the BML directly.

• If the full conditionals are known, Chib (1995) uses the output from the Gibbs sampler to estimate them.

• Otherwise Chib and Jeliazkov (2001) show that each posterior ordinate can be computed by

Requires p sequential

MCMC runs.

21

Chib & Jeliazkov estimator for models with latent vectors


The number of latent variables can be hundreds if not thousands. Hence the method is time consuming. Chib & Jeliazkov suggest to use the last ordinate to marginalise out the latent vector, provided that is analytically tractable (often it is not).

Hence the dimension of the latent vector is not an issue.

In Chapter 4 of the thesis, it is shown that the latent vector can be marginalised out directly from the MH kernel, as follows:

This observation however leads to another result. Assuming local independence, prior independence and a Metropolis - within – Gibbs algorithm, as in the case of the GLLVM, the Chib & Jeliazkov identity is drastically simplified as follows:

Hence the number of blocks , also, is

not an issue.

• The latent vector is marginalised out as previously. • Moreover, even there are p-blocks for the model parameters, only the full MCMC is required.• Can be used under data augmentations schemes that produce independence

22

Independence Chib & Jeliazkov estimator


Three simulated data sets – under different scenarios. Compare CJI with ML estimators.

30 batches

1000 iterations per batch



1st batchRtotal

23

Some results

Chapter 6 Implementation in simulated and real life datasets

kmodel = ktrue

•p =6 items, •N=600 individuals, •k=1 factor

24

Some results


kmodel = ktrue

•p =6 items, •N=600 individuals, •k=2 factors

25

Some results


kmodel = ktrue


26

Some results


kmodel <ktrue


27

Some results


kmodel >ktrue

•p =6 items, •N=600 individuals, •k=2 factors

28

Concluding comments


More comparisons are presented in Chapter 6 of the thesis, in simulated and real data sets. Some comments:

o The BG estimator was consistently associated with the smallest error. o The RM was also well behaved in all cases.

o The BH was associated with more error that the former two BSE.

• The harmonic mean failed in all cases.

• The PBE are well behaved: o LM is very quick and efficient – but might fail if the posterior is not symmetrical.o Similarly for the GC.o CJI is well behaved but time consuming. Since it is distributional

free, can be used as a benchmark method to get an idea of the BML.

• The BSE were successful in all examples.

Refer to Chapter 4 of the current thesis for more details on the implementation of the CJI (or see Vitoratou et al, 2013) :

29

Ideas initially implemented in thermodynamics are currently explored in Bayesian model evaluation.

geometric path whichlinks the endpoint densities

Boltzmann-Gibbs distribution

Partition function

Thermodynamics and Bayes

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

temperature parameter

Then the ratio λ can be computed via the thermodynamic integration identity (TI):

Bayes free energy

Assume two unnormalised densities (q1 and q0) and we are interested in the ratio of their normalising constants (λ). For that purpose we use a continuous and differential function of the form

30

The first application of the TI to the problem of estimating the BML is the power posteriors (PP) method (Friel and Pettitt, 2008; Lartillot and Philippe, 2006). Let

thenprior-posterior path

Thermodynamics and BML: Power Posteriors


power posterior

leading via the thermodynamic integration to the Bayesian marginal likelihood

For ts close to 0 we sample from densities close to the prior, where the variability is typically high.

31

Lefebvre et al. (2010) considered other options than the prior for the zero endpoint, keeping the unnormalised posterior at the unit endpoint. Any proper density g() will do:

An appealing option is to use an importance (envelope) function, that is a density as close as possible to the posterior).

importance-posterior path

Thermodynamics and BML: Importance Posteriors


importance posterior

For ts close to 0 we sample from densities close to the importance function, solving the problem of high variability.

32

Xie et al (2011) using the prior and the posterior as endpoint densities, considered a different approach to compute the BMI, also related to thermodynamics (Neal, 1993). First, the interval [0,1] is partitioned into n points and the free energy can be computed as:

An alternative approach: stepping-stone identities


• Under the power posteriors path, Xie et al (2011) showed that the BML occurs as:

• Under the importance posteriors path, Fan et al (2011) showed that the BML occurs as:

However, the stepping–stone identity (SI) is even more general and can be used under different paths, as an alternative to the TI:

Stepping stone

33

Path sampling identities for the BML- revisited


Hence, there are two general identities to compute a ratio of normalising constants, within the path sampling framework, namely

Different paths lead to different expressions for the BML:

Identity for the BMLTI SI

path

Priorposterior

Power posteriors (PPT)Friel and Pettitt, 2008

Lartillot and Philippe, 2006

Stepping-stone (PPS)Xie et al (2011)

Importance

posterior

Importance posteriors (IPT)

inspired by Lefebvre et al. (2010)

Generalised stepping stone (IPS)Fan et al (2011)

Other paths can be used, under both approaches, to derive identities for the BML or any other ratio of normalising constants.

Hereafter, the identities with be named by the path employed, with a subscript denoting the method implemented, e.g. IPS

34

Lartillot and Philippe (2006) considered as endpoint densities the unormalised posteriors of two competing models:

leading to the model switching path

Thermodynamics & direct BF identities: Model switching


leading via the thermodynamic integration to the Bayes Factor

While it is easy to derive the SI counterpart expression:

bidirectional melting-annealingsampling scheme.

35

Based on the idea of Lartillot and Philippe (2006) we may proceed with the compound paths. which consist of

Which can be used either with the TI or the SI approach. If the ratio of interest is the BF, the two BMLs should be derived at the endpoints of [0,1]. The PP and the IP paths are natural choices for the nested part of the identity. For the latter

• a hyper, geometric path

which links two competing models, andfor each endpoint function Qi , i=0,1.

• a nested, geometric path

The two intersecting paths form a quadrivial

Thermodynamics & direct BF identities: Quadrivials


36

Sources of error in path sampling estimators


a) The integral over [0,1] in the TI is typically approximated via numerical approaches, such as the trapezoidal or Simpson’s rule (Neal, 1993; Gelman and Meng, 1998), which require an n-point discretisation of [0,1]:

Note that the temperature schedule is also required for the SI method (it defines the stepping stone ratios) . The discretisation introduces error to the TI and SI estimators, that is referred to as the discretisation error.It can be reduced by a) increasing the number of points n and/or b) by assigning more points closer to the endpoint that is associated higher variability.

c) As a third source of error can be considered also the path-related error.

We may gain insight into a) and c) by considering the measures of entropy related to the TI.

b) At each point , a separate MCMC run is performed with target distribution the corresponding . Hence, Monte Carlo error occurs also at each run.

37

Performance: Pine data-a simple regression example


Measurements taken on 42 specimens. A linear regression model was fitted for the specimen’s maximum compressive strength (y), using their density (x) as independent variable:

The objective in this example is to illustrate how each method and path combination responds to prior uncertainty. To do so, we use three different prior schemes, namely:

The ratios of the corresponding BMLs under the three priors were estimated over n1 = 50 and n2 = 100 evenly spaced temperatures. At each temperature, a Gibbs algorithm was implemented and 30,000 posterior observations were generated; after discarding 5,000 as a burn-in period.

38

Performance: Pine data-a simple regression example


Reflects difference

in thediscretisatio

n error

Reflects difference

in thepath-related

error

All quadrivals come with

smaller batch mean error

Note: PP works just fine under a geometric temperature schedule that samples more points from the prior.

Implementing a uniform temperature schedule:

39

Based on the prior-posterior path, Friel and Pettitt (2008) and Lefebvre et al. (2010) showed that the PP method is connected with the Kullback – Leibler diveregence (KL; Kullback & Leibler, 1951).

Thermodynamic integration & distribution divergencies


Here we present their findings on a general form, that is, for any geometric path according to the TI

Relative entropy

Differential entropy Cross entropy

it holds that

symmetrised KL

40



Graphical representation of the TI

What about the intermedi

ate points?

41



TI minus free energy at each point

Instead of integrating the mean energy over the entire interval [0,1], there is an optimal temperature, where the mean energy equals the free energy.

42



Graphical representation of the NTI

functional KL

difference in the KL-

distance of the sampling distribution pt from p1

and p0

The ratio of interest

occurs at the point

where the sampling

distribution is equidistant

from the endpoint densities

43



The normalised thermodynamic integral

•According to the PPT method, the BML occurs at the point where the sampling distribution is equidistant from the prior and the posterior.

Hence:

•According to the QMST method, the BF occurs at the point where the

sampling distribution is equidistant from the two posteriors.

•according to the NTI, when geometric paths are employed, the free energy occurs at the point where the Boltzmann-Gibbs distribution is equidistant from the distributions at the endpoint states.

The sampling distribution pt is the Boltzmann-Gibbs distribution pertaining to the Hamiltonian (energy function) . Therefore

44



Graphical representation of the NTI

What are the areas stand for?

45



The normalised thermodynamic integral and probability distribution divergenciesA key observation here is that the sampling distribution embodies the Chernoff coefficient (Chernoff, 1952) :

Based on that, the NTI can be written as:

meaning that

and therefore, the areas correspond to the Chernoff t-divergence. At t=t*, we obtain the so-called Chernoff information:

46



Using the output from path sampling, the Chernoff divergence can be computed easily (see Chapter 5 of the thesis for a step-by step algorithm). Along with the Chernoff estimation, a number of other f-divergencies can be directly estimated, namely

• the Bhattacharyya distance (Bhattacharyya, 1943) at t = 0.5, • the Hellinger distance (Bhattacharyya, 1943; Hellinger, 1909), • the Rényi t-divergence (Rényi, 1961) and • the Tsallis t-relative entropy (Tsallis, 2001) .These measures of entropy are commonly used in• information theory, pattern recognition, cryptography, machine learning,• hypothesis testing • and recently, in non-equilibrium thermodynamics.

47



Measures of entropy and the NTI

48

Path selection, temperature schedule and error.


These results provide insight also on the error of the path sampling estimators. To begin with Lefebre et al (2010) have showed that the total variance is associated with the J−divergence of the endpoint densities and therefore with the choice of the path. Graphically • the J-distance

coincides with the slope of the secant defined at the endpoint densities.

• the slope of the tangent at a particular point ti, coincides with the local variance

• the graphical representation of two competing paths provides information about the estimators’ variances.

The shape of the curve is a

graphical representation of the total

variance.

Higher local variances, at

the points where the curve is steeper.

Paths with smaller cliffs are easier to

take!

49

Path selection, temperature schedule and error.


Numerical approximation of the TI:

Different level of

accuracy towards the two

endpoints

The discretization error depends primarily on

the path

Assign more tis at points where the curve is steeper (higher local variances)

50

Future work

Currently developing a library in R for BML estimation in GLLTM with Danny Arends.Expand results (and R library) to account for other type of data.

Further study on the TCI (Chapter 3).

Use the ideas in Chapter 4 to construct a better Metropolis algorithm for GLLVMs.Proceed further on the ideas presented in Chapter 5, with regard to the quadrivials, the temperature schedule and the optimal t*. Explore applications to information criteria.

51

Bibliography Bartholomew, D. and Knott, M. (1999). Latent variable models and factor analysis. Kendall’s Library of Statistics, 7. Wiley.

Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society, 35:99–109.Besag, J. (1989). A candidate’s formula: A curious result in Bayesian prediction. Biometrika, 76:183.Bock, R. and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46:443–459.Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4).Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90:1313–1321.Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96:270–281.Fan, Y., Wu, R., Chen, M., Kuo, L., and Lewis, P. (2011). Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution, 28(2):523–532.Fouskakis, D., Ntzoufras, I., and Draper, D. (2009). Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of healthcare. Annals of Applied Statistics, 3:663–690.Friel, N. and Pettitt, N. (2008). Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society Series B (Statistical Methodology), 70(3):589–607.Gelfand, A. E. and Dey, D. K. (1994). Bayesian Model Choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological), 56(3):501–514.Gelman, A. and Meng, X. (1998). Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science, 13(2):163–185.Goodman, L. A. (1962). The variance of the product of K random variables. Journal of the American Statistical Association, 57:54–60.Hellinger, E. (1909). Neue Begr¨undung der Theorie quadratischer Formen von unendlichvielen Veranderlichen. Journal fddotur die reine und angewandte Mathematik, 136:210–271.Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186(1007):453–461.Kass, R. and Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:773–795.Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22:49–86.Lewis, S. and Raftery, A. (1997). Estimating Bayes factors via posterior simulation with the Laplace Metropolis estimator. Journal of the American Statistical Association, 92:648–655.Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using Thermodynamic Integration. Systematic Biology, 55:195–207.Lefebvre, G., Steele, R., and Vandal, A. C. (2010). A path sampling identity for computing the Kullback-Leibler and J divergences. Computational Statistics and Data Analysis, 54(7):1719–1731.Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems.Erlbaum Associates, Hillsdale, NJ.Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley, Oxford, UK

52

Meng, X.-L. and Wong, W.-H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6:831–860.Moustaki, I. and Knott, M. (2000). Generalized Latent Trait Models. Psychometrika, 65:391–411.Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods.Technical Report CRG-TR-93-1, University of Toronto.Newton, M. and Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, 56:3–48.Nott, D., Kohn, R., and Fielding, M. (2008). Approximating the marginal likelihood using copula. arXiv:0810.5474v1. Available at http://arxiv.org/abs/0810.5474v1Ntzoufras, I., Dellaportas, P., and Forster, J. (2000). Bayesian variable and link determination for Generalised Linear Models. Journal of Statistical Planning and Inference,111(1-2):165–180.Patz, R. J. and Junker, B. W. (1999b). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2):146–178.Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128:301–323.Raftery, A. and Banleld, J. (1991). Stopping the Gibbs sampler, the use of morphology, and other issues in spatial statistics. Annals of the Institute of Statistical Mathematics, 43(430):32–43.Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Paedagogiske Institut, Copenhagen.Renyi, A. (1961). On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, pages 547–561.Tsallis et al., Nonextensive Statistical Mechanics and Its Applications, edited by S.Abe and Y. Okamoto (Springer-Verlag, Heidelberg, 2001); see also the comprehensive list of references at http://tsallis.cat.cbpf.br/biblio.htm.Vitoratou, S., Ntzoufras, I., and Moustaki, I. (2013). Marginal likelihood estimation from the Metropolis output: tips and tricks for efficient implementation in generalized linear latent variable models. To appear in: Journal of Statistical Computation and Simulation.Xie, W., Lewis, P., Fan, Y., Kuo, L., and Chen, M. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology, 60(2):150–160.

This thesis is dedicated to

http://tsallis.cat.cbpf.br/biblio.htm

Viva extented final

Technology

Transcript of Viva extented final