Fast Bayesian Inference in Dirichlet Process Mixture...

Fast Bayesian Inference in Dirichlet Process

Mixture Models

L. Wang and D. DunsonJournal of Computational and Graphical Statistics, 2011

Presented by Esther SalazarDuke University

November 18, 2011

E. Salazar (Reading group) November 18, 2011 1 / 19

Summary

The authors propose a fast approach for inference in Dirichlet processmixture (DPM) models

They focus on extremely fast alternatives to MCMC which allowaccurate approximate Bayes inferences and produce marginallikelihood estimates to be used in model comparison

The proposed algorithm is called: sequential updating and greedysearch (SUGS) algorithm

Dirichlet process mixture (DPM) models

Consider a DPM of normals (Lo 1984):

Sequential application of the DP prediction rule for subjects 1, . . . , ncreates a random partition of the integers {1, . . . , n}.

Approaches for posterior inference

For DPMs, there is a rich literature on MCMC algorithms

I marginal Gibbs sampling (MacEachern 1994)I conditional Gibbs sampling ((Ishwaran and James 2001)I split-merge (Jain and Neal 2004)

The previous approaches are useful in small to moderate sized datasets (severalhours or days for computation)

For DPMs, alternatives to MCMC are:

I predictive recursion (PR) (Newton and Zhang 1999; ...)I weighted Chinese restaurant (WCR) sampling (Lo, Brunner, and Chan

1996;...)I sequential importance sampling (SIS) (MacEachern, Clyde, and Liu 1999; ...)I variational Bayes (VB) (Blei and Jordan 2006; ...)

Disadvantages of the previous approaches:

I WCR and SIS are computationally intensive (large number of particles)I PR involves approximating a normalizing constantI VB tends to underestimate uncertainty in mixture models and is

sensitive to the starting values

Approaches for posterior inference

For DPMs, there is a rich literature on MCMC algorithms

I marginal Gibbs sampling (MacEachern 1994)I conditional Gibbs sampling ((Ishwaran and James 2001)I split-merge (Jain and Neal 2004)

The previous approaches are useful in small to moderate sized datasets (severalhours or days for computation)

For DPMs, alternatives to MCMC are:

I predictive recursion (PR) (Newton and Zhang 1999; ...)I weighted Chinese restaurant (WCR) sampling (Lo, Brunner, and Chan

1996;...)I sequential importance sampling (SIS) (MacEachern, Clyde, and Liu 1999; ...)I variational Bayes (VB) (Blei and Jordan 2006; ...)

Disadvantages of the previous approaches:

I WCR and SIS are computationally intensive (large number of particles)I PR involves approximating a normalizing constantI VB tends to underestimate uncertainty in mixture models and is

sensitive to the starting values

Proposal: general idea

They proposed an alternative sequential updating greedy search(SUGS) algorithm

The idea is factorize the DP prior as a product of:(i) a prior on the partition of subjects into clusters and(ii) independent priors on the parameters within each cluster

Product partition models (PPMs)

PPMs assume that items in different partition components are independent. Thelikelihood for a partition π = {S1, . . . , Sq} with observations y = (y1, . . . , yn) is aproduct over components

p(y|π) =q∏

f(ySj )

π is the only parameter under consideration (integrated out other parameters). Theprior distribution of the partition π is a product over the partition components

p(π) ∝q∏

Product partition models (PPMs)

PPMs assume that items in different partition components are independent. Thelikelihood for a partition π = {S1, . . . , Sq} with observations y = (y1, . . . , yn) is aproduct over components

p(y|π) =q∏

f(ySj )

π is the only parameter under consideration (integrated out other parameters). Theprior distribution of the partition π is a product over the partition components

p(π) ∝q∏

Dirichlet process mixtures and partition models

Assume there is an infinite sequence of clusters with θh the parameter for clusterh, h = 1, . . . ,∞Let γi be a cluster index for subject i with γi = h, then

Priors for the parameters within each of the clusters:

Sequential updating and greedy search: proposed algorithm

Conditional posterior probability of allocating subject i to cluster h

The joint posterior distribution for the cluster-specific coefficients θ = {θh}∞h=1

given the data and cluster allocation for all n subjects

SUGS algorithm

The algorithm cycles through subjects, i = 1, . . . , n, sequentially allocating themto the cluster that maximizes the conditional posterior allocation probability

This algorithm only requires a single cycle of deterministic calculations and canbe implemented within a few seconds. Also, the algorithm is online so additionalsubjects can be added

Estimation of the DP precision parameter αTo allow unknown α, we choose the prior

DP mixtures of normals and SUGS detailsThe authors focus on normal mixture models, letting θh = (µh, τh)

′ represent the meanand precision parameter for cluster h, h = 1, . . . ,∞. To specify p0 they chooseconjugate normal inverse-gamma priors

Updating

Simulation study

Two models were considered(1) mixture of three normals:

(2) a single normal with mean 0 and variance 0.4

In each case, they considered 100 simulated datasets with sample sizen = 500

Simulation study: results

Comparison with four other fast nonparametric DPM algorithms

Application

Data: Gestational age at delivery (GAD) from the Collaborative Perinatal Project(epidemiologic study conducted in the 1960s and 1970s)

The study is focused on 34,178 pregnancies which provide a large sample sizeexample

Aim: We are interested in the relationship of GAD and the covariates: race, sex,maternal smoking status during pregnancy, and maternal age (denoted byX1, X2, X3 and X4)

Model:

Application: results

20 permutations were run to eliminate the ordering effect

Computational speed: approximately 2 minutes for every singlepermutation

Application: results

Fast Bayesian Inference in Dirichlet Process Mixture...

Documents

Transcript of Fast Bayesian Inference in Dirichlet Process Mixture...

Bayesian clustering of replicated time-course gene ..._Publications_files...Bayesian clustering, mixture model, random effects, Dirichlet process, Chinese restaurant process, Markov-chain

A Dirichlet Mixture Model of Hawkes Processes for Event ...papers.nips.cc/paper/6734-a-dirichlet-mixture-model-of-hawkes... · esting and important problem in statistical machine

A Nonparametric Bayesian Approach to Causal Modelling...The Dirichlet process mixture regression (DPMR) method is a technique to produce a very exible regression model using Bayesian

BAYESIAN CLASSIFICATION USING GAUSSIAN MIXTURE … · BAYESIAN CLASSIFICATION USING GAUSSIAN MIXTURE MODEL AND ... Bayesian Classiﬁcation Using Gaussian Mixture Model and EM Estimation:

1 Bayesian Inference for Linear Dynamic Models with Dirichlet …caron/Publications/IEEETSP2007.pdf · 1 Bayesian Inference for Linear Dynamic Models with Dirichlet Process Mixtures

Dirichlet Process 3 - UMD · 2013. 12. 3. · A Dirichlet mixture consists of 1Dirichlet components, associated respectively with positive “mixture parameters” 2˜,2 ,…,24 that

Bayesian Inference for Dirichlet-Multinomialsusers.cecs.anu.edu.au/~ssanner/MLSS2010/Johnson1.pdf · Outline Introduction to Bayesian Inference Mixture models Sampling with Markov

Nonparametric and Variable-Dimension Bayesian Mixture ...jwmi.github.io › publications › Dissertation.pdf · Nonparametric and Variable-Dimension Bayesian Mixture Models: Analysis,

Applied Bayesian Nonparametric Mixture Modeling Session 4 ...thanos/notes-4.pdf · Dependent Dirichlet processesANOVA DDPsHierarchical DPsNested DPsSpatial DPsDDP application Outline

Bayesian Nonparametric Mixture Modelling: Methods and ...milovan/bayes.09/NUIG.NonParamBayes.… · Notes 1: Dirichlet process priors (de nitions, properties, and applications); Other

On theOn the Dirichlet Mixture ModelMixture Model …On theOn the Dirichlet Mixture ModelMixture Model for Mining Protein Sequence Data XugangXugang Ye Ye1, Stephen Altschul1 1National

Nonparametric Bayesian Methods (Dirichlet Process Mixtures)ml.cs.tsinghua.edu.cn/~jun/courses/statml-fall2015/10-DP_mixtures.… · Bayesian Mixture Models –MCMC inference Introduce

Nonparametric Bayesian Models and Dirichlet Process

In nite SVM: a Dirichlet Process Mixture of Large-margin ... · In nite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines measure G distributed according to a Dirichlet

Dirichlet Process Gaussian Mixture Models: Choice of the Base

Applied Bayesian Nonparametric Mixture Modeling Session 2 ...thanos/notes-2.pdf · Session 2 { Dirichlet process mixture models Athanasios Kottas (thanos@ams.ucsc.edu) Abel Rodriguez

Bayesian Inference for Dirichlet-Multinomials and ...web.science.mq.edu.au/~mjohnson/papers/Johnson11MLSS-talk-ext… · Bayesian Inference for Dirichlet-Multinomials and Dirichlet

Bayesian Nonparametrics: Models Based on the Dirichlet Processapanella/slides/nonparametric_bayes.pdf · Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro

Markov Chain Sampling Methods for Dirichlet Process Mixture ...

Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes