Fast Bayesian Inference in Dirichlet Process Mixture...

Post on 27-Jun-2020

6 views 0 download

Transcript of Fast Bayesian Inference in Dirichlet Process Mixture...

Fast Bayesian Inference in Dirichlet Process

Mixture Models

L. Wang and D. DunsonJournal of Computational and Graphical Statistics, 2011

Presented by Esther SalazarDuke University

November 18, 2011

E. Salazar (Reading group) November 18, 2011 1 / 19

Summary

The authors propose a fast approach for inference in Dirichlet processmixture (DPM) models

They focus on extremely fast alternatives to MCMC which allowaccurate approximate Bayes inferences and produce marginallikelihood estimates to be used in model comparison

The proposed algorithm is called: sequential updating and greedysearch (SUGS) algorithm

E. Salazar (Reading group) November 18, 2011 2 / 19

Dirichlet process mixture (DPM) models

Consider a DPM of normals (Lo 1984):

Sequential application of the DP prediction rule for subjects 1, . . . , ncreates a random partition of the integers {1, . . . , n}.

E. Salazar (Reading group) November 18, 2011 3 / 19

Approaches for posterior inference

For DPMs, there is a rich literature on MCMC algorithms

I marginal Gibbs sampling (MacEachern 1994)I conditional Gibbs sampling ((Ishwaran and James 2001)I split-merge (Jain and Neal 2004)

The previous approaches are useful in small to moderate sized datasets (severalhours or days for computation)

For DPMs, alternatives to MCMC are:

I predictive recursion (PR) (Newton and Zhang 1999; ...)I weighted Chinese restaurant (WCR) sampling (Lo, Brunner, and Chan

1996;...)I sequential importance sampling (SIS) (MacEachern, Clyde, and Liu 1999; ...)I variational Bayes (VB) (Blei and Jordan 2006; ...)

Disadvantages of the previous approaches:

I WCR and SIS are computationally intensive (large number of particles)I PR involves approximating a normalizing constantI VB tends to underestimate uncertainty in mixture models and is

sensitive to the starting values

E. Salazar (Reading group) November 18, 2011 4 / 19

Approaches for posterior inference

For DPMs, there is a rich literature on MCMC algorithms

I marginal Gibbs sampling (MacEachern 1994)I conditional Gibbs sampling ((Ishwaran and James 2001)I split-merge (Jain and Neal 2004)

The previous approaches are useful in small to moderate sized datasets (severalhours or days for computation)

For DPMs, alternatives to MCMC are:

I predictive recursion (PR) (Newton and Zhang 1999; ...)I weighted Chinese restaurant (WCR) sampling (Lo, Brunner, and Chan

1996;...)I sequential importance sampling (SIS) (MacEachern, Clyde, and Liu 1999; ...)I variational Bayes (VB) (Blei and Jordan 2006; ...)

Disadvantages of the previous approaches:

I WCR and SIS are computationally intensive (large number of particles)I PR involves approximating a normalizing constantI VB tends to underestimate uncertainty in mixture models and is

sensitive to the starting values

E. Salazar (Reading group) November 18, 2011 4 / 19

Proposal: general idea

They proposed an alternative sequential updating greedy search(SUGS) algorithm

The idea is factorize the DP prior as a product of:(i) a prior on the partition of subjects into clusters and(ii) independent priors on the parameters within each cluster

E. Salazar (Reading group) November 18, 2011 5 / 19

Product partition models (PPMs)

PPMs assume that items in different partition components are independent. Thelikelihood for a partition π = {S1, . . . , Sq} with observations y = (y1, . . . , yn) is aproduct over components

p(y|π) =q∏

j=1

f(ySj )

π is the only parameter under consideration (integrated out other parameters). Theprior distribution of the partition π is a product over the partition components

p(π) ∝q∏

j=1

h(Sj)

E. Salazar (Reading group) November 18, 2011 6 / 19

Product partition models (PPMs)

PPMs assume that items in different partition components are independent. Thelikelihood for a partition π = {S1, . . . , Sq} with observations y = (y1, . . . , yn) is aproduct over components

p(y|π) =q∏

j=1

f(ySj )

π is the only parameter under consideration (integrated out other parameters). Theprior distribution of the partition π is a product over the partition components

p(π) ∝q∏

j=1

h(Sj)

E. Salazar (Reading group) November 18, 2011 6 / 19

Dirichlet process mixtures and partition models

Assume there is an infinite sequence of clusters with θh the parameter for clusterh, h = 1, . . . ,∞Let γi be a cluster index for subject i with γi = h, then

Priors for the parameters within each of the clusters:

Sequential updating and greedy search: proposed algorithm

Conditional posterior probability of allocating subject i to cluster h

The joint posterior distribution for the cluster-specific coefficients θ = {θh}∞h=1

given the data and cluster allocation for all n subjects

E. Salazar (Reading group) November 18, 2011 8 / 19

SUGS algorithm

The algorithm cycles through subjects, i = 1, . . . , n, sequentially allocating themto the cluster that maximizes the conditional posterior allocation probability

This algorithm only requires a single cycle of deterministic calculations and canbe implemented within a few seconds. Also, the algorithm is online so additionalsubjects can be added

E. Salazar (Reading group) November 18, 2011 9 / 19

Estimation of the DP precision parameter αTo allow unknown α, we choose the prior

E. Salazar (Reading group) November 18, 2011 10 / 19

E. Salazar (Reading group) November 18, 2011 11 / 19

DP mixtures of normals and SUGS detailsThe authors focus on normal mixture models, letting θh = (µh, τh)

′ represent the meanand precision parameter for cluster h, h = 1, . . . ,∞. To specify p0 they chooseconjugate normal inverse-gamma priors

Updating

Simulation study

Two models were considered(1) mixture of three normals:

(2) a single normal with mean 0 and variance 0.4

In each case, they considered 100 simulated datasets with sample sizen = 500

E. Salazar (Reading group) November 18, 2011 13 / 19

Simulation study: results

E. Salazar (Reading group) November 18, 2011 14 / 19

Simulation study: results

E. Salazar (Reading group) November 18, 2011 15 / 19

Simulation study: results

Comparison with four other fast nonparametric DPM algorithms

E. Salazar (Reading group) November 18, 2011 16 / 19

Application

Data: Gestational age at delivery (GAD) from the Collaborative Perinatal Project(epidemiologic study conducted in the 1960s and 1970s)

The study is focused on 34,178 pregnancies which provide a large sample sizeexample

Aim: We are interested in the relationship of GAD and the covariates: race, sex,maternal smoking status during pregnancy, and maternal age (denoted byX1, X2, X3 and X4)

Model:

Application: results

20 permutations were run to eliminate the ordering effect

Computational speed: approximately 2 minutes for every singlepermutation

E. Salazar (Reading group) November 18, 2011 18 / 19

Application: results

E. Salazar (Reading group) November 18, 2011 19 / 19