Fast Bayesian Inference in Dirichlet Process Mixture...

21
Fast Bayesian Inference in Dirichlet Process Mixture Models L. Wang and D. Dunson Journal of Computational and Graphical Statistics, 2011 Presented by Esther Salazar Duke University November 18, 2011 E. Salazar (Reading group) November 18, 2011 1 / 19

Transcript of Fast Bayesian Inference in Dirichlet Process Mixture...

Page 1: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Fast Bayesian Inference in Dirichlet Process

Mixture Models

L. Wang and D. DunsonJournal of Computational and Graphical Statistics, 2011

Presented by Esther SalazarDuke University

November 18, 2011

E. Salazar (Reading group) November 18, 2011 1 / 19

Page 2: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Summary

The authors propose a fast approach for inference in Dirichlet processmixture (DPM) models

They focus on extremely fast alternatives to MCMC which allowaccurate approximate Bayes inferences and produce marginallikelihood estimates to be used in model comparison

The proposed algorithm is called: sequential updating and greedysearch (SUGS) algorithm

E. Salazar (Reading group) November 18, 2011 2 / 19

Page 3: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Dirichlet process mixture (DPM) models

Consider a DPM of normals (Lo 1984):

Sequential application of the DP prediction rule for subjects 1, . . . , ncreates a random partition of the integers {1, . . . , n}.

E. Salazar (Reading group) November 18, 2011 3 / 19

Page 4: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Approaches for posterior inference

For DPMs, there is a rich literature on MCMC algorithms

I marginal Gibbs sampling (MacEachern 1994)I conditional Gibbs sampling ((Ishwaran and James 2001)I split-merge (Jain and Neal 2004)

The previous approaches are useful in small to moderate sized datasets (severalhours or days for computation)

For DPMs, alternatives to MCMC are:

I predictive recursion (PR) (Newton and Zhang 1999; ...)I weighted Chinese restaurant (WCR) sampling (Lo, Brunner, and Chan

1996;...)I sequential importance sampling (SIS) (MacEachern, Clyde, and Liu 1999; ...)I variational Bayes (VB) (Blei and Jordan 2006; ...)

Disadvantages of the previous approaches:

I WCR and SIS are computationally intensive (large number of particles)I PR involves approximating a normalizing constantI VB tends to underestimate uncertainty in mixture models and is

sensitive to the starting values

E. Salazar (Reading group) November 18, 2011 4 / 19

Page 5: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Approaches for posterior inference

For DPMs, there is a rich literature on MCMC algorithms

I marginal Gibbs sampling (MacEachern 1994)I conditional Gibbs sampling ((Ishwaran and James 2001)I split-merge (Jain and Neal 2004)

The previous approaches are useful in small to moderate sized datasets (severalhours or days for computation)

For DPMs, alternatives to MCMC are:

I predictive recursion (PR) (Newton and Zhang 1999; ...)I weighted Chinese restaurant (WCR) sampling (Lo, Brunner, and Chan

1996;...)I sequential importance sampling (SIS) (MacEachern, Clyde, and Liu 1999; ...)I variational Bayes (VB) (Blei and Jordan 2006; ...)

Disadvantages of the previous approaches:

I WCR and SIS are computationally intensive (large number of particles)I PR involves approximating a normalizing constantI VB tends to underestimate uncertainty in mixture models and is

sensitive to the starting values

E. Salazar (Reading group) November 18, 2011 4 / 19

Page 6: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Proposal: general idea

They proposed an alternative sequential updating greedy search(SUGS) algorithm

The idea is factorize the DP prior as a product of:(i) a prior on the partition of subjects into clusters and(ii) independent priors on the parameters within each cluster

E. Salazar (Reading group) November 18, 2011 5 / 19

Page 7: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Product partition models (PPMs)

PPMs assume that items in different partition components are independent. Thelikelihood for a partition π = {S1, . . . , Sq} with observations y = (y1, . . . , yn) is aproduct over components

p(y|π) =q∏

j=1

f(ySj )

π is the only parameter under consideration (integrated out other parameters). Theprior distribution of the partition π is a product over the partition components

p(π) ∝q∏

j=1

h(Sj)

E. Salazar (Reading group) November 18, 2011 6 / 19

Page 8: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Product partition models (PPMs)

PPMs assume that items in different partition components are independent. Thelikelihood for a partition π = {S1, . . . , Sq} with observations y = (y1, . . . , yn) is aproduct over components

p(y|π) =q∏

j=1

f(ySj )

π is the only parameter under consideration (integrated out other parameters). Theprior distribution of the partition π is a product over the partition components

p(π) ∝q∏

j=1

h(Sj)

E. Salazar (Reading group) November 18, 2011 6 / 19

Page 9: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Dirichlet process mixtures and partition models

Assume there is an infinite sequence of clusters with θh the parameter for clusterh, h = 1, . . . ,∞Let γi be a cluster index for subject i with γi = h, then

Priors for the parameters within each of the clusters:

Page 10: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Sequential updating and greedy search: proposed algorithm

Conditional posterior probability of allocating subject i to cluster h

The joint posterior distribution for the cluster-specific coefficients θ = {θh}∞h=1

given the data and cluster allocation for all n subjects

E. Salazar (Reading group) November 18, 2011 8 / 19

Page 11: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

SUGS algorithm

The algorithm cycles through subjects, i = 1, . . . , n, sequentially allocating themto the cluster that maximizes the conditional posterior allocation probability

This algorithm only requires a single cycle of deterministic calculations and canbe implemented within a few seconds. Also, the algorithm is online so additionalsubjects can be added

E. Salazar (Reading group) November 18, 2011 9 / 19

Page 12: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Estimation of the DP precision parameter αTo allow unknown α, we choose the prior

E. Salazar (Reading group) November 18, 2011 10 / 19

Page 13: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

E. Salazar (Reading group) November 18, 2011 11 / 19

Page 14: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

DP mixtures of normals and SUGS detailsThe authors focus on normal mixture models, letting θh = (µh, τh)

′ represent the meanand precision parameter for cluster h, h = 1, . . . ,∞. To specify p0 they chooseconjugate normal inverse-gamma priors

Updating

Page 15: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Simulation study

Two models were considered(1) mixture of three normals:

(2) a single normal with mean 0 and variance 0.4

In each case, they considered 100 simulated datasets with sample sizen = 500

E. Salazar (Reading group) November 18, 2011 13 / 19

Page 16: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Simulation study: results

E. Salazar (Reading group) November 18, 2011 14 / 19

Page 17: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Simulation study: results

E. Salazar (Reading group) November 18, 2011 15 / 19

Page 18: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Simulation study: results

Comparison with four other fast nonparametric DPM algorithms

E. Salazar (Reading group) November 18, 2011 16 / 19

Page 19: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Application

Data: Gestational age at delivery (GAD) from the Collaborative Perinatal Project(epidemiologic study conducted in the 1960s and 1970s)

The study is focused on 34,178 pregnancies which provide a large sample sizeexample

Aim: We are interested in the relationship of GAD and the covariates: race, sex,maternal smoking status during pregnancy, and maternal age (denoted byX1, X2, X3 and X4)

Model:

Page 20: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Application: results

20 permutations were run to eliminate the ordering effect

Computational speed: approximately 2 minutes for every singlepermutation

E. Salazar (Reading group) November 18, 2011 18 / 19

Page 21: Fast Bayesian Inference in Dirichlet Process Mixture Modelspeople.ee.duke.edu/~lcarin/Esther11.18.2011.pdf · 11/18/2011  · Fast Bayesian Inference in Dirichlet Process Mixture

Application: results

E. Salazar (Reading group) November 18, 2011 19 / 19