Dirichlet Distribution, Dirichlet Process and Dirichlet Process Mixture

Dirichlet Distribution, Dirichlet Process andDirichlet Process Mixture

Leon Gu

CSD, CMU

Binomial and Multinomial

Binomial distribution: the number of successes in a sequence ofindependent yes/no experiments (Bernoulli trials).

P (X = x | n, p) =

)px(1− p)n−x

Multinomial: suppose that each experiment results in one of k possibleoutcomes with probabilities p1, . . . , pk; Multinomial models thedistribution of the histogram vector which indicates how many time eachoutcome was observed over N trials of experiments.

P (x1, . . . , xk | n, p1, . . . , pk) =N !∏k

i=1 xi!pxi

i ,∑

xi = N,xi ≥ 0

Beta Distribution

p(p | α, β) =1

B(α, β)pα−1(1− p)β−1

I p ∈ [0, 1]: considering p as the parameter of a Binomial distribution,we can think of Beta is a “distribution over distributions”(binomials).

I Beta function simply defines binomial coefficient for continuousvariables. (likewise, Gamma function defines factorial in continuousdomain.)

B(α, β) =Γ(α + β)

Γ(α)Γ(β)'

(α− 1

α + β − 2

)I Beta is the conjugate prior of Binomial.

Dirichlet Distribution

p(P = {pi} | αi) =

∏i Γ(αi)

Γ(∑

i αi)

pαi−1i

i pi = 1, pi ≥ 0

I Two parameters: the scale (or concentration) σ =∑

i αi, and thebase measure (α′

1, . . . , α′k), α′

i = αi/σ.

I A generalization of Beta:

I Beta is a distribution over binomials (in an interval p ∈ [0, 1]);I Dirichlet is a distribution over Multinomials (in the so-called

simplex∑

i pi = 1; pi ≥ 0).

I Dirichlet is the conjugate prior of multinomial.

Mean and Variance

I The base measure determines the mean distribution;

I Altering the scale affects the variance.

E(pi) =αi

σ= α′

V ar(pi) =αi(σ − α)

σ2(σ + 1)=

α′i(1− α′

(σ + 1)(2)

Cov(pi, pj) =−αiαj

σ2(σ + 1)(3)

Another Example

I A Dirichlet with small concentration σ favors extreme distributions,but this prior belief is very weak and is easily overwritten by data.

I As σ →∞, the covariance → 0 and the samples → base measure.

I posterior is also a Dirichlet

p(P = {pi} | αi) =

∏i Γ(αi)

Γ(∑

i αi)

pαi−1i (4)

P (x1, . . . , xk | n, p1, . . . , pk) =n!∏k

i=1 xi!pxi

p({pi}|x1, . . . , xk) =

∏i Γ(αi + xi)

Γ(N +∑

i αi)

pαi+xi−1i (6)

I marginalizing over parameters (condition on hyper-parameters only)

p(x1, . . . , xk|α1, . . . , αk) =

∏i αxi

I prediction (conditional density of new data given previous data)

p(new result = j|x1, . . . , xk, alpha1, . . . , αk) =αj + xj

σ + N

Dirichlet ProcessSuppose that we are interested in a simple generative model (monogram)for English words. If asked “what is the next word in a newly-discoveredwork of Shakespeare?”, our model must surely assign non-zero probabilityfor words that Shakespeare never used before. Our model should alsosatisfy a consistency rule called exchangeability: the probability of findinga particular word at a given location in the stream of text should be thesame everywhere in thee stream.Dirichlet process is a model for a stream of symbols that 1) satisfies theexchangeability rule and that 2) allows the vocabulary of symbols to growwithout limit. Suppose that the mode has seen a stream of length Fsymbols. We identify each symbol by an unique integer w ∈ [0,∞) andFw is the counts if the symbol. Dirichlet process models

I the probability that the next symbol is symbol w is

F + α

I the probability that the next symbol is never seen before is

F + α

G ∼ DP (|̇G0, α)

I Dirichlet process generalizes Dirichlet distribution.

I G is a distribution function in a space of infinite but countablenumber of elements.

I G0: base measure; α: concentration

I pretty much the same as Dirichlet distributionI expectation and varianceI the posterior is also a Dirichlet process DP (|̇G0, α)I predictionI integration over G (data conditional on G0 and αonly)I

I equivalent constructionsI Polya Urn SchemeI Chinese Restaurant ProcessI Stick Breaking SchemeI Gamma Process Construction

Dirichlet Process Mixture

I How many clusters?

I Which is better?

Graphical Illustrations

Multinomial-Dirichlet Process

Dirichlet Process Mixture

Dirichlet Distribution, Dirichlet Process and Dirichlet Process Mixture

Documents

Transcript of Dirichlet Distribution, Dirichlet Process and Dirichlet Process Mixture

Dirichlet Process Mixture Models made Scalable and ...€¦ · A Dirichlet Process (DP) is a probability distribution over distribu-tions. In our use-case, a distribution over the

1 Negative Binomial Process Count and Mixture Modelingpeople.ee.duke.edu/~lcarin/Mingyuan_PAMI_6.pdfprocess can be marginalized as a NB process and normalized as a Dirichlet process,

Online Learning of a Dirichlet Process Mixture of Generalized

Hierarchical Double Dirichlet Process Mixture of Gaussian ...ppoupart/publications/mogp/TayalPoupartLi... · stationary and consequently the function is unable to adapt to varying

Dirichlet Process 4 - UMD

Bayesian Nonparametric Mixture Modelling: Methods and ...milovan/bayes.09/NUIG.NonParamBayes.… · Notes 1: Dirichlet process priors (de nitions, properties, and applications); Other

A Dirichlet Process Mixture Model for Spherical Datapeople.csail.mit.edu/jstraub/download/straub2015dptgmm.pdf · models to the spherical domain. Experiments on synthetic data as

Using Proﬁle Regression Mixture Models and Dirichlet ...ucakche/agdank/agdank2013presentations/... · Using Proﬁle Regression Mixture Models and Dirichlet Processes to explore

Dirichlet Distribution, Dirichlet Process and Dirichlet ...

Markov Chain Sampling Methods for Dirichlet Process Mixture ...

Perspective Hierarchical Dirichlet Process forPerspective ...

DIMM-SC: A Dirichlet mixture model for clustering …DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data Zhe Sun1, Ting Wang2, Ke Deng3,

The Nested Dirichlet Process

In nite SVM: a Dirichlet Process Mixture of Large-margin ... · In nite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines measure G distributed according to a Dirichlet

Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

Dirichlet Process Gaussian Mixture Models: Choice of …mlg.eng.cam.ac.uk/pub/pdf/GoeRas10.pdf · Dirichlet process Gaussian mixture models: ... Dirichlet Process Gaussian Mixture

Nonparametric Bayesian Methods (Dirichlet Process Mixtures)ml.cs.tsinghua.edu.cn/~jun/courses/statml-fall2015/10-DP_mixtures.… · Bayesian Mixture Models –MCMC inference Introduce

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim 12.03.15.(Thu) Computational Models of Intelligence.

Background Subtraction with Dirichlet process Gaussian Mixture … · 2018. 9. 27. · 2.1.1 Dirichlet process. In probability theory, are family of stochastic method whose realization

Online Trans-dimensional von Mises-Fisher Mixture Models ... · ONLINE TRANS-DIMENSIONAL VON MISES-FISHER MIXTURE MODELS (NMF) (Lee and Seung,1999), Dirichlet process Gaussian mixture