Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and...

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Chong Wang and David M. BleiNIPS 2009

Discussion led by Chunping Wang

ECE, Duke University

March 26, 2010

Outline

• Motivations

• LDA and HDP-LDA

• Sparse Topic Models

• Inference Using Collapsed Gibbs sampling

• Experiments

• Conclusions

Motivations

• Topics modeling with the “bag of words” assumption

• An extension of the HDP-LDA model

• In the LDA and the HDP-LDA models, the topics are drawn from an exchangeable Dirichlet distribution with a scale parameter . As approaches zero, topics will be

o sparse: most probability mass on only a few terms

o less smooth: empirical counts dominant

• Goal: to decouple sparsity and smoothness so that these two properties can be achieved at the same time.

• How: a Bernoulli variable for each term and each topic is introduced.

LDA and HDP-LDA

wzθα β

)(Dir~ αθd

)(Mult~ ddiz θ

)(Mult~dizdiw β

)(Dir~ uβ k

wzθα β

HDP-LDA

u),(DP~ αθ d

)(GEM~ α

)(Dir~ uβ k

)(Mult~ ddiz θ

)(Mult~dizdiw β

topic : k

document : d

word : i

topic : k

document : d

word : i

Nonparametric form of LDA, with the

number of topics unbounded

Base measure

weights

Sparse Topic Models

)(Dir~ uβ

The size of the vocabulary is V

)(Dir~ bβ

Defined on a V-1-simplex Defined on a sub-simplex specified by b

b : a V-length binary vector composed of V Bernoulli variables

)(Bern~ kkvb ),(Beta~ srk one selection proportion for each topic

Sparsity: the pattern of ones in , controlled by

Smoothness: enforced over terms with non-zero ’s through

kvb Decoupled!

Sparse Topic Models

Inference Using Collapsed Gibbs sampling

As in the HDP-LDA

Topic proportions and topic distributions are integrated out. θ β

Topic proportions and topic distributions are integrated out.

The direct-assignment method based on the Chinese restaurant franchise (CRF) is used for and an augmented variable, table counts

As in the HDP-LDA

α,z m

Notation:

: # of customers (words) in restaurant d (document) eating dish k (topic)

: # of tables in restaurant d serving dish k

: marginal counts represented with dots

K, u: current # of topics and new topic index, respectively

: # of times that term v has been assigned to topic k

: # of times that all the terms have been assigned to topic k

conditional density of under the topic k given all data except

..... ,,, mmmn kdd

)},'',:,{|()|( '''''' diidkzzwvwpvwf idididdidiw

diw diw

Recall the direct-assignment sampling method for the HDP-LDA

Sampling topic assignments

if a new topic is sampled, then sample , and let and and

Sampling stick length

Sampling table counts

),,,(Dir~| .1. Kmm mα

kwfnkzp

kkdidkdidi

usedpreviouslyif)()(),,|( ,

kdkdk mns

nmmp ))(,(

)(),,|(

Recall the direct-assignment sampling method for HDP-LDA

Sampling topic assignments

kwfnkzp

kkdidkdidi

usedpreviouslyif)()(),,|( ,

,)( vdikdi

wk nvwf difor HDP-LDA

dikkdiw

k bnvwf di )()|( )(,

bfor sparse TM

Instead, the authors integrate out for faster convergence. kb

dikidididkkkdikkdi

wk diidkzzwpvwpdvwf

bβββ )},'',:,{|,()|()|( ''''''

Since there are total possible , this is the central computational challenge for the sparse TM.

straightforward

define vocabularyset of terms that have word assignments in topic k

This conditional probability depends on the selector proportions.

Sampling Bernoulli parameter ( using as an auxiliary variable)

Sampling hyper-parameters o : with Gamma(1,1) priorso : Metropolis-Hastings using symmetric Gaussian proposal

Estimate topic distributions from any single sample of z and b

define set of terms with an “on” b

o sample conditioned on ;o sample conditioned on .

sparsity

smoothness on the selected terms

Experiments

arXiv: online research abstracts, D = 2500, V = 2873

Nematode Biology: research abstracts, D = 2500, V = 2944

NIPS: NIPS articles between 1988-1999, V = 5005. 20% of words for each paper are used.

Conf. abstracts: abstracts from CIKM, ICML, KDD, NIPS, SIGIR and WWW, between 2005-2008, V = 3733.

Four datasets:

Two predictive quantities:

where the topic complexity kk Bcomplexity

Experiments

better perplexity, simpler models

larger : smoother

less topics

similar # of terms

Experiments

small (<0.01)

Experiments

small (<0.01)

lack of smoothness

Experiments

small (<0.01)

Need more topics to explain all kinds of patterns of empirical word counts

lack of smoothness

Experiments

Infrequent words populate “noise” topics.

small (<0.01)

Need more topics to explain all kinds of patterns of empirical word counts

lack of smoothness

Conclusions

A new topic model in the HDP-LDA framework, based on the “bag of words” assumption;

Main contributions:

• Decoupling the control of sparsity and smoothness by introducing binary selectors for term assignments in each topic;

• Developing a collapsed Gibbs sampler in the HDP-LDA framework.

Held out performance is better than the HDP-LDA.

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and...

Documents

Transcript of Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and...

Machine Learning and Econometrics · 8/28/2015 · Ruiz, Athey and Blei (2017) evaluate on days with unusual prices Athey, Blei, Donnelly and Ruiz (2017) evaluate change in purchases

Nonparametric Bayes Pachinko Allocation by Li, Blei and McCallum (UAI 2007)

Authors: K.H. Wang, T.H. Wang, W.L. Wang, S.C. Huang

Max-Margin Classification of Data with Absent Features Presented by Chunping Wang Machine Learning Group, Duke University July 3, 2008 by Chechik, Heitz,

Wang-Wang Blues - Sheet music

Blei Jordan Ba

The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei Presented.

Variational Inferencejbg/teaching/CMSC_726/17a.pdf · Variational Inference Material adapted from David Blei University of Maryland INTRODUCTION Material adapted from David Blei j

Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

DECO Bly UK Ltd DECO Blei P. GmbH

Cluster, Grid and Cloud Computing Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw.

Charter Management Organizations 2017 · Charter Management Organizations 2017 James L. Woodworth, Ph.D. – Lead Analyst Margaret E. Raymond, Ph.D. – Project Director Chunping

Chong Wang and David M. Blei Best student paper award at ... · Latent Dirichlet allocation (LDA) is a popular topic model. It assumes There are K topics For each article, topic proportions

Blei lafferty2009

Hidden Markov Models David Meir Blei November 1, 1999.

Integrating Topics and Syntax Paper by Thomas Griffiths, Mark Steyvers, David Blei, Josh Tenenbaum Presentation by Eric Wang 9/12/2008.

AutomaticDiﬀerentiationVariationalInferencegelman/research/published/advi... · 2017-02-16 · Kucukelbir, Tran, Ranganath, Gelman and Blei Eachstepaboveiscarefullydesignedtomakeadviwork“outofthebox”forapracticalclassofmodern

Die Blei-Blei Methoden - uni- · PDF filePb used as reference. so, increase of . 208. Pb/ 204. Pb, 207. ... of the 'conformable' Pb model. ... The uncertainty of a data (age) is as

arXiv:1906.05323v2 [cs.NE] 25 Oct 2019 · 2011; Blundell et al. 2015; Paisley, Blei, and Jordan 2012; Ranganath, Gerrish, and Blei 2013) has renewed interest in Bayesian neural networks.

Columbia Business School Behavioral Insights from Text ...IPoisson Factorization (e.g., Gopalan, Hofman and Blei, 2013; Gopalan, Charlin and Blei, 2014) ITopic model IOffset variables