CS 6347 Lecture 18 Topic Models and LDA · 2017-04-07 · Latent Dirichlet Allocation (LDA) •...

CS 6347

Lecture 18

Topic Models and LDA(some slides by David Blei)

Generative vs. Discriminative Models

• Recall that, in Bayesian networks, there could be many different, but equivalent models of the same joint distribution

• Although these two models are equivalent (in the sense that they imply the same independence relations), they can differ significantly when it comes to inference/prediction

𝑌 𝑋

GenerativeDiscriminative

• Generative models: we can think of the observations as being generated by the latent variables

– Start sampling at the top and work downwards

– Examples?

𝑌 𝑋

• Generative models: we can think of the observations as being generated by the latent variables

– Start sampling at the top and work downwards

– Examples: HMMs, naïve Bayes, LDA

𝑌 𝑋

• Discriminative models: most useful for discriminating the values of the latent variables

– Almost always used for supervised learning

– Examples?

𝑌 𝑋

• Discriminative models: most useful for discriminating the values of the latent variables

– Almost always used for supervised learning

– Examples: CRFs

𝑌 𝑋

• Suppose we are only interested in the prediction task (i.e., estimating 𝑝(𝑌|𝑋))

– Discriminative model: 𝑝 𝑋, 𝑌 = 𝑝 𝑋 𝑝(𝑌|𝑋)

– Generative model: 𝑝 𝑋, 𝑌 = 𝑝 𝑌 𝑝(𝑋|𝑌)

𝑌 𝑋

Generative Models

• The primary advantage of generative models is that they provide a model of the data generating process

– Could generate “new” data samples by using the model

• Topic models (generative models of documents)

– Methods for discovering themes (topics) from a collection (e.g., books, newspapers, etc.)

– Annotates the collection according to the discovered themes

– Use the annotations to organize, search, summarize, etc.

Topic Models

Models of Text Documents

• Bag-of-words model: assume that the ordering of words in a document do not matter

– This is typically false as certain phrases can only appear together

• Unigram model: all words in a document are drawn uniformly at random from categorical distribution

• Mixture of unigrams model: for each document, we first choose a topic 𝑧 and then generate words for the document from the conditional distribution 𝑝(𝑤|𝑧)

– Topics are just probability distributions over words

Latent Dirichlet Allocation (LDA)

• 𝛼 and 𝜂 are parameters of the prior distributions over 𝜃 and 𝛽

• 𝜃𝑑 is the distribution of topics for document 𝑑 (real vector of length 𝐾)

• 𝛽𝑘 is the distribution of words for topic 𝑘 (real vector of length 𝑉)

• 𝑧𝑑,𝑛 is the topic for the 𝑛th word in the 𝑑th document

• 𝑤𝑑,𝑛 is the 𝑛th word of the 𝑑th document

• Plate notation

– There are 𝑁 ⋅ 𝐷 different variables that represent the observed words in the different documents

– There are 𝐾 total topics (assumed to be known in advance)

– There are 𝐷 total documents

• The only observed variables are the words in the documents

– The topic for each word, the distribution over topics for each document, and the distribution of words per topic are all latent variables in this model

• The model contains both continuous and discrete random variables

– 𝜃𝑑 and 𝛽𝑘 are vectors of probabilities

– 𝑧𝑑,𝑛 is an integer in {1, … , 𝐾} that indicates the topic of the 𝑛th word in the 𝑑th document

– 𝑤𝑑,𝑛 is an integer in 1,… , 𝑉 which indexes over all possible words

• 𝜃𝑑~𝐷𝑖𝑟(𝛼) where 𝐷𝑖𝑟(𝛼) is the Dirichlet distribution with parameter vector 𝛼 > 0

• 𝛽𝑘~𝐷𝑖𝑟(𝜂) with parameter vector 𝜂 > 0

• Dirichlet distribution over 𝑥1, … , 𝑥𝐾 such that 𝑥1, … , 𝑥𝐾 ≥ 0and σ𝑖 𝑥𝑖 = 1

𝑓(𝑥1, … , 𝑥𝐾; 𝛼1, … , 𝛼𝐾) ∝ෑ

𝑥𝑖𝛼𝑖−1

– The Dirichlet distribution is a distribution over probability distributions over 𝐾 elements

• The discrete random variables are distributed via the corresponding probability distributions

𝑝(𝑧𝑑,𝑛 = 𝑘 𝜃𝑑 = 𝜃𝑑 𝑘

𝑝 𝑤𝑑,𝑛 = 𝑣 𝑧𝑑,𝑛, 𝛽1, … , 𝛽𝐾 = 𝛽𝑧𝑑,𝑛 𝑣

– Here, 𝜃𝑑 𝑘 is the 𝑘th element of the vector 𝜃𝑑 which corresponds to the percentage of document 𝑑corresponding to topic 𝑘

• The joint distribution is then

𝑝 𝑤, 𝑧, 𝜃, 𝛽 𝛼, 𝜂 =ෑ

𝑝(𝛽𝑘|𝜂)ෑ

𝑝(𝜃𝑑|𝛼)ෑ

𝑝(𝑧𝑑,𝑛 𝜃𝑑 𝑝 𝑤𝑑,𝑛 𝑧𝑑,𝑛, 𝛽

• LDA is a generative model

– We can think of the words as being generated by a probabilistic process defined by the model

– How reasonable is the generative model?

• Inference in this model is NP-hard

• Given the 𝐷 documents, want to find the parameters that best maximize the joint probability

– Can use an EM based approach called variational EM

Variational EM

• Recall that the EM algorithm constructed a lower bound using Jensen’s inequality

𝑙 𝜃 =

𝑘=1

𝑥𝑚𝑖𝑠𝑘

𝑝(𝑥𝑜𝑏𝑠(𝑘)

, 𝑥𝑚𝑖𝑠(𝑘)

|𝜃)

𝑘=1

𝑞𝑘 𝑥𝑚𝑖𝑠(𝑘)

⋅𝑝 𝑥𝑜𝑏𝑠

(𝑘), 𝑥𝑚𝑖𝑠

(𝑘)𝜃

𝑘=1

log𝑝 𝑥𝑜𝑏𝑠

(𝑘), 𝑥𝑚𝑖𝑠

(𝑘)𝜃

Variational EM

• Performing the optimization over 𝑞 is equivalent to computing

𝑝(𝑥𝑚𝑖𝑠(𝑘)

|𝑥𝑜𝑏𝑠(𝑘)

, 𝜃)

• This can be intractable in practice

– Instead, restrict 𝑞 to lie in some restricted set of distributions 𝑄

– For example, could make a mean-field assumption

𝑞 𝑥𝑚𝑖𝑠(𝑘)

𝑖∈𝑚𝑖𝑠𝑘

𝑞𝑖(𝑥𝑖(𝑘))

• The resulting algorithm only yields an approximation to the log-likelihood

EM for Topic Models

𝑝 𝑤 𝛼, 𝜂 = නෑ

𝑝(𝛽𝑘|𝜂)න

𝑝(𝜃𝑑|𝛼)ෑ

𝑝(𝑧𝑑,𝑛 𝜃𝑑 𝑝 𝑤𝑑,𝑛 𝑧𝑑,𝑛, 𝛽 𝑑𝜃 𝑑𝛽

• To apply variational EM, we write

log 𝑝 𝑤 𝛼, 𝜂 = logනන

𝑝 𝑤, 𝑧, 𝜃, 𝛽 𝛼, 𝜂 𝑑𝜃𝑑𝛽

≥ නන

𝑞 𝑧, 𝜃, 𝛽 log𝑝 𝑤, 𝑧, 𝜃, 𝛽 𝛼, 𝜂

𝑞 𝑧, 𝜃, 𝛽𝑑𝜃𝑑𝛽

where we restrict the distribution 𝑞 to be of the following form

𝑞 𝑧, 𝜃, 𝛽 =ෑ

𝑞(𝛽𝑘|𝜂)ෑ

𝑞 𝜃𝑑 𝛼 ෑ

𝑞(𝑧𝑑,𝑛)

Example of LDA

Extensions of LDA

• Author– Topic model

– 𝑎𝑑 is the group of authors for the 𝑑th document

– 𝑥𝑑,𝑛 is the author of the 𝑛th word of the 𝑑thdocument

– 𝜃𝑎 is the topic distribution for author 𝑎

– 𝑧𝑑,𝑛 is the topic for the 𝑛thword of the 𝑑th document

The Author-Topic Model for Authors and Documents Rosen-Zvi et al.

Extensions of LDA

• Label 𝑌𝑑 for each document represents a value to be predicted from the document

– E.g., number of stars for each document in a corpus of movie reviews

CS 6347 Lecture 18 Topic Models and LDA · 2017-04-07 · Latent Dirichlet Allocation (LDA) •...

Documents

Transcript of CS 6347 Lecture 18 Topic Models and LDA · 2017-04-07 · Latent Dirichlet Allocation (LDA) •...

Latent Dirichlet Allocation (LDA) and Topic modeling ... · 2.3 Latent Dirichlet Allocation LDA is a generative probabilistic model of a corpus. The basic idea is that the documents

A Latent Dirichlet Allocation Systematic Literature Review ... · (TM) technique and Latent Dirichlet Allocation (LDA) topic model were used to cluster articles in different topics.

Topic Modeling of Multimodal Data: An Autoregressive Approachopenaccess.thecvf.com/.../Zheng_Topic_Modeling_of... · Topic modeling based on latent Dirichlet allocation (LDA) has

Decoupling Sparsity and Smoothness in the Dirichlet ...jmlr.csail.mit.edu/papers/volume20/18-569/18-569.pdf · successful model of this kind has been latent Dirichlet allocation (LDA,

High Performance Latent Dirichlet Allocation for Text Mining · GFS Google File System GS Gibbs Sampling HDFS Hadoop Distributed File System HD-LDA Hierarchical Distributed LDA HDP

ISQS 6347, Data

Notes on Latent Dirichlet Allocation (LDA) for Beginners

Latent Dirichlet Allocation - Alex Smola · 3. Latent Dirichlet allocation Latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus. The basic idea is that

Anomaly Detection in Unstructured Environments using ... · Latent Dirichlet Allocation (LDA), proposed by Blei et al. [3] improves upon PLSA by placing a Dirichlet prior on and ˚,

Latent Dirichlet Allocation (LDA) - Penn Engineering ...cis520/lectures/LDA.pdfDirichlet Distribution • Dirichlet distribution • is defined over a (k-1)-simplex. I.e., it takes

Latent Dirichlet Allocation - BYU Data Mining Labdml.cs.byu.edu/~cgc/docs/atdm/Readings/LDA-Paper.pdf · Latent Dirichlet allocation (LDA) is a generative probabilistic model of a

Latent Dirichlet Allocation (LDA)1 - Hediberthedibert.org/wp-content/uploads/2018/07/lda.pdf · Latent Dirichlet allocation LDA is a generative probabilistic model of a corpus. Documents

Latent Dirichlet Allocation€¦ · 3. Latent Dirichlet allocation Latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus. The basic idea is that documents

Latent Dirichlet Allocation (LDA) and Topic modeling: models, … · 2018-08-27 · Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey Hamed Jelodar1,

Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May. 10 2013 LDA.

3/11/20151 Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application.

Autoencoding variational Bayes for latent Dirichlet allocation · Autoencoding variational Bayes for latent Dirichlet allocation 3 2 Latent Dirichlet Allocation LDA is probably the

Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/... · I. LDA: Latent Dirichlet Allocation Joint Distribution: Variational Inference with : Minimize the variational bound

lda Documentation - media.readthedocs.org · lda Documentation lda implements latent Dirichlet allocation (LDA) using collapsed Gibbs sampling. lda is fast and can be installed without

Explore the Evolution of Development Topics via On-Line LDAhome.cse.ust.hk/~jhuao/pdf/p555-hu.pdf · A. LDA Latent Dirichlet Allocation (LDA) is a probabilistic tech-nique that has