Topic models Source: Topic models, David Blei, MLSS 09.

68
Topic models Source: “Topic models”, David Blei, MLSS ‘09

Transcript of Topic models Source: Topic models, David Blei, MLSS 09.

Page 1: Topic models Source: Topic models, David Blei, MLSS 09.

Topic models

Source: “Topic models”, David Blei, MLSS ‘09

Page 2: Topic models Source: Topic models, David Blei, MLSS 09.

Topic modeling - Motivation

Page 3: Topic models Source: Topic models, David Blei, MLSS 09.

Discover topics from a corpus

Page 4: Topic models Source: Topic models, David Blei, MLSS 09.

Model connections between topics

Page 5: Topic models Source: Topic models, David Blei, MLSS 09.

Model the evolution of topics over time

Page 6: Topic models Source: Topic models, David Blei, MLSS 09.

Image annotation

Page 7: Topic models Source: Topic models, David Blei, MLSS 09.

Extensions*

• Malleable: Can be quickly extended for data with tags (side information), class label, etc

• The (approximate) inference methods can be readily translated in many cases

• Most datasets can be converted to ‘bag-of-words’ format using a codebook representation and LDA style models can be readily applied (can work with continuous observations too)

*YMMV

Page 8: Topic models Source: Topic models, David Blei, MLSS 09.

Connection to ML research

Page 9: Topic models Source: Topic models, David Blei, MLSS 09.

Latent Dirichlet Allocation

Page 10: Topic models Source: Topic models, David Blei, MLSS 09.

LDA

Page 11: Topic models Source: Topic models, David Blei, MLSS 09.

Probabilistic modeling

Page 12: Topic models Source: Topic models, David Blei, MLSS 09.

Intuition behind LDA

Page 13: Topic models Source: Topic models, David Blei, MLSS 09.

Generative model

Page 14: Topic models Source: Topic models, David Blei, MLSS 09.

The posterior distribution

Page 15: Topic models Source: Topic models, David Blei, MLSS 09.

Graphical models (Aside)

Page 16: Topic models Source: Topic models, David Blei, MLSS 09.

LDA model

Page 17: Topic models Source: Topic models, David Blei, MLSS 09.

Dirichlet distribution

Page 18: Topic models Source: Topic models, David Blei, MLSS 09.

Dirichlet Examples

Darker implies lower magnitude

\alpha < 1 leads to sparser topics

Page 19: Topic models Source: Topic models, David Blei, MLSS 09.

LDA

Page 20: Topic models Source: Topic models, David Blei, MLSS 09.

Inference in LDA

Page 21: Topic models Source: Topic models, David Blei, MLSS 09.

Example inference

Page 22: Topic models Source: Topic models, David Blei, MLSS 09.

Example inference

Page 23: Topic models Source: Topic models, David Blei, MLSS 09.

Topics vs words

Page 24: Topic models Source: Topic models, David Blei, MLSS 09.

Explore and browse document collections

Page 25: Topic models Source: Topic models, David Blei, MLSS 09.

Why does LDA “work” ?

Page 26: Topic models Source: Topic models, David Blei, MLSS 09.

LDA is modular, general, useful

Page 27: Topic models Source: Topic models, David Blei, MLSS 09.

LDA is modular, general, useful

Page 28: Topic models Source: Topic models, David Blei, MLSS 09.

LDA is modular, general, useful

Page 29: Topic models Source: Topic models, David Blei, MLSS 09.

Approximate inference

• An excellent reference is “On smoothing and inference for topic models” Asuncion et al. (2009).

Page 30: Topic models Source: Topic models, David Blei, MLSS 09.

Posterior distribution for LDA

The only parameters we need to estimate are \alpha, \beta

Page 31: Topic models Source: Topic models, David Blei, MLSS 09.

Posterior distribution

Page 32: Topic models Source: Topic models, David Blei, MLSS 09.

Posterior distribution for LDA

• Can integrate out either \theta or z, but not both

• Marginalize \theta => z ~ Polya (\alpha)• Polya distribution also known as Dirichlet

compound multinomial (models “burstiness”)• Most algorithms marginalize out \theta

Page 33: Topic models Source: Topic models, David Blei, MLSS 09.

MAP inference

• Integrate out z• Treat \theta as random variable• Can use EM algorithm• Updates very similar to that of PLSA (except

for additional regularization terms)

Page 34: Topic models Source: Topic models, David Blei, MLSS 09.

Collapsed Gibbs sampling

Page 35: Topic models Source: Topic models, David Blei, MLSS 09.

Variational inference

Can think of this as extension of EM where we compute expectations w.r.t “variational distribution” instead of true posterior

Page 36: Topic models Source: Topic models, David Blei, MLSS 09.

Mean field variational inference

Page 37: Topic models Source: Topic models, David Blei, MLSS 09.

MFVI and conditional exponential families

Page 38: Topic models Source: Topic models, David Blei, MLSS 09.

MFVI and conditional exponential families

Page 39: Topic models Source: Topic models, David Blei, MLSS 09.

Variational inference

Page 40: Topic models Source: Topic models, David Blei, MLSS 09.

Variational inference for LDA

Page 41: Topic models Source: Topic models, David Blei, MLSS 09.

Variational inference for LDA

Page 42: Topic models Source: Topic models, David Blei, MLSS 09.

Variational inference for LDA

Page 43: Topic models Source: Topic models, David Blei, MLSS 09.

Collapsed variational inference

• MFVI: \theta, z assumed to be independent• \theta can be marginalized out exactly• Variational inference algorithm operating on

the “collapsed space” as CGS• Strictly better lower bound than VB• Can think of “soft” CGS where we propagate

uncertainty by using probabilities than samples

Page 44: Topic models Source: Topic models, David Blei, MLSS 09.

Estimating the topics

Page 45: Topic models Source: Topic models, David Blei, MLSS 09.

Inference comparison

Page 46: Topic models Source: Topic models, David Blei, MLSS 09.

Comparison of updates

“On smoothing and inference for topic models” Asuncion et al. (2009).

MAP

VB

CVB0

CGS

Page 47: Topic models Source: Topic models, David Blei, MLSS 09.

Choice of inference algorithm

• Depends on vocabulary size (V) , number of words per document (say N_i)

• Collapsed algorithms – Not parallelizable• CGS - need to draw multiple samples of topic

assignments for multiple occurrences of same word (slow when N_i >> V)

• MAP – Fast, but performs poor when N_i << V• CVB0 - Good tradeoff between computational

complexity and perplexity

Page 48: Topic models Source: Topic models, David Blei, MLSS 09.

Supervised and relational topic models

Page 49: Topic models Source: Topic models, David Blei, MLSS 09.

Supervised LDA

Page 50: Topic models Source: Topic models, David Blei, MLSS 09.

Supervised LDA

Page 51: Topic models Source: Topic models, David Blei, MLSS 09.

Supervised LDA

Page 52: Topic models Source: Topic models, David Blei, MLSS 09.

Supervised LDA

Page 53: Topic models Source: Topic models, David Blei, MLSS 09.

Variational inference in sLDA

Page 54: Topic models Source: Topic models, David Blei, MLSS 09.

ML estimation

Page 55: Topic models Source: Topic models, David Blei, MLSS 09.

Prediction

Page 56: Topic models Source: Topic models, David Blei, MLSS 09.

Example: Movie reviews

Page 57: Topic models Source: Topic models, David Blei, MLSS 09.

Diverse response types with GLMs

Page 58: Topic models Source: Topic models, David Blei, MLSS 09.

Example: Multi class classification

Page 59: Topic models Source: Topic models, David Blei, MLSS 09.

Supervised topic models

Page 60: Topic models Source: Topic models, David Blei, MLSS 09.

Upstream vs downstream models

Upstream: Conditional modelsDownstream: The predictor variable is generated based on actually observed z than \theta which is E(z’s)

Page 61: Topic models Source: Topic models, David Blei, MLSS 09.

Relational topic models

Page 62: Topic models Source: Topic models, David Blei, MLSS 09.

Relational topic models

Page 63: Topic models Source: Topic models, David Blei, MLSS 09.

Relational topic models

Page 64: Topic models Source: Topic models, David Blei, MLSS 09.

Predictive performance of one type given the other

Page 65: Topic models Source: Topic models, David Blei, MLSS 09.

Predicting links from documents

Page 66: Topic models Source: Topic models, David Blei, MLSS 09.

Predicting links from documents

Page 67: Topic models Source: Topic models, David Blei, MLSS 09.

Things we didn’t address

• Model selection: Non parametric Bayesian approaches

• Hyperparameter tuning• Evaluation can be a bit tricky (comparing

approximate bounds) for LDA, but can use traditional metrics in supervised versions

Page 68: Topic models Source: Topic models, David Blei, MLSS 09.

Thank you!