Chapter 21 Parametric odeling Christoph M. Hoffmann and Robert ...
S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.
-
Upload
alfred-gilmore -
Category
Documents
-
view
216 -
download
2
Transcript of S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.
![Page 1: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/1.jpg)
STATISTICAL TOPIC MODELINGpart 1
Andrea Tagarelli
Univ. of Calabria, Italy
![Page 2: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/2.jpg)
Statistical topic modeling (1/3)
• Key assumption: • text data represented as a mixture of topics, i.e., probability
distributions over terms
• Generative model for documents:• document features as being generated by latent variables
• Topic modeling vs. vector-space text modeling• (Latent) Semantic aspects underlying correlations between words
• Document topical structure
![Page 3: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/3.jpg)
Statistical topic modeling (2/3)
• Training on (large) corpus to learn:• Per-topic word distributions• Per-document topic distributions
[Blei, CACM, 2012]
![Page 4: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/4.jpg)
Statistical topic modeling (3/3)
• Graphical “Plate” notation• Standard representation for generative models• Rectangles (plates) represent repeated areas of the model
• number of times the variable(s) is repeated
[Hofmann, SIGIR, 1999]
![Page 5: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/5.jpg)
Observed and latent variables• Observed variable: we know the current value• Latent variable: a variable whose state cannot be observed• Estimation problem:
• Estimate values for a set of distribution parameters that can best explain a set of observations
• Most likely values of parameters: maximum likelihood of a model
• Likelihood impossible to calculate in full• Approximation through
• Expectation-maximization algorithm: an iterative method to estimate the probability of unobserved, latent variables. Until a local optimum is obtained
• Gibbs sampling: update parameters sample-wise• Variational inference: approximate the model by a simpler one
![Page 6: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/6.jpg)
Probabilistic LSA• PLSA [Hofmann, 2001]
• Probabilistic version of LSA conceived to better handling problems of term polysemy M
N
d z w
![Page 7: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/7.jpg)
PLSA training (1/2)
• Joint probability model:
• Likelihood
![Page 8: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/8.jpg)
PLSA training (2/2)
• Training with EM:• Initialization of the per-topic word distributions and per-document
topic distributions• E-step:
• M-step:
![Page 9: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/9.jpg)
Latent Dirichlet Allocation (1/2)
• LDA [Blei et al., 2003]• Adds a Dirichlet prior on the per-document topic distribution
• 3-level scheme: corpus, documents, and terms• Terms are the only observed variables
For each doc in a collection of N docs
For each word position in a doc of length M
Topic assignment to a word at position i in doc dj
Word token at position i in doc dj
Per-document topic distribution
Per-topic word distribution
[Moens and Vulic, Tutorial @WSDM 2014]
![Page 10: S TATISTICAL T OPIC M ODELING part 1 Andrea Tagarelli Univ. of Calabria, Italy.](https://reader035.fdocuments.us/reader035/viewer/2022062714/56649cfa5503460f949cbf89/html5/thumbnails/10.jpg)
Latent Dirichlet Allocation (2/2)
• Meaning of Dirichlet priors• θ ~ Dir(α1, …, αK)
• Each αk is a prior observation count for the no. of times a topic zk is sampled in a document prior to word observations
• Analogously for ηi, with β ~ Dir(η1, …, ηV)
• Inference for a new document: Given α, β, η, infer θ• Exact inference problem is intractable: training through
• Gibbs sampling• Variational inference