Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
-
Upload
clifford-brooks -
Category
Documents
-
view
213 -
download
0
Transcript of Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Topic Models in Text Processing
IR Group MeetingPresented by Qiaozhu Mei
Topic Models in Text Processing
• Introduction• Basic Topic Models
– Probabilistic Latent Semantic Indexing– Latent Dirichlet Allocation– Correlated Topic Models
• Variations and Applications
Overview
• Motivation:– Model the topic/subtopics in text collections
• Basic Assumptions:– There are k topics in the whole collection– Each topic is represented by a multinomial distribution
over the vocabulary (language model)– Each document can cover multiple topics
• Applications– Summarizing topics– Predict topic coverage for documents– Model the topic correlations
Basic Topic Models
• Unigram model
• Mixture of unigrams
• Probabilistic LSI
• LDA
• Correlated Topic Models
Unigram Model
• There is only one topic in the collection
• Estimation:– Maximum likelihood estimation
• Application: LMIR,
N
nnwpp
1
)()(w
M: Number of Documents
N: Number of words in ww: a document
in the collectionwn: a word in w
Mixture of unigrams
• There is k topics in the collection, but each document only cover one topic
• Estimation: MLE and EM Algorithm• Application: simple document clustering
z
N
nn zwpzpp
1
)|()()(w
Probabilistic Latent Semantic Indexing
• (Thomas Hofmann ’99): each document is generated from more than one topics, with a set of document specific mixture weights {p(z|d)} over k topics.
• These mixture weights are considered as fixed parameters to be estimated.
• Also known as aspect model. • No prior knowledge about topics required,
context and term co-occurrences are exploited
PLSI (cont.)
• Assume uniform p(d)• Parameter Estimation:
– π: {p(z|d)}; : {p(w|z)}– Maximizing log-likelihood using EM algorithm
z
nn dzpzwpdpwdp )|()|()(),(
d: document
Problem of PLSI
• Mixture weights are considered as document specific, thus no natural way to assign probability to a previously unseen document.
• Number of parameters to be estimated grows with size of training set, thus overfits data, and suffers from multiple local maxima.
• Not a fully generative model of documents.
Latent Dirichlet Allocation
• (Blei et al ‘03) Treats the topic mixture weights as a k-parameter hidden random variable (a multinomial) and places a Dirichlet prior on the multinomial mixing weights. This is sampled once per document.
• The weights for word multinomial distributions are still considered as fixed parameters to be estimated.
• For a fuller Bayesian approach, can place a Dirichlet prior to these word multinomial distributions to smooth the probabilities. (like Dirichlet smoothing)
Dirichlet Distribution
• Bayesian Theory: Everything is not fixed, everything is random. • Choose prior for the multinomial distribution of the topic mixture w
eights: Dirichlet distribution is conjugate to the multinomial distribution, which is natural to choose.
• A k-dimensional Dirichlet random variable can take values in the (k-1)-simplex, and has the following probability density on this simplex:
111
1
1 1
)(
)()()1(
kkk
i i
ki ip
The Graph Model of LDA
M
dd
kN
n zdndnddnd
kN
n znnn
nn
N
nn
dzwpzppDp
dzwpzppp
zwpzppp
d
dn
n
1 1
1
1
),()()(),(
),()()(),()3(
),()()(),,,()2(
w
wz
Generating Process of LDA
• Choose• Choose• For each of the N words :
– Choose a topic– Choose a word from , a multinomial
probability conditioned on the topic . β is a k by n matrix parameterized with the word probabilities.
nw
nw ),( nn zwp
nz
)11(][ ij
ijVk zwp
Inference and Parameter Estimation
• Parameters: – Corpus level: α, β: sampled once in the corpus creatin
g generative model.– Document level: , z: sampled once to generate a doc
ument
• Inference: estimation of document-level parameters.
• However, it is intractable to compute, which needs approximate inference.
Variantional Inference