Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion...

17
Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009

Transcript of Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion...

Page 1: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Hidden Topic Markov ModelsAmit Gruber, Michal Rosen-Zvi and Yair Weiss

in AISTATS 2007

Discussion led by Chunping Wang

ECE, Duke University

March 2, 2009

Page 2: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Outline

• Motivations

• Related Topic Models

• Hidden Topic Markov Models

• Inference

• Experiments

• Conclusions

Page 3: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Motivations

• Feature Reduction

Extensively large text corpora a small number of variables

• Topical segmentation

Segment a document according to hidden topics

• Word sense disambiguation

Distinguish between different instances of the same word according to the context

Page 4: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Related Topic Models

• LDA (JMLR 2003)

1. For ,

draw

2. For ,

(a) Draw

(b) For , draw

(c) For , draw

~ ( ),Dirichlet

1, , dn N ~ ( ),nz Multinomial

~ ( ).nn zw Multinomial

1, ,d D

1, ,k K

1, , dn N

~ ( ),k Dirichlet

Words in a document are exchangeable; documents are also exchangeable.

d

d

d

Page 5: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Related Topic Models

• Dynamic Topic Models (ICML 2006)

Words in a document are exchangeable; documents are not exchangeable.

Page 6: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Related Topic Models

• Topic Modeling: Beyond Bag of Words (ICML 2006)

Words in a document are not exchangeable; documents are exchangeable.

J

d

Page 7: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Related Topic Models

• Integrating Topics and Syntax (NIPS 2005)

Words in a document are not exchangeable; documents are exchangeable.

HMM

LDA Semantic words

Non-semantic (syntactic) words

Page 8: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Hidden Topic Markov Models

No topic transition is allowed within a sentence. Whenever a new sentence starts, either the old topic is kept or a new topic is drawn according to .

d

d

d

d

Page 9: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Hidden Topic Markov Models

1

1 0 0

0 0

0 0 1K K

A

1

2

1

K

K K K

A

0n Transition matrices

1n

within a sentence

or no transition between two sentences, with probability

Transition occurs between two sentences, with probability

Emission matrix

1

β

Initial state distribution

Page 10: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

InferenceEM algorithm:• E-step

Compute using the forward-backward algorithm;

• M-step

d

d

Page 11: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Experiments• NIPS dataset (1740 documents, 1557 for training, 183 for

testing)– Data preprocess

Extract words in the vocabulary (J=12113, no stop words);

Divide text to sentences according to “.?!;” .

– Compare LDA, HTMM and VHTMM1 in terms of perplexity

VHTMM1: a variant of HTMM with , a “bag of sentences”1

Ntest: the total length of the test document;

N: the first N words of the document are observed.

Average Ntest=1300

Page 12: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Experiments

K=100 N=10

The lower the perplexity is, the better the model is in predicting unseen words.

Page 13: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Experiments– Topical segmentation

HTMM

LDA

Page 14: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Experiments– Top words of topics

HTMM

LDA

mathacknowledgmentsreference

Page 15: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Experiments

As more topics are available, the topics become more specific and topic transitions are more frequent.

Page 16: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Experiments• Two toy datasets, generated using HTMM and LDA.

Goal: to eliminate the option that the perplexity of HTMM might be lower than the perplexity of LDA only because it has less degrees of freedom. With toy datasets, other criteria can be used for comparison.

Page 17: Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.

Conclusions• HTMM is another extension of LDA, which relaxes

the “bag-of-words” assumption by modeling the topic dynamics with a Markov chain.

• This extension leads to a significant improvement in perplexity, and makes additional inferences possible, such as topical segmentation and word sense disambiguation.

• It requires a larger storage since the entire document has to be the input of the algorithm.

• It only applies to structured data, where sentences are well defined.