Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

21
Topic Models Presented by Iulian Pruteanu Friday, July 28 th , 2006

Transcript of Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Page 1: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Topic Models

Presented by Iulian Pruteanu

Friday, July 28th, 2006

Page 2: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Outline

1. Introduction

2. Exchangeable topic models (L. Fei-Fei. et al. CVPR 2005)

3. Dynamic topic models (D. Blei et al. ICML 2006)

Page 3: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Introduction

Topic models – tools for automatically organizing, searching and browsing large collections (documents, images, etc.)

Topic models – the discovered patterns often reflect the underlying topics which combined, form corpuses.

Exchangeable (static) topic models – the words (patches) of each document (image) are assumed to be independently drawn from a mixture of multinomials; the mixture components (topics) are shared by all documents

Dynamic topic models – capture the evolution of topics in a sequentially organized corpus of documents (images)

Page 4: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)

Used for learning natural scene categories.

A key idea is to use intermediate representations (themes) before classifying scenes.

Avoid using manually labeled or segmented images to train the system.

Local regions are first clustered into different intermediate themes, and then into categories. NO supervision is needed apart from a single category label to the training image.

• the algorithm provides a principled approach to learning relevant intermediate representations of scenes, without supervision

• the model is able to group categories of images into a sensible hierarchy

Page 5: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)

Page 6: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)

Page 7: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)

a patch x is the basic unit of an image

an image is a sequence of N patches

a category is a collection of I images

is the total number of themes

intermediate themes (K-dim unit vectors)

is the total number of codewords

KCθ

K

Knnz 1}{

TK

T

Page 8: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)

Bayesian decision:

For convenience, is always assumed to be a fixed uniform distribution,

Page 9: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)

Learning: Variational inference:

Page 10: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)

Features and codebook:

1. Evenly sampled grid

2. Random sampling

3. Kadir & Brady saliency detector

4. Lowe’s DoG detector

Page 11: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)

Experimental setup and results:

A model for each category was obtained from the training images.

Page 12: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)Experimental setup and results:

Page 13: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Exchangeable topic models (CVPR 2005)

Experimental setup and results:

Page 14: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Dynamic topic models (ICML 2006)

Topic models – tools for automatically organizing, searching and browsing large collections (documents, images, etc.)

Topic models – the discovered patterns often reflect the underlying topics which combined, form documents.

Exchangeable (static) topic models – the words (patches) of each document (image) are assumed to be independently drawn from a mixture of multinomials; the mixture components (topics) are shared by all documents

Dynamic topic models – capture the evolution of topics in a sequentially organized corpus of documents (images)

Page 15: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Dynamic topic models (ICML 2006)

Static topic model review:

Each document (image) is assumed drawn from the following generative process:

1. choose topic proportions from a distribution over the (K-1) simplex, such as a Dirichlet

2. for each (word) patch:

- choose a topic assignment

- choose a patch

This process assumes that images (documents) are drawn exchangeably from the same set of topics.

In a dynamic topic model, we suppose that the data is divided by time slice, for example by year. The images of each slice are modeled with a K-component topic model, where the topics associated with slice t evolve from the topics associated with slice t-1.

θ

)( zMultx ~

)(θMultZ ~

Page 16: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Dynamic topic models (ICML 2006)

Dynamic topic models:

Extension of the logistic normal distribution to time-series simplex data

Page 17: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Dynamic topic models (ICML 2006)

Approximate inference:

In the dynamic topic model, the latent variables are the topics ,

mixture proportions and topic indicators .

They optimize the free parameters of a distribution over the latent variables so that the distribution is close to K-L divergence to the true posterior.

Follow all the derivations in the paper.

T

tKkkt 11}{ ,

T

t

Cjjt 11}{

,N

n

T

t

Cjnjtz 111}{

,,

Page 18: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Dynamic topic models (ICML 2006)

Experimental setup and results:

A subset of 30,000 articles from the journal “Science”, 250 from each of the 120 years between 1881 and 1999.

The corpus is made up of approximately 7.5 million words. To explore the corpus and its themes, a 20-component dynamic topic model was estimated.

Page 19: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Dynamic topic models (ICML 2006)

Page 20: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Dynamic topic models (ICML 2006)

Discussion:

A sequential topic model for discrete data was developed by using Gaussian time series on the natural parameters of the multinomial topics and logistic normal topic proportion models.

The most promising extension to the method presented here is to incorporate a model of how new topics in the collection appear or disappear over time, rather than assuming a fixed number of topics.

Page 21: Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

References:

1. Blei, D., Ng, A., and Jordan, M. (JMLR 2003) – “Latent Dirichlet allocation”

2. Blei, D., Lafferty, J. D. (NIPS 2006) – “Correlated topic models”

3. Fei-Fei, L. and Perona, P. (IEEE CVPR 2005) – “A Bayesian hierarchical model for learning natural scene categories”

4. Blei, D., Lafferty, J. D. (ICML 2006) – “Dynamic topic models”