Nicolas loeff lda

Post on 11-May-2015

577 views 3 download

Tags:

Transcript of Nicolas loeff lda

Latent Dirichlet Allocation

D. Blei, A. Ng, M. Jordan

Includes some slides adapted from J.Ramos at Rutgers, M. Steyvers and M. Rozen-Zvi at UCI, L. Fei Fei at UIUC.

Overview

What is so special about text? Classification methods LSI Unigram / Mixture of Unigram Probabilistic LSI (Aspect Model) LDA model Geometric interpretation

What is so special about text?

No obvious relation between features High dimensionality, (often larger

vocabulary, V, than the number of features!)

Importance of speed

The need for dimensionality reduction Representation:

Presenting documents as vectors in the words space - ‘bag of words’ representation

It is a sparse representation, V>>|D| A need to define conceptual closeness

Bag of wordsOf all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain, visual, perception,

retinal, cerebral cortex,eye, cell, optical

nerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

China, trade, surplus, commerce,

exports, imports, US, yuan, bank, domestic,

foreign, increase, trade, value

Bag of words

Order of words in document can be ignored, only count matters.

Probability theory: Exchangability (includes IID) (Aldous, 1985).

Exchangable RVs have a representation as mixture distribution (de Finetti, 1990).

What does this have to do with Vision?

ObjectObject Bag of Bag of ‘words’‘words’

TF-IDF Weighing Scheme (Salton and McGill, 1983) Given corpus D, word w, document d,

calculate wd = fw, d · log (|D|/fw, D) Many varieties of basic scheme.

Search procedure:Scan each d, compute each wi, d, return set D’ that maximizes Σi wi, d

A Spatial Representation: Latent Semantic Analysis (Deerwester, 1990)

Document/Term count matrix

1…

16…

0…

SCIENCE…

6190RESEARCH

2012SOUL

3034LOVE

Doc3 … Doc2Doc1SVD

High dimensional space, not as high as |V|

SOUL

RESEARCH

LOVE

SCIENCE

• Each word is a single point in semantic space (dimensionality reduction)• Similarity measured by cosine of angle between word vectors

Feature Vector representation

From: Modeling the Internet and the Web: Probabilistic methods and Algorithms, Pierre Baldi, Paolo Frasconi, Padhraic Smyth

Classification: assigning words to topics

Different models for data:

Discrete Classifier, modeling the boundaries between different classes of the data

Prediction of Categorical output e.g., SVM

Density Estimator: modeling the distribution of the data points themselves

Generative Models e.g. NB

Generative Models – Latent semantic structure

Latent Structure

Words

∑=

),()( ww PPDistribution over words

)()()|()|(

wwwP

PPP =

Inferring latent structure

Topic Models Unsupervised learning of topics (“gist”) of

documents: articles/chapters conversations emails .... any verbal context

Topics are useful latent structures to explain semantic association

Probabilistic Generative Model

Each document is a probability distribution over topics

Each topic is a probability distribution over words

Generative Process

loan

TOPIC 1

money

loan

bank

money

bank

river

TOPIC 2

river

riverstream

bank

bank

stream

bank

loanDOCUMENT 2: river2 stream2 bank2 stream2 bank2 money1 loan1 river2 stream2 loan1 bank2 river2 bank2 bank1 stream2

river2 loan1 bank2 stream2 bank2 money1 loan1 river2 stream2

bank2 stream2 bank2 money1 river2 stream2 loan1 bank2 river2

bank2 money1 bank1 stream2 river2 bank2 stream2 bank2

money1

DOCUMENT 1: money1 bank1 bank1 loan1 river2 stream2

bank1 money1 river2 bank1 money1 bank1 loan1 money1

stream2 bank1 money1 bank1 bank1 loan1 river2 stream2 bank1

money1 river2 bank1 money1 bank1 loan1 bank1 money1

stream2

.3

.8

.2

Mixture components

Mixture weights

Bayesian approach: use priors Mixture weights ~ Dirichlet( α ) Mixture components ~ Dirichlet( β )

.7

Vision: Topic = Object categories

Simple Model: Unigram

Words of document are drawn IID from a single multinomial distribution:

Unigram Mixture Model

First choose topic z, then generate words conditionally independent given topic.

Unigram Mixture Model

First choose topic z, then generate words conditionally independent given topic.

Unigram Mixture Model

First choose topic z, then generate words conditionally independent given topic.

Probabilistic Latent Semantic Indexing (Hoffman, 1999) Document d in training set, and word wn are

conditionally independent given topic.

Not truly generative (dummy r.v. d). Number of parameters grows with size of corpus (overfitting).

Document may contain several topics.

Vision app.: Sivic et al., 2005

wN

d z

D

“face”

LDA

LDA

LDA

LDA

LDA

Vision app.: Fei Fei Li, 2005

wN

c z

D

π

“beach”

Example: Word density distribution

A geometric interpretation

LDA

Topics sampled repeatedly in each Document (like pLSI).

But, number of parameters does not grow with size of corpus.

Problem: Inference.

LDA - Inference

Coupling between Dirchlet distribuions makes inference intractable.

Blei, 2001: Variational Approximation

LDA - Inference

Other procedures: Monte Carlo Markov Chin (Griffith et al.,

2002)Expectation Propagation (Minka et al., 2002)

Experiments

Perplexity: Inverse of geometric mean per-word likelihood (monotonically decreasing function of likelihood):

Idea: Lower Perplexity implies better generalization.

Experiments – Nematode corpus

Experiments – AP corpus

Polysemy

PRINTINGPAPERPRINT

PRINTEDTYPE

PROCESSINK

PRESSIMAGE

PRINTERPRINTS

PRINTERSCOPY

COPIESFORM

OFFSETGRAPHICSURFACE

PRODUCEDCHARACTERS

PLAYPLAYSSTAGE

AUDIENCETHEATERACTORSDRAMA

SHAKESPEAREACTOR

THEATREPLAYWRIGHT

PERFORMANCEDRAMATICCOSTUMESCOMEDYTRAGEDY

CHARACTERSSCENESOPERA

PERFORMED

TEAMGAME

BASKETBALLPLAYERSPLAYERPLAY

PLAYINGSOCCERPLAYED

BALLTEAMSBASKET

FOOTBALLSCORECOURTGAMES

TRYCOACH

GYMSHOT

JUDGETRIALCOURT

CASEJURY

ACCUSEDGUILTY

DEFENDANTJUSTICEEVIDENCEWITNESSES

CRIMELAWYERWITNESS

ATTORNEYHEARING

INNOCENTDEFENSECHARGE

CRIMINAL

HYPOTHESISEXPERIMENTSCIENTIFIC

OBSERVATIONSSCIENTISTS

EXPERIMENTSSCIENTIST

EXPERIMENTALTEST

METHODHYPOTHESES

TESTEDEVIDENCE

BASEDOBSERVATION

SCIENCEFACTSDATA

RESULTSEXPLANATION

STUDYTEST

STUDYINGHOMEWORK

NEEDCLASSMATHTRY

TEACHERWRITEPLAN

ARITHMETICASSIGNMENT

PLACESTUDIED

CAREFULLYDECIDE

IMPORTANTNOTEBOOK

REVIEW

Choosing number of topics Subjective interpretability

Bayesian model selection Griffiths & Steyvers (2004)

Generalization test

Non-parametric Bayesian statistics Infinite models; models that grow with size of data

Teh, Jordan, Teal, & Blei (2004) Blei, Griffiths, Jordan, Tenenbaum (2004)