Nicolas loeff lda
-
Upload
jun-wang -
Category
Technology
-
view
577 -
download
3
Transcript of Nicolas loeff lda
Latent Dirichlet Allocation
D. Blei, A. Ng, M. Jordan
Includes some slides adapted from J.Ramos at Rutgers, M. Steyvers and M. Rozen-Zvi at UCI, L. Fei Fei at UIUC.
Overview
What is so special about text? Classification methods LSI Unigram / Mixture of Unigram Probabilistic LSI (Aspect Model) LDA model Geometric interpretation
What is so special about text?
No obvious relation between features High dimensionality, (often larger
vocabulary, V, than the number of features!)
Importance of speed
The need for dimensionality reduction Representation:
Presenting documents as vectors in the words space - ‘bag of words’ representation
It is a sparse representation, V>>|D| A need to define conceptual closeness
Bag of wordsOf all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain, visual, perception,
retinal, cerebral cortex,eye, cell, optical
nerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.
China, trade, surplus, commerce,
exports, imports, US, yuan, bank, domestic,
foreign, increase, trade, value
Bag of words
Order of words in document can be ignored, only count matters.
Probability theory: Exchangability (includes IID) (Aldous, 1985).
Exchangable RVs have a representation as mixture distribution (de Finetti, 1990).
What does this have to do with Vision?
ObjectObject Bag of Bag of ‘words’‘words’
TF-IDF Weighing Scheme (Salton and McGill, 1983) Given corpus D, word w, document d,
calculate wd = fw, d · log (|D|/fw, D) Many varieties of basic scheme.
Search procedure:Scan each d, compute each wi, d, return set D’ that maximizes Σi wi, d
A Spatial Representation: Latent Semantic Analysis (Deerwester, 1990)
Document/Term count matrix
1…
16…
0…
SCIENCE…
6190RESEARCH
2012SOUL
3034LOVE
Doc3 … Doc2Doc1SVD
High dimensional space, not as high as |V|
SOUL
RESEARCH
LOVE
SCIENCE
• Each word is a single point in semantic space (dimensionality reduction)• Similarity measured by cosine of angle between word vectors
Feature Vector representation
From: Modeling the Internet and the Web: Probabilistic methods and Algorithms, Pierre Baldi, Paolo Frasconi, Padhraic Smyth
Classification: assigning words to topics
Different models for data:
Discrete Classifier, modeling the boundaries between different classes of the data
Prediction of Categorical output e.g., SVM
Density Estimator: modeling the distribution of the data points themselves
Generative Models e.g. NB
Generative Models – Latent semantic structure
Latent Structure
Words
∑=
),()( ww PPDistribution over words
)()()|()|(
wwwP
PPP =
Inferring latent structure
Topic Models Unsupervised learning of topics (“gist”) of
documents: articles/chapters conversations emails .... any verbal context
Topics are useful latent structures to explain semantic association
Probabilistic Generative Model
Each document is a probability distribution over topics
Each topic is a probability distribution over words
Generative Process
loan
TOPIC 1
money
loan
bank
money
bank
river
TOPIC 2
river
riverstream
bank
bank
stream
bank
loanDOCUMENT 2: river2 stream2 bank2 stream2 bank2 money1 loan1 river2 stream2 loan1 bank2 river2 bank2 bank1 stream2
river2 loan1 bank2 stream2 bank2 money1 loan1 river2 stream2
bank2 stream2 bank2 money1 river2 stream2 loan1 bank2 river2
bank2 money1 bank1 stream2 river2 bank2 stream2 bank2
money1
DOCUMENT 1: money1 bank1 bank1 loan1 river2 stream2
bank1 money1 river2 bank1 money1 bank1 loan1 money1
stream2 bank1 money1 bank1 bank1 loan1 river2 stream2 bank1
money1 river2 bank1 money1 bank1 loan1 bank1 money1
stream2
.3
.8
.2
Mixture components
Mixture weights
Bayesian approach: use priors Mixture weights ~ Dirichlet( α ) Mixture components ~ Dirichlet( β )
.7
Vision: Topic = Object categories
Simple Model: Unigram
Words of document are drawn IID from a single multinomial distribution:
Unigram Mixture Model
First choose topic z, then generate words conditionally independent given topic.
Unigram Mixture Model
First choose topic z, then generate words conditionally independent given topic.
Unigram Mixture Model
First choose topic z, then generate words conditionally independent given topic.
Probabilistic Latent Semantic Indexing (Hoffman, 1999) Document d in training set, and word wn are
conditionally independent given topic.
Not truly generative (dummy r.v. d). Number of parameters grows with size of corpus (overfitting).
Document may contain several topics.
Vision app.: Sivic et al., 2005
wN
d z
D
“face”
LDA
LDA
LDA
LDA
LDA
Vision app.: Fei Fei Li, 2005
wN
c z
D
π
“beach”
Example: Word density distribution
A geometric interpretation
LDA
Topics sampled repeatedly in each Document (like pLSI).
But, number of parameters does not grow with size of corpus.
Problem: Inference.
LDA - Inference
Coupling between Dirchlet distribuions makes inference intractable.
Blei, 2001: Variational Approximation
LDA - Inference
Other procedures: Monte Carlo Markov Chin (Griffith et al.,
2002)Expectation Propagation (Minka et al., 2002)
Experiments
Perplexity: Inverse of geometric mean per-word likelihood (monotonically decreasing function of likelihood):
Idea: Lower Perplexity implies better generalization.
Experiments – Nematode corpus
Experiments – AP corpus
Polysemy
PRINTINGPAPERPRINT
PRINTEDTYPE
PROCESSINK
PRESSIMAGE
PRINTERPRINTS
PRINTERSCOPY
COPIESFORM
OFFSETGRAPHICSURFACE
PRODUCEDCHARACTERS
PLAYPLAYSSTAGE
AUDIENCETHEATERACTORSDRAMA
SHAKESPEAREACTOR
THEATREPLAYWRIGHT
PERFORMANCEDRAMATICCOSTUMESCOMEDYTRAGEDY
CHARACTERSSCENESOPERA
PERFORMED
TEAMGAME
BASKETBALLPLAYERSPLAYERPLAY
PLAYINGSOCCERPLAYED
BALLTEAMSBASKET
FOOTBALLSCORECOURTGAMES
TRYCOACH
GYMSHOT
JUDGETRIALCOURT
CASEJURY
ACCUSEDGUILTY
DEFENDANTJUSTICEEVIDENCEWITNESSES
CRIMELAWYERWITNESS
ATTORNEYHEARING
INNOCENTDEFENSECHARGE
CRIMINAL
HYPOTHESISEXPERIMENTSCIENTIFIC
OBSERVATIONSSCIENTISTS
EXPERIMENTSSCIENTIST
EXPERIMENTALTEST
METHODHYPOTHESES
TESTEDEVIDENCE
BASEDOBSERVATION
SCIENCEFACTSDATA
RESULTSEXPLANATION
STUDYTEST
STUDYINGHOMEWORK
NEEDCLASSMATHTRY
TEACHERWRITEPLAN
ARITHMETICASSIGNMENT
PLACESTUDIED
CAREFULLYDECIDE
IMPORTANTNOTEBOOK
REVIEW
Choosing number of topics Subjective interpretability
Bayesian model selection Griffiths & Steyvers (2004)
Generalization test
Non-parametric Bayesian statistics Infinite models; models that grow with size of data
Teh, Jordan, Teal, & Blei (2004) Blei, Griffiths, Jordan, Tenenbaum (2004)