Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54...
Transcript of Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54...
![Page 2: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/2.jpg)
23/05/16 2
Conference Dinner
![Page 3: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/3.jpg)
23/05/16 3
Conference dinner
- I sit at a table with a probability proportional to the number
of people already sitting there
![Page 4: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/4.jpg)
23/05/16 4
Conference dinner
- I sit at a table with a probability proportional to the number
of people already sitting there
- If everybody does the same and there are more and more
people entering, the probabilities for choosing the tables
converge
![Page 5: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/5.jpg)
23/05/16 5
Conference dinner
- I sit at a table with a probability proportional to the number
of people already sitting there
- If everybody does the same and there are more and more
people entering, the probabilities for choosing the tables
converge
- The scheme yields a sample of a Dirichlet distribution
Parameters: initial number of participants at each table
![Page 6: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/6.jpg)
23/05/16 6
Dirichlet Distribution
- The scheme yields a sample of a Dirichlet distribution
Parameters: initial number of participants at each table
- “rich get richer”, preferential attachment
- Initial settings of < 1 participant at each table produce
sparse distributions
![Page 7: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/7.jpg)
23/05/16 7
In reality, I choose tables based on the number of people AND the topic they talk about!
![Page 8: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/8.jpg)
23/05/16 8
In reality, I choose tables based on the number of people AND the topic they talk about!
![Page 9: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/9.jpg)
23/05/16 9
Topics
![Page 10: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/10.jpg)
23/05/16 10
![Page 11: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/11.jpg)
23/05/16 11
Articles are labelled with tags(e.g. politics, economy, sports, ...)
![Page 12: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/12.jpg)
23/05/16 12
Politics: election, party, vote, candidate, ...Economy: dollar, crisis, financial, market, …Sports: soccer, basketball, match, score, ...
Articles are labelled with tags(e.g. politics, economy, sports, ...)
![Page 13: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/13.jpg)
23/05/16 13
Politics: election, party, vote, candidate, ...Economy: dollar, crisis, financial, market, …Sports: soccer, basketball, match, score, ...
Articles are labelled with tags(e.g. politics, economy, sports, ...)
Topics
![Page 14: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/14.jpg)
23/05/16 14
Topic Modelling
![Page 15: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/15.jpg)
23/05/16 15
Topic Modelling
Automatically extract topics from text documents!
![Page 16: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/16.jpg)
23/05/16 16
Latent Semantic Analysis
![Page 17: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/17.jpg)
23/05/16 17
Term-document matrix
high occurrencelow occurrence
![Page 18: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/18.jpg)
23/05/16 18
Term-document matrix
term frequencies document 4
high occurrencelow occurrence
![Page 19: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/19.jpg)
23/05/16 19
Term-document matrix
how often does document 4contain the word “blood”?
high occurrencelow occurrence
![Page 20: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/20.jpg)
23/05/16 20
Latent Semantic Analysis (LSA)
- Topic model based on “matrix decomposition”
![Page 21: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/21.jpg)
23/05/16 21
Latent Semantic Analysis (LSA)
- Topic model based on “matrix decomposition”
- Topics are described by “loadings” over the terms
Topic 1
![Page 22: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/22.jpg)
23/05/16 22
The Test Dataset
![Page 23: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/23.jpg)
23/05/16 23
probabilistic topic model probabilistic topic model probabilistic topic model probabilistic topic model probabilistic topic model probabilistic topic model
probabilistic topic model famous fashion model
famous fashion modelfamous fashion model
famous fashion model famous fashion model famous fashion model famous fashion model famous fashion model famous fashion model famous fashion model famous fashion model famous fashion model famous fashion model
document 0: document 1: document 2: document 3: document 4: document 5: document 6: document 7: document 8: document 9: document 10: document 11: document 12: document 13: document 14: document 15: document 16: document 17: document 18: document 19:
Test dataset
![Page 24: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/24.jpg)
23/05/16 24
Topic 1: famous, fashion, modelTopic 2: model, probabilistic, topic
Expected topics
probabilistic topic model probabilistic topic model …
famous fashion model famous fashion model
...
Test dataset
![Page 25: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/25.jpg)
23/05/16 25
Test dataset
Term-document matrix
![Page 26: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/26.jpg)
23/05/16 26
Test dataset
Term-document matrix
![Page 27: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/27.jpg)
23/05/16 27
Test dataset
Term-document matrix
![Page 28: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/28.jpg)
23/05/16 28
Topic 1 Topic 2
LSA
![Page 29: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/29.jpg)
23/05/16 29
Topic 1 Topic 2
LSA
![Page 30: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/30.jpg)
23/05/16 30
Topic 1 Topic 2
LSA
![Page 31: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/31.jpg)
23/05/16 31
Topic 1 Topic 2
LSA
![Page 32: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/32.jpg)
23/05/16 32
Topic 1 Topic 2
LSA
![Page 33: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/33.jpg)
23/05/16 33
LSA – Weaknesses
- Topic loadings can be negative → hard to interpret!
- LSA has problems to cope with word ambiguities
![Page 34: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/34.jpg)
23/05/16 34
Probabilistic LSA
![Page 35: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/35.jpg)
23/05/16 35
Probabilistic LSA (PLSA)
- Based on categorical distributions
![Page 36: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/36.jpg)
23/05/16 36
Probabilistic LSA (PLSA)
- Based on categorical distributions
- Probabilistic model that explains
the creation of documents
![Page 37: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/37.jpg)
23/05/16 37
Probabilistic LSA (PLSA)
The PLSA model for the creation of words in documents:
1) Documents have each a categorical distribution
over the topics
![Page 38: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/38.jpg)
23/05/16 38
Probabilistic LSA (PLSA)
The PLSA model for the creation of words in documents:
1) Documents have each a categorical distribution
over the topics
2) Topics have each a categorical distribution over
all words
![Page 39: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/39.jpg)
23/05/16 39
Probabilistic LSA (PLSA)
The PLSA model for the creation of words in documents:
1) Documents have each a categorical distribution
over the topics
2) Topics have each a categorical distribution over
all words
3) Creation of a word in document i:
1)Draw a topic from
2)Draw a word from
![Page 40: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/40.jpg)
23/05/16 40
Topic 1 Topic 2
Probabilistic LSA (PLSA)
![Page 41: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/41.jpg)
23/05/16 41
Topic 1 Topic 2
Probabilistic LSA (PLSA)
![Page 42: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/42.jpg)
23/05/16 42
Topic 1 Topic 2
Probabilistic LSA (PLSA)
![Page 43: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/43.jpg)
23/05/16 43
Document 0 (probabilistic topic model)
Probabilistic LSA (PLSA)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Topic 1 Topic 2
Loadin
g
![Page 44: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/44.jpg)
23/05/16 44
Document 0 (probabilistic topic model) Document 7 (famous fashion model)
Probabilistic LSA (PLSA)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Topic 1 Topic 2
Loadin
g
![Page 45: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/45.jpg)
23/05/16 45
PLSA – Strengths & Weaknesses
- Topics are probability distributions and easy to interpret!
- PLSA still has problems to cope with ambiguous words
![Page 46: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/46.jpg)
23/05/16 46
Latent Dirichlet Allocation
![Page 47: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/47.jpg)
23/05/16 47
Latent Dirichlet Allocation (LDA)
- A word in a document is likely to belong to the same topic
as the other words of that document
![Page 48: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/48.jpg)
23/05/16 48
Latent Dirichlet Allocation (LDA)
- A word in a document is likely to belong to the same topic
as the other words of that document
famous fashion modeldocument 7:
![Page 49: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/49.jpg)
23/05/16 49
Latent Dirichlet Allocation (LDA)
- A word in a document is likely to belong to the same topic
as the other words of that document
famous fashion model
Topic 1 Topic 1 ?
document 7:
![Page 50: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/50.jpg)
23/05/16 50
Latent Dirichlet Allocation (LDA)
- A word in a document is likely to belong to the same topic
as the other words of that document
famous fashion model
Topic 1 Topic 1 → Topic 1
document 7:
![Page 51: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/51.jpg)
23/05/16 51
Latent Dirichlet Allocation (LDA)
- A word in a document is likely to belong to the same topic
as the other words of that document
probabilistic topic modeldocument 0:
![Page 52: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/52.jpg)
23/05/16 52
Latent Dirichlet Allocation (LDA)
- A word in a document is likely to belong to the same topic
as the other words of that document
probabilistic topic model
Topic 2 Topic 2 ?
document 0:
![Page 53: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/53.jpg)
23/05/16 53
Latent Dirichlet Allocation (LDA)
- A word in a document is likely to belong to the same topic
as the other words of that document
probabilistic topic model
Topic 2 Topic 2 → Topic 2
document 0:
![Page 54: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/54.jpg)
23/05/16 54
Latent Dirichlet Allocation (LDA)
- A word in a document is likely to belong to the same topic
as the other words of that document
- We would need some preference for already assigned
topics in a document
![Page 55: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/55.jpg)
23/05/16 55
Latent Dirichlet Allocation (LDA)
- A word in a document is likely to belong to the same topic
as the other words of that document
- We would need some preference for already assigned
topics in a document
→ Dirichlet distribution!
![Page 56: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/56.jpg)
23/05/16 56
Dirichlet distribution
![Page 57: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/57.jpg)
23/05/16 57
Dirichlet distribution
![Page 58: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/58.jpg)
23/05/16 58
Topic 1 Topic 2
Latent Dirichlet Allocation (LDA)
![Page 59: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/59.jpg)
23/05/16 59
Document 0 (probabilistic topic model) Document 7 (famous fashion model)
Probabilistic topic model (with sparse Dirichlet)
![Page 60: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/60.jpg)
23/05/16 60
LDA – Strengths
- LDA can cope with ambiguous words!
- Most popular topic model
![Page 61: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/61.jpg)
23/05/16 61
(Human) Evaluation
![Page 62: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/62.jpg)
23/05/16 62
PLSA LDATopic 1 family,registered,like,hard,members,… first,network,time,won,week,third,...
Topic 2 high,left,planned,organization,story,… two,house,found,police,car,home,..
Topic 3 normal,predicted,first,chief,health,… cents,futures,cent,lower,higher,...
… … ...
![Page 63: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/63.jpg)
23/05/16 63
Topic Model Game
- Tests the semantic coherence of topics
- Given the top-5 words of a topic and an intruder word
from a different topic – find the intruder word!
![Page 64: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/64.jpg)
23/05/16 64
Topic Model Game
Given the top-5 words of a topic and an intruder word from a different topic – find the intruder word!
air pollution power blood environmental nuclear
![Page 65: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/65.jpg)
23/05/16 65
Topic Model Game
Given the top-5 words of a topic and an intruder word from a different topic – find the intruder word!
air pollution power blood environmental nuclear
![Page 66: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/66.jpg)
23/05/16 66
Topic Model Game
https://tinyurl.com/tmt16
![Page 67: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/67.jpg)
23/05/16 67
Summary
![Page 68: Topic Model Tutorial Part 1 – The IntuitionTopic 2 Topic 2 → Topic 2 document 0: 23/05/16 54 Latent Dirichlet Allocation (LDA) - A word in a document is likely to belong to the](https://reader036.fdocuments.us/reader036/viewer/2022081411/60abbd503e77966f9b494fe3/html5/thumbnails/68.jpg)
23/05/16 68
Summary
- Dirichlet distribution (Polya urn scheme)
- Latent Semantic Analysis (LSA)
- Probabilistic Latent Semantic Analysis (PLSA)
- Latent Dirichlet Allocation (LDA)
- Human evaluation of topic models