• date post

16-Apr-2017
• Category

## Education

• view

18.978

2

Embed Size (px)

### Transcript of LDA Beginner's Tutorial

Presentation Template Guidelines

Latent Dirichlet Allocation (LDA)- for ML-IR Discussion Group1

Prepared by Wayne Tai Lee, Satpreet Singh

Unsupervised learningBayesian StatisticsMixture ModelsLDA theory and intuitionLDA practice and applications

Clustering is a form of Unsupervised learning Classification is known as supervised learningValidation is difficult

How would you cluster?

5Documents of wikipediaNow try these ones!

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEClustering documents is difficult because many repeated words are used. Some documents may be similar to one another on different topics. So we might want to cluster allowing membership.5

Bayesian StatisticsA framework to update your beliefs6

Probabilities as beliefsUpdates your belief as data is observedRequires a model that describes the data generation

Candidate potentialExample: Evaluating Candidates

Candidate potentialExample: Evaluating Candidates

Schooling

Experience

Interview

Internship

Candidate potentialExample: Evaluating Candidates

Schooling

Experience

Interview

Internship

How to update?!

Model for candidates

Model for data generation

An easy way to build hierarchical relationships

Candidate QualityHighLow

14

Marginal Distribution of Candidate Performance: ignore quality

Distribution of Candidate Performance:

Distribution of Candidate Performance:

Mixture Weights

Mixture Weights

Distribution of Candidate Performance:

Distribution of Candidate Performance:

????

20

How are words in a document generated?21

Each word comes from different topics

Mixture Weightfor Topic k

Multinomial Distributionover ALL words basedon topic k

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.21

Just a mixture model22

WordTopic 1Topic K

Just a mixture model23

WordTopic 1Topic K

1) Pick a topic

2) Pick a word

Just a mixture model24

WordTopic 1Topic K

The chosen Topic: Z

Just a mixture model25

WordTopic 1Topic K

So we really want to knowZ__

The chosen Topic: Z

Just a mixture model26

WordTopic 1Topic K

So we really want to knowZ (cluster for the word) (document composition) (key words)

The chosen Topic: Z

Review!27

ZW

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.27

28

Zd,n

k=1KWd,n

n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documents

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.28

29

Zd,n

k=1KWd,n

n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documentsBayesian: But what about the distribution for and ??

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.29

30

Zd,n

k=1KWd,n

n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documentsBayesian: But what about the distribution for and ??

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.30

31

and control the sparsity of the weights for the multinomial.Implications: a priori we assumeTopics have few key words Documents only have a small subset of topics

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.31

Dirichlet Distribution with Different Sparsity Parameters32

Latent Dirichlet Allocation!!!

Zd,n

k=1KWd,n

n=1,,Nd

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.33

34

How do we fit this model?

Want the posterior:

Worst part of Bayesian Analysis..personally speaking~

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.34

35

Two main ways to get posterior:Sampling methodsAsymtotically correctTime consumingLots of black magic in sampling tricksVariational methods (practical solution!)An approximation with no guaranteesFasterNeed math skills

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.35

36

Variational Bayes (specifically mean field variational bayes):Whats crazy?Assumes all the latent variables are independentWhats not crazy?Finds the best model within this crazy class.Best under KL divergence

Empirically have shown promising results!

For sufficient details:Explaining Variational Approximations by Ormerod and Wand

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.36

LDA Take Home

37An intuitively appealing Bayesian unsupervised learning modelTraining is difficultLots of packages exist, main issue is scalabilityValidation is difficultUsually cast into a supervised learning frameworkPresentation is difficultVisualization for the Bayesian model is hard.