LDA Beginner's Tutorial

download LDA Beginner's Tutorial

If you can't read please download the document

Transcript of LDA Beginner's Tutorial

Presentation Template Guidelines

Latent Dirichlet Allocation (LDA)- for ML-IR Discussion Group1

Prepared by Wayne Tai Lee, Satpreet Singh

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMELatent Dirichlet Allocation:A Bayesian Unsupervised Learning ModelRoadmap2

Unsupervised learningBayesian StatisticsMixture ModelsLDA theory and intuitionLDA practice and applications

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEUnsupervised LearningLearning patterns with no labels3

Clustering is a form of Unsupervised learning Classification is known as supervised learningValidation is difficult

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME4

How would you cluster?

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMETake home: validation is difficult.no true answer here.4

5Documents of wikipediaNow try these ones!

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEClustering documents is difficult because many repeated words are used. Some documents may be similar to one another on different topics. So we might want to cluster allowing membership.5

Bayesian StatisticsA framework to update your beliefs6

Probabilities as beliefsUpdates your belief as data is observedRequires a model that describes the data generation

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME7

Candidate potentialExample: Evaluating Candidates

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME8

Candidate potentialExample: Evaluating Candidates

Schooling

Experience

Interview

Internship

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME9

Candidate potentialExample: Evaluating Candidates

Schooling

Experience

Interview

Internship

How to update?!

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME10

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME11

Model for candidates

Model for data generation

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEMixture ModelsA popular statistical model12

An easy way to build hierarchical relationships

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEMixture models visualized13

Candidate QualityHighLow

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process13

14

Marginal Distribution of Candidate Performance: ignore quality

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME15

Distribution of Candidate Performance:

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME16

Distribution of Candidate Performance:

Mixture Weights

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME17

Mixture Weights

Distribution of Candidate Performance:

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME18

Distribution of Candidate Performance:

????

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEHow are words in a document generated?19

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEOne possibility:20Each word comes from different topics (bag of words: ignore order)

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME

20

How are words in a document generated?21

Each word comes from different topics

Mixture Weightfor Topic k

Multinomial Distributionover ALL words basedon topic k

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.21

Just a mixture model22

WordTopic 1Topic K

LeadershipBig DataMachine Learning

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process22

Just a mixture model23

WordTopic 1Topic K

LeadershipBig DataMachine Learning

1) Pick a topic

2) Pick a word

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process23

Just a mixture model24

WordTopic 1Topic K

LeadershipBig DataMachine Learning

The chosen Topic: Z

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process24

Just a mixture model25

WordTopic 1Topic K

LeadershipBig DataMachine Learning

So we really want to knowZ__

The chosen Topic: Z

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process25

Just a mixture model26

WordTopic 1Topic K

LeadershipBig DataMachine Learning

So we really want to knowZ (cluster for the word) (document composition) (key words)

The chosen Topic: Z

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process26

Review!27

ZW

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.27

28

Zd,n

k=1KWd,n

n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documents

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.28

29

Zd,n

k=1KWd,n

n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documentsBayesian: But what about the distribution for and ??

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.29

30

Zd,n

k=1KWd,n

n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documentsBayesian: But what about the distribution for and ??

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.30

31

and control the sparsity of the weights for the multinomial.Implications: a priori we assumeTopics have few key words Documents only have a small subset of topics

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.31

Dirichlet Distribution with Different Sparsity Parameters32

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME33

Latent Dirichlet Allocation!!!

Zd,n

k=1KWd,n

n=1,,Nd

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.33

34

How do we fit this model?

Want the posterior:

Worst part of Bayesian Analysis..personally speaking~

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.34

35

Two main ways to get posterior:Sampling methodsAsymtotically correctTime consumingLots of black magic in sampling tricksVariational methods (practical solution!)An approximation with no guaranteesFasterNeed math skills

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.35

36

Variational Bayes (specifically mean field variational bayes):Whats crazy?Assumes all the latent variables are independentWhats not crazy?Finds the best model within this crazy class.Best under KL divergence

Empirically have shown promising results!

For sufficient details:Explaining Variational Approximations by Ormerod and Wand

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.36

LDA Take Home

37An intuitively appealing Bayesian unsupervised learning modelTraining is difficultLots of packages exist, main issue is scalabilityValidation is difficultUsually cast into a supervised learning frameworkPresentation is difficultVisualization for the Bayesian model is hard.

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME