Visualzing Topic Models

36
Visualizing Topic Models Ben Mabey @bmabey

Transcript of Visualzing Topic Models

Page 1: Visualzing Topic Models

Visualizing Topic Models

Ben Mabey @bmabey

Page 2: Visualzing Topic Models

2

Page 3: Visualzing Topic Models

2

Latent Dirichlet Allocation (LDA)

Page 4: Visualzing Topic Models

2

0 1 … kdoc a 0.25 0.14 … 0.02doc b 0.01 0.30 … 0.09… … … … 0.31doc D 0.13 0.07 … 0.01

Document-Topic Distributions

Latent Dirichlet Allocation (LDA)

Page 5: Visualzing Topic Models

2

0 1 … kdoc a 0.25 0.14 … 0.02doc b 0.01 0.30 … 0.09… … … … 0.31doc D 0.13 0.07 … 0.01

Document-Topic Distributions

0 1 … kbird 0.002 0.01 … 0.004coffee 0.001 0.003 … 0.009… … … … 0.031work 0.002 0.006 … 0.021

Term-Topic DistributionsLatent Dirichlet Allocation

(LDA)

Page 6: Visualzing Topic Models

3

Page 7: Visualzing Topic Models

3

250k+ stories July 2007 - May 2014

Page 8: Visualzing Topic Models

4

Game written by 14 year old passes Angry Birds as the top free iphone app

Page 9: Visualzing Topic Models

4

Topic P(T|D)

58 0.19

38 0.14

16 0.06

… …

Game written by 14 year old passes Angry Birds as the top free iphone app

Page 10: Visualzing Topic Models

4

Topic P(T|D)

58 0.19

38 0.14

16 0.06

… …

58 38 16

app game language

developer player code

mobile video game programming

user gaming java

app store developer programmer

Game written by 14 year old passes Angry Birds as the top free iphone app

Page 11: Visualzing Topic Models

5

Topic P(T|D)

mobile apps 0.19

38 0.14

16 0.06

… …

Table 2

58mobile apps 38video games 16programming

app game language

developer player code

application video game programming

user gaming java

app store developer programmer

mobile play programming language

mobile apps 38 16

app game language

developer player code

mobile video game programming

user gaming java

app store developer programmer

Game written by 14 year old passes Angry Birds as the top free iphone app

Page 12: Visualzing Topic Models

6

Topic P(T|D)

mobile apps 0.19

video games 0.14

16 0.06

… …

Table 2

58mobile apps 38video games 16programming

app game language

developer player code

application video game programming

user gaming java

app store developer programmer

mobile play programming language

mobile apps video games 16

app game language

developer player code

mobile video game programming

user gaming java

app store developer programmer

Game written by 14 year old passes Angry Birds as the top free iphone app

Page 13: Visualzing Topic Models

7

Topic P(T|D)

mobile apps 0.19

video games 0.14

programming 0.06

… …

Table 2

58mobile apps 38video games 16programming

app game language

developer player code

application video game programming

user gaming java

app store developer programmer

mobile play programming language

mobile apps video games programming

app game language

developer player code

mobile video game programming

user gaming java

app store developer programmer

Game written by 14 year old passes Angry Birds as the top free iphone app

Page 14: Visualzing Topic Models

8

Interpreting Topic Models

Page 15: Visualzing Topic Models

8

What  is  the  meaning  of  each  topic?

Interpreting Topic Models

Page 16: Visualzing Topic Models

8

What  is  the  meaning  of  each  topic?

How  prevalent  is  each  topic?

Interpreting Topic Models

Page 17: Visualzing Topic Models

8

What  is  the  meaning  of  each  topic?

How  prevalent  is  each  topic?

How  do  the  topics  relate  to  each  other?

Interpreting Topic Models

Page 18: Visualzing Topic Models

8

What  is  the  meaning  of  each  topic?

How  prevalent  is  each  topic?

How  do  the  topics  relate  to  each  other?

How  do  the  documents  relate  to  each  other?

Interpreting Topic Models

Page 19: Visualzing Topic Models

9

What  is  the  meaning  of  each  topic?  

How  prevalent  is  each  topic?  

How  do  the  topics  relate  to  each  other?  

How  do  the  documents  relate  to  each  other?

Interpreting Topic Models

Page 20: Visualzing Topic Models

LDAvis

10

https://github.com/cpsievert/LDAvis

Page 21: Visualzing Topic Models

pyLDAvis

11

https://github.com/bmabey/pyLDAvis

py

Page 22: Visualzing Topic Models

pyLDAvis

11

https://github.com/bmabey/pyLDAvis

py

BYOM (bring your own model)

Page 23: Visualzing Topic Models

Demo Time!

12

Page 24: Visualzing Topic Models

Distinctiveness & Saliency

13

Termite: Visualization Techniques for Assessing Textual Topic Models Jason Chuang, Christopher D. Manning and Jeffrey Heer. 2012

measure  how  much  information  a  term  conveys  about  topics

Page 25: Visualzing Topic Models

Distinctiveness & Saliency

14

coding tech news video games distinctiveness P(w) saliency

game 10 10 50 0.03 0.28 0.01apple 20 40 20 -0.16 0.32 -0.05

angry birds 1 1 30 0.25 0.13 0.03

python 50 5 10 0.17 0.26 0.05

TOTAL 81 56 110

P(T|game) 0.14 0.14 0.71

P(T|apple) 0.25 0.50 0.25

P(T|angry birds) 0.03 0.03 0.94

P(T|pyhton) 0.77 0.08 0.15

P(T) 0.33 0.23 0.45

Page 26: Visualzing Topic Models

Distinctiveness & Saliency

14

coding tech news video games distinctiveness P(w) saliency

game 10 10 50 0.03 0.28 0.01apple 20 40 20 -0.16 0.32 -0.05

angry birds 1 1 30 0.25 0.13 0.03

python 50 5 10 0.17 0.26 0.05

TOTAL 81 56 110

P(T|game) 0.14 0.14 0.71

P(T|apple) 0.25 0.50 0.25

P(T|angry birds) 0.03 0.03 0.94

P(T|pyhton) 0.77 0.08 0.15

P(T) 0.33 0.23 0.45

computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Page 27: Visualzing Topic Models

Distinctiveness & Saliency

14

coding tech news video games distinctiveness P(w) saliency

game 10 10 50 0.03 0.28 0.01apple 20 40 20 -0.16 0.32 -0.05

angry birds 1 1 30 0.25 0.13 0.03

python 50 5 10 0.17 0.26 0.05

TOTAL 81 56 110

P(T|game) 0.14 0.14 0.71

P(T|apple) 0.25 0.50 0.25

P(T|angry birds) 0.03 0.03 0.94

P(T|pyhton) 0.77 0.08 0.15

P(T) 0.33 0.23 0.45

computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Page 28: Visualzing Topic Models

Distinctiveness & Saliency

15

coding tech news video games distinctiveness P(w) saliency

game 10 10 50 0.03 0.28 0.01apple 20 40 20 -0.16 0.32 -0.05

angry birds 1 1 30 0.25 0.13 0.03

python 50 5 10 0.17 0.26 0.05

TOTAL 81 56 110

P(T|game) 0.14 0.14 0.71

P(T|apple) 0.25 0.50 0.25

P(T|angry birds) 0.03 0.03 0.94

P(T|pyhton) 0.77 0.08 0.15

P(T) 0.33 0.23 0.45

computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Page 29: Visualzing Topic Models

Distinctiveness & Saliency

16

coding tech news video games distinctiveness P(w) saliency

game 10 10 50 0.03 0.28 0.01apple 20 40 20 -0.16 0.32 -0.05

angry birds 1 1 30 0.25 0.13 0.03

python 50 5 10 0.17 0.26 0.05

TOTAL 81 56 110

P(T|game) 0.14 0.14 0.71

P(T|apple) 0.25 0.50 0.25

P(T|angry birds) 0.03 0.03 0.94

P(T|pyhton) 0.77 0.08 0.15

P(T) 0.33 0.23 0.45

computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Page 30: Visualzing Topic Models

Distinctiveness & Saliency

17

coding tech news video games distinctiveness P(w) saliency

game 10 10 50 0.03 0.28 0.01apple 20 40 20 -0.16 0.32 -0.05

angry birds 1 1 30 0.25 0.13 0.03

python 50 5 10 0.17 0.26 0.05

TOTAL 81 56 110

P(T|game) 0.14 0.14 0.71

P(T|apple) 0.25 0.50 0.25

P(T|angry birds) 0.03 0.03 0.94

P(T|pyhton) 0.77 0.08 0.15

P(T) 0.33 0.23 0.45

computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Page 31: Visualzing Topic Models

Distinctiveness & Saliency

18

coding tech news video games distinctiveness P(w) saliency

game 10 10 50 0.03 0.28 0.01apple 20 40 20 -0.16 0.32 -0.05

angry birds 1 1 30 0.25 0.13 0.03

python 50 5 10 0.17 0.26 0.05

TOTAL 81 56 110

P(T|game) 0.14 0.14 0.71

P(T|apple) 0.25 0.50 0.25

P(T|angry birds) 0.03 0.03 0.94

P(T|pyhton) 0.77 0.08 0.15

P(T) 0.33 0.23 0.45

distinctiveness weighted by the term's overall frequency

computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Page 32: Visualzing Topic Models

Distinctiveness & Saliency

18

coding tech news video games distinctiveness P(w) saliency

game 10 10 50 0.03 0.28 0.01apple 20 40 20 -0.16 0.32 -0.05

angry birds 1 1 30 0.25 0.13 0.03

python 50 5 10 0.17 0.26 0.05

TOTAL 81 56 110

P(T|game) 0.14 0.14 0.71

P(T|apple) 0.25 0.50 0.25

P(T|angry birds) 0.03 0.03 0.94

P(T|pyhton) 0.77 0.08 0.15

P(T) 0.33 0.23 0.45

distinctiveness weighted by the term's overall frequency

computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Page 33: Visualzing Topic Models

Distinctiveness & Saliency

18

coding tech news video games distinctiveness P(w) saliency

game 10 10 50 0.03 0.28 0.01apple 20 40 20 -0.16 0.32 -0.05

angry birds 1 1 30 0.25 0.13 0.03

python 50 5 10 0.17 0.26 0.05

TOTAL 81 56 110

P(T|game) 0.14 0.14 0.71

P(T|apple) 0.25 0.50 0.25

P(T|angry birds) 0.03 0.03 0.94

P(T|pyhton) 0.77 0.08 0.15

P(T) 0.33 0.23 0.45

distinctiveness weighted by the term's overall frequency

computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Page 34: Visualzing Topic Models

Distinctiveness & Saliency

19

measure  how  much  information  a  term  conveys  about  topics…

Page 35: Visualzing Topic Models

Distinctiveness & Saliency

19

measure  how  much  information  a  term  conveys  about  topics…

globally

Page 36: Visualzing Topic Models

Thank you! Learn more at http://github.com/bmabey/pyLDAvis

Ben Mabey @bmabey

http://nbviewer.ipython.org/github/bmabey/hacker_news_topic_modelling/