ChronoSAGE: Diversifying Topic Modeling Chronologically

18
ChronoSAGE ChronoSAGE: Diversifying Topic Modeling Chronologically Tomonari MASADA NAGASAKI University [email protected]

description

Slides for the poster presentation in WAIM 2014

Transcript of ChronoSAGE: Diversifying Topic Modeling Chronologically

Page 1: ChronoSAGE: Diversifying Topic Modeling Chronologically

ChronoSAGE

ChronoSAGE:Diversifying Topic Modeling

Chronologically

Tomonari MASADANAGASAKI University

[email protected]

Page 2: ChronoSAGE: Diversifying Topic Modeling Chronologically

Solution

ProblemProblem• Find research trends• Present them in a readable manner

Solution• Extract trending words at each epoch• Display them chronologically

Page 3: ChronoSAGE: Diversifying Topic Modeling Chronologically

MethodMethod

•SAGE [Eisenstein+ 11]

–Represent each word probability

as a multiplication of factors

Page 4: ChronoSAGE: Diversifying Topic Modeling Chronologically

ChronoSAGE

• Use SAGE for our chronological

analysis of academic papers

• Represent each word probability

as a multiplication of four factors

ChronoSAGE

• Use SAGE for our chronological

analysis of time-stamped docs

• Represent each word probability

as a multiplication of four factors

Page 5: ChronoSAGE: Diversifying Topic Modeling Chronologically

corpus-wide

background

per-topic

background

Page 6: ChronoSAGE: Diversifying Topic Modeling Chronologically

per-epoch

background

per-topictrends

Page 7: ChronoSAGE: Diversifying Topic Modeling Chronologically

words sorted byper-epoch background probabilities (TDT4)

t=0 edt paralymp lebanon 32nd wild-card u.s china

t=1 kippur 10-13 lebanon china palestinian text join

t=2 10-14 10-16 10-18 10-15 10-19 10-17 10-20

t=3 10-24 10-23 10-22 10-25 10-21 10-26 10-27

t=4 10-29 10-28 10-31 10-30 11-3 leipzig lebanon

t=5 11-10 11-8 11-9 11-6 11-7 11-5 convuls

t=6 11-17 11-16 11-11 11-14 11-15 11-12 11-13

t=7 11-18 11-19 11-24 11-22 11-23 11-20 11-21

t=8 11-25 11-27 11-28 11-26 11-30 11-29 seclus

Page 8: ChronoSAGE: Diversifying Topic Modeling Chronologically

words sorted byper-epoch background probabilities (TDT4)

t=9 12-8 12-6 12-5 12-7 12-3 537-vote 12-4

t=10 12-12 12-15 12-14 12-10 12-13 12-11 12-9

t=11 12-17 12-18 12-21 12-20 12-19 12-22 12-16

t=12 12-24 12-28 12-29 12-23 12-27 12-26 12-25

t=13 309 tabasco 2001 1-5 vy 12-0 free-agent

t=14 presid-elect’s 1-12 1-8 1-11 1-9 1-10 1-7

t=15 1-14 1-13 1-19 1-18 1-17 1-16 1-15

t=16 1-21 1-26 1-25 1-22 1-20 1-23 1-24

t=17 1-28 1-31 1-30 1-27 1-29 dawosi bhuj

Page 9: ChronoSAGE: Diversifying Topic Modeling Chronologically

Evaluation (1)

• SAGE and ChronoSAGE are better

than LDA in terms of PMI (point-

wise mutual information).

–We used the entire English

Wikipedia for PMI computation.

Page 10: ChronoSAGE: Diversifying Topic Modeling Chronologically

PMI

,

where .

Page 11: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 12: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 13: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 14: ChronoSAGE: Diversifying Topic Modeling Chronologically

Evaluation (2)

• ChronoSAGE can extract

chronological trends for each topic

as top-K word lists.

–ChronoSAGE can do what SAGE can’t do.

Page 15: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 16: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 17: ChronoSAGE: Diversifying Topic Modeling Chronologically
Page 18: ChronoSAGE: Diversifying Topic Modeling Chronologically