A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

22
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge Engineering Group, Dept. of Computer Science and Technology Tsinghua University 2 Department of Computer Science Kent State University Dec. 25 th 2008

description

A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search. 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge Engineering Group, Dept. of Computer Science and Technology Tsinghua University 2 Department of Computer Science Kent State University - PowerPoint PPT Presentation

Transcript of A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

Page 1: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

1

A Topic Modeling Approach and its Integration into the Random WalkFramework for Academic Search

1Jie Tang, 2Ruoming Jin, and 1Jing Zhang

1Knowledge Engineering Group, Dept. of Computer Science and Technology

Tsinghua University2Department of Computer Science

Kent State UniversityDec. 25th 2008

Page 2: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

2

Motivation

However, the results are still not satisfactory …

“Academic search is treated as document search, but ignore

semantics”

Page 3: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

3

Examples – Expertise search

Search with keyword

Modeling using VSM Principles of Data Mining.DJ Hand - Drug Safety, 2007 - drugsafety.adisonline.com

Advances in Knowledge Discovery and Data Mining UM Fayyad, G Piatetsky-Shapiro, P Smyth, R…

Data Mining: Concepts and Techniques J Han, M Kamber - 2001…

Return

Search with semantic modeling

Modeling using semantic topics

Data mining

Data mining

Association Rules

Database systems

Data management

Web databases

Information systems

0.4

0.2

0.150.1

0.05

0.02

Topics

Return

ExpertsExpertise

conferences

Expertise papers

Data mining

11

00

1 1 0 1

1 0 1 0 1

0 1

00 1 1 1 1 1

Query

vector

Doc1

vector

Doc3 vector

Doc4 vector

Page 4: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

4

1. How to model the heterogeneous academic network?

2. How to capture the link information for ranking objects in the academic network?

Challenges

----------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------------------

Cite------------------------------------------------------------------------------------------------------------------------

Cite

Cite

Citewrite

write

write

Co-write

Co-writeCo-author

Co-author

PC memberchair

publish

publish

publish

----------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------------------

Cite------------------------------------------------------------------------------------------------------------------------

Cite

Cite

Citewrite

write

write

Co-write

Co-writeCo-author

Co-author

PC memberchair

publish

publish

publish

----------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------------------

Cite------------------------------------------------------------------------------------------------------------------------

Cite

Cite

Citewrite

write

write

Co-write

Co-writeCo-author

Co-author

PC memberchair

publish

publish

publish

Page 5: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

5

Outline

• Previous Work

• Our Approach– Ranking with Topic Model and Random Walk

• Experimental Results

• Online System—ArnetMiner.org

Page 6: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

6

----------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------------------

Cite------------------------------------------------------------------------------------------------------------------------

Cite

Cite

Citewrite

write

write

Co-write

Co-writeCo-author

Co-author

PC memberchair

publish

publish

publish

Previous Work

Search with keyword• Language Model [Zhai, 01], VSM, etc.

Search with semantic topics• LSI [Berry,95], pLSI [Hofmann, 99], LDA

[Blei,03] [Wei, 06], etc.

Ranking• PageRank [Page, 99], HITS [Kleinberg, 99],

PopRank [Nie, 05], Link Fusion [Xi, 04], AuthorRank [Liu, 05], etc.

Combining links and contents• A Joint Probabilistic Model [Cohn and

Hofmann, 01], Topical PageRank [Nie, 06], etc.

Page 7: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

7

Outline

• Previous Work

• Our Approach– Ranking with Topic Model and Random Walk

• Experimental Results

• Online System—ArnetMiner.org

Page 8: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

8

Modeling the Academic Network using

T

DNd

wzxad

β

Φ

α

A

θ

c

T

μ ψ

T

DNd

wzx

ad

β

Φ

α

ACθ

c

T

D

Ndwz

β

Φ

c

η,σ2

ad x

α

A

θ

ACT1 ACT2 ACT3

authors

Topic

words

conference

Author-Conference-Topic Model [Tang et al., 08]

Page 9: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

9

Generative Story of ACT1 Model

• Generative process

Shafiei

Milios

1234

NLP

MLDM

IR

1234ML

NLPIR

DM

Latent Dirichlet Co-clustering

Shafiei and Milios

We present a generative model for clustering documents and terms. Our model is a four hierarchical bayesian model. We present efficient inference techniques based on Markow Chain Monte Carlo. We report results in document modeling, document and terms clustering …

ICDM 0.23KDD 0.19….

mining 0.23clustering 0.19classification 0.17….

ICML 0.23NIPS 0.19….

model 0.23learning 0.19boost 0.17….

P(c|z)

P(w|z)

P(c|z)

P(w|z)

clustering

inference

ICDM

Paper

NIPS

Page 10: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

10

ACT Model 1

Generative process:

T

DNd

wzxad

β

Φ

α

A

θ

c

T

μ ψ

ACT1

authors

Topic

words

conference

Page 11: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

11

Random walk over the academic network

Modeling academic network with topics

Integrating Topic Model into Random Walk

----------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------------------

Cite------------------------------------------------------------------------------------------------------------------------

Cite

Cite

Citewrite

write

write

Co-write

Co-writeCo-author

Co-author

PC memberchair

publish

publish

publish+=?

Page 12: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

12

Combination Method 1

ISWC

IJCAI

WWW

Tree CRF...EOS...

Association...

Paper Graph Gp

Author Graph Ge

Prof. WangProf. Tang

Jing Zhang

Conference Graph Gc

λde

λed

λcd

λdc

λdd

Stage 1:Random walk

Stage 2.Topic-based relevance

Ranking score

Topic-based relevance score

Combination by multiplication

ISWC

IJCAI

WWW

Tree CRF...EOS...

Association...

Prof. WangProf. Tang

Jing Zhang

Data mining

Query

. . .

. . .

Topic layer

Page 13: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

13

Query: ontology alignment

ISWC

IJCAI

WWW

Tree CRF...EOS...

Association...

posowl

Web service

Paper Graph Gp

Author Graph Ge

Prof. WangProf. Tang

Jing Zhang

Conference Graph Gc

Hidden Theme Graph Gt

λde

λed

λcd

λdc

λtdλdt

λqtλtq

λdd

Combination Method 2

Ranking score

Transition probability

Page 14: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

14

Outline

• Previous Work

• Our Approach– Ranking with Topic Model and Random Walk

• Experimental Results

• Online System—ArnetMiner.org

Page 15: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

15

Experimental Setting

• Arnetminer data: (http://arnetminer.org)– 14,134 authors, 10,716 papers, 1,434 confs/journals– and relationships between them

• Evaluation measures: – pooled relevance + human judgment– P@5, P@10, P@20, R-pre, MAP

• Baselines:– Language Model (LM)– LDA– Author Topic (AT)

Page 16: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

16

Discovered Topics

200 topics have been discovered automatically

from the academic network

Page 17: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

17

Expertise Search Results

Page 18: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

18

Expertise Search Results (cont.)

Page 19: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

19

Online System—ArnetMiner(http://arnetminer.org)

Publication

Social Graph

User Interests and Evolution

Basic Profile Information

Social Graph

ExpertsExpertise

conferences

Expertise papers

Page 20: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

20

Outline

• Previous Work

• Our Approach– Ranking with Topic Model and Random Walk

• Experimental Results

• Conclusion & Future Work

Page 21: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

21

Conclusion & Future Work

• Investigate the problem of modeling heterogeneous academic network using a unified probabilistic model.

• Propose two methods to combine topic models with the random walk framework for academic search.

• Experimental results show that our approach can significantly improve the performance of academic search.

• Our approach is general. Variations of the approach can be applied to many other applications such as social search and blog search.

Page 22: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search

22

Thanks!

Q&A & DemoHP: http://keg.cs.tsinghua.edu.cn/persons/tj/

Online URL: http://arnetminer.org