1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke...
-
Upload
terrence-butterly -
Category
Documents
-
view
218 -
download
5
Transcript of 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke...
![Page 1: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/1.jpg)
1
Topic Distributions over Links on Web
Jie Tang1, Jing Zhang1, Jeffrey Xu Yu2, Zi Yang1, Keke
Cai3, Rui Ma3, Li Zhang3, and Zhong Su3
1 Tsinghua University2 Chinese University of Hong Kong
3 IBM, China Research LabDec. 7th 2009
![Page 2: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/2.jpg)
2
Motivation
• Web users create links with significantly different intentions
• Understanding of the category and the influence of each link can benefit many applications, e.g.,– Expert finding– Collaborator finding– New friends recommendation– …
![Page 3: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/3.jpg)
3
Original citation networkSemantic citation network
Examples – Topic distribution analysis over citations
Researcher A • an in-depth understanding of the
research field?
VS.
Self-Indexing Inverted Files for Fast Text Retrieval
StaticIndex Pruning for Information
Retrieval Systems
Signature les: An access Method for Documents and
its Analytical Performance Evaluation
FilteredDocument Retrieval with Frequency-Sorted Indexes
Vector-space Ranking with Effective Early Termination
Efficient Document Retrieval in Main Memory
A Document-centric Approach to Static Index Pruning in Text
Retrieval Systems
An Inverted Index Implementation
Parameterised Compression for Sparse Bitmaps
Introduction of Modern Information Retrieval
Memory Efficient Ranking
Topic 31: Ranking and Inverted Index
Topic 27: Information retrieval
Topic 1 : Theory
Topic 21: Framework
Topic 22: Compression
Other
Topic 23: Index method
Topic 34: Parallel computing
Basic theoryComparable workOther
Citation Relationship Type
Topics
![Page 4: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/4.jpg)
4
Problem: Link Semantic AnalysisTopic modeling
over linksCitation context words
Link semantics
![Page 5: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/5.jpg)
5
Outline
• Previous Work
• Our Approach– Pairwise Restricted Boltzmann Machines (PRBMs)
• Experimental Results
• Conclusion & Future Work
![Page 6: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/6.jpg)
6
Previous Work
Link influence analysis• Citation influence topic [Dietz, 07];• Social influence analysis [Crandall, 08; Tang,
09];
Graphical model• Probabilistic LSI [Hofmann, 99], • Latent Dirichlet Allocation [Blei, 03], • Restricted Boltzmann machines [Welling, 01]
Social network analysis• Social network analysis [Wasserman, 94]• Web community discovery [Newman, 04]• ‘Small world’ networks [Watts, 18]
![Page 7: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/7.jpg)
7
Outline
• Previous Work
• Our Approach– Pairwise Restricted Boltzmann Machines (PRBMs)
• Experimental Results
• Conclusion & Future Work
![Page 8: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/8.jpg)
8
Pairwise Restricted Boltzmann Machines (PRBMs)
Link context words
Topic distribution
Link category
Latent variables defined over the link to bridge the
two pages
Pairwise Restricted Boltzmann Machines (PRBMs)
Example
![Page 9: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/9.jpg)
9
Formalization of PRBMs
Formalization
PRBMs
Obj. Func:
with
![Page 10: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/10.jpg)
10
Model Learning
Generative learning
Discriminative learning
Hybrid learning
Obj. Func:
Expectation w.r.t. the data distribution
Expectation w.r.t. the distribution defined by the
model
We use the Contrast Divergence to learn the model distribution PM
![Page 11: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/11.jpg)
11
Link Semantic Analysis
• Link category annotation– First we calculate – Then we estimate the probability p(c|e) by a mean field
algorithm
• Link influence estimation– Estimate influence by KL divergence
– An alternative way is to generate the influence score by a Gaussian distribution, thus
![Page 12: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/12.jpg)
12
Outline
• Previous Work
• Our Approach– Pairwise Restricted Boltzmann Machines (PRBMs)
• Experimental Results
• Conclusion & Future Work
![Page 13: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/13.jpg)
13
Experimental Setting
• Data sets– Arnetminer data: 978,504 papers, 14M citations– Wikipedia: 14K “article” pages and 25 K links
• Evaluation measures – Link categorization accuracy– Topical analysis
• Baselines:– SVM+LDA– SVM+RBM
![Page 14: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/14.jpg)
14
Accuracy of Link Categorization
gPRBM: our approach with generative learning
dPRBM: our approach with discriminative learning
hPRBM: our approach with hybrid learning
![Page 15: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/15.jpg)
15
Category-Topic Mixture
![Page 16: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/16.jpg)
16
Example Analysis
![Page 17: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/17.jpg)
17
Outline
• Previous Work
• Our Approach– Pairwise Restricted Boltzmann Machines (PRBMs)
• Experimental Results
• Conclusion & Future Work
![Page 18: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/18.jpg)
18
Conclusion & Future Work
• Concluding remarks– Investigate the problem of quantifying link semantics on the
Web
– Propose a Pairwise Restricted Boltzmann Machines to solve this problem
• Future Work– Semantic analysis over social relationships
– Correlation between the link semantics and the information propagation
![Page 19: 1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.](https://reader035.fdocuments.us/reader035/viewer/2022081602/55163790550346a2308b62be/html5/thumbnails/19.jpg)
19
Thanks!
Q&AHP: http://keg.cs.tsinghua.edu.cn/persons/tj/