A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs
-
Upload
heidi-charles -
Category
Documents
-
view
28 -
download
5
description
Transcript of A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs
1
A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs
Hongbo Deng, Michael R. Lyu and Irwin King
Department of Computer Science and Engineering
The Chinese University of Hong Kong
July 1st, 2009
2Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Introduction
Many data can be modeled as bipartite graphs
Content Graph
IR Modelsfor
Link Analysisfor
- VSM
- Language Model
- etc.
- HITS
- PageRank
- etc.
Incorporate Content with Graph- Personalized PageRank (PPR)
- Linear Combination
- etc.
Semantic relationsRelevance
3Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
d1
map www.maps.comq2 d2
q3Google map
Query URL
maps.google.comd3
www.google.comdj
www.mapquest.commapquest q1
...
google qi
...
An Illustration
PPR
mapquest
map quest
united states map
mapquest.com
HITS
mapquest
google.com
map quest
weather
Noisy link data
Lack of relevance constraints
More reasonable
mapquest
united states map
map of florida
us map
world map
Query suggestion
for query “map”:d1
map www.maps.comq2 d2
q3Google map
Query URL
maps.google.comd3
www.google.comdj
www.mapquest.commapquest q1
...
google qi
...
4Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Outline
Introduction Generalized Co-HITS
Preliminaries Iterative Framework Regularization Framework
Experiments Conclusion
5Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Preliminaries
Graph
Hidden links:
Explicit links:
Content
X Y
6Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Basic idea Incorporate the bipartite graph with the content
information from both sides Initialize the vertices with the relevance scores x0, y0
Propagate the scores (mutual reinforcement)
Generalized Co-HITS
Initial scores Score propagation
7Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Iterative framework
Generalized Co-HITS
8Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Iterative Regularization Framework
Consider the vertices on one side
Smoothness Fit initial scores
U
Wuu
9Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Generalized Co-HITS
Regularization Framework
R1R2R3
Wuu Wvv
Intuition: the highly connected vertices are most likely to have similar relevance scores.
10Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Generalized Co-HITS
Regularization FrameworkThe cost function: Optimization problem:
Solution:
11Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Application to Query-URL Bipartite Graphs
Bipartite graph construction Edge weighted by the click frequency Normalize to obtain the transition matrix
Overall Algorithm
12Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Outline
Introduction Preliminaries Generalized Co-HITS
Iterative Framework Regularization Framework
Experiments Conclusion
13Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Experimental Evaluation
Data collection AOL query log data
Cleaning the data Removing the queries that appear less than 2 times Combining the near-duplicated queries 883,913 queries and 967,174 URLs 4,900,387 edges 250,127 unique terms
14Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Evaluation: ODP Similarity
A simple measure of similarity among queries using ODP categories (query category)
Definition:
Example: Q1: “United States” “Regional > North America >
United States” Q2: “National Parks” “Regional > North America >
United States > Travel and Tourism > National Parks and Monuments”
Precision at rank n (P@n):
300 distinct queries
3/5
15Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Experimental Results
Comparison of Iterative Framework
The improvements of OSP and CoIter over the baseline (the dashed line) are promising when compared to the PPR. The initial relevance scores from both sides provide valuable information.
personalized PageRank one-step propagation general Co-HITS
Result 1:
16Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Experimental Results
Comparison of Regularization Frameworksingle-sided regularization double-sided regularization
SiRegu can improve the performance over the baseline. CoRegu performs better than SiRegu, which owes to the newly developed cost function R3. Moreover, CoRegu is relatively robust.
Result 2:
17Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Experimental Results
Detailed Results
The CoRegu-0.5 achieves the best performance. It is very essential and promising to consider the double-sided regularization framework for the bipartite graph.
Result 3:
18Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Conclusions
Propose the Co-HITS algorithm to incorporate the bipartite graph with the content information from both sides.
The Co-HITS algorithm is more general, which includes HITS and personalized PageRank as special cases.
The CoRegu is more robust with the newly developed cost function, which achieves the best performance with consistent and promising improvements.
19Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering
The Chinese University of Hong Kong
Q&A
Thanks!