Personalizing Web Page Recommendation via Collaborative Filtering and
Topic-Aware Markov Model
Qingyan Yang, Ju Fan, Jianyong Wang, Lizhu
Zhou
Database Research Group, DCS&T, Tsinghua University
Motivation
Recommender framework
Experimental evaluation
Conclusions
04/18/23 2DB Group, DCS&T, Tsinghua University
AgendAgendaa
Motivation
Recommender framework
Experimental evaluation
Conclusions
04/18/23 3DB Group, DCS&T, Tsinghua University
• The Web is explosively growing▪By the end of 2009 (source: the 25th Internet Report, 2010)
◦ 33,600,000,000 Web pages in China◦ Twice as many as that in 2003
• Finding desired information is more difficult.▪Users often wander aimless on the Web without
visiting pages of his/her interests▪Or spend a long time on finding the expected
information.
MotivationMotivation
04/18/23 404/18/23 4DB Group, DCS&T, Tsinghua University
• Objective ▪To understand users' navigation behavior▪To show some pages of users' interests at a
specific time
• Existing popular solutions▪Markov model and its variants▪Temporal relation is important.
Web page recommendationWeb page recommendation
04/18/23 604/18/23 6DB Group, DCS&T, Tsinghua University
If the browsing sequence is "A B C … A B C … A B C", Then C is recommended when A and B are visited one after another
• No personalized recommendations▪All users receive the same results
• Topic information of pages is neglected.▪Two pages, which are sequentially visited, may be
very different in terms of topics.
Limitations Limitations
04/18/23 DB Group, DCS&T, Tsinghua University 7
• Personalized Web page recommendation• Two novel features
▪Personalization◦ Meet preference of different users
04/18/23 DB Group, DCS&T, Tsinghua University 8
PIGEON: our solutionPIGEON: our solution
I am a blog about finance
• Two novel features▪Personalization▪Topical coherence
◦ To be relevant to users' present missions
04/18/23 DB Group, DCS&T, Tsinghua University 9
PIGEON: our solutionPIGEON: our solution
Motivation
Recommender framework
Experimental evaluation
Conclusions
04/18/23 10DB Group, DCS&T, Tsinghua University
Data representationData representation
• Navigation graph
04/18/23 DB Group, DCS&T, Tsinghua University 12
Time User ID IP address Target Source
(09:44:44)
(0e0c…) (211.90.-.-) A ()
(09:44:58)
(0e0c…) (211.90.-.-) B A
(10:14:29)
(0e0c…) (211.90.-.-) G A
2
1
32
2 2
1
4 2 6
2
1
A
B
C
D
E
F
G
H I J
K
L
MWeb page
Edge: jump relation
Weight: relation frequency
Jump relation
Topic discoveryTopic discovery
• Basic idea▪We assume that pages with similar URLs or
evolved in jump relations are topically relevant.
• URLs Features ▪Keywords. e.g., http://dblp.uni-trier.de/db/index.html
▪Expanded by Manifold-based keyword propagation
• Web page clustering▪Each cluster represents one topic
04/18/23 DB Group, DCS&T, Tsinghua University 13
Example Example
04/18/23 DB Group, DCS&T, Tsinghua University 14
2
1
32
22
1
4 2 6
2
1
A
B
C
D
E
F
G
H I J
K
L
M
Topic-Aware Markov ModelTopic-Aware Markov Model
• Take n-grams as states. e.g., n=2
• Web page preference score▪Maximum likelihood estimation▪e.g., P(D|BC) = f(BCD)/f(BC) = 1/2
A B C D B C A
AB BC CD DB CA
A C C A, B D B
AB BC CD AC CC CADB CA BD DB Topical stateTemporal
state
04/18/23 15DB Group, DCS&T, Tsinghua University
A B C D B C A
Personalized RecommenderPersonalized Recommender
• Collaborative filtering▪Basic idea
~s(u;p) = kX
u0
sim(u;u0)s(u0;p)~s(u;p) = kX
u0
sim(u;u0)s(u0;p) u : active user; p : Webpageu : active user; p : Webpage
04/18/23 16DB Group, DCS&T, Tsinghua University
user similarities
Web page preference
User SimilarityUser Similarity
• User profile▪A set of topics
• Similarity measurement▪Topic similarity▪Maximum weight matching
04/18/23 17DB Group, DCS&T, Tsinghua University
sim(u1;u2) =0:9+ 0:8+ 1:0
3= 0:9sim(u1;u2) =
0:9+ 0:8+ 1:03
= 0:9
0.9
0.81.0
Motivation
Recommender framework
Experimental evaluation
Conclusions
04/18/23 18DB Group, DCS&T, Tsinghua University
Experiment settingsExperiment settings
• Data set▪1,402,371 records of 375 users in 34 days▪First 30 days for training and 4 days for testing
• Metrics are precision and recall• Comparative methods
04/18/23 19DB Group, DCS&T, Tsinghua University
Temporal Topical Personalized
Baseline Y
TAMM Y Y
PIGEON Y Y Y
Experimental evaluationExperimental evaluation
1st-order model 2nd-order model
04/18/23 20DB Group, DCS&T, Tsinghua University
Motivation
Recommender framework
Experimental evaluation
Conclusions
04/18/23 21DB Group, DCS&T, Tsinghua University
ConclusionsConclusions
04/18/23 DB Group, DCS&T, Tsinghua University 22
• Taking user similarities into account, we could recommend Web pages to meet different users' preferences.
• We discover users' interested topics using an effective graph-based clustering algorithm.
• We devise a topic-aware Markov model to learn navigation patterns which contribute to the topically coherent recommendations.
Top Related