Parallelizing Random Walk with Restart for Large-Scale Query Recommendation
description
Transcript of Parallelizing Random Walk with Restart for Large-Scale Query Recommendation
Parallelizing Parallelizing Random Walk with Restart for Random Walk with Restart for
Large-Scale Query RecommendationLarge-Scale Query Recommendation
Meng-Fen Chiang, Tsung-Wei Wang andMeng-Fen Chiang, Tsung-Wei Wang and
Wen-Chih PengWen-Chih Peng
Department of Computer ScienceDepartment of Computer Science
National Chiao Tung University (R.O.C.)National Chiao Tung University (R.O.C.)
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR
– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
2
IntroductionIntroduction
• Yahoo! Asia Knowledge Plus (AKP)Yahoo! Asia Knowledge Plus (AKP)
Question Answer
3
Introduction (contd.)Introduction (contd.)
• User access logUser access log– Consider a QA pair as an Item– A sequence of items clicked by a user
– Typically, what a user looks for during a short period shares certain topics
• Within 4 min, 18 sec. “Upload photos to Facebook “4
Introduction (contd.)Introduction (contd.)
• Random Walk with Restart (RWR)Random Walk with Restart (RWR)– Compute relevance scores of a set of node for
a query nodeNode 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
4
3
2
56
7
910
811
120.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
5
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR
– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
6
Related WorkRelated Work
• Random Walk with Restart (RWR)Random Walk with Restart (RWR)– Off-line mode
• Pre-compute required information off-line– Pros : fast on-line recommendation for a query– Cons : prohibitive storage consumption
– On-line mode• Compute recommendation for a query on-line
– Pros : less storage consumption– Cons : longer response time
– Fast RWR• Less storage consumption• Fast on-line response time for a query
7
Related Work (contd.)Related Work (contd.)
• Scalable recommendationScalable recommendation– SmartMiner
• Identify user sessions• Mine frequent navigation patterns
– Personalized community recommendation• 312 K active users, 109 K popular communities• Training time ~ 14 mins (200 nodes)
– Personalized news recommendation• Handel streaming content• No explicit runtime analysis of off-line training and
on-line recommendation
8
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR
– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
9
Problem DefinitionProblem Definition
• GoalGoal– Given user click logs, a query item I– Recommend relevant items w.r.t. I
• RequirementsRequirements– Effectiveness
• Mine frequent navigation patterns from click logs
– Scalability• Efficiently manage large-scale click logs within few
hours– Parallelization of RWR– Parallelization of RWR for multiple query nodes
10
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• A framework for scalable A framework for scalable
recommendationrecommendation– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
11
System Architecture System Architecture
User Access Logs
Temporal Following Pattern
Mining
Parameters:1.window size2.bin size
Item ID : <Item List>. . .
Recommendation Graph
Construction
Random Walk with Restart
Item ID : <Item List>. . .
Query Items :Item 1Item 2
. . .
12Off-Line Computation StorageInput
Mining Temporal Following Mining Temporal Following Patterns in ParallelPatterns in Parallel
User Access Logs
Temporal Following Pattern
Mining
Parameters:1.window size2.bin size
Item ID : <Item List>. . .
Recommendation Graph
Construction
Random Walk with Restart
Item ID : <Item List>. . .
Query Items :Item 1Item 2
. . .
13
Temporal Following RelationTemporal Following Relation
• Frequent QA browsing behaviors of Frequent QA browsing behaviors of users within a pre-defined time users within a pre-defined time windowwindow– E.g., window size = 150 sec.
14
Item 1 Item 2 Item 4Item 3User Click Stream :
0
Temporal Following relation : <Item 1, Item 2> : dt = 30
30 70 160
<Item 1, Item 3> : dt = 70
. . .<Item 1, Item 4> : dt = 160
Temporal Following Pattern Temporal Following Pattern MiningMining
15
Mapper 1
Mapper N
Reducer 1 Reducer N
User click logs
. . .
. . .
Parameters
<Itemi , Itemj:cntij>
<Itemi , <Itemj:cntij, …, Itemz:cntiz>>
Temporal Following Relations
Temporal Following Patterns
Emit temporal following pairs for each item
Aggregate temporal following relation for each item
Recommendation Graph Recommendation Graph ConstructionConstruction
User Access Logs
Temporal Following Pattern
Mining
Parameters:1.window size2.bin size
Item ID : <Item List>. . .
Recommendation Graph
Construction
Random Walk with Restart
Item ID : <Item List>. . .
Query Items :Item 1Item 2
. . .
16
Recommendation Graph Recommendation Graph ConstructionConstruction
• Goal Goal – Transform discovered temporal following
patterns to a recommendation graph
• E.g., E.g.,
17
<Item 1, <Item2:cnt12, item3:cnt13>>
Temporal Following Pattern
<Item 4, <Item3:cntt13>> n1
n2
n3
n4
cnt13
cnt12
cnt43
Recommendation Graph
Paralleling Paralleling Random Walk with RestartRandom Walk with Restart
User Access Logs
Temporal Following Pattern
Mining
Parameters:1.window size2.bin size
Item ID : <Item List>. . .
Recommendation Graph
Construction
Random Walk with Restart
Item ID : <Item List>. . .
Query Items :Item 1Item 2
. . .
18
Paralleling Paralleling Random Walk with RestartRandom Walk with Restart
• With single queryWith single query
1
43
2
5 6
7
9 10
811
120.130.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
Node 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
43
2
5 6
7
9 10
811
12
19
Paralleling RWR With Single QueryParalleling RWR With Single Query
20
Machine 1 : Set initial score
for q
Machine N : Set initial score
for qMachine 1 :
Calculate relevance score
for each item
Machine N : Calculate
relevance score for each item
Machine 1 : Calculate difference of relevance score
vectors
Machine N : Calculate difference of relevance score
vectors
q : an item
User click logs
. . .
. . .
. . .
Initialization
RWR
Convergence
Converged
Parameters
No Yes
Paralleling Paralleling Random Walk with RestartRandom Walk with Restart
• With multiple queryWith multiple query
1
4
3
2
5 6
7
9 10
811
12
1
43
2
56
7
9 10
811
12
1
43
2
5 6
7
9 10
811
120.130.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
21
1
43
2
5 6
7
9 10
811
120.100.10
0.10
0.13
0.13
0.13
0.13
0.04
0.02
0.04
0.03
0.13
Paralleling RWR With Multiple Paralleling RWR With Multiple QueriesQueries
22
Machine 1 : Set initial score
for Q
Machine N : Set initial score
for Q
Mapper 1 : Calculate diffusion score for each item
w.r.t. each q
Mapper N : Calculate relevance score for each item
w.r.t. each q
Reducer 1 : Sum up diffusion
score for each item w.r.t. q
Reducer N : Sum up diffusion
score for each w.r.t. q
Q : itemsUser click logs
. . .
. . .
. . .
Initialization
RWR
Parameters
Until Maximum iteration<Itemi , <q1:rs1i, …, qz:rs1z> <adjacent list>>
Paralleling RWR With Multiple QueriesParalleling RWR With Multiple Queries
• Diffusion score for each item w.r.t. Diffusion score for each item w.r.t. qq
• Sum up diffusion scores for each item Sum up diffusion scores for each item w.r.t. w.r.t. qq
23
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR
– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
24
Experimental SetupExperimental Setup
• Yahoo! Asia Knowledge Plus (AKP)Yahoo! Asia Knowledge Plus (AKP)– Duration : 1-week in July, 2009– #clicks : 90 M– #items : 4 M– #users : 2 M
• Performance evaluationPerformance evaluation– Quality study– Scalability study– Case study
25
Quality StudyQuality Study
• User access logsUser access logs– Train 80% – Test 20%
• GroundtruthGroundtruth– For each item I clicked by user U– The set of items clicked by U after I within T sec.
• Measure the similarity with historical Measure the similarity with historical user click logsuser click logs– Item-precision– Item-recall
26
Quality Study (contd.)Quality Study (contd.)
– Top-k hot items in the category of test item (HC)
– Temporal following pattern (TFP)– RWR based on temporal following pattern
(RWRTFP)• Higher precision & recall
27
Scalability StudyScalability Study
• Temporal following pattern (TFP)– 4.1M items– 40 sec.• RWR based on temporal following pattern
(RWRTFP)– #sizes of input data – #computing nodes
28
Scalability Study (contd.)Scalability Study (contd.)
• Computational cost is significantly reduced as number of machines increases
• More queries, more computation effective– 0.74 sec. (2K queries) 0.49 sec. (10K
queries)
29
Case StudyCase Study
• Query ItemQuery Item– “What can I do if I do not have Word?”
30
ConclusionConclusion
• Proposes a parallel RWR for multiple Proposes a parallel RWR for multiple query recommendationquery recommendation– Parallelize mining frequent navigation
behavior– Parallelize RWR– Compute RWR for multiple queries in parallel
• The recommender systemThe recommender system– General– Content- agnostic
31
Q & AQ & A
32
Temporal Following Pattern Temporal Following Pattern MiningMining
33
Mapper 1 : Emit temporal
following pairs for each item
Mapper N : Emit temporal
following pairs for each item
Reducer 1 : Aggregate temporal following relation for
each item
Reducer N : Aggregate temporal following relation for
each item
User click logs
. . .
. . .
Parameters
<Itemi , Itemj:dtij>
<Itemi , <Itemj:dtij, …, Itemz:dtiz>>
Temporal Following Relations
Temporal Following Patterns