RakutenViki Data Challenge Recommendation Metric Robin M. E. Swezey, Ph. D Intelligence Domain Group...
-
Upload
griffin-daniels -
Category
Documents
-
view
216 -
download
1
Transcript of RakutenViki Data Challenge Recommendation Metric Robin M. E. Swezey, Ph. D Intelligence Domain Group...
RakutenViki Data ChallengeRecommendation Metric
Robin M. E. Swezey, Ph. DIntelligence Domain GroupRakuten Institute of Technologyhttp://rit.rakuten.co.jp
2
Self-Introduction
• Specs– Born near Paris in 1985 (27yrs old)– Dual Citizen / – D.E @ Nagoya Institute of Technology– Rakuten/R.I.T since July 2013– Currently consulting at Viki– Previous works in Rakuten
• Recommendation (next slide)• Advertisement
– Streaming content matching– Distributed response prediction
• Others– Women’s health application– ML Evangelism
3
Self-Introduction
• Work in Recommendation– Recommendation
• For golf booking deals - Swezey R., Chung Y.: Recommending short-lived dynamic packages for golf booking services, CIKM 2015 + 2 patents
• For travel (extension of the above)• Testing of recommender systems with offline simulator (RecoMiX)
User Action Logs
WebSite Simulator Recommender SystemAPI calls
Statis
tical
Result
s
Prior Training Set
2.7 Million books browsedfrom May to Nov. 2014
Feedback
5
Accuracy Metric
• Expected Weighted Average Precision
• gives final evaluation score in leadboard, and is the Expectation of over all users
• S(k) is the importance of user k in terms of viewing• w(k) is the normalized weight of user k over the set of N users
𝐸𝑊𝐴𝑃@3=∑𝑘=1
𝑁
𝑤 (𝑘 ) [𝑊𝐴𝑃@3 ]𝑘
𝑤 (𝑘 )= 𝑆 (𝑘 )
∑𝑘=1
𝑁
𝑆 (𝑘)𝑎𝑛𝑑𝑆 (𝑘 )= [∑ 𝑦 𝑗 ]@user𝑘
6
Building the Metric
• Basic regression metric– Score-based regression metric: RMSE
• Disavantages– RMSE is generally inadequate for
recommendation problems» Difficult to convey meaning» We don’t care about exact
score prediction– What matters in information retrieval
is the final set of results
7
Building the Metric
• Basic classification metrics– Precision @n
• Proportion of matches (true positives) in recommendations (n)• How precise my n recommendations are
– Recall @n
• Proportion of matches (true positives) in user history (h)• How many trues am I able to recall with my n recommendations
– Advantages• Easier to convey than RMSE• Focused on the set of recommendations
8
Context
• Focusing the problem– In our setting, n is fixed to 3– Constraints
1. User history sizes vary from few to many videos
2. Order of recommendations matters ↓
3. Engagement of users matters
1
2
3
9
Precision
• Example– Precision @n: # of true positives / # of recommendations (n)
User history set Recommendations Precision @3
10
Precision
• Example– Precision @n: # of true positives / # of recommendations (n)
User history set Recommendations Precision @3
P = 2/3 = 0.66
11
Precision
• Example– Precision @n: # of true positives / # of recommendations (n)
User history set Recommendations Precision @3
P = 2/3 = 0.66
P = 1/3 = 0.33
12
Recall
• Example– Recall @n: # of true positives / history size (h)
User history set Recommendations Recall @3
13
Recall
• Example– Recall @n: # of true positives / history size (h)
User history set Recommendations Recall @3
P = 2/3 = 0.66
14
Recall
• Example– Recall @n: # of true positives / history size (h)
User history set Recommendations Precision @3
P = 2/3 = 0.66
P = 1/4 = 0.25
15
Back to Context
• Constraints1. User history sizes vary from few to many videos
2. Order of recommendations matters ↓
3. Engagement of users matters
1
2
3
16
Constraint 1: History sizes
• Example 1– A user with big history
User history set Recommendations Recall @3
17
Constraint 1: History sizes
• Example 1– If user history > n, recall never reaches 1
– Everything matches but recall is stuck at 50%
User history set Recommendations Recall @3
R = 3/6 = 0.5
18
Constraint 1: History sizes
• Example 2– A user with small history
User history set Recommendations Precision @3
19
Constraint 1: History sizes
• Example 2– If user history < n, precision never reaches 1
– Everything has been recalled but precision is stuck at 33%
User history set Recommendations Precision @3
P = 1/3 = 0.33
20
Constraint 1: History sizes
• Solution– Use min(h, n) as denominator: big history user
User history set Recommendations Precision @3
P = 3/3 = 1
21
Constraint 1: History sizes
• Solution– Use min(h, n) as denominator: small history user
– Precision and recall are the same metric in that case• P = R = |tp|/min(h,n)
User history set Recommendations Precision @3
P = 1/1 = 1
22
Back to Context
• Constraints1. User history sizes vary from few to many videos
2. Order of recommendations matters ↓
3. Engagement of users matters
1
2
3
23
Constraint 2: Order
• Example– We only measure P@3, so for this user:
User history set Recommendations Precision @3
Algo 1
Algo 2
12
3
1
2
3
24
Constraint 2: Order
• Example– We only measure P@3, so for this user:
User history set Recommendations Precision @3
P = 1/3 = 0.33
Algo 1
Algo 2
12
3
1
2
3
25
Constraint 2: Order
• Example– Wait, what?
User history set Recommendations Precision @3
P = 1/3 = 0.33
P = 1/3 = 0.33
Algo 1
Algo 2
12
3
1
2
3
26
Constraint 2: Order
• Solution1. Average Precision (ap@n)
• Take precision@k from k=1 to n and average it over n:
• Note: this ap@n is slightly different than regular IR ap@n– No P(k)=0 if k-th item is not a true positive– Integration over n, not over recall– Stronger weight on good ordering– Later wrong predictions less penalized– For practical purpose because of score weighting
27
Constraint 2: Order
• Solution– AP@3 for the 2 algos:
User history set Recommendations AP@3
P@1 = 1/1
Algo 1
Algo 2
28
Constraint 2: Order
• Solution– AP@3 for the 2 algos:
User history set Recommendations AP@3
P@2 = 1/2
Algo 1
Algo 2
29
Constraint 2: Order
• Solution– AP@3 for the 2 algos:
User history set Recommendations AP@3
P@3 = 1/3
Algo 1
Algo 2
30
Constraint 2: Order
• Solution– AP@3 for the 2 algos:
User history set Recommendations AP@3
AP@n = 1/3(1/1+1/2+1/3)
= 0.61
Algo 1
Algo 2
31
Constraint 2: Order
• Solution– AP@3 for the 2 algos:
User history set Recommendations Precision @3
ap@n = 1/3(1/1+1/2+1/3)
= 0.61
ap@n = 1/3(0+0+1/3)= 0.11
Algo 1
Algo 2
32
Averaging over Users
• From AP@n to MAP1. Average Precision (AP@n)
• Take precision@k from k=1 to n and average it over n:
2. MAP• Take the mean of AP@n over users:
33
Back to Context
• Constraints1. User history sizes vary from few to many videos
2. Order of recommendations matters
3. Engagement of users matters
1
2
3
34
Constraint 3: Engagement
• Improving MAP for Viki– Viewing time is of crucial importance in the Viki funnel
– Each viewed video is scored from 1 to 3 based on viewing time (equidistant binning) to quantify engagement
Unique Visitors
# of Video Starts
# of Available
Ads# of Filled Ads
Monetize
Frequency Engagement
Retention
Coverage
Main KPI Driver
35
Constraint 3: Engagement
• Solution part 1: from AP@n to WAP@n1. Sort user scores by engagement score descending
2. Take first min(h, n) scores (here: first 3) → yi
User history set Sorted true scores (yi)
32
12 3 2 2
2 31
36
Constraint 3: Engagement
• Solution part 1: from AP@n to WAP@n3. Make your ordered list of recommendations
4. Take each prediction’s score for this user → pi
Sorted scores list (yi) Recommendations (pi) WAP @3
3 2 2 3
20
1
2
32 31
37
Constraint 3: Engagement
• Solution part 1: from AP@n to WAP@n5. For each score-weighted precision@i,
A) Use cumulated pi as numerator
B) Use cumulated yi as denominator
Sorted scores list (yi) Recommendations (pi) WAP @3
WP@1 = 2/3 WP@2 = (2+3)/(3+2)WP@3 = (2+3+0)/ (3+2+2)
3 2 2 3
20
1
2
32 31
38
Constraint 3: Engagement
• Solution part 1: from AP@n to WAP@n6. Final step: Average over 3
Sorted scores list (yi) Recommendations (pi) WAP @3
WAP@3 = 1/3 [ 2/3 + (2+3)/(3+2) + (2+3+0)/ (3+2+2) ] = 0.79
3 2 2 3
20
1
2
32 31
39
Constraint 3: Engagement
• Solution part 1: from AP@n to WAP@n– Compute WAP from sorted scores:
• Weighted Average Precision, calculated for each user
User Videos watched Video scores Recommended videos
u1 v1, v2, v3, v4, v5, v6
3, 3, 2, 2, 1, 1 v1, v2, v3
u2 v1, v2, v3, v4, v5, v6
3, 3, 2, 2, 1, 1 v3, v2, v1
u3 v1, v2, v3, v4, v5, v6
3, 3, 2, 2, 1, 1 v4, v5, v6
[𝑊𝐴𝑃@3 ]𝑘=[ 13∑𝑖=13 ∑𝑗=1
𝑖
𝑝 𝑗
∑𝑗=1
min (𝑖 ,𝑛 )
𝑦 𝑗 ]@user 𝑘
40
Constraint 3: Engagement
• Solution part 2: weigh our user for the final expectation1. Sum user k’s video scores to get his engagement → Sk
User k history set Engagement of user k (Sk)
2 + 1 + 2 + 3= 8
32
12
k
41
Constraint 3: Engagement
• Solution part 2: weigh our user for the final expectation2. Total engagement is the sum of engagement of all users
• e.g. 200
3. Divide user k’s engagement Sk by total engagement → wk
User k history set Weight of user k (wk)
2 + 1 + 2 + 3= 8
8 / 200 = 0.0432
12
k
42
Constraint 3: Engagement
• Solution part 2: from MAP@n to EWAP@n
• gives final evaluation score in leadboard, and is the Expectation of over all users
• S(k) is the total engagement of user k• w(k) is the normalized weight of user k over the set of N users
𝐸𝑊𝐴𝑃@3=∑𝑘=1
𝑁
𝑤 (𝑘 ) [𝑊𝐴𝑃@3 ]𝑘
𝑤 (𝑘 )= 𝑆 (𝑘 )
∑𝑘=1
𝑁
𝑆 (𝑘)𝑎𝑛𝑑𝑆 (𝑘 )= [∑ 𝑦 𝑗 ]@user𝑘
43
Back to Context
• Constraints1. User history sizes vary from few to many videos
2. Order of recommendations matters
3. Engagement of users matters
44
Conclusion
• Constraints satisfied– WAP
• Takes values in [0,1] for each user regardless of user history size• Measures the ordering of retrieved videos• Measures engagement value of videos retrieved by participants
– EWAP• Weighs users according to engagement in final integration
– More examples in the Challenge Statement