Post on 15-Jan-2015
description
Implicit Feedback Recommendation via Implicit-to-Explicit OLR Mapping
Denis Parra (Pitt), Alexandros Karatzoglou (TID), Xavier Amatriain (TID), Idil Yavuz (Pitt)
CARS 2011October 23rd 2011
Outline
• Introduction• Datasets• Models• Results• Discussion• Conclusion
A More Clear Outline
• This presentation has, IMHO, 3 sections:1. Good News2. Not that Good News3. Good News
• Which represent, respectively1. Results of first study on last.fm presented in UMAP
2011 2. Initial results of the study we present here 3. Expected Results after analysis of 2. (once I finish my
comps) – and your feedback!
Parra, Amatriain "Walk the Talk" 4
Introduction• Most of recommender system approaches rely on
explicit information of the users, but…• Explicit feedback: scarce (people are not especially
eager to rate or to provide personal info)• Implicit feedback: Is less scarce, but (Hu et al., 2008)
There’s no negative feedback … and if you watch a TV program just once or twice?
Noisy … but explicit feedback is also noisy (Amatriain et al., 2009)
Preference & Confidence … we aim to map the I.F. to preference (our main goal)
Lack of evaluation metrics … if we can map I.F. and E.F., we can have a comparable evaluation
7/12/2011
Section I : Good News
• Last.fm User Study• Linear regression results
Recalling the 1st study (1/5)• Last.fm users (114 in total after filtering)• For each user, we crawled all the albums they
listened to send them a personalized survey
Parra, Amatriain "Walk the Talk" 7
Recalling the 1st study (2/5)• What items should they rate? Item (album) sampling:– Implicit Feedback (IF): playcount for a user on a given album.
Changed to scale [1-3], 3 means being more listened to.– Global Popularity (GP): global playcount for all users on a given
album [1-3]. Changed to scale [1-3], 3 means being more listened to.
– Recentness (R) : time elapsed since user played a given album. Changed to scale [1-3], 3 means being listened to more recently.
7/12/2011
Parra, Amatriain "Walk the Talk" 8
Recalling the 1st study (3/5)
• Demographics Survey + Rating 100 albums
7/12/2011
Parra, Amatriain "Walk the Talk" 9
Recalling the 1st study (4/5)• Gender• Age• Country• Hours per week spent on internet [int_hrs_per_week]• Hours per week listening to music online [msc_hrs_per_week]• Number of concerts per year [conc_per_year]• Do you read specialized music blogs or magazines? [blogs_mag]• Do you have experience evaluating music online? [rate_music]• How frequently do you buy physical music records? [buy_records]• How frequently do you buy music online? [buy_online] • Do you prefer listening to single tracks, whole albums or either way?
[track_or_CD]
7/12/2011
Recalling the 1st study (5/5)
• Prediction of rating by multiple Linear Regression evaluated with RMSE.
• Results showed that Implicit feedback (play count of the album by a specific user) and recentness (how recently an album was listened to) were important factors, global popularity had a weaker effect.
• Results also showed that listening style (if user preferred to listen to single tracks, CDs, or either) was also an important factor, and not the other ones.
... but
• Linear Regression didn’t account for the nested nature of ratings
• And ratings were treated as continuous, when they are actually ordinal.
User 1
1 3 5 3 0 4 5 2 2 1 5 4 3 2
User n
3 2 1 0 4 5 2 5 4 3 2 1 3 5
. . .
So, Ordinal Logistic Regression!
• Actually Mixed-Effects Ordinal Multinomial Logistic Regression
• Mixed-effects: Nested nature of ratings • We obtain a distribution over ratings (ordinal
multinomial) per each pair USER, ITEM -> we predict the rating using the expected value.
• … And we can compare the inferred ratings with a method that directly uses implicit information (playcounts) to recommend ( by Hu, Koren et al. 2007)
Model [to predict rating user x item]
• Model
• Predicted value
Final MELR Model with 4 fixed effects
Section II : Not that Good News
• Datasets I and II• Results measured as MAP and nDCG.
Datasets
• D1: users, albums, if, re, gp, ratings, demographics/consumption
• D2: users, albums, if, re, gp, NO RATINGS.
Experiments• First step: build MELR model using D1• For D1 and D2: split dataset in 5 parts to
perform a 5-fold cross validation
Results
Section III: Expected Good News
• After Analyzing our data/process
Lessons / Challenges (1/2)Problem/ Challenge
1. Ground truth: Playcounts of albums or tracks?
2. Quantization of playcounts (implicit feedback), recentness, and overall number of listeners of an album (global popularity) [1-3] scale v/s raw playcounts
3. Defining Relevancy of recommended elements (to compare with the raw playcounts)
Lessons / Challenges (2/2)Problem/ Challenge
4. Additional/Alternative metrics for evaluation [MAP and nDCG used in the paper]
5. New Survey (In order to deal with the issues of identifying “actual” relevancy in dataset2)
6. Significance of level-2 variables: track_or_CD (study 1 v/s 2, where concerts_per_year was significant)
… so
• Lots of work to do [after my comps]• Questions, Suggestions?
Parra, Amatriain "Walk the Talk" 23
Is this the end?
7/12/2011
Thanks!
• Denis Parra dap89@pitt.edu
Backup slides
Complete Results for D2
AVG (MAP) AVG(nDCG) SD (MAP) SD(nDCG)
Koren 0.101428473 0.271841949 0.001504383 0.001906666
KorenLog 0.123444659 0.295368576 0.002105906 0.002239792
logit_3 0.12225702 0.294351155 0.000868738 0.001100289
popularity 0.01777665 0.136672774 0.000937768 0.00086512
linear_2 0.123417895 0.295026219 0.001830371 0.001554675
linear_3 0.122317675 0.294211465 0.001744534 0.00212961
Distribution of ratings
Actual Distribution of ratings
Some Intuition About the Results
• Distributions of ratings in both datasets
Parra, Amatriain "Walk the Talk" 30
4 Regression Analysis
• Including Recentness increases R2 in more than 10% [ 1 -> 2]• Including GP increases R2, not much compared to RE + IF [ 1 -> 3]• Not Including GP, but including interaction between IF and RE improves
the variance of the DV explained by the regression model. [ 2 -> 4 ]7/12/2011
M1: implicit feedback
M2: implicit feedback & recentness
M4: Interaction of implicit feedback & recentness
M3: implicit feedback, recentness, global popularity
Parra, Amatriain "Walk the Talk" 31
4.1 Regression Analysis
• We tested conclusions of regression analysis by predicting the score, checking RMSE in 10-fold cross validation.
• Results of regression analysis are supported.
7/12/2011
Model RMSE1 RMSE2User average 1.5308 1.1051M1: Implicit feedback 1.4206 1.0402M2: Implicit feedback + recentness 1.4136 1.034M3: Implicit feedback + recentness + global popularity 1.4130 1.0338M4: Interaction of Implicit feedback * recentness 1.4127 1.0332
Parra, Amatriain "Walk the Talk" 32
4.2 Regression Analysis – Track or Album
• Including this variable that seemed to have an effect in the general analysis, helped to improve accuracy of the model
7/12/2011
Model Tracks Tracks/Albums
Albums
User average 1.1833 1.1501 1.1306M1: Implicit feedback 1.0417 1.0579 1.0257M2: Implicit feedback + recentness 1.0383 1.0512 1.0169M3: Implicit feedback + recentness + global popularity
1.0386 1.0507 1.0159
M4: Interaction of Implicit feedback * recentness 1.0384 1.049 1.0159
Parra, Amatriain "Walk the Talk" 33
Is this the end?
7/12/2011