Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS Slides]
-
Upload
daniel-valcarce -
Category
Data & Analytics
-
view
117 -
download
0
Transcript of Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS Slides]
DOCTORAL SYMPOSIUMExploring Statistical Language Models forRecommender Systems
RecSys 201516 - 20 September, Vienna, Austria
Daniel Valcarce@dvalcarce
Information Retrieval LabUniversity of A CoruñaSpain
Motivation
1
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
# Goal: Retrieve relevantdocuments according to theinformation need of a user
# Examples: Search engines(web, multimedia...)
# Input: The user’s query(explicit).
Information Filtering (IF)
# Goal: Select relevant itemsfrom an information streamfor a given user
# Examples: spam filters,recommender systems
# Input: The user’s history(implicit).
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
# Goal: Retrieve relevantdocuments according to theinformation need of a user
# Examples: Search engines(web, multimedia...)
# Input: The user’s query(explicit).
Information Filtering (IF)
# Goal: Select relevant itemsfrom an information streamfor a given user
# Examples: spam filters,recommender systems
# Input: The user’s history(implicit).
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
# Goal: Retrieve relevantdocuments according to theinformation need of a user
# Examples: Search engines(web, multimedia...)
# Input: The user’s query(explicit).
Information Filtering (IF)
# Goal: Select relevant itemsfrom an information streamfor a given user
# Examples: spam filters,recommender systems
# Input: The user’s history(implicit).
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
# Goal: Retrieve relevantdocuments according to theinformation need of a user
# Examples: Search engines(web, multimedia...)
# Input: The user’s query(explicit).
Information Filtering (IF)
# Goal: Select relevant itemsfrom an information streamfor a given user
# Examples: spam filters,recommender systems
# Input: The user’s history(implicit).
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
# Goal: Retrieve relevantdocuments according to theinformation need of a user
# Examples: Search engines(web, multimedia...)
# Input: The user’s query(explicit).
Information Filtering (IF)
# Goal: Select relevant itemsfrom an information streamfor a given user
# Examples: spam filters,recommender systems
# Input: The user’s history(implicit).
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
# Goal: Retrieve relevantdocuments according to theinformation need of a user
# Examples: Search engines(web, multimedia...)
# Input: The user’s query(explicit).
Information Filtering (IF)
# Goal: Select relevant itemsfrom an information streamfor a given user
# Examples: spam filters,recommender systems
# Input: The user’s history(implicit).
2
Information Retrieval vs Information Filtering (2)
Some people consider them different fields:
# U. Hanani, B. Shapira and P. Shoval: InformationFiltering: Overview of Issues, Research and Systems inUser Modeling and User-Adapted Interaction (2001)
While other consider them the same thing:
# N. J. Belkin and W. B. Croft: Information filtering andinformation retrieval: two sides of the same coin? inCommunications of the ACM (1992)
What is undeniable is that they are closely related:
# Why not apply techniques from one field to the other?
# It has already been done!
3
Information Retrieval vs Information Filtering (2)
Some people consider them different fields:
# U. Hanani, B. Shapira and P. Shoval: InformationFiltering: Overview of Issues, Research and Systems inUser Modeling and User-Adapted Interaction (2001)
While other consider them the same thing:
# N. J. Belkin and W. B. Croft: Information filtering andinformation retrieval: two sides of the same coin? inCommunications of the ACM (1992)
What is undeniable is that they are closely related:
# Why not apply techniques from one field to the other?
# It has already been done!
3
Information Retrieval vs Information Filtering (2)
Some people consider them different fields:
# U. Hanani, B. Shapira and P. Shoval: InformationFiltering: Overview of Issues, Research and Systems inUser Modeling and User-Adapted Interaction (2001)
While other consider them the same thing:
# N. J. Belkin and W. B. Croft: Information filtering andinformation retrieval: two sides of the same coin? inCommunications of the ACM (1992)
What is undeniable is that they are closely related:
# Why not apply techniques from one field to the other?
# It has already been done!
3
Information Retrieval vs Information Filtering (2)
Some people consider them different fields:
# U. Hanani, B. Shapira and P. Shoval: InformationFiltering: Overview of Issues, Research and Systems inUser Modeling and User-Adapted Interaction (2001)
While other consider them the same thing:
# N. J. Belkin and W. B. Croft: Information filtering andinformation retrieval: two sides of the same coin? inCommunications of the ACM (1992)
What is undeniable is that they are closely related:
# Why not apply techniques from one field to the other?
# It has already been done!
3
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
# Vector: Vector Space Model
# MF: Latent SemanticIndexing (LSI)
# Probabilistic: LDA
,Language Models (LM)
Information Filtering (IF)
Some CF techniques are:
# Vector: Pairwise similarities(cosine, Pearson)
# MF: SVD, NMF
# Probabilistic: LDA andother PGMs
4
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
# Vector: Vector Space Model
# MF: Latent SemanticIndexing (LSI)
# Probabilistic: LDA
,Language Models (LM)
Information Filtering (IF)
Some CF techniques are:
# Vector: Pairwise similarities(cosine, Pearson)
# MF: SVD, NMF
# Probabilistic: LDA andother PGMs
4
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
# Vector: Vector Space Model
# MF: Latent SemanticIndexing (LSI)
# Probabilistic: LDA
,Language Models (LM)
Information Filtering (IF)
Some CF techniques are:
# Vector: Pairwise similarities(cosine, Pearson)
# MF: SVD, NMF
# Probabilistic: LDA andother PGMs
4
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
# Vector: Vector Space Model
# MF: Latent SemanticIndexing (LSI)
# Probabilistic: LDA
,Language Models (LM)
Information Filtering (IF)
Some CF techniques are:
# Vector: Pairwise similarities(cosine, Pearson)
# MF: SVD, NMF
# Probabilistic: LDA andother PGMs
4
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
# Vector: Vector Space Model
# MF: Latent SemanticIndexing (LSI)
# Probabilistic: LDA,Language Models (LM)
Information Filtering (IF)
Some CF techniques are:
# Vector: Pairwise similarities(cosine, Pearson)
# MF: SVD, NMF
# Probabilistic: LDA andother PGMs
4
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough inInformation Retrieval:
# State-of-the-art technique for text retrieval
# Solid statistical foundation
Maybe they can also be useful in RecSys:
# Are LM a good framework for Collaborative Filtering?
# Can LM be adapted to deal with temporal (TARS) and/orcontextual information (CARS)?
# A principled formulation of LM that combinesContent-Based and Collaborative Filtering?
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough inInformation Retrieval:
# State-of-the-art technique for text retrieval
# Solid statistical foundation
Maybe they can also be useful in RecSys:
# Are LM a good framework for Collaborative Filtering?
# Can LM be adapted to deal with temporal (TARS) and/orcontextual information (CARS)?
# A principled formulation of LM that combinesContent-Based and Collaborative Filtering?
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough inInformation Retrieval:
# State-of-the-art technique for text retrieval
# Solid statistical foundation
Maybe they can also be useful in RecSys:
# Are LM a good framework for Collaborative Filtering?
# Can LM be adapted to deal with temporal (TARS) and/orcontextual information (CARS)?
# A principled formulation of LM that combinesContent-Based and Collaborative Filtering?
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough inInformation Retrieval:
# State-of-the-art technique for text retrieval
# Solid statistical foundation
Maybe they can also be useful in RecSys:
# Are LM a good framework for Collaborative Filtering?
# Can LM be adapted to deal with temporal (TARS) and/orcontextual information (CARS)?
# A principled formulation of LM that combinesContent-Based and Collaborative Filtering?
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough inInformation Retrieval:
# State-of-the-art technique for text retrieval
# Solid statistical foundation
Maybe they can also be useful in RecSys:
# Are LM a good framework for Collaborative Filtering?
# Can LM be adapted to deal with temporal (TARS) and/orcontextual information (CARS)?
# A principled formulation of LM that combinesContent-Based and Collaborative Filtering?
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough inInformation Retrieval:
# State-of-the-art technique for text retrieval
# Solid statistical foundation
Maybe they can also be useful in RecSys:
# Are LM a good framework for Collaborative Filtering?
# Can LM be adapted to deal with temporal (TARS) and/orcontextual information (CARS)?
# A principled formulation of LM that combinesContent-Based and Collaborative Filtering?
5
Language Models for Recommendation: Related work
There is little work done in using Language Models for CF:
# J. Wang, A. P. de Vries and M. J. Reinders: A User-ItemRelevance Model for Log-based Collaborative Filteringin ECIR 2006
# A. Bellogín, J. Wang and P. Castells: BridgingMemory-Based Collaborative Filtering and TextRetrieval in Information Retrieval (2013)
# J. Parapar, A. Bellogín, P. Castells and Á. Barreiro:Relevance-Based Language Modelling for RecommenderSystems in Information Processing & Management (2013)
6
Language Models for Recommendation: Related work
There is little work done in using Language Models for CF:
# J. Wang, A. P. de Vries and M. J. Reinders: A User-ItemRelevance Model for Log-based Collaborative Filteringin ECIR 2006
# A. Bellogín, J. Wang and P. Castells: BridgingMemory-Based Collaborative Filtering and TextRetrieval in Information Retrieval (2013)
# J. Parapar, A. Bellogín, P. Castells and Á. Barreiro:Relevance-Based Language Modelling for RecommenderSystems in Information Processing & Management (2013)
6
Language Models for Recommendation: Related work
There is little work done in using Language Models for CF:
# J. Wang, A. P. de Vries and M. J. Reinders: A User-ItemRelevance Model for Log-based Collaborative Filteringin ECIR 2006
# A. Bellogín, J. Wang and P. Castells: BridgingMemory-Based Collaborative Filtering and TextRetrieval in Information Retrieval (2013)
# J. Parapar, A. Bellogín, P. Castells and Á. Barreiro:Relevance-Based Language Modelling for RecommenderSystems in Information Processing & Management (2013)
6
Language Models for Recommendation: Related work
There is little work done in using Language Models for CF:
# J. Wang, A. P. de Vries and M. J. Reinders: A User-ItemRelevance Model for Log-based Collaborative Filteringin ECIR 2006
# A. Bellogín, J. Wang and P. Castells: BridgingMemory-Based Collaborative Filtering and TextRetrieval in Information Retrieval (2013)
# J. Parapar, A. Bellogín, P. Castells and Á. Barreiro:Relevance-Based Language Modelling for RecommenderSystems in Information Processing & Management (2013)
6
Relevance-Based Language Modelsfor Collaborative Filtering
6
Relevance-Based Language Models
Relevance-Based Language Models or Relevance Models (RM)are a pseudo-relevance feedback technique from IR.
Pseudo-relevance feedback is an automatic query expansiontechnique.
The expanded query is expected to yield better results than theoriginal one.
7
Pseudo-relevance feedback
Information need
8
Pseudo-relevance feedback
Information need
query
8
Pseudo-relevance feedback
Information need
query RetrievalSystem
8
Pseudo-relevance feedback
Information need
query RetrievalSystem
8
Pseudo-relevance feedback
Information need
query RetrievalSystem
8
Pseudo-relevance feedback
Information need
query RetrievalSystem
8
Pseudo-relevance feedback
Information need
query RetrievalSystem
QueryExpansion
expandedquery
8
Pseudo-relevance feedback
Information need
query RetrievalSystem
QueryExpansion
expandedquery
8
Relevance-Based Language Models for CF Recommendation (1)
IR RecSysUser’s query User’s profile
mostˆ1,populatedˆ1,stateˆ2 Titanicˆ2,Avatarˆ3,Sharkˆ5
Doc
umen
ts
Nei
ghbo
urs
Term
s
Item
s
9
Relevance-Based Language Models for CF Recommendation (2)
Parapar et al. (2013):
RM2 : p(i|Ru) ∝ p(i)∏j∈Iu
∑v∈Vu
p(i|v) p(v)p(i)
p(j|v)
# Iu is the set of items rated by the user u
# Vu is neighbourhood of the user u. This is computed using aclustering algorithm
# p(i|u) is computed smoothing the maximum likelihoodestimate with the probability in the collection
# p(i) and p(v) are the item and user priors
10
Relevance-Based Language Models for CF Recommendation (2)
Parapar et al. (2013):
RM2 : p(i|Ru) ∝ p(i)∏j∈Iu
∑v∈Vu
p(i|v) p(v)p(i)
p(j|v)
# Iu is the set of items rated by the user u# Vu is neighbourhood of the user u. This is computed
using a clustering algorithm
# p(i|u) is computed smoothing the maximum likelihoodestimate with the probability in the collection
# p(i) and p(v) are the item and user priors
10
Relevance-Based Language Models for CF Recommendation (2)
Parapar et al. (2013):
RM2 : p(i|Ru) ∝ p(i)∏j∈Iu
∑v∈Vu
p(i|v) p(v)p(i)
p(j|v)
# Iu is the set of items rated by the user u# Vu is neighbourhood of the user u. This is computed using a
clustering algorithm
# p(i|u) is computed smoothing the maximum likelihoodestimate with the probability in the collection
# p(i) and p(v) are the item and user priors
10
Relevance-Based Language Models for CF Recommendation (2)
Parapar et al. (2013):
RM2 : p(i|Ru) ∝ p(i)∏j∈Iu
∑v∈Vu
p(i|v) p(v)p(i)
p(j|v)
# Iu is the set of items rated by the user u# Vu is neighbourhood of the user u. This is computed using a
clustering algorithm
# p(i|u) is computed smoothing the maximum likelihoodestimate with the probability in the collection
# p(i) and p(v) are the item and user priors
10
Smoothing methods
10
Smoothing in RM2
RM2 : p(i|Ru) ∝ p(i)∏j∈Iu
∑v∈Vu
p(i|v) p(v)p(i)
p(j|v)
p(i|u) is computed smoothing the maximum likelihoodestimate:
pml(i|u) =ru,i∑
j∈Iuru,j
with the probability in the collection:
p(i|C) =∑
v∈U rv,i∑j∈I, v∈U rv,j
11
Why use smoothing?
In Information Retrieval, smoothing provides:
# A way to deal with data sparsity
# The inverse document frequency (IDF) role
# Document length normalisation
In RecSys, we have the same problems:
# Data sparsity
# Item popularity vs item specificity
# Profiles with different lengths
12
Why use smoothing?
In Information Retrieval, smoothing provides:
# A way to deal with data sparsity
# The inverse document frequency (IDF) role
# Document length normalisation
In RecSys, we have the same problems:
# Data sparsity
# Item popularity vs item specificity
# Profiles with different lengths
12
Smoothing techniques
Jelinek-Mercer (JM): Linear interpolation. Parameter λ.
pλ(i|u) = (1− λ) pml(i|u) + λ p(i|C)
Dirichlet priors (DP): Bayesian analysis. Parameter µ.
pµ(i|u) =ru,i + µ p(i|C)µ+
∑j∈Iu
ru,j
Absolute Discounting (AD): Subtract a constant δ.
pδ(i|u) =max(ru,i − δ, 0) + δ |Iu| p(i|C)∑
j∈Iuru,j
13
Experiments with smoothing
13
Smoothing: ranking accuracy
0.20
0.25
0.30
0.35
0 100 200 300 400 500 600 700 800 900 1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
nDC
G@
10
µ
λ, δ
RM2 + ADRM2 + JMRM2 + DP
Figure: nDCG@10 values of RM2 varying the smoothing methodusing 400 nearest neighbours according to Pearson’s correlation onMovieLens 100k dataset
14
Smoothing: diversity
0.010
0.015
0.020
0.025
0.030
0 100 200 300 400 500 600 700 800 900 1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Gin
i@10
µ
λ, δ
RM2 + ADRM2 + JMRM2 + DP
Figure: Gini@10 values of RM2 varying the smoothing method using400 nearest neighbours according to Pearson’s correlation onMovieLens 100k dataset
15
Smoothing: novelty
7.5
8.0
8.5
9.0
9.5
0 100 200 300 400 500 600 700 800 900 1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
MSI
@10
µ
λ, δ
RM2 + ADRM2 + JMRM2 + DP
Figure: MSI@10 values of RM2 varying the smoothing method using400 nearest neighbours according to Pearson’s correlation onMovieLens 100k dataset
16
More about smoothings in RM2 for CF
More details about smoothings in:
D. Valcarce, J. Parapar, Á. Barreiro: A Study ofSmoothing Methods for Relevance-Based LanguageModelling of Recommender Systems in ECIR 2015
17
Priors
17
Priors in RM2
RM2 : p(i|Ru) ∝ p(i)∏j∈Iu
∑v∈Vu
p(i|v) p(v)p(i)
p(j|v)
p(i) and p(v) are the item and user priors:
# Enable to introduce a priori information into the model
# Provide a principled way of modelling business rules!
18
Priors in RM2
RM2 : p(i|Ru) ∝ p(i)∏j∈Iu
∑v∈Vu
p(i|v) p(v)p(i)
p(j|v)
p(i) and p(v) are the item and user priors:
# Enable to introduce a priori information into the model
# Provide a principled way of modelling business rules!
18
Prior estimates
Uniform (U) Linear (L)
User prior pU(u) =1|U| pL(u) =
∑i∈Iu
ru,i∑v∈U
∑j∈Iv
rv,j
Item prior pU(i) =1|I| pL(i) =
∑u∈Ui
ru,i∑j∈I
∑v∈Uj
rv,j
19
Experiments with priors
19
Priors on MovieLens 100k
User prior Item prior nDCG@10 Gini@10 MSI@10
Linear Linear 0.0922 0.4603 28.4284Uniform Linear 0.2453 0.2027 16.4022Uniform Uniform 0.3296 0.0256 6.8273Linear Uniform 0.3423 0.0264 6.7848
Table: nDCG@10, Gini@10 and MSI@10 values of RM2 varying theprior estimates using 400 nearest neighbours according to Pearson’scorrelation on MovieLens 100k dataset and Absolute Discounting(δ = 0.1)
More priors in
D. Valcarce, J. Parapar and Á. Barreiro: A Study of Priorsfor Relevance-Based Language Modelling ofRecommender Systems in RecSys 2015!
20
Comparison with other CF algorithms
20
Priors on MovieLens 100k
Algorithm nDCG@10 Gini@10 MSI@10
SVD 0.0946 0.0109 14.6129SVD++ 0.1113 0.0126 14.9574NNCosNgbr 0.1771 0.0344 16.8222UIR-Item 0.2188 0.0124 5.2337PureSVD 0.3595 0.1364 11.8841RM2-JM 0.3175 0.0232 9.1087RM2-DP 0.3274 0.0251 9.2181RM2-AD 0.3296 0.0256 9.2409RM2-AD-L-U 0.3423 0.0264 9.2004
Table: nDCG@10, Gini@10 and MSI@10 values of different CFrecommendation algorithms
21
Conclusions and future directions
21
Conclusions
IR techniques can be employed in RecSys
# Not only methods such as SVD...
# but also Language Models!
Language Models provide a principled and interpretableframework for recommendation.
Relevance-Based Language Models are competitive, but there isroom for improvements:
# More sophisticated priors# Neighbourhood computation
◦ Different similarity metrics: cosine, Kullback–Leiblerdivergence
◦ Matrix factorisation: NMF, SVD◦ Spectral clustering: NC
22
Conclusions
IR techniques can be employed in RecSys
# Not only methods such as SVD...
# but also Language Models!
Language Models provide a principled and interpretableframework for recommendation.
Relevance-Based Language Models are competitive, but there isroom for improvements:
# More sophisticated priors# Neighbourhood computation
◦ Different similarity metrics: cosine, Kullback–Leiblerdivergence
◦ Matrix factorisation: NMF, SVD◦ Spectral clustering: NC
22
Conclusions
IR techniques can be employed in RecSys
# Not only methods such as SVD...
# but also Language Models!
Language Models provide a principled and interpretableframework for recommendation.
Relevance-Based Language Models are competitive, but there isroom for improvements:
# More sophisticated priors
# Neighbourhood computation◦ Different similarity metrics: cosine, Kullback–Leibler
divergence◦ Matrix factorisation: NMF, SVD◦ Spectral clustering: NC
22
Conclusions
IR techniques can be employed in RecSys
# Not only methods such as SVD...
# but also Language Models!
Language Models provide a principled and interpretableframework for recommendation.
Relevance-Based Language Models are competitive, but there isroom for improvements:
# More sophisticated priors# Neighbourhood computation
◦ Different similarity metrics: cosine, Kullback–Leiblerdivergence
◦ Matrix factorisation: NMF, SVD◦ Spectral clustering: NC
22
Future work
Improve novelty and diversity figures:
# RM2 performance is similar to PureSVD in terms of nDCG
# but it fails in terms of diversity and novelty
Introduce more evidences in the LM framework apart fromratings:
# Content-based information (hybrid recommender)
# Temporal and contextual information (TARS & CARS)
23
Future work
Improve novelty and diversity figures:
# RM2 performance is similar to PureSVD in terms of nDCG
# but it fails in terms of diversity and novelty
Introduce more evidences in the LM framework apart fromratings:
# Content-based information (hybrid recommender)
# Temporal and contextual information (TARS & CARS)
23
Thank you!
@dvalcarce
http://www.dc.fi.udc.es/~dvalcarce
Time and Context in Language Models
Time:
# X. Li and W. B. Croft: Time-based Language Models inCIKM 2003
# K. Berberich, S. Bedathur, O. Alonso and G. Weikum: Alanguage modeling approach for temporal informationneeds in ECIR 2010
Context:
# H. Rode and D. Hiemstra: Conceptual Language Modelsfor Context-Aware Text Retrieval in TREC 2004
# L. Azzopardi: Incorporating Context within theLanguage Modeling Approach for ad hoc InformationRetrieval. PhD Thesis (2005)
25