Post on 06-Jan-2017
2015-12-10Eliezer de Souza da Silva (State-space models, Dynamic PMF vis HDP)
Tomasz Kuśmierczyk (Tensor factorization)
Session 3: Time variant models
Tensor factorizationState-space models
Dynamic Bayesian PMF (via HDP)
Approximate and Scalable Inference for ComplexProbabilistic Models in Recommender Systems
Part 1: Models and Representations
Literature / Sources● Temporal Collaborative Filtering with Bayesian Probabilistic Tensor
Factorization.-- Xiong, L., Chen, X., Huang, T. K., Schneider, J. G., & Carbonell, J. G. 2010. SDM Proceedings.
● Dynamic Matrix Factorization: A State-Space Approach -- John Z. Sun, Kush R. Varshney and Karthik Subbian. 2012. ICASSP.
● Dynamic Bayesian Probabilistic Matrix Factorization -- Sotirios P. Chatzis. 2014. AAAI.
Temporal Collaborative Filtering with
Bayesian Probabilistic Tensor Factorization
Matrix Factorization (previous cases)
M Items
N U
sers
latent 1 latent D
Ratings (normalized)
Matrix Factorization (previous cases)
Users(N x D)
Items(M x D)
Tensors generalization (multi-way data)- P-mode tensor of dimensions M1 x … x Mp (example: observations x
measurements x time x equipments).- Multiple relationships between multidimensional variables- Focus on 3-way (canonical decomposition or parallel factor analysis - CP)
CP Tensor Factorization (current case: 3 way analysis)
M Items
N U
sers K Con
texts
latent 1 latent D
Ratings (normalized)
CP Tensor Factorization (current case)
Users(N x D)
Items(M x D) Context values
(K x D)
M Items
N U
sers K Con
texts
latent 1 latent D
Ratings (normalized)
CP Tensor Factorization (current case)
Temporal ...
● 1 additional type of contexts = time
(3D tensor instead of 2D matrix R)
● In practice:○ ECCO sales: two context values per season (early/late
season)○ Netflix, Movielens: one context value per month
MAP Approach: what’s new to PMF
MAP Approach
MAP Approach
MAP Approach
MAP Approachargmax log p(U,V,T,T0| R)
argmax log p(R|U,V,T,T0) + log p(U,V,T,T0)
MAP Approachargmax log p(U,V,T,T0| R)
argmax log p(R|U,V,T,T0) + log p(U,V,T,T0)
MAP Approachargmax log p(U,V,T,T0| R)
argmax log p(R|U,V,T,T0) + log p(U,V,T,T0)
argmax
MAP Approach
● Four params (lambdas)
● SGD● Block Coordinate Descent
Bayesian approach
Bayesian approach
Bayesian approach
Predictions for unobserved
Integrate over all params
A posteriori distribution of
params
Observed evidence
Bayesian approach: Expectation over posterior dist
Bayesian approach: MCMC estimate
Sample from posterior distribution
Linear state-space approach
Linear state-space approach- User latent factors are time dependent- gaussian assumptions for the dynamics allows exact inference
Linear state-space approach- User latent factors are time dependent- User latent factors are hidden states in a state-space system
time dependent user features
Linear state-space approach- items latent factors are stationary- ratings are time dependent and observed
Stationary items factors
time dependent ratings
time dependent user features
Kalman filters: combining new information
System dynamics
Prediction
Kalman gain
Update
PMF meets Kalman
Stationary items factors
time dependent ratings
time dependent user features
PMF meets Kalman
PMF meets Kalman- Parameters are time-independent- Initial state iid zero mean gaussian for all users with similar scaling of preferences σU- process (time evolution of user preferences) and measurement (estimation of rating from user and item latent
factors) noise are iid zero mean gaussians, σQ,σR- Transitions (A) and measurements (items latent factors H) can be calculated to maximize the log-likelihood.
PMF meets Kalman: learning the parameters- EM with expected joint likelihood maximization- Other approaches: minimizing the residual prediction error, maximizing the prediction likelihood, maximizing the
measurement likelihood, optimizing the performance after smoothing.
Dynamic Bayesian Probabilistic Matrix Factorization
Dynamic Bayesian Probabilistic Matrix Factorization- User patterns changing over time- Groups of users share latent structure (clustering of user features)- Capture the dynamics of the generative process of the group structure- dHDP - dynamic hierarchical dirichlet process
Dirichlet distribution
Dirichlet distribution
Dirichlet process- Distribution of distributions (infinite distribution of discrete distributions)- Clustering effect: rich gets richer- Chinese Restaurant process.
Hierarchical Dirichlet Process (HDP)
HDP for time domain
Bayesian PMF
dHDP
Groups of users
Bayesian PMF