Restricted Boltzmann Machines for Collaborative Filtering · 22/10/2019 · Presenter: Vijay...
Transcript of Restricted Boltzmann Machines for Collaborative Filtering · 22/10/2019 · Presenter: Vijay...
Restricted Boltzmann Machines for Collaborative Filtering
Authors: Ruslan Salakhutdinov, Andriy Minh, and Geoffrey HintonProceedings of the 24th international conference on Machine learning. ACM, 2007
Presenter: Vijay Shankar VenkataramanFacilitators: Omar Nada, Jesse Cresswell
Oct 22, 2019
Netflix Prize Prize Dataset (2006)
● Features○ <user, movie, date of grade, grade>○ 480,189 users rated 17,770 movies on 5 point
scale● Training Data (100,480,507 ratings)
○ 480,189 users gave to 17,770 movies○ Training set - 99,072,112○ Probe set - 1,408,395
● Qualifying set (2,817,131 ratings)○ Quiz set - 1,408,342○ Test set - 1,408,789
2Grand Prize of $1M for improving the RMSE by 10% over Netflix’s Cinematch model!
Collaborative Filtering
Key Ideas
● Filtering - predict content likely to get high ratings from given user
● Collaborative - identify users who have rated content similarly
Methods - use neighbourhoods or models
3
(2013). Retrieved from https://en.wikipedia.org/wiki/Collaborative_filtering#/media/File:Collaborative_filtering.gif
Restricted Boltzmann Machines
Energy based unsupervised models
Boltzmann
Conditional Independence
Give a decent example - with biases and weights and energies
4
Restricted Boltzmann Machines
Energy based unsupervised models
Boltzmann
Conditional Independence
5
Restricted Boltzmann Machines
Energy based unsupervised models
Boltzmann
Conditional Independence
6
Restricted Boltzmann MachinesLearning Algorithm
- Make observed states more probable- Maximize the log probability
7
Expensive! Needs MCMC!
Restricted Boltzmann MachinesLearning Algorithm
- Make observed states more probable- Maximize the log probability
Approximate by
8
Needs MCMC! Expensive!
Contrastive Divergence
Restricted Boltzmann MachinesIntuition for Contrastive Divergence
Restricted Boltzmann MachinesIntuition for Contrastive Divergence
10
Restricted Boltzmann Machines for CF
11
1 RBM per user, shared weights
Marginal Distributions
Learning
Restricted Boltzmann Machines for CF
12
1 RBM per user, shared weights
Marginal Distributions
Learning
Restricted Boltzmann Machines for CF
13
1 RBM per user, shared weights
Marginal Distributions
Learning
Restricted Boltzmann Machines for CF
14
Inference
- Use hidden vectors to generate rating probability
- Expected value works well
- Expensive to calculate for many movies
Restricted Boltzmann Machines for CF
15
Inference
- Paper uses a mean-field approximation
Restricted Boltzmann Machines for CF Conditional RBMs
Conditional term
Learning conditional term
Restricted Boltzmann Machines for CF
17
One last trick - Conditional Factored RBMs
Experiments
18
Comparison between RBMs
RBM - F = 100Conditional factored RBM -- F = 500, C = 30
Experiments
19
Comparison with SVDs
Very different errors
Key takeaways
20
Energy based models like RBMs can learn good representations even for large datasets
Learning in RBMs does not rely on backpropagation
At the time, world class performance on Netflix dataset
Very different errors from Matrix Factorization Techniques
Discussion PointsWhy is the mean-field inference faster?
How do we tackle the cold start problem?
How useful are energy based methods in machine learning today?
Why are the errors very different from matrix factorizations? Latent representations are very different?
Do variational autoencoders do better than RBM in generating latent representations?
21