Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni [email protected]...

31
Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni [email protected] Graduate student University of Minnesota Duluth

Transcript of Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni [email protected]...

Page 1: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Netflix Prize Solution: A Matrix Factorization Approach

ByAtul S. Kulkarni

[email protected] student

University of Minnesota Duluth

Page 2: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.
Page 3: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Agenda

• Problem Description• Netflix Data• Why is it a tough nut to crack?• Overview of methods already applied to this problem• Overview of the Paper• Details of the method• How does this method works for the Netflix problem• My implementation• Results• Q and A?

Page 4: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Netflix Prize Problem

• Given a set of users with their previous ratings for a set of movies, can we predict the rating they will assign to a movie they have not previously rated?

• Defined at http://www.netflixprize.com//index• Seeks to improve the Cinematch’s (Netflix’s existing

movie recommender system) prediction performance by 10%.

• How is the performance measured? – Root Mean Square Error (RMSE)

• Winner gets a prize of 1 Million USD.

Page 5: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Problem Description

• Recommender Systems– Use the knowledge about preference of a group of

users about a certain items and help predict the interest level for other users from same community. [1]

• Collaborative filtering– Widely used method for recommender systems– Tries to find traits of shared interest among users

in a group to help predict the likes and dislikes of the other users within the group. [1]

Page 6: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Why is this problem interesting?

• Used by almost every recommender system today– Amazon– Yahoo– Google– Netflix– …

Page 7: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Netflix Data

• Netflix released data for this competition• Contains nearly 100 Million ratings • Number of users (Anonymous) = 480,189• Number of movies rated by them = 17,770• Training Data is provided per movie• To verify the model developed without submitting

the predictions to Netflix “probe.txt” is provided• To submit the predictions for competition

“qualifying.txt” is used

Page 8: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Netflix Data in Pictures

• These pictures are taken as is from [5]

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 50

5000

10000

15000

20000

25000

30000

35000

40000

45000

Num. Users with Avg. Rating of

Page 9: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Netflix Data in Pictures Contd.

19091923

19241922

19261929

19301935

19341942

19401947

19461951

19521955

19571970

19621969

19751972

19781982

19841987

19901993

20051998

20002004

0

200

400

600

800

1000

1200

1400

1600

Number of movies per year

Page 10: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Netflix Data in Pictures Contd.

4

9

28

33

26

Distribution of ratings

12345

Page 11: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Netflix Data

• Data in the training file is per movie– It looks like this

Movie#Customer#,Rating,Date of RatingCustomer#,Rating,Date of RatingCustomer#,Rating,Date of Rating

- Example 4:1065039,3,2005-09-061544320,1,2004-06-28410199,5,2004-10-16

Page 12: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Netflix Data

Data points in the “probe.txt” looks like this (Have answers)

Movie#Customer#Customer#

1:3087826478711283744

Data in the qualifying.txt looks like this (No answers)

Movie#Customer#, DateofRatingCustomer#, DateofRating

1:1046323,2005-12-191080030,2005-12-231830096,2005-03-14

Page 13: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Hard Nut to Crack?

• Why is this problem such a difficult one?– Total ratings possible = 480,189 (user) * 17,770 (movies) = 8532958530 (8.5

Billion)– Total available = 100 Million– The User x Movies matrix has 8.4 Billion entries

missing– Consider the problem as Least Square problem– We can consider this problem by representing it as

system of equation in a matrix

Page 14: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Technically tough as well

• Huge memory requirements• High time requirements• Because we are using only ~100 Million of

possible 8.5 Billion ratings the predictors have some error in their weights (small training data)

Page 15: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Various Methods Employed for Netflix Prize Problem

• Nearest Neighbor methods– k-NN with variations

• Matrix factorization– Probabilistic Latent Semantic Analysis– Probabilistic Matrix Factorization– Expectation Maximization for Matrix Factorization– Singular Value Decomposition– Regularized Matrix Factorization

[2]

Page 16: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

The Paper

• Title: “Improving regularized singular value decomposition for collaborative filtering” - Arkadiusz Paterek, Proceedings of KDD Cup and Workshop, 2007. [3]

• Uses Algorithm described by Simon Funk (Brandyn Webb) in [4].

• The algorithm revolves around regularized Singular Value Decomposition (SVD) described in [4] and suggests some interesting use of biases to it to improve performance.

• It also proposes some methods for post processing of the features extracted from the SVD.

• It compares the various combinations of methods suggested in the paper for the Netflix Data.

Page 17: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Singular Value Decomposition

• Consider the given problem as a Matrix of Users x Movies A

or • Movies x Users• Show are the two

examples• What do we do with

this representation?

M1 M2 M3 M4 M5 M6

U1 2 4 5 5 1

U2 3 5 1 5

U3 2 4 5 5

U1 U2 U3

M1 2 2

M2 4

M3 5 3 4

M4 5 5 5

M5 1

M6 1 5 5

Page 18: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Singular Value Decomposition

• Method of Matrix Factorization

• Applicable to rectangular matrices and square alike

• Decomposes the matrix in to 3 component matrices whose product approximates the original matrix

• E.g.• D $d

[1] 13.218989 4.887761 1.538870• U $u [,1] [,2] [,3]

[1,] -0.5606779 0.8192382 -0.1203705[2,] -0.5529369 -0.4786352 -0.6820331[3,] -0.6163612 -0.3158436 0.7213472

• V $v [,1] [,2] [,3][1,] -0.17808307 0.20598164 0.78106201[2,] -0.16965834 0.67044040 -0.31288023[3,] -0.52406769 0.28579770 0.15429276[4,] -0.65435261 0.02532797 -0.26336364[5,] -0.04182898 -0.09792523 -0.44320373[6,] -0.48469427 -0.64511243 0.04951659

Page 19: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Can we recover original Matrix?

• Yes. (Well almost!) Here is how.• We will Multiply the 3 Matrices U*D*VT

• We get – A* ~= A.• [,1] [,2] [,3] [,4] [,5] [,6][1,] 2.000000e+00 4.000000e+00 5 5 -1.557185e-17 1[2,] -8.564655e-16 -1.221706e-15 3 5 1.000000e+00 5[3,] 2.000000e+00 -1.231356e-15 4 5 1.757492e-16 5

• We can see this is an Approximation of the original matrix.

Page 20: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

How do we use SVD?

• We use the 2 matrices U and V to estimate the original matrix A.

• So what happened to the diagonal matrix D?• We train our method on the given training set

and learn by rolling the diagonal matrix in the two matrices.

• We do U * VT and obtain A’.• Error = ∀i∀jAij’ – Aij.

Page 21: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Algorithm variations covered in this paper

• Simple Predictors• Regularized SVD• Improved Regularized SVD (with Biases)• Post processing SVD with KNN• Post processing SVD with kernel ridge regression• K-means• Linear model for each item• Decreasing the number of Parameters

Page 22: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

The SVD Algorithm from paper [3,4,6]

• Initialize 2 arrays movieFeatures (U) and customerFeatures (V) to very small value 0.1

• For every feature# in featuresUntil minimum iterations are done or RMSE is not improving more than

minimum improvement For every data point in training set //data point has custID and movieID

prating = customerFeatures[feature#][custID] * movieFeatures [feature#][movieID] //Predict the rating

error = originalrating - prating //Find the errorsquareerrsum += error * error //Sum the squared error for RMSE.cf = customerFeatures[feature#][custID] //locally copy current feature

value mf = movieFeatures [feature#][movieID] //locally copy current feature value

Contd.

Page 23: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Algorithm contd. customerFeatures[feature#][custID] += learningrate *(error * mf – regularizationfactor * cf) //Rolling the ERROR in to the features

movieFeatures [feature#][movieID] += learningrate *(error * cf – regularizationfactor * mf) //Rolling the ERROR in to the feature

RMSE = (squareerrsum / total number of data points) // Calculate RMSE• Now we do the testing• For every test point with custID and movieID

For every feature# in Featurespredictedrating += customerFeatures[feature#][custID] *

movieFeatures [feature#][movieID]

• Caveats – clip the ratings in the range (1, 5) predicted rating might go out of bounds

• “Regularization factor” is introduced by Brandyn Webb in [4] to reduce the over fitting

Page 24: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Variation: Improved Regularized SVD

• That was regularized SVD• Improved Regularized SVD with Biases

– Predict the rating with 2 added biases Ci per customer and Dj per movie

• Rating = Ci + Dj + coustomerFeatures[featue#][i] * movieFeatures[Feature#][j]

– During training update the biases as • Ci += learningrate * (err – regularization(Ci + Dj – global_mean))

• Dj += learningrate * (err – regularization(Ci + Dj – global_mean)) • Learningrate = .001, regularization = 0.05, global_mean = 3.6033

Page 25: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Variation: KNN for Movies

• Post processing with KNN– On the Regularized SVD movieFeature matrix we

run cosine similarity between 2 vectors similarity = movieFeature[movieID1]T * movieFeature[movieID2]

||movieFeature[movieID1]||*||movieFeature[movieID2]||

– Using this similarity measure we build a neighborhood of 1 nearest movies and predict rating of the nearest movie as the predicted rating

Page 26: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Experimentation Strategy by author

• Select 1.5% - 15% of the probe.txt as hold-out set or test set.

• Train all models on rest of the ratings• All models predict the ratings• Merge the results using linear regression on

the test set• Combining two methods for initial prediction

& then performing linear regression

Page 27: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Results from the Paper[2]Predictor Test RMSE with

BASICTest RMSE with BASIC and RSVD2

Cumulative Test RMSE

BASIC .9826 .9039 .9826

RSVD .9024 .9018 .9094

RSVD2 .9039 .9039 .9018

KMEANS .9410 .9029 .9010

SVD_KNN .9525 .9013 .8988

SVD_KRR .9006 .8959 .8933

LM .9506 .8995 .8902

NSVD1 .9312 .8986 .8887

NSVD2 .9590 .9032 .8879

SVD_KRR * NSVD1 - - .8879

SVD_KRR * NSVD2 - - .8877

Replicated from the paper as is

Page 28: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

My Experiments

• I am trying out the regularized SVD method and Improved Regularized SVD method with qualifying.txt, probe.txt

• Also, going to implement first 3 steps of the author’s experimentation strategy (in my case I will predict with regularized SVD and Improved regularized SVD)

• If time permits might try SVD KNN method• I am also varying some parameters like learning rate,

number of features, etc. to see its effect on the results.• I shall have all my results posted on the web site soon

Page 29: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Questions?

Page 30: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

References1. Herlocker, J, Konstan, J., Terveen, L., and Riedl, J.

Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information Systems 22 (2004), ACM Press, 5-53.

2. Gábor Takács, István Pilászy, Bottyán Németh, Domonkos Tikk Scalable Collaborative Filtering Approaches for Large Recommender Systems. JMLR Volume 10 :623--656, 2009.

3. Arkadiusz Paterek, Improving regularized singular value decomposition for collaborative filtering - Proceedings of KDD Cup and Workshop, 2007.

4. http://sifter.org/~simon/journal/20061211.html5. http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/6. G. Gorrell and B. Webb. Generalized hebbian algorithm for incremental latent

semantic analysis. Proceedings of Interspeech, 2006.

Page 31: Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth.

Thanks for your time!

Atul S. [email protected]