Big, Practical Recommendations with Alternating Least Squares
-
Upload
sean-owen -
Category
Technology
-
view
149 -
download
2
description
Transcript of Big, Practical Recommendations with Alternating Least Squares
![Page 1: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/1.jpg)
Big, Practical Recommendations with Alternating Least Squares
Sean Owen • Apache Mahout / Myrrix.com
![Page 2: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/2.jpg)
WHERE’S BIG LEARNING?
Next: Application Layer Analytics Machine Learning
Like Apache Mahout Common Big Data app
today Clustering, recommenders,
classifiers on Hadoop Free, open source; not
mature
Where’s commercialized Big Learning?
Storage
Database
Processing
Applications
![Page 3: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/3.jpg)
A RECOMMENDER SHOULD …
Answer in Real-time Ingest new data, now Modify recommendations
based on newest data No “cold start” for new data
Scale Horizontally For queries per second For size of data set
Accept Diverse Input Not just people and
products Not just explicit ratings Clicks, views, buys Side information
Be “Pretty Accurate”
![Page 4: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/4.jpg)
NEED: 2-TIER ARCHITECTURE
Real-time Serving Layer Quick results based on
precomputed model Incremental update Partitionable for scale
Batch Computation Layer Builds model Scales out (on Hadoop?) Asynchronous, occasional,
long-lived runs
![Page 5: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/5.jpg)
A PRACTICAL ALGORITHM
MATRIX FACTORIZATION BENEFITS
Factor user-item matrix to user-feature + feature-item matrix
Well understood in ML, as: Principal Component
Analysis Latent Semantic Indexing
Several algorithms, like: Singular Value
Decomposition Alternating Least Squares
Models intuition Factorization is batch
parallelizable Reconstruction (recs) in
low-dimension is fast Allows projection of new
data Cold start solution Approximate update solution
![Page 6: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/6.jpg)
A PRACTICAL IMPLEMENTATION
ALTERNATING LEAST SQUARES BENEFITS
Simple factorization P ≈ X YT
Approximate: X, Y are “skinny” (low-rank)
Faster than the SVD Trivially parallel, iterative
Dumber than the SVD No singular values,
orthonormal basis
Parallelizable by row -- very Hadoop-friendly
Iterative: OK answer fast,
refine as long as desired Yields to “binary” input
model Ratings as regularization
instead Sparseness / 0s no longer a
problem
![Page 7: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/7.jpg)
ALS ALGORITHM 1
Input: (user, item, strength) tuples Anything you can quantify is
input Strength is positive
Many tuples per user-item R is sparse user-item
interaction matrix rij = total strength of
interaction between user i and item j
1 4 3
3
4 3 2
5 2 3
5
2 4 R
![Page 8: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/8.jpg)
ALS ALGORITHM 2
Follow “Collaborative Filtering for Implicit Feedback Datasets”www2.research.att.com/~yifanhu/PUB/cf.pdf
Construct “binary” matrix P 1 where R > 0 0 where R = 0
Factor P, not R R returns in regularization
Still sparse; implicit 0s fine
1 1 1 0 0
0 0 1 0 0
0 1 0 1 1
1 0 1 0 1
0 0 0 1 0
1 1 0 0 0 P
![Page 9: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/9.jpg)
ALS ALGORITHM 3
P is m x n Choose k << m, n Factor P as Q = X YT, Q ≈
P
X is m x k ; YT is k x n
Find best approximation Q Minimize L2 norm of diff: || P-
Q ||2
Minimal squared error: “Least Squares”
Recommendations are largest values in Q
YT
X
![Page 10: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/10.jpg)
ALS ALGORITHM 4
Optimizing X, Y simultaneously is non-convex, hard
If X or Y are fixed, system of linear equations: convex, easy
Initialize Y with random values
Solve for X Fix X, solve for Y Repeat (“Alternating”)
YT
X
![Page 11: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/11.jpg)
ALS ALGORITHM 5
Define regularization weights cui = 1 + α rui
Minimize:
Σ cui(pui – xuTyi)2 + λ(Σ||xu||2 + Σ||yi||2)
Simple least-squares regression objective, plus Weighted least-squared error terms by strength,
a penalty for not reconstructing 1 at “strong” association is higher Standard L2 regularization term
![Page 12: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/12.jpg)
ALS ALGORITHM 6
With fixed Y, compute optimal X Each row xu is independent
Define Cu as diagonal matrix of cu (user strength weights)
xu = (YTCuY + λI)-1 YTCupu
Compare to simple least-squares regression solution (YTY)-1 YTpu
Adds Tikhonov / ridge regression regularization term λI
Attaches cu weights to YT
See paper for how YTCuY is computed efficiently;skipping the engineering!
![Page 13: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/13.jpg)
1 1 1 0 0
0 0 1 0 0
0 1 0 1 1
1 0 1 0 1
0 0 0 1 0
1 1 0 0 0
EXAMPLE FACTORIZATION
k = 3, λ = 2, α = 40, 10 iterations
≈
0.96 0.99 0.99 0.38 0.93
0.44 0.39 0.98 -0.11
0.39
0.70 0.99 0.42 0.98 0.98
1.00 1.04 0.99 0.44 0.98
0.11 0.51 -0.13
1.00 0.57
0.97 1.00 0.68 0.47 0.91
Q = X•YT
![Page 14: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/14.jpg)
FOLD-IN
Need immediate, if approximate, updates for new data
New user u needs new row Qu = Xu YT
We have Pu ≈ Qu
Compute Xu via right inverse:X YT(YT)-1 = Q(YT)-1 so:X = Q(YT)-1
What is (YT)-1?
Note (YTY)(YTY)-1 = I Gives YT’s right inverse:
YT (Y(YTY)-1) = I Xu = Qu Y(YTY)-1
Xu ≈ Pu Y(YTY)-1
Recommend as usual: Qu = XuYT
For existing user, instead add to existing row Xu
![Page 15: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/15.jpg)
THIS IS MYRRIX
Soft-launched Serving Layer available
as open source download Computation Layer available
as beta Ready on Amazon EC2 / EMR Full launch Q4 2012 myrrix.com
![Page 16: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/16.jpg)
APPENDIX
![Page 17: Big, Practical Recommendations with Alternating Least Squares](https://reader033.fdocuments.us/reader033/viewer/2022061223/54c6bbc14a7959d25a8b457a/html5/thumbnails/17.jpg)
EXAMPLES
STACKOVERFLOW TAGS WIKIPEDIA LINKS
Recommend tags to questions
Tag questions automatically, improve tag coverage
3.5M questions x 30K tags
4.3 hours x 5 machines on Amazon EMR
$3.03 ≈ $0.08 per 100,000 recs
Recommend new linked articles from existing links
Propose missing, related links
2.5M articles x 1.8M articles
28 hours x 2 PCs on Apache Hadoop 1.0.3