MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
-
Upload
xavier-amatriain -
Category
Engineering
-
view
3.044 -
download
7
Transcript of MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
![Page 1: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/1.jpg)
Distributing Large-scale ML Algorithms: from GPUs to the Cloud
MMDS 2014June, 2014
Xavier AmatriainDirector - Algorithms Engineering @xamat
![Page 2: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/2.jpg)
Outline■ Introduction■ Emmy-winning Algorithms■ Distributing ML Algorithms in Practice■ An example: ANN over GPUs & AWS Cloud
![Page 3: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/3.jpg)
What we were interested in:■ High quality recommendationsProxy question:■ Accuracy in predicted rating ■ Improve by 10% = $1million!
Data size:■ 100M ratings (back then “almost massive”)
![Page 4: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/4.jpg)
2006 2014
![Page 5: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/5.jpg)
Netflix Scale▪ > 44M members
▪ > 40 countries
▪ > 1000 device types
▪ > 5B hours in Q3 2013
▪ Plays: > 50M/day
▪ Searches: > 3M/day
▪ Ratings: > 5M/day
▪ Log 100B events/day
▪ 31.62% of peak US downstream traffic
![Page 6: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/6.jpg)
Smart Models ■ Regression models (Logistic, Linear, Elastic nets)
■ GBDT/RF■ SVD & other MF models■ Factorization Machines■ Restricted Boltzmann Machines■ Markov Chains & other graphical
models■ Clustering (from k-means to
modern non-parametric models)■ Deep ANN■ LDA■ Association Rules■ …
![Page 7: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/7.jpg)
Netflix Algorithms
“Emmy Winning”
![Page 8: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/8.jpg)
![Page 9: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/9.jpg)
Rating Prediction
![Page 10: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/10.jpg)
2007 Progress Prize▪ Top 2 algorithms
▪ MF/SVD - Prize RMSE: 0.8914▪ RBM - Prize RMSE: 0.8990
▪ Linear blend Prize RMSE: 0.88
▪ Currently in use as part of Netflix’ rating prediction component
▪ Limitations▪ Designed for 100M ratings, we have 5B ratings▪ Not adaptable as users add ratings▪ Performance issues
![Page 11: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/11.jpg)
Ranking
Ranking
![Page 12: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/12.jpg)
Page composition
![Page 13: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/13.jpg)
Similarity
![Page 14: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/14.jpg)
Search Recommendations
![Page 15: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/15.jpg)
Postplay
![Page 16: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/16.jpg)
Gamification
![Page 17: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/17.jpg)
Distributing ML algorithms in practice
![Page 18: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/18.jpg)
1. Do I need all that data?
2. At what level should I distribute/parallelize?
3. What latency can I afford?
![Page 19: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/19.jpg)
Do I need all that data?
![Page 20: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/20.jpg)
Really?
Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart
![Page 21: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/21.jpg)
Sometimes, it’s not about more data
![Page 22: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/22.jpg)
[Banko and Brill, 2001]
Norvig: “Google does not have better Algorithms, only more Data”
Many features/ low-bias models
![Page 23: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/23.jpg)
Sometimes, it’s not about more data
![Page 24: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/24.jpg)
At what level should I parallelize?
![Page 25: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/25.jpg)
The three levels of Distribution/Parallelization
1. For each subset of the population (e.g. region)
2. For each combination of the hyperparameters
3. For each subset of the training data
Each level has different requirements
![Page 26: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/26.jpg)
Level 1 Distribution
■ We may have subsets of the population for which we need to train an independently optimized model.
■ Training can be fully distributed requiring no coordination or data communication
![Page 27: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/27.jpg)
Level 2 Distribution
■ For a given subset of the population we need to find the “optimal” model
■ Train several models with different hyperparameter values
■ Worst-case: grid search■ Can do much better than this (E.g. Bayesian
Optimization with Gaussian Process Priors)■ This process *does* require coordination
■ Need to decide on next “step”■ Need to gather final optimal result
■ Requires data distribution, not sharing
![Page 28: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/28.jpg)
Level 3 Distribution
■ For each combination of hyperparameters, model training may still be expensive
■ Process requires coordination and data sharing/communication
■ Can distribute computation over machines splitting examples or parameters (e.g. ADMM)
■ Or parallelize on a single multicore machine (e.g. Hogwild)
■ Or… use GPUs
![Page 29: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/29.jpg)
ANN Training over GPUS and AWS
![Page 30: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/30.jpg)
ANN Training over GPUS and AWS
■ Level 1 distribution: machines over different AWS regions
■ Level 2 distribution: machines in AWS and same AWS region
■ Use coordination tools■ Spearmint or similar for parameter optimization■ Condor, StarCluster, Mesos… for distributed cluster coordination
■ Level 3 parallelization: highly optimized parallel CUDA code on GPUs
![Page 31: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/31.jpg)
What latency can I afford?
![Page 32: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/32.jpg)
3 shades of latency
▪ Blueprint for multiple algorithm services
▪ Ranking
▪ Row selection
▪ Ratings
▪ Search
▪ …
▪ Multi-layered Machine Learning
![Page 33: MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud](https://reader030.fdocuments.us/reader030/viewer/2022020213/58f9a98f760da3da068b7000/html5/thumbnails/33.jpg)
Matrix Factorization Example