Recommender System Experiments with...
Transcript of Recommender System Experiments with...
![Page 1: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/1.jpg)
Recommender System Experiments with MyMediaLite
Or: Everything you always wanted to know about offline experiments* (*but were afraid to ask)
Zeno Gantner <[email protected]>
Nokia Location & Commerce, Berlin
![Page 2: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/2.jpg)
HERE Maps by Nokia … in Berlin
● ca. 800 people● HERE Maps platform
– mobile apps● HERE Drive● HERE Maps● HERE Transit (public transport)
– customers● Yahoo Maps● Bing Maps● major car companies: BMW, VW,
Toyota, ...
![Page 3: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/3.jpg)
HERE Maps by Nokia … in Berlin
Maps Search Team● #bbuzz regulars● 3 of us contributed to
Lucene 4.3.0 ;-)
http://2011.berlinbuzzwords.de/content/improving-search-ranking-through-ab-tests-case-studyhttp://2012.berlinbuzzwords.de/sessions/efficient-scoring-lucenehttp://2012.berlinbuzzwords.de/sessions/introducing-cascalog-functional-data-processing-hadoophttp://2012.berlinbuzzwords.de/sessions/relevance-optimization-check-candidate-listshttps://issues.apache.org/jira/browse/LUCENE-4930https://issues.apache.org/jira/browse/LUCENE-4571
![Page 4: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/4.jpg)
(C) Paul L. Dineen; license: CC by; source http://www.flickr.com/photos/pauldineen/4529216647/sizes/o/in/photostream/
![Page 5: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/5.jpg)
![Page 6: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/6.jpg)
+ = ?
![Page 7: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/7.jpg)
Data + Software/Algorithms = ???
(c) Joon Han, license: CC by-sa 3.0, source: http://en.wikipedia.org/wiki/File:Groundhog_day_tip_top_bistro.jpg(c) Diliff; license CC by-3.0
Real-world deployments
![Page 8: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/8.jpg)
Data mining competitions
![Page 9: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/9.jpg)
Research
![Page 10: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/10.jpg)
+ = ?
![Page 11: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/11.jpg)
RecSys Experiments with MyMediaLite
1. Interaction Data
2. Baseline Methods
3. Apples and Oranges
4. Metrics
5. Hyperparameter Tuning
6. Reproducibility
![Page 12: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/12.jpg)
Running Example: MyMediaLite
● RecSys toolkit and evaluation framework
● written in C#/Mono● C#, Python, Ruby, F#● 2 Java ports
(RapidMiner plugin)● regular releases (every
2-3 months) since 2010
● simple● choice● free● documented● tested
http://mymedialite.net/http://github.com/zenogantner/MyMediaLite
![Page 13: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/13.jpg)
Running Example: MyMediaLite
command-line tools● rating_prediction
● item_recommendation
Find all examples here:
http://github.com/zenogantner/mml-eval-examples
![Page 14: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/14.jpg)
1. Interaction Data
Explicit feedback
Not always there.
Implicit feedback● views● clicks● purchases
Often positive-only.
![Page 15: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/15.jpg)
1. Interaction Data
User ID Item ID Timestamp
196 242 881250949
186 302 891717742
22 377 878887116
244 51 880606923
... ... ...
item_recommendation --training-file=F1 --test-file=F2
IDs can be (almost) arbitrary strings
optional
Separator: whitespace,tab, comma, :: Alternative format:
yyyy-mm-dd
![Page 16: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/16.jpg)
Random Splits
item_recommendation … --test-ratio=0.25
Shuffle and split:
Simple, but:● Does not take temporal trends into account.● Does not use all data for testing.
![Page 17: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/17.jpg)
k-fold Cross-Validation
item_recommendation … --cross-validation=4
Shuffle and split:
● Uses each data point for evaluation.● Does not take temporal trends into account.
![Page 18: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/18.jpg)
Chronological Splits
rating_prediction … --chronological-split=0.25
rating_prediction … --chronological-split=01/01/2002
Sort chronologically and split:
● Use the past to predict the “future”.● Takes trends in the data into account.
– time of day, day of week
– season
– trending products
![Page 19: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/19.jpg)
(c) Serolillo, license: CC by 2.5
![Page 20: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/20.jpg)
2. Baseline Methods
Why compare against baselines?● Absolute numbers have no meaning.
– … well, at least here.
– Relative numbers may also have no meaning.● … if you compare to the wrong things.
Good baselines:● the strongest solution that is still simple● the existing solution● standard solutions
– coll. filtering: kNN, vanilla matrix factorization
![Page 21: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/21.jpg)
2. Baseline Methods
item_recommendation … --recommender=Random
item_recommendation … --recommender=MostPopular
item_recommendation …
--recommender=MostPopularByAttributes
--item-attributes=ARTISTS
Item recommendation baselines:● random● popular items (by attribute/category)
![Page 22: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/22.jpg)
(c) Michael Collins; license: CC by-2.0
![Page 23: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/23.jpg)
3. Apples and Oranges
Always check if you measure on the same splits.
It happens quite often …
![Page 24: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/24.jpg)
3. Apples and Oranges
Always check if you measure on the same splits.
It happens quite often … e.g. this ICML 2013 paper:
![Page 25: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/25.jpg)
3. Apples and Oranges
![Page 26: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/26.jpg)
3. Apples and Oranges● On chronological splits of the Netflix dataset,
matrix factorization (“SVD”) models usually do not perform below 0.9 RMSE.
● Chronological splits can be much harder than random splits!
Lessons:● Baselines are important – they can also help us
to “debug” experiments.● Do not compare between simple splits and
chronological splits.
![Page 27: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/27.jpg)
(c) Pastorius; license: CC by 3.0; source: http://commons.wikimedia.org/wiki/File:Plastic_tape_measure.jpg
![Page 28: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/28.jpg)
4. Metrics
What is the right metric?● Know your goal.
– It always depends on what you want to achieve.
– What to measure?
● Criticize your metrics.– They may ignore important aspects of your problem.
– They are just approximations of user behavior.
● Eyeball the results.– Your metrics may fail to catch WTF results.
http://thenoisychannel.com/2012/08/20/wtf-k-measuring-ineffectiveness/
![Page 29: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/29.jpg)
4. Metricsitem_recommendation ... --measures=”prec@5,NDCG”
Precision at k● number of “correct” items in the top k results● The choice of k is specific to your application.● very simple● easy to understand and explain
More ranking measures: NDCG, MAP, ERR
![Page 30: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/30.jpg)
4. MetricsPrecision at k
recommendations precision at 4
bad 0
good 1
bad 0
bad 0
bad --
good --
bad --
1/4
![Page 31: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/31.jpg)
5. Hyperparameter Tuningitem_recommendation … --recommender=WRMF
--recommender-options=”reg=0.01 alpha=2”
● Hyperparameters, e.g.– regularization to control overfitting
– learning rate (for gradient descent methods)
– stopping criterion
● You have to do it. Also for your baselines.● Don't get too fancy.
– Grid search will do it in most cases.
● More advanced:– Nelder-Mead/Simplex
– Particle swarm optimization
![Page 32: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/32.jpg)
5. Hyperparameter Tuning
rating_prediction … --search-hp
Grid search● simple● brute force● embarrassingly parallel
“A practical guide to SVM classification”
http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
![Page 33: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/33.jpg)
6. Reproducible Experiments
item_recommendation … --random-seed=1
Random seed● “random” splitting● training initialization● debugging
![Page 34: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/34.jpg)
6. Reproducible Experiments
item_recommendation … --random-seed=1
Besides random seed:● Put everything in version control.
– data, software
– scripts and configuration
● Use build tools like make for automation.– Knows when to re-run your data preprocessing steps.
http://bitaesthetics.com/posts/make-for-data-scientists.html
![Page 35: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/35.jpg)
6. Reproducible Experiments
item_recommendations … --recommender=ExternalItemRecommender --recommender-options=”prediction_file=FILE”
Re-use evaluation code.
Create predictions using external software. Use MyMediaLite for evaluation.
![Page 36: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/36.jpg)
6. Reproducible Experiments
item_recommendations … --recommender=ExternalItemRecommender --recommender-options=”prediction_file=FILE”
Why re-use evaluation code?● Evaluation protocols (splitting+candidate
selection+metrics) are not easy to get right.● Ensures comparability.
– more configuration kept fixed => less risk of accidental differences
● Laziness!
![Page 37: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/37.jpg)
(c) by Caucas; license: CC by-nc-nd 2.0; source: http://www.flickr.com/photos/thecaucas/2597813380/sizes/o/
![Page 38: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/38.jpg)
Summary1. Split your data appropriately.2. Do not compare apples and oranges.3. Compare against simple and strong
baselines.4. Precision at k is a metric that is easy to
explain.5. Grid search is a simple method for
hyperparameter tuning.6. Make your experiments reproducible.7. MyMediaLite can help you with some of these
things ;-). Try it out!
![Page 39: Recommender System Experiments with MyMediaLite2013.berlinbuzzwords.de/sites/2013.berlinbuzzwords.de/files/slides/... · Recommender System Experiments with MyMediaLite Or: Everything](https://reader033.fdocuments.us/reader033/viewer/2022052810/60829bd10a198b65a309ef7b/html5/thumbnails/39.jpg)
http://github.com/zenogantner/mml-eval-exampleshttp://mymedialite.net/http://github.com/zenogantner/MyMediaLite
(c) Michael Sauers; license CC by-nc-sa 2.0