7/31/2019 Talk BedCon Berlin 2012
1/42
How to build a recommender system based on Mahout and
Java EE
Berlin Expert Days 29. 30. March 2012 Manuel Blechschmidt CTO Apaxo GmbH
7/31/2019 Talk BedCon Berlin 2012
2/42
All the web content will be personalized in threeto five years.
Sheryl Sandberg COO Facebook 09.2010
7/31/2019 Talk BedCon Berlin 2012
3/42
Agenda
- What is personalization?
- What algorithms can be used?- Architecture of a recommender system
- How to bundle Mahout into an Java EE
application- Interface with the recommender
- Conclusion
7/31/2019 Talk BedCon Berlin 2012
4/42
What is personalization?
Personalization involves using technology toaccommodate the differences betweenindividuals. Once confined mainly to the Web, itis increasingly becoming a factor in education,health care (i.e. personalized medicine),television, and in both "business to business"and "business to consumer" settings.
Source:https://en.wikipedia.org/wiki/Personalization
7/31/2019 Talk BedCon Berlin 2012
5/42
Amazon.com
7/31/2019 Talk BedCon Berlin 2012
6/42
TripAdvisor.com
7/31/2019 Talk BedCon Berlin 2012
7/42
eBay
7/31/2019 Talk BedCon Berlin 2012
8/42
criteo.com - Retargeting
7/31/2019 Talk BedCon Berlin 2012
9/42
Zalando
7/31/2019 Talk BedCon Berlin 2012
10/42
Plista
7/31/2019 Talk BedCon Berlin 2012
11/42
YouTube
7/31/2019 Talk BedCon Berlin 2012
12/42
Naturideen.de (coming soon)
7/31/2019 Talk BedCon Berlin 2012
13/42
Recommender
This talk will concentrate on recommender technology based on collaborative filtering (cf)to personalize a web site
- a lot of research is going on
- cf has shown great success in movie and musicindustry
- recommenders can collect data silently and useit without manual maintenance
7/31/2019 Talk BedCon Berlin 2012
14/42
What is a recommender?Let U be a set of users of the recommendation system and I be the
set of items from which the users can choose. A recommender r is a function which produces for a user u
ia set of recommended
items R kwith k entries and a binary, transitive, antisymmetric and
total relation prefers_over ui
which can be used for sorting the
recommendations for the user. The recommender r is oftencalled a top-k recommender.
7/31/2019 Talk BedCon Berlin 2012
15/42
WARNING!
The next slides contain math which takes normally3 years to understand so don't be disappointedif you don't get it.
7/31/2019 Talk BedCon Berlin 2012
16/42
What should wolf and sheep eat?Ms. Baumgartner your sun is ignoringagain the food chain.
SON!
Source: http://static.nichtlustig.de/comics/full/081009.jpg
http://static.nichtlustig.de/comics/full/081009.jpghttp://static.nichtlustig.de/comics/full/081009.jpg7/31/2019 Talk BedCon Berlin 2012
17/42
Demo Data
Carrots Grass Pork Beef Corn FishRabbit 10 7 1 2 ? 1
Cow 7 10 ? ? ? ?
Dog ? 1 10 10 ? ?
Pig 5 6 4 ? 7 3Chicken 7 6 2 ? 10 ?
Pinguin 2 2 ? 2 2 10
Bear 2 ? 8 8 2 7
Lion ? ? 9 10 2 ?Tiger ? ? 8 ? ? 5
Antilope 6 10 1 1 ? ?
Wolf 1 ? ? 8 ? 3
Sheep ? 8 ? ? ? 2
7/31/2019 Talk BedCon Berlin 2012
18/42
Characteristics of Demo Data
Ratings from 1 10
Users: 12
Items: 6
Ratings: 43 (unusual normally 100,000 100,000,000)
Matrix filled: ~60% (unusual normally sparse around 0.5-2%)
Average Number of Ratings per User: ~3.58 Average Number of Ratings per Item: ~7.17
Average Rating: ~5.579
https://github.com/ManuelB/facebook-recommender-demo/tree/master/docs/BedConExamples.R
https://github.com/ManuelB/facebook-recommender-demo/tree/master/docs/BedConExamples.Rhttps://github.com/ManuelB/facebook-recommender-demo/tree/master/docs/BedConExamples.R7/31/2019 Talk BedCon Berlin 2012
19/42
7/31/2019 Talk BedCon Berlin 2012
20/42
7/31/2019 Talk BedCon Berlin 2012
21/42
YouTube Rating Distribution
http://techcrunch.com/2009/09/22/youtube-comes-to-a-5-star-realization-its-ratings-are-useless/
http://techcrunch.com/2009/09/22/youtube-comes-to-a-5-star-realization-its-ratings-are-useless/http://techcrunch.com/2009/09/22/youtube-comes-to-a-5-star-realization-its-ratings-are-useless/7/31/2019 Talk BedCon Berlin 2012
22/42
Model and Memory Approaches
- Item- or User- Based Collaborative Filtering
- Idea suggest what similair users did
- Matrix Factorization e.g Singular Value Decomposition- Try to create a profile based of factors for users and items
Main difference:
A model base approach tries to extract the underlying logic fromthe data.
7/31/2019 Talk BedCon Berlin 2012
23/42
User Based Approach
- Find similar or opposite animals like wolf
- Checkout what these other animals like- Recommend this to wolf
7/31/2019 Talk BedCon Berlin 2012
24/42
Find animals which voted for beef,fish and carrots too
Carrots Grass Pork Beef Corn Fish
Wolf 1 ? ? 8 ? 3
Pinguin 2 2 ? 2 2 10
Bear 2 ? 8 8 2 7
Rabbit 10 7 1 2 ? 1
Cow 7 10 ? ? ? ?
Dog ? 1 10 10 ? ?
Pig 5 6 4 ? 7 3
Chicken 7 6 2 ? 10 ?Lion ? ? 9 10 2 ?
Tiger ? ? 8 ? ? 5
Antilope 6 10 1 1 ? ?
Sheep ? 8 ? ? ? ?
7/31/2019 Talk BedCon Berlin 2012
25/42
Pearson Correlation
- 1 = very similar
- (-1) = complete opposite votings
- similarity between wolf and pinguin: -0.2401922
- cor(c(8,3,1),c(2,10,2))
- similarity between wolf and bear: 0.8196562
- cor(c(8,3,1),c(8,7,2))- similarity between wolf and rabbit: -0.6465846
- cor(c(8,3,1),c(2,1,10))
7/31/2019 Talk BedCon Berlin 2012
26/42
Weighted sum of the ratings
Pork (10):
(0.8196 * 8 + (-0.6466) * 1) / (0.8196 + (-0.6465)
= 34 ~ 10
Grass (5.65):
(2*(-0.2401)+7*(-0.6465)) / ((-0.2401) + (-0.6465))= 5,65
Corn (2,0):
2*(-0.2401)+2*(0.8196)) / (-0.2401+0.8196) = 2,0
7/31/2019 Talk BedCon Berlin 2012
27/42
Predicted ratings
- Wolf should eat: Pork Rating: 10.0
- Wolf should eat: Grass Rating: 5.645701
- Wolf should eat: Corn Rating: 2.0
7/31/2019 Talk BedCon Berlin 2012
28/42
SVD
http://public.lanl.gov/mewall/kluwer2002.html
http://public.lanl.gov/mewall/kluwer2002.htmlhttp://public.lanl.gov/mewall/kluwer2002.html7/31/2019 Talk BedCon Berlin 2012
29/42
Factorized Matrixes
7/31/2019 Talk BedCon Berlin 2012
30/42
Predicted Matrix (k = 2)
7/31/2019 Talk BedCon Berlin 2012
31/42
Predicted Ratings
- Sheep should eat: Carrots Rating: 6.01
- Sheep should eat: Corn Rating: 5.83
7/31/2019 Talk BedCon Berlin 2012
32/42
What other algorithms can be used?
Similarity Measures for Item or User based:
- LogLikelihood Similarity
- Cosine Similarity
- Pearson Similarity
- etc.
Estimating algorithms for SVD:- ALSWRFactorizer
- ExpectationMaximizationSVDFactorizer
7/31/2019 Talk BedCon Berlin 2012
33/42
Let's built it
7/31/2019 Talk BedCon Berlin 2012
34/42
Mahout - Taste
Taste is a flexible, fast collaborative filteringengine for Java.
It was primarly written by Sean Owen and is partof the Apache Mahout library.
Another important author is: Sebastian Schelter
It offers all the algorithms introduced here and isOpen Source
http://mahout.apache.org/taste.html
http://mahout.apache.org/taste.htmlhttp://mahout.apache.org/taste.html7/31/2019 Talk BedCon Berlin 2012
35/42
Architecture of the recommender
7/31/2019 Talk BedCon Berlin 2012
36/42
Packaging
7/31/2019 Talk BedCon Berlin 2012
37/42
Eclipse Project
7/31/2019 Talk BedCon Berlin 2012
38/42
Maven pom.xml
7/31/2019 Talk BedCon Berlin 2012
39/42
7/31/2019 Talk BedCon Berlin 2012
40/42
Demo and live Debugging
7/31/2019 Talk BedCon Berlin 2012
41/42
Conclusion
Recommendation is a lot of math
You shouldn't implement the algorithms again
There are a lot of unsanswered questions
- Scalibility, Performance, Usability
You can gain a lot from good personalization
7/31/2019 Talk BedCon Berlin 2012
42/42
More sources
http://www.apaxo.de /
http://mahout.apache.org /
http://research.yahoo.com /
http://www.grouplens.org/
http://recsys.acm.org/
https://github.com/ManuelB/facebook-recommender-demo/
http://docs.oracle.com/javaee/6/tutorial/doc/
http://www.apaxo.de/http://mahout.apache.org/http://research.yahoo.com/http://www.grouplens.org/http://recsys.acm.org/https://github.com/ManuelB/facebook-recommender-demo/http://docs.oracle.com/javaee/6/tutorial/doc/http://docs.oracle.com/javaee/6/tutorial/doc/https://github.com/ManuelB/facebook-recommender-demo/http://recsys.acm.org/http://www.grouplens.org/http://research.yahoo.com/http://mahout.apache.org/http://www.apaxo.de/Top Related