An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn...
Transcript of An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn...
You may also like...An Introduction to Recommendation Systems
Raj BandyopadhyayDamballa
1Thursday, December 13, 12
The long tail• Brick-and-mortar stores: cater to aggregate
population, limited product diversity
• Online retailers: cater to eclectic/niche preferences, rare and less popular items
• However, users need to be able to discover those rare and niche items!
• Recommendation systems provide a way
2Thursday, December 13, 12
The long tail
TraditionalRetailers
OnlineRetailers
Popularized by Chris Anderson, editor of Wired.
3Thursday, December 13, 12
Problem Statement• For a system with users and items
• Predict how a user U will rate an item I
• user ‘likes’ a story on Facebook
• user gives a movie 4/5 stars on Netflix
• Find items likely to be rated highly by a user
• Examples of items: movies (Netflix), products (Amazon), stories (Facebook)
4Thursday, December 13, 12
HP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6
A 4 5 5 1
B 2 3 2 3
C 3 5 4 4
D 3 2 2 4 5 4
HP: Harry Potter, TW: Twilight, SW: Star Wars
Utility (ratings) matrix
Users
Movies
5Thursday, December 13, 12
A (bipartite) graph view
HP1 TW1 TW2 TW3 SW4 SW5 SW6HP2 HP3
A B C D
The weights of the lines represent the ratings
6Thursday, December 13, 12
Recommendation systems• Collaborative filtering: recommend items
based on ratings by users with similar preferences
• Content-based: recommend items similar intrinsically to previous items highly rated by user
• Latent Factor Models: Use techniques from Linear Algebra to extract hidden features, on which predictions are based
• Hybrid Systems: combine different approaches
7Thursday, December 13, 12
Collaborative Filtering• For user U, find a neighborhood of users
{N}, who have preferences similar to U
• Predict U’s ratings based on ratings of other users in {N}
• How do we find users similar to U?
• We use a similarity metric
8Thursday, December 13, 12
Intuition: Similarity metric
HP1 TW1 TW2 TW3 SW4 SW5 SW6HP2 HP3
A C D
C and D are more similar than A and D
9Thursday, December 13, 12
Intuition: Similarity metric
• Users C and D should be assigned a higher similarity score if and only if:
• C and D watch many of the same movies
• C and D rate the same movies similarly
• Several similarity metrics meet these conditions. Let’s look at one...
10Thursday, December 13, 12
Cosine similarity
• Treat each user as a vector of ratings
• Cosine similarity of u and v: u.v/|u||v|
• Always in the range [0,1]
• Higher value => Greater similarity
• Cosine of the angle θ between u,v
θ
v
u
11Thursday, December 13, 12
Cosine similarityHP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6
A 4 5 5 1
B 2 3 2 3
C 3 5 4 4
D 3 2 2 4 5 4
A B C D
A 1 0.26 0.26 0.04
B 0.26 1 0.7 0.57
C 0.26 0.7 1 0.44
D 0.04 0.57 0.44 1
Takes into account:i) number of items in commonii) similarity of ratings
C is more similar to D than A
12Thursday, December 13, 12
Predicting ratings• How can we predict the rating given by a
user u to an item i?
• Use the similarity metric to find the similarity of other users to u
• Find top K most similar neighbors to u who have rated item i
• Average the neighbors’ ratings for i
13Thursday, December 13, 12
Predicting ratingsHP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6
A 4 5 5 1
B 2 3 2 3
C 3 5 4 4 ???
D 3 2 2 4 5 4
Example: How would user C rate movie SW4?
Choosing K=2,Top K neighbors for C who have rated SW4 are B and D
(0.7x3 + 0.44x4)/(0.7 + 0.44) = 3.39
A B C D
A 1 0.26 0.26 0.04
B 0.26 1 0.7 0.57
C 0.26 0.7 1 0.44
D 0.04 0.57 0.44 1
14Thursday, December 13, 12
Item based similarity• Current approach: user-based similarity
• “People who like this also like...”
• Could we use the same approach for items?
• “You may also like...”
• Yes! Item-based similarity: dual of user-based
• Items rated similarly by the same user get higher similarity score
• Predict ratings for a new item based on ratings by a user for similar items
15Thursday, December 13, 12
Item-based similarity
HP1 TW1 SW4
A 4 1
B 2 3 3
C 3 5
D 3 4
Treat the columns as vectors and calculate similarity between them
HP1 TW1 SW4
HP1 1 0.7 0.22
TW1 0.7 1 0.63
SW4 0.22 0.63 1
Example: How would user C rate movie SW4?
Similar to user-based method, choosing K = 2Top 2 movies similar to SW4 rated by C are HP1 and TW1
(0.63x5 + 0.22x3)/(0.63 + 0.22) = 4.48
16Thursday, December 13, 12
User or Item based similarity?
• Theoretically, should be similar
• In practice, item-based works better
• Items are less ‘complicated’ than people
• Easier to categorize items
• Better performance: needs fewer neighbors to get accurate prediction
Example: CD similar to Mozart, Beethoven, Bach => classical. Not same for users
A user typically rates only a small fraction of items, so less items to iterate over
17Thursday, December 13, 12
Collaborative Filtering: Problems• Cold Start and First Rater: new item/user
• Fraud/Attacks: fake items/ratings
• Sparsity of utility matrix
• Implicit features: what do users actually like?
• How can we address these problems?
• Use recommendations based on intrinsic properties of users/items (content-based models)
• Use linear algebra to extract useful features and reduce sparsity (latent factor models)
18Thursday, December 13, 12
Recommendation systems• Collaborative filtering: recommend items
based on ratings by users with similar preferences
• Content-based: recommend items similar intrinsically to previous items highly rated by user
• Latent Factor Models: Use techniques from Linear Algebra to extract hidden features, on which predictions are based
• Hybrid Systems: combine different approaches
19Thursday, December 13, 12
Content-based systems• Create a profile vector for each item/user
• Use machine learning (clustering and classification) to find similar items/users
• Profile vectors composed of features designed to reflect intrinsic properties
• What features?
20Thursday, December 13, 12
Profile features• Tags/categories for items
• Demographic features for users
• Examples:
• Gender, race, income
• Movie genres: Sci-fi, Fantasy, Comedy
• Does movie X cast actor Y?
21Thursday, December 13, 12
Examples: featuresFantasy Sci-Fi Magical
powersSupernatural
creaturesSpaceships Alan
RickmanHarrison
Ford
HP1 1 0 1 1 0 1 0
TW1 1 0 0 1 0 0 0
SW4 0 1 1 0 1 0 1
• How do we use these feature vectors? Here’s one way:
• Calculate similarity between movie profile vectors
• Similar to item-based CF, use weighted average
• Many other algorithms can be appliedTalk about problems with content based systems- need to decide and add features manually
22Thursday, December 13, 12
So far...• General approach in recommenders:
• Find similar users/items
• Use a weighted average of neighbors’ ratings to predict unknown rating
• Collaborative filtering: behavioral similarity
• Content-based systems: intrinsic similarity
23Thursday, December 13, 12
Recommendation systems• Collaborative filtering: recommend items
based on ratings by users with similar preferences
• Content-based: recommend items similar (intrinsically) to previous items highly rated by user
• Latent Factor Models: Use techniques from Linear Algebra to extract hidden features, on which predictions are based
• Hybrid Systems: combine different approaches
24Thursday, December 13, 12
Latent Factor Models• Can we make the utility matrix denser?
• Can we gain some insight into what user preferences actually mean?
• Linear algebra techniques can uncover these latent (hidden) factors
• SVD: singular value decomposition
25Thursday, December 13, 12
How does SVD help?• Find low-dim approximation of ratings matrix
• Use low-dim vectors in to calculate similarity
Dim 1 Dim 2
A 0.9 -0.8
B 0.1 0.3
C 0.24 0.85
D -0.89 0.4
HP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6
A 4 5 5 1
B 2 3 2 3
C 3 5 4 4
D 3 2 2 4 5 4
We have reduced each user from a sparser 9-D vector to a 2-D vector.
How can we interpret the dimensions here?Perhaps as “user tastes” or “movie genres”.
SVD
26Thursday, December 13, 12
Interpreting the SVDX=Dim 1 Y=Dim 2
A 0.9 -0.8
B 0.1 0.3
C 0.24 0.85
D -0.89 0.4
HP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6
A 4 5 5 1
B 2 3 2 3
C 3 5 4 4
D 3 2 2 4 5 4
A
D
C
B
Female
Fantasy
Male
Sci-Fi
In this case we interpret the SVD approximation as showing:
Dim 1: Sci-Fi to FantasyDim 2: Male to Female
Note: C is closer to D than A
27Thursday, December 13, 12
Recommendation systems• Collaborative filtering: recommend items
based on ratings by users with similar preferences
• Content-based: recommend items similar (intrinsically) to previous items highly rated by user
• Latent Factor Models: Use mathematical techniques to extract hidden features, on which predictions are based
• Hybrid Systems: combine different approaches
• Let’s look at a case study
28Thursday, December 13, 12
Case Study: Netflix contest
Hybrid system with some important innovations
CF
SVD
Other..
DT~500 features
Final ratings
Winner: Bellkor’s Pragmatic Chaos
29Thursday, December 13, 12
Bellkor’s innovations• Incorporate user biases in model
• Bias parameters for each user & movie
• Ratings are deviations from user/movie-specific bias
• Incorporate temporal shifts in model
• User ratings change based on time
• Algorithms parametrized by time
• Focus on good ways to blend algorithms
• GBDT: Gradient boosted decision trees
30Thursday, December 13, 12
Summary
• We have seen several algorithms:
• CF, content-based, latent-factor (SVD)
• Hybrid: combinations of the above
• So which algorithm should you use?
• Depends on your use case and business
31Thursday, December 13, 12
Summary• Are recommendations an absolutely core, critical
part of your business? (e.g. Netflix, Amazon)
• You should be using a hybrid system
• Spend a lot of effort tuning based on domain-specific knowledge
• Find innovative ways to combine different kinds of features
32Thursday, December 13, 12
Summary• Is your data set extremely sparse?
• Latent factor (SVD) based models may extract the “essence” of your data
• YMMV on whether they provide usable insights into customer behavior
• Otherwise...
• Collaborative filtering models: fast and easy to implement and test
33Thursday, December 13, 12
The road ahead• Users can be fickle about providing ratings
• (How often do you rate stuff?)
• Collect better data without annoying users
• Games to collect ratings information
• “gamification”, work of Luis von Ahn
• NLP to parse reviews & other sources
• sentiment analysis to infer ratings
34Thursday, December 13, 12
The road ahead
• Incorporate social network information
• Mine your twitter feed for content
• Use your social graph to identify similar users/friends
• Recommend across genres/categories
35Thursday, December 13, 12
Further reading• Machine Learning in Action: Harrington,
Manning Publishers
• Recommender Systems: Melville and Sindhwani, IBM Research
• Mining of Massive Datasets (Ch 9): Rajaraman, Ullman and Leskovec, Stanford University
• The Bellkor Solution to the Netflix Grand Prize: Bell and Koren, AT&T labs
36Thursday, December 13, 12