An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn...

You may also like...An Introduction to Recommendation Systems

Raj BandyopadhyayDamballa

1Thursday, December 13, 12

The long tail• Brick-and-mortar stores: cater to aggregate

population, limited product diversity

• Online retailers: cater to eclectic/niche preferences, rare and less popular items

• However, users need to be able to discover those rare and niche items!

• Recommendation systems provide a way


The long tail

TraditionalRetailers

OnlineRetailers

Popularized by Chris Anderson, editor of Wired.


Problem Statement• For a system with users and items

• Predict how a user U will rate an item I

• user ‘likes’ a story on Facebook

• user gives a movie 4/5 stars on Netflix

• Find items likely to be rated highly by a user

• Examples of items: movies (Netflix), products (Amazon), stories (Facebook)


HP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6

A 4 5 5 1

B 2 3 2 3

C 3 5 4 4

D 3 2 2 4 5 4

HP: Harry Potter, TW: Twilight, SW: Star Wars

Utility (ratings) matrix

Users

Movies


A (bipartite) graph view

HP1 TW1 TW2 TW3 SW4 SW5 SW6HP2 HP3

A B C D

The weights of the lines represent the ratings


Recommendation systems• Collaborative filtering: recommend items

based on ratings by users with similar preferences

• Content-based: recommend items similar intrinsically to previous items highly rated by user

• Latent Factor Models: Use techniques from Linear Algebra to extract hidden features, on which predictions are based

• Hybrid Systems: combine different approaches


Collaborative Filtering• For user U, find a neighborhood of users

{N}, who have preferences similar to U

• Predict U’s ratings based on ratings of other users in {N}

• How do we find users similar to U?

• We use a similarity metric


Intuition: Similarity metric

HP1 TW1 TW2 TW3 SW4 SW5 SW6HP2 HP3

A C D

C and D are more similar than A and D


Intuition: Similarity metric

• Users C and D should be assigned a higher similarity score if and only if:

• C and D watch many of the same movies

• C and D rate the same movies similarly

• Several similarity metrics meet these conditions. Let’s look at one...


Cosine similarity

• Treat each user as a vector of ratings

• Cosine similarity of u and v: u.v/|u||v|

• Always in the range [0,1]

• Higher value => Greater similarity

• Cosine of the angle θ between u,v

θ

v

u


Cosine similarityHP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6

A 4 5 5 1

B 2 3 2 3

C 3 5 4 4

D 3 2 2 4 5 4

A B C D

A 1 0.26 0.26 0.04

B 0.26 1 0.7 0.57

C 0.26 0.7 1 0.44

D 0.04 0.57 0.44 1

Takes into account:i) number of items in commonii) similarity of ratings

C is more similar to D than A


Predicting ratings• How can we predict the rating given by a

user u to an item i?

• Use the similarity metric to find the similarity of other users to u

• Find top K most similar neighbors to u who have rated item i

• Average the neighbors’ ratings for i


Predicting ratingsHP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6

A 4 5 5 1

B 2 3 2 3

C 3 5 4 4 ???

D 3 2 2 4 5 4

Example: How would user C rate movie SW4?

Choosing K=2,Top K neighbors for C who have rated SW4 are B and D

(0.7x3 + 0.44x4)/(0.7 + 0.44) = 3.39

A B C D

A 1 0.26 0.26 0.04

B 0.26 1 0.7 0.57

C 0.26 0.7 1 0.44

D 0.04 0.57 0.44 1


Item based similarity• Current approach: user-based similarity

• “People who like this also like...”

• Could we use the same approach for items?

• “You may also like...”

• Yes! Item-based similarity: dual of user-based

• Items rated similarly by the same user get higher similarity score

• Predict ratings for a new item based on ratings by a user for similar items


Item-based similarity

HP1 TW1 SW4

A 4 1

B 2 3 3

C 3 5

D 3 4

Treat the columns as vectors and calculate similarity between them

HP1 TW1 SW4

HP1 1 0.7 0.22

TW1 0.7 1 0.63

SW4 0.22 0.63 1

Example: How would user C rate movie SW4?

Similar to user-based method, choosing K = 2Top 2 movies similar to SW4 rated by C are HP1 and TW1

(0.63x5 + 0.22x3)/(0.63 + 0.22) = 4.48


User or Item based similarity?

• Theoretically, should be similar

• In practice, item-based works better

• Items are less ‘complicated’ than people

• Easier to categorize items

• Better performance: needs fewer neighbors to get accurate prediction

Example: CD similar to Mozart, Beethoven, Bach => classical. Not same for users

A user typically rates only a small fraction of items, so less items to iterate over


Collaborative Filtering: Problems• Cold Start and First Rater: new item/user

• Fraud/Attacks: fake items/ratings

• Sparsity of utility matrix

• Implicit features: what do users actually like?

• How can we address these problems?

• Use recommendations based on intrinsic properties of users/items (content-based models)

• Use linear algebra to extract useful features and reduce sparsity (latent factor models)




• Content-based: recommend items similar intrinsically to previous items highly rated by user




Content-based systems• Create a profile vector for each item/user

• Use machine learning (clustering and classification) to find similar items/users

• Profile vectors composed of features designed to reflect intrinsic properties

• What features?


Profile features• Tags/categories for items

• Demographic features for users

• Examples:

• Gender, race, income

• Movie genres: Sci-fi, Fantasy, Comedy

• Does movie X cast actor Y?


Examples: featuresFantasy Sci-Fi Magical

powersSupernatural

creaturesSpaceships Alan

RickmanHarrison

Ford

HP1 1 0 1 1 0 1 0

TW1 1 0 0 1 0 0 0

SW4 0 1 1 0 1 0 1

• How do we use these feature vectors? Here’s one way:

• Calculate similarity between movie profile vectors

• Similar to item-based CF, use weighted average

• Many other algorithms can be appliedTalk about problems with content based systems- need to decide and add features manually


So far...• General approach in recommenders:

• Find similar users/items

• Use a weighted average of neighbors’ ratings to predict unknown rating

• Collaborative filtering: behavioral similarity

• Content-based systems: intrinsic similarity




• Content-based: recommend items similar (intrinsically) to previous items highly rated by user




Latent Factor Models• Can we make the utility matrix denser?

• Can we gain some insight into what user preferences actually mean?

• Linear algebra techniques can uncover these latent (hidden) factors

• SVD: singular value decomposition


How does SVD help?• Find low-dim approximation of ratings matrix

• Use low-dim vectors in to calculate similarity

Dim 1 Dim 2

A 0.9 -0.8

B 0.1 0.3

C 0.24 0.85

D -0.89 0.4


A 4 5 5 1

B 2 3 2 3

C 3 5 4 4

D 3 2 2 4 5 4

We have reduced each user from a sparser 9-D vector to a 2-D vector.

How can we interpret the dimensions here?Perhaps as “user tastes” or “movie genres”.

SVD


Interpreting the SVDX=Dim 1 Y=Dim 2

A 0.9 -0.8

B 0.1 0.3

C 0.24 0.85

D -0.89 0.4


A 4 5 5 1

B 2 3 2 3

C 3 5 4 4

D 3 2 2 4 5 4

A

D

C

B

Female

Fantasy

Male

Sci-Fi

In this case we interpret the SVD approximation as showing:

Dim 1: Sci-Fi to FantasyDim 2: Male to Female

Note: C is closer to D than A




• Content-based: recommend items similar (intrinsically) to previous items highly rated by user

• Latent Factor Models: Use mathematical techniques to extract hidden features, on which predictions are based


• Let’s look at a case study


Case Study: Netflix contest

Hybrid system with some important innovations

CF

SVD

Other..

DT~500 features

Final ratings

Winner: Bellkor’s Pragmatic Chaos


Bellkor’s innovations• Incorporate user biases in model

• Bias parameters for each user & movie

• Ratings are deviations from user/movie-specific bias

• Incorporate temporal shifts in model

• User ratings change based on time

• Algorithms parametrized by time

• Focus on good ways to blend algorithms

• GBDT: Gradient boosted decision trees


Summary

• We have seen several algorithms:

• CF, content-based, latent-factor (SVD)

• Hybrid: combinations of the above

• So which algorithm should you use?

• Depends on your use case and business


Summary• Are recommendations an absolutely core, critical

part of your business? (e.g. Netflix, Amazon)

• You should be using a hybrid system

• Spend a lot of effort tuning based on domain-specific knowledge

• Find innovative ways to combine different kinds of features


Summary• Is your data set extremely sparse?

• Latent factor (SVD) based models may extract the “essence” of your data

• YMMV on whether they provide usable insights into customer behavior

• Otherwise...

• Collaborative filtering models: fast and easy to implement and test


The road ahead• Users can be fickle about providing ratings

• (How often do you rate stuff?)

• Collect better data without annoying users

• Games to collect ratings information

• “gamification”, work of Luis von Ahn

• NLP to parse reviews & other sources

• sentiment analysis to infer ratings


The road ahead

• Incorporate social network information

• Mine your twitter feed for content

• Use your social graph to identify similar users/friends

• Recommend across genres/categories


Further reading• Machine Learning in Action: Harrington,

Manning Publishers

• Recommender Systems: Melville and Sindhwani, IBM Research

• Mining of Massive Datasets (Ch 9): Rajaraman, Ullman and Leskovec, Stanford University

• The Bellkor Solution to the Netflix Grand Prize: Bell and Koren, AT&T labs


http://www.manning.com/pharrington/

http://www.manning.com/pharrington/

http://www.vikas.sindhwani.org/recommender.pdf

http://www.vikas.sindhwani.org/recommender.pdf

http://infolab.stanford.edu/~ullman/mmds/ch9a.pdf

http://infolab.stanford.edu/~ullman/mmds/ch9a.pdf

http://www.stat.osu.edu/~dmsl/GrandPrize2009_BPC_BellKor.pdf




An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn...

Documents

Transcript of An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn...