Recommender Systems

43
RecSys: Recommender Systems Tran The Truyen http://truyen.vietlabs.com

description

A survey on current recommendation technologies, including own latest research at IMPCA, Curtin University of Technology

Transcript of Recommender Systems

Page 1: Recommender Systems

RecSys: Recommender Systems

Tran The Truyen

http://truyen.vietlabs.com

Page 2: Recommender Systems

The world is an over-crowded place

Page 3: Recommender Systems

They all want to get our attention

Page 4: Recommender Systems

We are overloaded

• Thousands of news articles

and blog posts each day

• Millions of movies, books

and music tracks online

• In Hanoi, > 50 TV channels,

thousands of programs

each day

• In New York, several

thousands of ad messages

sent to us per day

Page 5: Recommender Systems

But we really need and consume only a few of them!

Page 6: Recommender Systems

Sometimes, all we need is this

Page 7: Recommender Systems

Or, just this

DON’T DISTURB!

Page 8: Recommender Systems

Help me!

Page 9: Recommender Systems

Can Google help?

• Yes, but only when we really know what

we are looking for

• What if I just want some interesting music

tracks?

– Btw, what does it mean by “interesting”?

Page 10: Recommender Systems

Can Facebook help?

• Yes, I tend to find my friends’ stuffs

interesting

• What if I had only few friends, and what

they like do not always attract me?

Page 11: Recommender Systems

Can experts help?

• Yes, but it won’t scale well

– Everyone receives exactly the same advice!

• It is what they like, not me!

– Like movies, what get expert approval does not guarantee attention of the mass

Page 12: Recommender Systems

OK, here is the idea called RecSys:

• To recommend to us

something we may like– It may not be popular– The world is long-tailed

• How?– Based on our history of using services

– Based on other people like us

– Ever heard of “collective intelligence”?

I like these bits

Page 13: Recommender Systems

Hang on, what is long-tailed?

• Popularised by Chris Anderson, Wired 2004

The short-tailed distribution

The long-tailed distribution

The bell-shaped distribution

Page 14: Recommender Systems

Ever heard of

• GroupLens?

• Amazon recommendation?

• Netflix Cinematch?

• Google News personalization?

• Netflix Prize $1mil challenge?

• Strands?

• TiVo?

• Findory?

Page 15: Recommender Systems
Page 16: Recommender Systems

Want some evidences?(Celma & Lamere, ISMIR 2007)

• Netflix:

– 2/3 rented movies are from recommendation

• Google News

– 38% more click-through are due to recommendation

• Amazon

– 35% sales are from recommendation

Page 17: Recommender Systems

What can be recommended?• Advertising messages

• Investment choices

• Restaurants

• Cafes

• Music tracks

• Movies

• TV programs

• Books

• Cloths

• Supermarket goods

• Tags

• News articles

• Online mates (Dating services)

• Future friends (Social network sites)

• Courses in e-learning

• Drug components

• Research papers

• Citations

• Code modules

• Programmers

Page 18: Recommender Systems

But, what do recommender systems do, exactly?

1. Predict how much you may like a certain

product/service

2. Compose a list of N best items for you

3. Compose a list of N best users for a certain

product/service

4. Explain to you why these items are recommended to

you

5. Adjust the prediction and recommendation based on

your feedback and other people

Page 19: Recommender Systems

Graph representation

Titanic Taken Panda

Me My friend You Another guy

?

Page 20: Recommender Systems

We must also take a good care of

• Data normalisation

• Removal or reduction of noise

• Protection of users’ privacy

• Attack: someone just doesn’t like your

system

Page 21: Recommender Systems

Task 1: Preference prediction

• Collaborative filtering

– User-based method

– Item-based method

– Matrix Factorization

• Content-based filtering

• Hybrid:

– Linear/sequential/switching combination

– Semi-Restricted Boltzmann Machines

Page 22: Recommender Systems

Collaborative filtering (1)

• User-based method (1994,

GroupLens)– Many people liked “KungfuPanda”

– Can you tell how much I like it?

– The idea is to pick about 20-50people who share similar taste with me, then how much I like depend on how much THEY liked.

– In short: you may like it because your “friends” liked it

8

7

5 4 5 3 4

3 5 4 5

4 5 4

5 4 3 55

4

5

3 43

2 3 5

1 4 2

345

1 2 3 4 5 6 7 8

item

user

1

2

3

4

5

6

Page 23: Recommender Systems

Collaborative filtering (2)

• Item-based method (2001,

deployed at Amazon) – I have watched so many good & bad movies

– Would you recommend me watching “Taken”?

– The idea is to pick from my previous list 20-50 movies that share similar audience with “Taken”, then how much I will like depend on how much I liked those early movies

– In short: I tend to watch this movie because I have watched those movies … or

– People who have watched those movies also liked this movie (Amazon style)

4 3 55

4

5

3 43

2 3 5

1 4 2

345

1 2 3 4 5 6 7 8

item

user

1

2

3

4

5

6

7

8

5 4 5 3 4

3 5 4 5

4 5 4

5

Page 24: Recommender Systems

Collaborative filtering (3)

• Matrix Factorization (2006, Netflixchallgence) – You many have watched thousands of movies– But perhaps I can tell these movies belong to

10 groups, like Action, Sci-Fi, Animation, etc,…

– So 10 numbers are enough to describe your taste

– Likewise, “Titanic” has been watched by millions people, but perhaps …10 numbers are enough to describe its features

– Magic: these hidden aspects can be discovered automatically by Matrix Factorization!

~ [0.1 0.3 0.2 0.9 0.5 0.4 0.7 0.3 0.8 1.5]

Page 25: Recommender Systems

Problems with collaborative filtering

• Scale– Netflix (2007): 5M users, 50K movies, 1.4B ratings

• Sparse data– I have rated only one book at Amazon!

• Cold-Start– New users and items do not have history

• Popularity bias– Everyone reads “Harry Potter”

• Hacking– Someone reads “Harry Potter” reads “Karma Sutra”

Page 26: Recommender Systems

Content-based method• Web page: words, hyperlinks, images, tags, comments,

titles, URL, topic

• Music: genre, rhythm, melody, harmony, lyrics, meta data,

artists, bands, press releases, expert reviews, loudness,

energy, time, spectrum, duration, frequency, pitch, key,

mode, mood, style, tempo

• User: age, sex, job, location, time, income, education,

language, family status, hobbies, general interests, Web

usage, computer usage, fan club membership, opinion,

comments, tags, mobile usage

• Context: time, location, mobility, activity, socializing,

emotion

Page 27: Recommender Systems

Content-based method (2)

• Can we acquire those content pieces

automatically?– Fairly easy for text

– Difficult for music and video, except for digital signals. E.g. music genre classification 60-80% accuracy

– A lot of noise, e.g. misplaced tags

– Attacks

• What can we do with these?– Compute similarity between items or users

– Query items that are similar to a given item

– Match item’s content and user’s profile

Page 28: Recommender Systems

Content-based method (3)

• Measuring similarity

– Cosine, TF-IDF as in standard Information Retrieval

– KL-divergence for probability-oriented guys

– Euclidean, dimensionality reduction if you want

– Anything you can imagine of!

Page 29: Recommender Systems

Hybrid: Semi-Restricted BoltzmannMachines (2009, IMPCA)

• A probabilistic combination of– Item-based method

– User-based method

– Matrix Factorization

– (May be) content-based method

• It looks like a Neural Network– But it does not really so ☺

• It really is a type of Markov

random fields, which is, again, a

type of Graphical Models– Self-advertising: I work on these stuffs for living!

Item X��������

��������

������������

������������

User A User B User C

Page 30: Recommender Systems

But, what do recommender systems do, exactly?

1. Predict how much you may like a certain

product/service

2. Compose a list of N best items for you

3. Compose a list of N best users for a certain

product/service

4. Explain to you why these items are recommended to

you

5. Adjust the prediction and recommendation based on

your feedback and other people

Page 31: Recommender Systems

• Top-N item list:

– Find similar users, collect what they like

– Filter out those the user has rated

– Rank the remaining items by considering

• The number of times each item is liked by those users

• The popularity of the item

• The associated ratings

• The similarity between each item in the list and what the user has rated

• Switching the role of item to user, we may have

top-N user list

Task 2,3: Top-N recommendation

Page 32: Recommender Systems

But, what do recommender systems do, exactly?

1. Predict how much you may like a certain

product/service

2. Compose a list of N best items for you

3. Compose a list of N best users for a certain

product/service

4. Explain to you why these items are recommended to

you

5. Adjust the prediction and recommendation based on

your feedback and other people

Page 33: Recommender Systems

• This is a current hit …

• More on this artist …

• Try something from similar artists …

• Someone similar to you also like this …

• As you listened to that, you may want this …

• These two go together …

• This is most popular in your group …

• This is highly rated …

• Try something new …

Task 4: Explanation

Page 34: Recommender Systems

• Examples from Strands.com

– Welcom back (recently viewed)

– For you today

– New for you

– Hot / Most popular of this type

– Other people also do this …

– Similar or related products

– Complementary accessories

– This goes with this …

– Gift idea

– Shopping assisant

Task 4: Explanation (2)

Page 35: Recommender Systems

But, what do recommender systems do, exactly?

1. Predict how much you may like a certain

product/service

2. Compose a list of N best items for you

3. Compose a list of N best users for a certain

product/service

4. Explain to you why these items are recommended to

you

5. Adjust the prediction and recommendation based on

your feedback and other people

Page 36: Recommender Systems

• New items and users come each hour or minute

• The two worlds:– Most songs and books are still interesting for a long time (the tail is really long)

– Most news articles are read on the day and forgotten next day• But tracking back is useful to follow an event or scandal

• Online updating large-scale neighbour-based

systems is NOT easy at all

Task 5: Online updating

Page 37: Recommender Systems

Evaluation

• How do we know the recommendation is

good?

– How good is good?

– Measures should be automated

• Practice: training/testing split (e.g. 80/20)

• Popular criteria

– Prediction error: ZOE, MAE, RMSE

– Hit recall/precision/F-measure, rank utility, ROC curve,

Page 38: Recommender Systems

Evaluation (2)

• Yet little on

– Relevance

– Usefulness

– % Increase in purchase

– % Reduction in cost

– Novelty/surprise/long-tails

– Diversity

– Coverage

– Explainability

Page 39: Recommender Systems

A question: Can we make use of these information sources?

• Blogs

• Social Media

• Online comments

• Online stores

• Review sites

• Locations

• Mobility

Page 40: Recommender Systems

A case-study: Strands

• Services for any online-retailers– Retailers send product, purchase information into Strands server (one retailer per account) through APIs

– Strands returns recommendation for each visitor

• The same logic for social media servers

• moneyStrands for personal financial

management (e.g. investment recommendation)

• MyStrands for music personalization

Page 41: Recommender Systems

Want more practical hints?

• New books:

–Toby Segaran, Programming Collective Intelligence, O'Reilly, 2007

–Satnam Alag, Collective Intelligence in Action, Manning Publications, 2009

• Check out for real deployment:

–TechCrunch– ReadWriteWeb

Page 42: Recommender Systems

Want more state-of-the-arts?

• Research in Recommender Systems is becoming a

mainstream, evidenced from the recent conference

ACM RecSys.

• Other places:– ICWSM: Weblog and Social Media

– WebKDD: Web Knowledge Discovery and Data Mining

– WWW: The original WWW conference

– SIGIR: Information Retrieval

– ACM KDD: Knowledge Discovery and Data Mining

– ICML: Machine Learning

Page 43: Recommender Systems

Questions left to you

• Will you trust such Recommender

Systems?

• Will you implement and deploy it here?

• Will you do research?– PhD scholarships available (as of 19/4/09)

– See http://truyen.vietlabs.com/scholarship.html

– Warning: you are going to waste 3-5 years of your youth life!