Recommender Systems Recommender Systems

12
1 Recommender Systems Information Systems M Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/ Recommender Systems A recommender system (RS) helps people to evaluate the, potentially huge, number of alternatives offered by a Web site In their simplest form RS’s recommend to their users personalized and ranked lists of items Provide consumers with information to help them decide which items to purchase Given a set of users and items (documents, products, …) recommend items to a user based on past behavior of this and other users additional information on users/items A multitude of applications, and a big market! E-commerce Medical applications (e.g., matching patients to doctors) Customer Relationship Management (e.g., matching customer problems to experts) Recommender Systems 2 Sistemi Informativi M

Transcript of Recommender Systems Recommender Systems

Page 1: Recommender Systems Recommender Systems

1

Recommender SystemsInformation Systems M

Prof. Paolo Ciaccia

http://www-db.deis.unibo.it/courses/SI-M/

Recommender Systems

� A recommender system (RS) helps people to evaluate the, potentially huge,

number of alternatives offered by a Web site

� In their simplest form RS’s recommend to their users personalized and ranked lists of items

� Provide consumers with information to help them decide which items to purchase

� Given a set of users and items (documents, products, …) recommend items

to a user based on

� past behavior of this and other users

� additional information on users/items

� A multitude of applications, and a big market!

� E-commerce

� Medical applications (e.g., matching patients to doctors)

� Customer Relationship Management (e.g., matching customer problems to experts)

Recommender Systems 2Sistemi Informativi M

Page 2: Recommender Systems Recommender Systems

2

What book should I buy?

Recommender Systems Sistemi Informativi M 3

What movie should I watch?

• The Internet Movie Database (IMDb) provides information about actors, films, television shows, television stars, video games…

• Owned by Amazon.com since 1998 • 796,328 titles and 2,127,371 people• More than 50M users per month

Recommender Systems 4Sistemi Informativi M

Page 3: Recommender Systems Recommender Systems

3

The Netflix prize (1)

� Netflix is a US online movie rental service

� Over 100K titles and 55 million DVDs total

� A proprietary recommendation system called “Cinematch”

� Approximately 60% of Netflix members select their movies based on movie

recommendations

In October 2006, Netflix announced it would have paid a $1 million to whoever

created a movie-recommending algorithm 10% better than Cinematch

Recommender Systems Sistemi Informativi M 5

12.01.2011

The Netflix prize (2)

� Within two weeks, Netflix received 169 submissions, including three that

were slightly superior to Cinematch

� After a month, more than a thousand programs had been entered, and the

top scorers were almost halfway to the goal

� Three years later, on 21st of September 2009, Netflix announced the winner

Recommender Systems Sistemi Informativi M 6

Page 4: Recommender Systems Recommender Systems

4

What news should I read?

Recommender Systems Sistemi Informativi M 7

Where should I spend my vacation?

Recommender Systems Sistemi Informativi M 8

Page 5: Recommender Systems Recommender Systems

5

Remarkable examples

Recommender Systems Sistemi Informativi M 9

Amazon.com Books, movies, music

CDNOW.com Music

Ebay.com (feedback forms) Anything

Reel.com Movies

Barnes & Noble Books

Method

Systems

JinniTaste

Kid

Nano

crow

d

Clerk

dogs

Critic

kerIMDb

Flixst

er

Movi

elens

Netfli

x

Shaza

m

Pand

ora

LastF

M

YooC

hoos

e

Think

Analy

tics

Itune

s

Amaz

on

Collaborative Filtering v v v v v v v v v v v v

Content-Based

Techniquesv v v v v v v v v v v

Knowledge-Based

Techniquesv v v v v v v

Ontologies and

Semantic Web

Technologies for

Recommender Systems

v v v

Hybrid Techniques v v v v v v v

Context Dependent

Recommender Systemsv v v v v v

Technologies

Recommender Systems Sistemi Informativi M 10

Page 6: Recommender Systems Recommender Systems

6

Inputs to a RS

� Behavior of user in past “transactions”

� which items viewed/purchase

� content/attributes of items

� pages bookmarked

� explicit ratings on items

� Context (used in context-based recommendations)

� what the user appears to be doing now

� Role/domain

� additional info about users, items, …

Recommender Systems 11Sistemi Informativi M

Content-Based Recommendation

� In content-based recommendations the system tries to recommend items

that matches the user profile

� The profile is based on items that the user liked in the past or on explicit

interests that s/he defines

Recommender Systems 12Sistemi Informativi M

New booksUser Profile

Recommender

Systems

Match

Page 7: Recommender Systems Recommender Systems

7

Implementing content-based RS’s

� The basic idea is borrowed from the Vector Space Model

� Each item is characterized by a set of (weighted) features

� Movie: actors, director, title, …

� Weight: use tf.idf

� Also works for “unstructured” data (web pages, docs, etc.)

� The user profile is built using user history

� E.g., a vector representing the relevance of features/keywords for that user

� Either implicit or explicit “rating of features” (or both)

� Cosine similarity can be used to match the user profile with an item vector

Recommender Systems Sistemi Informativi M 13

Pros and cons of content-based RS’s

� Able to recommend new and unpopular items

� No need for data on other users

� Can provide explanations of recommended items

� Limited content analysis

� Not always easy to find the appropriate features to use

� Overspecialization

� Can only recommend items similar to previously seen/rated ones

� Further, items too similar to some the user already knows might not be of interest (e.g., news articles)

� New users

� How to build a profile?

Recommender Systems Sistemi Informativi M 14

Page 8: Recommender Systems Recommender Systems

8

Collaborative filtering (CF)

� Unlike content-based recommendation methods, CF recommender systems

try to predict the utility of items for a particular user based on the items

previously rated by other users

� Two basic variants of CF:

User-based: To predict a user’s opinion for an item, use the opinion of similar

users, where similarity between users depends on their opinions for other

items

Item-based: as in content-based RS’s, the assumption is that a user is likely to

have the same opinion for similar items; however, now similarity between

items depends on how other users have rated them

Recommender Systems Sistemi Informativi M 15

User-based CF

Recommender Systems Sistemi Informativi M 16

Item 1 Item 2 Item 3 Item 4 Item 5

User 1 8 1?

2 7

User 2 2?

5 7 5

User 3 5 4 7 4 7

User 4 7 1 7 3 8

User 5 1 7 4 6 5

User 6 8 3 8 3 7

Page 9: Recommender Systems Recommender Systems

9

Similarity between users: simple way

� Only consider items both users have rated

� For each item, compute the difference in the users’ ratings

� If Item j has been rated by both User 1 and User 2:

| rating (User 1, Item j) – rating (User 2, Item j) |

� Take the average of these differences over all common items

Recommender Systems Sistemi Informativi M 17

Item 1 Item 2 Item 3 Item 4 Item 5

User 1 8 1 ? 2 7

User 2 2 ? 5 7 5

Similarity between users: more realistic

� Can use either all items or only those rated by both users

� We have a user-item matrix R of ratings, where ra,i is the rating of user a for

item I, and is the average rating of user a

� Two major alternatives for measuring the similarity between users:

Recommender Systems Sistemi Informativi M 18

∑∑

−−

−−

=

i

2

bib,

i

2

aia,

i

bib,aia,

)r(r)r(r

)r)(rr(r

b)sim(a,

ar

Pearson correlation

Cosine

∑∑

∑=

i

2

ib,

i

2

ia,

i

ib,ia,

rr

rr

b)sim(a,

Page 10: Recommender Systems Recommender Systems

10

Rating prediction and recommendation

� To predict the rating ra,i for the (target) user a and item i, a weighted sum

can be used:

� Rather than considering all the users, only the k most similar to user a can

be used

� Based on rating predictions, the top-N items can be recommended to user a

Recommender Systems Sistemi Informativi M 19

iu,

u

ia, ru)sim(a,r ×=∑

5

4

7 7

8

weighted sum

Problems with user-based CF

� User Cold-Start problem

� Not enough is known about new user to decide who is similar

� Sparsity of the rating matrix

� With large item sets, users will have rated only some of the items(makes it hard to find similar users)

� With 2M books, rating 2K of them is only 0.1%

� Scalability

� With millions of users and items, computations become slow

� Item Cold-Start problem

� Cannot predict ratings for a new item until some users have rated it

� Also a problem with “esoteric” items

� Popularity bias

� Cannot recommend items to a user with unique tastes

Recommender Systems Sistemi Informativi M 20

Page 11: Recommender Systems Recommender Systems

11

Item-based CF

� Pearson correlation (or cosine) is now used to measure the similarity of

items

� Still based on ratings, not on items’ content!

Recommender Systems Sistemi Informativi M 21

∑∑

−−

−−

=

u

2

jju,

u

2

iiu,

u

jju,iiu,

)r(r)r(r

)r)(rr(r

j)sim(i,

Pearson correlation

Cosine

∑∑

∑=

u

2

ju,

u

2

iu,

u

ju,iu,

rr

rr

j)sim(i,

Generating predictions

� As with user-based CF, can use all items or only the k most similar ones

Recommender Systems Sistemi Informativi M 22

∑ ×

=

j

ja,

j

ia,j)sim(i,

rj)sim(i,

r

Item

3

2

18

7

Weighted sum

Item

5

Item

4

Item

2

Item

1

Page 12: Recommender Systems Recommender Systems

12

Problems with item-based CF

� Item Cold-Start problem

� This is a major problem here

Recommender Systems Sistemi Informativi M 23

Important Issues

� Cold Start, Implicit/Explicit Rating, Sparsity, Portfolio Effect (non diversity

problem), Security, Privacy, …

� A lot of work exists on RS‘s, and many other alternatives have been

developed

� Hybrid RS‘s

� Model-based CF

� Develop a model of user ratings (probabilistic, based on clustering, etc.)

� Context-based RS‘s

� Vary the predictions depending on user context

� …

� See also the survey [AT05] on the web site

Recommender Systems 24Sistemi Informativi M