Preliminaries
Data Mining
1
The art of extracting knowledge from large bodies of structured data.
Let’s put it to use!
Preliminaries
Recommendations
2
Basic Recommendations with Collaborative Filtering
Preliminaries
Making Recommendations
4
Preliminaries
The Netflix Prize (2006-2009)
5
Preliminaries
The Netflix Prize (2006-2009)
6
Preliminaries
What was the Netflix Prize?
• In October, 2006 Netflix released a dataset containing 100 million anonymous movie ratings and challenged the data mining, machine learning, and computer science communities to develop systems that could beat the accuracy of its recommendation system, Cinematch.
• Thus began the Netflix Prize, an open competition for the best collaborative filtering algorithm to predict user ratings for films, solely based on previous ratings without any other information about the users or films.
7
Preliminaries
The Netflix Prize Datasets
• Netflix provided a training dataset of 100,480,507 ratings that 480,189 users gave to 17,770 movies.
– Each training rating (or instance) is of the form user,movie, data of rating, rating .
– The user and movie fields are integer IDs, while ratings are from 1 to 5 (integral) stars.
8
Preliminaries
The Netflix Prize Datasets
• The qualifying dataset contained over 2,817,131 instances of the form user,movie, date of rating , with ratings known only to the jury.
• A participating team’s algorithm had to predict grades on the entire qualifying set, consisting of a validation and test set.
– During the competition, teams were only informed of the score for a validation or quiz set of 1,408,342 ratings.
– The jury used a test set of 1,408,789 ratings to determine potential prize winners.
9
Preliminaries
The Netflix Prize Data
10
1 2 . . 𝑚
1 5 2 5 4
2 2 5 3
.
. 2 2 4 2
𝑛 5 1 5 ?
Users
Movie Ratings
Preliminaries
The Netflix Prize Data
11
1 2 . . 𝑚
1 5 2 5 4
2 2 5 3
.
. 2 2 4 2
𝑛 5 1 5 ?
Instances (samples, examples,
observations)
Movie Ratings
Preliminaries
The Netflix Prize Data
12
1 2 . . 𝑚
1 5 2 5 4
2 2 5 3
.
. 2 2 4 2
𝑛 5 1 5 ?
Users
Features (attributes, dimensions)
Preliminaries
The Netflix Prize Goal
13
Star Wars
Hoop Dreams
Contact Titanic
Joe 5 2 5 4
John 2 5 3
Al 2 2 4 2
Everaldo 5 1 5 ?
Movie Ratings
Users
Goal: Predict ? (a movie rating) for a user
Preliminaries
The Netflix Prize Methods
14
Bennett, James, and Stan Lanning. "The Netflix Prize." Proceedings of KDD Cup and Workshop. Vol. 2007. 2007.
Preliminaries
The Netflix Prize Methods
15
We will discuss these methods now.
We will discuss these methods by the end of the course.
Preliminaries
Raw Averages
User average: Simply assign the average rating given by user 𝑟𝑢∈𝑈 , where 𝑈 is the set of all users.
Item average: Simply assign 𝑟𝑢,𝑖𝑖∈𝐼 , where 𝐼 is the set of all
items and 𝑟𝑢,𝑖 is the rating given to item 𝑖 by user 𝑢.
16
Preliminaries
Raw Averages
User average: Simply assign the average rating given by user 𝑟𝑢∈𝑈 , where 𝑈 is the set of all users.
Item average: Simply assign 𝑟𝑢,𝑖𝑖∈𝐼 , where 𝐼 is the set of all
items and 𝑟𝑢,𝑖 is the rating given to item 𝑖 by user 𝑢.
17
What about universally good or bad movies? Or skewed rating systems?
Preliminaries
Bayesian Method
Apply Bayes’ Theorem: Of the ratings 𝑟 ∈ 𝑅 a user could give for a movie, assign the highest value:
18
𝑃 𝑟 𝑖 =𝑃 𝑖 𝑟 𝑃 𝑟
𝑃 𝑖
where 𝑃 𝑟 𝑖 is the (conditional) probability of rating 𝑟 given item 𝑖, 𝑃 𝑖 𝑟 is the (conditional) probability of item 𝑖 given rating 𝑟, 𝑃 𝑟 is the (prior) probability of rating 𝑟, and 𝑃 𝑖 is the (prior) probability of item 𝑖.
Preliminaries
Bayesian Method
Apply Bayes’ Theorem: Of the ratings 𝑟 ∈ 𝑅 a user could give for a movie, assign the highest value:
19
𝑃 𝑟 𝑖 =𝑃 𝑖 𝑟 𝑃 𝑟
𝑃 𝑖
where 𝑃 𝑟 𝑖 is the (conditional) probability of rating 𝑟 given item 𝑖, 𝑃 𝑖 𝑟 is the (conditional) probability of item 𝑖 given rating 𝑟, 𝑃 𝑟 is the (prior) probability of rating 𝑟, and 𝑃 𝑖 is the (prior) probability of item 𝑖.
But this method still doesn’t account for the similarity between users.
Preliminaries
Cute Kitten Picture Intermission
20
Preliminaries
Key to Collaborative Filtering
Common insight: personal tastes are correlated
If Alice and Bob both like X and Alice likes Y, then Bob is more likely to like Y, especially (perhaps) if Bob knows Alice.
21
Preliminaries
Collaborative Filtering
Collaborative filtering (CF) systems work by collecting user feedback in the form of ratings for items in a given domain and exploiting similarities in rating behavior amongst several users in determining how to recommend an item
22
Preliminaries
Collaborative Filtering Dataset
23
1 2 . . 𝑚
1 5 2 5 4
2 2 5 3
.
. 2 2 4 2
𝑛 5 1 5 ?
Users
Items
Goal: Predict ? (an item) for 𝑛 (a user)
Preliminaries
Types of Collaborative Filtering
24
Neighborhood- or Memory-based
Model-based
Hybrid
2
3
1
Preliminaries
Types of Collaborative Filtering
25
Neighborhood- or Memory-based
2
3
1 We’ll talk about this type now.
Preliminaries
Neighborhood-based CF
A subset of users are chosen based on their similarity to the active users, and a weighted combination of their ratings is used to produce predictions for this user.
26
Preliminaries
Neighborhood-based CF
It has three steps:
27
1 Assign a weight to all users with respect to similarity with the active user
2
3
Select k users that have the highest similarity with the active user—commonly called the neighborhood.
Compute a prediction from a weighted combination of the selected neighbors’ ratings.
Preliminaries
Neighborhood-based CF
In step 1, the weight 𝑤𝑎,𝑢 is a measure of similarity
between the user 𝑢 and the active user 𝑎. The most commonly used measure of similarity is the Pearson correlation coefficient between the ratings of the two users:
28
Step 1
𝑤𝑎,𝑢 = 𝑟𝑎,𝑖 − 𝑟 𝑎 𝑟𝑢,𝑖 − 𝑟 𝑢𝑖∈𝐼
𝑟𝑎,𝑖 − 𝑟 𝑎2 𝑟𝑢,𝑖 − 𝑟 𝑢
2
𝑖∈𝐼𝑖∈𝐼
where 𝐼 is the set of items rated by both users, 𝑟𝑢,𝑖 is the rating given to item 𝑖 by user 𝑢, and 𝑟 𝑢 is the mean rating given by user 𝑢.
Preliminaries
Neighborhood-based CF
In step 2, some sort of threshold is used on the similarity score to determine the “neighborhood.”
29
Step 2
Preliminaries
Neighborhood-based CF
In step 3, predictions are generally computed as the weighted average of deviations from the neighbor’s mean, as in:
30
Step 3
𝑝𝑎,𝑖 = 𝑟 𝑎 = 𝑟𝑢,𝑖 − 𝑟 𝑢 × 𝑤𝑎,𝑢𝑢∈𝐾
𝑤𝑎,𝑢𝑢∈𝐾
where 𝑝𝑎,𝑖 is the prediction for the active user 𝑎 for item 𝑖, 𝑤𝑎,𝑢 is the similarity between users 𝑎 and 𝑢, and 𝐾 is the neighborhood or set of most similar users.
Preliminaries
Neighborhood-base CF
• Common Problems: – The search for similar users has high computational
complexity, causing conventional neighborhood-based CF algorithms to not scale well.
– It is common for the active user to have highly correlated neighbors that are based on very few co-rated (overlapping) items, which often result in bad predictors.
– When measuring the similarity between users, items that have been rated by all (and universally liked or disliked) are not as useful as less common items. 31
Preliminaries
Item-to-Item Matching
• An extension to neighborhood-based CF.
• Addresses the problem of high computational complexity of searching for similar users.
• The idea:
32
Rather than matching similar users, match a user’s rated items to similar items.
Preliminaries
Item-to-Item Matching
In this approach, similarities between pairs of items 𝑖 and 𝑗 are computed off-line using Pearson correlation, given by:
33
𝑤𝑖,𝑗 = 𝑟𝑢,𝑖 − 𝑟 𝑖 𝑟𝑢,𝑗 − 𝑟 𝑗𝑢∈𝑈
𝑟𝑢,𝑖 − 𝑟 𝑖2 𝑟𝑢,𝑗 − 𝑟 𝑗
2
𝑢∈𝑈𝑢∈𝑈
where 𝑈 is the set of all users who have rated both items 𝑖 and 𝑗, 𝑟𝑢,𝑖 is the rating of user 𝑢 on item 𝑖, and 𝑟 𝑖 is the average rating of
the 𝑖th item across users.
Preliminaries
Item-to-Item Matching
Now, the rating for item 𝑖 for user 𝑎 can be predicted using a simple weighted average, as in:
34
𝑝𝑎,𝑖 = 𝑟𝑢,𝑖𝑤𝑖,𝑗𝑗∈𝐾
𝑤𝑖,𝑗𝑗∈𝐾
where 𝐾 is the neighborhood set of the 𝑘 items rated by 𝑎 that are most similar to 𝑖.
Preliminaries
Significance Weighting
• Another extension to neighborhood-based CF.
• Addresses the problem of bad predictors generated by active user to have highly correlated neighbors that are based on very few co-rated (overlapping) items.
• The idea:
35
Multiply the similarity weight by a significance weighting factor, which devalues the correlations
based on a few co-rated items.
Preliminaries
Inverse User Frequency
• Yet another extension to neighborhood-based CF.
• Addresses the problem of the dominance of items that have been rated by all (and universally liked or disliked), yet are not as useful as less common items.
• The idea:
36
Weight an item rating by the inverse of the frequency that item is rated.
Preliminaries
Inverse User Frequency
When measuring the similarity between users, items that have been rated by all (and universally liked or disliked) are not as useful as less common items. To account for this, compute
37
𝑓𝑖 = log𝑛 𝑛𝑖
where 𝑛𝑖 is the number of users who have rated item 𝑖 out of the total number of 𝑛 users. To apply inverse user frequency while using similarity-based CF, the original rating is transformed for 𝑖 by multiplying it by the factor 𝑓𝑖 .
Preliminaries
And Now…
38
Let’s run “the data mining” on some data!
Preliminaries
References
• Prem Melville and Vikas Sindhwani. Recommender Systems. In Encyclopedia of Machine Learning, Claude Sammut and Geoffrey Webb (Eds), Springer, 2010.
39
Top Related