Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an...
-
date post
19-Dec-2015 -
Category
Documents
-
view
221 -
download
3
Transcript of Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an...
![Page 1: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/1.jpg)
COLLABORATIVE FITLERING
![Page 2: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/2.jpg)
Rubi’s Motivation for CF
Find a PhD problem
Find “real life” PhD problem
Find an interesting PhD problem
Make Money!
![Page 3: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/3.jpg)
Recommender Systems
Basic implementations: Most popular / cheap / etc. New items Can they go shopping together?
![Page 4: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/4.jpg)
Live Demonstrations
Amazon
NetflixXBOX360 usage:http://www.youtube.com/watch?v=IitD0hdOCvA
![Page 5: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/5.jpg)
Netflix Example
![Page 6: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/6.jpg)
Netflix Example
![Page 7: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/7.jpg)
Netflix Prize
Goal: Improve the accuracy of predictions about how much someone is going to love a movie by 10%
Started at 2006 (Max until 2011)
Prize: $1,000,000
September 2009 - 10.06%!! by Bellkor
![Page 8: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/8.jpg)
Recommender Systems
Personalized Recommendations!!!
Predicts user rating Provide Recommendations
Attempt to profile user preferences
Model interaction between users and product
![Page 9: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/9.jpg)
Recommender Systems
Requirements: Provide good recommendations (daaaa)
Justify the recommendation
Feasible in Run-Time
![Page 10: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/10.jpg)
Strategies
Content-Based
Collaborative Filtering (CF)
![Page 11: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/11.jpg)
Content-Based
Actors:Will Smith, Martin…
Genre:Action / Comedy
Director:Michael Bay
![Page 12: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/12.jpg)
Content-Based - VSM
Domain of Features
Describing Vector
0
1
0
0
1
0
1
1
Will Smith
Michael Bay
Action
Comedy
Pamela Anderson
![Page 13: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/13.jpg)
Comparing Two Vectors
Calculate the angle between the vectors
Easier to calculate the cosine
||||||||cos
21
21
vv
vv
![Page 14: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/14.jpg)
VSM – “near” vectors
- Michael Bay - Action
- Will Smith - Comedy
![Page 15: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/15.jpg)
Content-Based - Disadvantages
Static
Can’t find “special” correlations
Requires gathering external information
![Page 16: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/16.jpg)
Collaborative Filtering
Relies just on users behavior
No profiles are required
Analyzes the relationships between users and items
![Page 17: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/17.jpg)
CF - Levels
Neighborhood Based(local area)
Factorization Based(regional area)
![Page 18: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/18.jpg)
CF – Neighborhood Based
![Page 19: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/19.jpg)
CF – Neighborhood Based
![Page 20: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/20.jpg)
CF – Neighborhood Based
![Page 21: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/21.jpg)
CF – Neighborhood Based
![Page 22: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/22.jpg)
CF – Neighborhood Based
![Page 23: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/23.jpg)
CF – Neighborhood Based
CF Algorithms
![Page 24: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/24.jpg)
Little more formally
Missing value estimation
User-Item matrix of scores
Predict unknown scores within the matrix
![Page 25: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/25.jpg)
Scores??
According to: Purchases
Rating
Browsing history
…
![Page 26: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/26.jpg)
Formally..
M (|M|=m) users
N (|N|=n) items
R mXn matrix
ru,i the rating of user u of item i
![Page 27: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/27.jpg)
More Problems
Massive amount of Data
99% of the matrix R is unknown(sparse matrix)
Data is NOT uniform across users & items
![Page 28: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/28.jpg)
Netflix Real-Life Data
17,700 Movies
480,000 Users
(rating in a scale of 1-5)
Over 100,000,000 Ratings!!
![Page 29: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/29.jpg)
Netflix – How to Win??
Quality is measured by RMSE(more emphasis on large errors)
Predict unknown 1,400,000 rating and compare them to real rating
Improve Netflix’s system (Cinematch) by 10%
![Page 30: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/30.jpg)
Netflix – How to Win??
RMSE
||
)ˆ(),(
2,,
TestSet
rr
RMSE TestSetiuiuiu
![Page 31: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/31.jpg)
Netflix – Leaderboard
![Page 32: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/32.jpg)
Netflix – Statistics
51,051 contestants, 41,305 teams
186 countries
44,014 valid submissions from 5169 different teams
![Page 33: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/33.jpg)
OK, so what's the plan?
Find a “good” neighborhoodhttp://www.youtube.com/watch?v=XOw-ak2aJS8
(p.s. what about YouTube's related videos?)
Take a weighted average on the neighbors rate
![Page 34: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/34.jpg)
More Specifically
User-Based: N(u;i) – set of users who rate similarly to
u and actually rated i
);( ,
);( ,,
,
iuNv vu
iuNv ivvu
iu s
rsr
![Page 35: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/35.jpg)
Su,v
Key role! Used for: Selecting N(u;i) Weighting
Most popular implementations: Pearson correlation coefficient Cosine similarity
![Page 36: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/36.jpg)
Pearson correlation coefficient
I(u,v) – Set of all items rated by both u and v
),(
2,),(
2,
),( ,,
,)()(
))((
vuIk vkvvuIk uku
vuIk vkvuku
vurrrr
rrrrs
![Page 37: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/37.jpg)
N(u;i)
Most popular / easiest ways: Correlation Threshold Best – n – neighbors
What about external data?
![Page 38: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/38.jpg)
Social Networks!
![Page 39: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/39.jpg)
Social Networks, Hot Topics
MySpace
Delicious
Flicker
![Page 40: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/40.jpg)
Quick Summary
Two main parameters: How to choose the neighbors
How to choose the weights
![Page 41: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/41.jpg)
What about performance?Netflix Data: N = 17,700 M = 480,000
Calculating N(u;i) is expensive
M >> N
![Page 42: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/42.jpg)
Item-Based
Instead of “users” neighbors, “items” neighbors
Estimate using known rating made by the user on similar items
![Page 43: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/43.jpg)
More Specifically
Item-Based: N(i;u) – set of items who other users
rate similar to i. Similarly, all items needs to be rated by u as well
);( ,
);( ,,
,
uiNj ji
uiNj juji
iu s
rsr
![Page 44: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/44.jpg)
Reminder..
User-Based: N(u;i) – set of users who rate similarly to
u and actually rated i
);( ,
);( ,,
,
iuNv vu
iuNv ivvu
iu s
rsr
![Page 45: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/45.jpg)
Why is it better?
Similarities is between Items (not Users) Pre-compute all Si,j
Provide better recommendations?
Easier Justification
Most industry systems use it (Amazon)
![Page 46: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/46.jpg)
Checkpoint
We know the basics
Can we “Tweak” the basic algorithm?
![Page 47: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/47.jpg)
“Tweaks” - Normalized Data Some rate 3 and some 5 for movies they
liked
Old solution: normalize the dataset
New solution: predict the change from the average rating instead of the rating
![Page 48: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/48.jpg)
“Tweaks” - Remove Global Effects
A user rates 5 all the times
A user rated 10,000 movies
Remove old rating?
Using the Time variable is not “Tweak”..
![Page 49: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/49.jpg)
TAU’s Current Research
Distributed CF!!!
“Server” level
![Page 50: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/50.jpg)
Distributed CF
![Page 51: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/51.jpg)
Distributed CF
![Page 52: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/52.jpg)
Distributed CF
![Page 53: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/53.jpg)
Distributed CF
![Page 54: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/54.jpg)
Distributed CF
![Page 55: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/55.jpg)
Distributed CF
![Page 56: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/56.jpg)
Distributed CF
![Page 57: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/57.jpg)
Distributed CF
?
?
![Page 58: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/58.jpg)
Shared Users
![Page 59: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/59.jpg)
Shared Users
![Page 60: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/60.jpg)
Shared Items
![Page 61: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/61.jpg)
Shared Items
![Page 62: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/62.jpg)
How To Do It????
Copy all data to one server?
CF algorithm do not scale linear Privacy Bandwidth
![Page 63: Rubi’s Motivation for CF Find a PhD problem Find “real life” PhD problem Find an interesting PhD problem Make Money!](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a058bf/html5/thumbnails/63.jpg)
TAU’s Solution
Join TAU’s DB group for more info