s07 Filtering Class
-
Upload
arockiaruby-ruby -
Category
Documents
-
view
228 -
download
0
Transcript of s07 Filtering Class
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 1/30
Filtering and
Recommender
Systems
Content-based
and
Collaborative
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 2/30
Personalization
• Recommenders are instances ofpersonalization software.
• Personalization concerns adapting to theindividual needs, interests, and preferences of
each user.• Includes:
– Recommending
– Filtering
– Predicting (e.g. form or calendar appt. completion)
• From a business perspective, it is viewed aspart of Customer Relationship Management(CRM).
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 3/30
Feedback &
Prediction/Recommendation
• Traditional IR has a single user—probablyworking in single-shot modes
– Relevance feedback…
• WEB search engines have: – Working continually
• User profiling – Profile is a ―model‖ of the user
• (and also Relevance feedback) – Many users
• Collaborative filtering – Propagate user preferences to other
users…
You know this one
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 4/30
Recommender Systems in Use
• Systems for recommending items (e.g. books,movies, CD’s, web pages, newsgroup
messages) to users based on examples of
their preferences.
• Many on-line stores providerecommendations (e.g. Amazon, CDNow).
• Recommenders have been shown tosubstantially increase sales at on-line stores.
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 5/30
Feedback Detection
– Click certain pages incertain order while ignoremost pages.
– Read some clicked pageslonger than some otherclicked pages.
– Save/print certain clickedpages.
– Follow some links inclicked pages to reachmore pages.
– Buy items/Put them inwish-lists/Shopping Carts
– Explicitly ask users torate items/pages
Non-Intrusive Intrusive
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 6/30
Justifying Recommendation..
• Recommendation systems must justify theirrecommendations – Even if the justification is bogus..
– For search engines, the ―justifications‖ are the pagesynopses
• Some recommendation algorithms are better atproviding human-understandable justifications thanothers – Content-based ones can justify in terms of classifier
features.. – Collaborative ones are harder-pressed other than saying
―people like you seem to like this stuff‖ – In general, giving good justifications is important..
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 7/30
Content/Profile-basedRed
Mars
Juras-
sic
Park
Lost
World
2001
Foundation
Differ-ence
Engine
Machine
Learning
User
Profile
Neuro-
mancer 2010
Collaborative Filtering
A 9
B 3
C: :
Z 5
A
B
C 9: :
Z 10
A 5
B 3
C: :
Z 7
A
B
C 8: :
Z
A 6
B 4
C: :
Z
A 10
B 4
C 8. .
Z 1
User
Database
Active
User
Correlation
Match
A 9
B 3
C. .
Z 5
A 9
B 3
C
: :
Z 5
A 10
B 4
C 8
. .
Z 1
Extract
RecommendationsC
Content-based vs. Collaborative
Recommendation
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 8/30
Collaborative Filtering
A 9
B 3
C
: :Z 5
A
B
C 9
: :Z 10
A 5
B 3
C
: :Z 7
A
B
C 8
: :Z
A 6
B 4
C
: :Z
A 10
B 4
C 8
. .Z 1
User
Database
Active
User
Correlation
Match
A 9
B 3
C
. .Z 5
A 9
B 3
C
: :
Z 5
A 10
B 4
C 8
. .
Z 1
Extract
RecommendationsC
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 9/30
Item-User Matrix
• The input to the collaborative filteringalgorithm is an mxn matrix where rows areitems and columns are users
– Sort of like term-document matrix (items are termsand documents are users)
• Can think of users as vectors in the space ofitems (or vice versa)
– Can do vector similarity between users• And find who are most similar users..
– Can do scalar clusters over items etc..• And find what are most correlated items
Th i nk u s e r s d o c s
I t e m
s k e y w or d s
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 10/30
A Collaborative Filtering Method
(think kNN regression)• Weight all users with respect to similarity with the
active user. – How to measure similarity?
• Could use cosine similarity; normally pearson coefficient isused
• Select a subset of the users (neighbors) to use aspredictors.
• Normalize ratings and compute a prediction from aweighted combination of the selected neighbors’
ratings.• Present items with highest predicted ratings as
recommendations.
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 11/30
3/27
Homework 2 Solns postedMidterm on Thursday in class
Covers everything covered by the first twohomeworksQns??
Today
Complete Filtering
Discuss Das/Datar paper
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 12/30
Finding User Similarity with Person
Correlation Coefficient• Typically use Pearson correlation coefficient
between ratings for active user, a, and anotheruser, u.
ua r r
uaua
r r c
),(covar ,
r a and r u are the ratings vectors for the m items rated by
both a and u
r i,j is user i’s rating for item jm
r r r r
r r
m
i uiuaia
ua
1 ,,
))((
),(covar
m
r r m
i
xi x
r x
1
2
, )(
m
r
r
m
i
i x
x
1
,
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 13/30
Neighbor Selection
• For a given active user, a, select correlatedusers to serve as source of predictions.
• Standard approach is to use the most similark users, u, based on similarity weights, w a,u
• Alternate approach is to include all userswhose similarity weight is above a giventhreshold.
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 14/30
Rating Prediction
• Predict a rating, pa,i , for each item i , for active user,a, by using the k selected neighbor users,
u {1,2,…k }.
• To account for users different ratings levels, basepredictions on differences from a user’s average rating.
• Weight users’ ratings contribution by their
similarity to the active user.
n
u
ua
n
u
uiuua
aia
w
r r w
r p
1
,
1
,,
,
||
)(
ri,j is user i’s rating for item j
ua r r
uaua
r r c
),(covar ,
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 15/30
Similarity Weighting=User
Similarity• Typically use Pearson correlation coefficient
between ratings for active user, a, and anotheruser, u.
ua r r
uaua
r r c
),(covar ,
r a and r u are the ratings vectors for the m items rated by
both a and u
r i,j is user i’s rating for item jm
r r r r
r r
m
i uiuaia
ua
1 ,,
))((
),(covar
m
r r m
i
xi x
r x
1
2
, )(
m
r
r
m
i
i x
x
1
,
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 16/30
Significance Weighting
• Important not to trust correlations basedon very few co-rated items.
• Include significance weights, sa,u, basedon number of co-rated items, m.
uauaua c sw ,,,
50if 50
50if 1
, mm
m s ua
ua r r
uaua
r r c
),(covar ,
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 17/30
Problems with Collaborative Filtering
• Cold Start: There needs to be enough other usersalready in the system to find a match.
• Sparsity: If there are many items to be recommended,even if there are many users, the user/ratings matrix issparse, and it is hard to find users that have rated the
same items.• First Rater: Cannot recommend an item that has not
been previously rated.
– New items
– Esoteric items• Popularity Bias: Cannot recommend items to
someone with unique tastes.
– Tends to recommend popular items.
• WHAT DO YOU MEAN YOU DON’T CARE FOR BRITNEYSPEARS YOU DUNDERHEAD? #$%$%$&^
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 18/30
Content-Based Recommending
• Recommendations are based oninformation on the content of items
rather than on other users’ opinions. • Uses machine learning algorithms to
induce a profile of the users preferencesfrom examples based on a featuraldescription of content.
• Lots of systems
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 19/30
Adapting Naïve Bayes idea for Book
Recommendation• Vector of Bags model
– E.g. Books have severaldifferent fields that are alltext
• Authors, description, … • A word appearing in one
field is different from thesame word appearing inanother
– Want to keep each bagdifferent—vector of m Bags;Conditional probabilities for
each word w.r.t each classand bag
• Can give a profile of auser in terms of wordsthat are most predictiveof what they like – Odds Ratio
P(rel|example)/P(~rel|example)
An example is positive if theodds ratio is > 1
– Strengh of a keyword• Log[P(w|rel)/P(w|~rel)]
– We can summarize auser’s profile in terms ofthe words that havestrength above somethreshold.
S
m
dm
i
mi smcja P
Book P
cj P Book cj P
1
||
1
),|(
)(
)()|(
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 20/30
Advantages of Content-Based
Approach• No need for data on other users. – No cold-start or sparsity problems.
• Able to recommend to users with unique tastes.
• Able to recommend new and unpopular items – No first-rater problem.
• Can provide explanations of recommended items
by listing content-features that caused an item tobe recommended.
• Well-known technology The entire field ofClassification Learning is at (y)our disposal!
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 21/30
Disadvantages of Content-Based
Method• Requires content that can be encoded as
meaningful features.
• Users’ tastes must be represented as alearnable function of these content features.
• Unable to exploit quality judgments of otherusers.
– Unless these are somehow included in the contentfeatures.
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 22/30
Content-Boosted CF - I
Content-Based
Predictor
Training Examples
Pseudo User-ratings Vector
Items with Predicted Ratings
User-ratings Vector
User-rated Items
Unrated Items
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 23/30
Content-Boosted CF - II
• Compute pseudo user ratings matrix – Full matrix – approximates actual full user ratings matrix
• Perform CF – Using Pearson corr. between pseudo user-rating vectors
• This works better than either!
User Ratings
Matrix
Pseudo User
Ratings Matrix
Content-Based
Predictor
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 24/30
Why can’t the pseudo ratings be
used to help content-based filtering?• How about using the pseudo ratings to improve a content-based filter
itself? (or how access to unlabelled examples improves accuracy…) – Learn a NBC classifier C0 using the few items for which we have user
ratings – Use C0 to predict the ratings for the rest of the items
– Loop• Learn a new classifier C1 using all the ratings (real and predicted)• Use C1 to (re)-predict the ratings for all the unknown items
– Until no change in ratings
• With a small change, this actually works in finding a better classifier! – Change: Keep the class posterior prediction (rather than just the max class)
• This means that each (unlabelled) entity could belong to multiple classes—with
fractional membership in each• We weight the counts by the membership fractions
– E.g. P(A=v|c) = Sum of class weights of all examples in c that have A=v divided by Sumof class weights of all examples in c
• This is called expectation maximization – Very useful on web where you have tons of data, but very little of it is
labelled – Reminds you of K-means, doesn’t it?
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 25/30
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 26/30
(boosted) content filtering
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 27/30
Co-training• Suppose each instance has two parts:
x = [x1, x2]
x1, x2 conditionally independent given f(x)
• Suppose each half can be used to classify instance
f1, f2 such that f1(x1) = f2(x2) = f(x)
• Suppose f1, f2 are learnable
f1 H1, f2 H2, learning algorithms A1, A2
Unlabeled Instances
[x1, x2]
Labeled Instances
<[x1, x2], f1(x1)>A1 f2
Hypothesis
~A2
Small labeled data needed
You train me —I train you…
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 28/30
Observations
• Can apply A1 to generate as much trainingdata as one wants
– If x1 is conditionally independent of x2 / f(x),
– then the error in the labels produced by A1 – wil l look l ike random noise to A2 ! ! !
• Thus no limit to quality of the hypothesisA2 can make
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 29/30
8/13/2019 s07 Filtering Class
http://slidepdf.com/reader/full/s07-filtering-class 30/30
Discussion of the Google News
Collaborative Filtering Paper