s07 Filtering Class

8/13/2019 s07 Filtering Class

http://slidepdf.com/reader/full/s07-filtering-class 1/30

Filtering and

Recommender

Systems

Content-based

and

Collaborative



Personalization

• Recommenders are instances ofpersonalization software.

• Personalization concerns adapting to theindividual needs, interests, and preferences of

each user.• Includes:

– Recommending

– Filtering

– Predicting (e.g. form or calendar appt. completion)

• From a business perspective, it is viewed aspart of Customer Relationship Management(CRM).



Feedback &

Prediction/Recommendation

• Traditional IR has a single user—probablyworking in single-shot modes

– Relevance feedback…

• WEB search engines have: – Working continually

• User profiling – Profile is a ―model‖ of the user

• (and also Relevance feedback) – Many users

• Collaborative filtering – Propagate user preferences to other

users…

You know this one



Recommender Systems in Use

• Systems for recommending items (e.g. books,movies, CD’s, web pages, newsgroup

messages) to users based on examples of

their preferences.

• Many on-line stores providerecommendations (e.g. Amazon, CDNow).

• Recommenders have been shown tosubstantially increase sales at on-line stores.



Feedback Detection

– Click certain pages incertain order while ignoremost pages.

– Read some clicked pageslonger than some otherclicked pages.

– Save/print certain clickedpages.

– Follow some links inclicked pages to reachmore pages.

– Buy items/Put them inwish-lists/Shopping Carts

– Explicitly ask users torate items/pages

Non-Intrusive Intrusive



Justifying Recommendation..

• Recommendation systems must justify theirrecommendations – Even if the justification is bogus..

– For search engines, the ―justifications‖ are the pagesynopses

• Some recommendation algorithms are better atproviding human-understandable justifications thanothers – Content-based ones can justify in terms of classifier

features.. – Collaborative ones are harder-pressed other than saying

―people like you seem to like this stuff‖ – In general, giving good justifications is important..



Content/Profile-basedRed

Mars

Juras-

sic

Park

Lost

World

2001

Foundation

Differ-ence

Engine

Machine

Learning

User

Profile

Neuro-

mancer 2010

Collaborative Filtering

A 9

B 3

C: :

Z 5

A

B

C 9: :

Z 10

A 5

B 3

C: :

Z 7

A

B

C 8: :

Z

A 6

B 4

C: :

Z

A 10

B 4

C 8. .

Z 1

User

Database

Active

User

Correlation

Match

A 9

B 3

C. .

Z 5

A 9

B 3

C

: :

Z 5

A 10

B 4

C 8

. .

Z 1

Extract

RecommendationsC

Content-based vs. Collaborative

Recommendation



Collaborative Filtering

A 9

B 3

C

: :Z 5

A

B

C 9

: :Z 10

A 5

B 3

C

: :Z 7

A

B

C 8

: :Z

A 6

B 4

C

: :Z

A 10

B 4

C 8

. .Z 1

User

Database

Active

User

Correlation

Match

A 9

B 3

C

. .Z 5

A 9

B 3

C

: :

Z 5

A 10

B 4

C 8

. .

Z 1

Extract

RecommendationsC



Item-User Matrix

• The input to the collaborative filteringalgorithm is an mxn matrix where rows areitems and columns are users

– Sort of like term-document matrix (items are termsand documents are users)

• Can think of users as vectors in the space ofitems (or vice versa)

– Can do vector similarity between users• And find who are most similar users..

– Can do scalar clusters over items etc..• And find what are most correlated items

Th i nk u s e r s d o c s

I t e m

s k e y w or d s



A Collaborative Filtering Method

(think kNN regression)• Weight all users with respect to similarity with the

active user. – How to measure similarity?

• Could use cosine similarity; normally pearson coefficient isused

• Select a subset of the users (neighbors) to use aspredictors.

• Normalize ratings and compute a prediction from aweighted combination of the selected neighbors’

ratings.• Present items with highest predicted ratings as

recommendations.



3/27

Homework 2 Solns postedMidterm on Thursday in class

Covers everything covered by the first twohomeworksQns??

Today

Complete Filtering

Discuss Das/Datar paper



Finding User Similarity with Person

Correlation Coefficient• Typically use Pearson correlation coefficient

between ratings for active user, a, and anotheruser, u.

ua r r

uaua

r r c

),(covar ,

r a and r u are the ratings vectors for the m items rated by

both a and u

r i,j is user i’s rating for item jm

r r r r

r r

m

i uiuaia

ua

1 ,,

))((

),(covar

m

r r m

i

xi x

r x

1

2

, )(

m

r

r

m

i

i x

x

1

,



Neighbor Selection

• For a given active user, a, select correlatedusers to serve as source of predictions.

• Standard approach is to use the most similark users, u, based on similarity weights, w a,u

• Alternate approach is to include all userswhose similarity weight is above a giventhreshold.



Rating Prediction

• Predict a rating, pa,i , for each item i , for active user,a, by using the k selected neighbor users,

u {1,2,…k }.

• To account for users different ratings levels, basepredictions on differences from a user’s average rating.

• Weight users’ ratings contribution by their

similarity to the active user.

n

u

ua

n

u

uiuua

aia

w

r r w

r p

1

,

1

,,

,

||

)(

ri,j is user i’s rating for item j

ua r r

uaua

r r c

),(covar ,



Similarity Weighting=User

Similarity• Typically use Pearson correlation coefficient

between ratings for active user, a, and anotheruser, u.

ua r r

uaua

r r c

),(covar ,

r a and r u are the ratings vectors for the m items rated by

both a and u

r i,j is user i’s rating for item jm

r r r r

r r

m

i uiuaia

ua

1 ,,

))((

),(covar

m

r r m

i

xi x

r x

1

2

, )(

m

r

r

m

i

i x

x

1

,



Significance Weighting

• Important not to trust correlations basedon very few co-rated items.

• Include significance weights, sa,u, basedon number of co-rated items, m.

uauaua c sw ,,,

50if 50

50if 1

, mm

m s ua

ua r r

uaua

r r c

),(covar ,



Problems with Collaborative Filtering

• Cold Start: There needs to be enough other usersalready in the system to find a match.

• Sparsity: If there are many items to be recommended,even if there are many users, the user/ratings matrix issparse, and it is hard to find users that have rated the

same items.• First Rater: Cannot recommend an item that has not

been previously rated.

– New items

– Esoteric items• Popularity Bias: Cannot recommend items to

someone with unique tastes.

– Tends to recommend popular items.

• WHAT DO YOU MEAN YOU DON’T CARE FOR BRITNEYSPEARS YOU DUNDERHEAD? #$%$%$&^



Content-Based Recommending

• Recommendations are based oninformation on the content of items

rather than on other users’ opinions. • Uses machine learning algorithms to

induce a profile of the users preferencesfrom examples based on a featuraldescription of content.

• Lots of systems



Adapting Naïve Bayes idea for Book

Recommendation• Vector of Bags model

– E.g. Books have severaldifferent fields that are alltext

• Authors, description, … • A word appearing in one

field is different from thesame word appearing inanother

– Want to keep each bagdifferent—vector of m Bags;Conditional probabilities for

each word w.r.t each classand bag

• Can give a profile of auser in terms of wordsthat are most predictiveof what they like – Odds Ratio

P(rel|example)/P(~rel|example)

An example is positive if theodds ratio is > 1

– Strengh of a keyword• Log[P(w|rel)/P(w|~rel)]

– We can summarize auser’s profile in terms ofthe words that havestrength above somethreshold.

S

m

dm

i

mi smcja P

Book P

cj P Book cj P

1

||

1

),|(

)(

)()|(



Advantages of Content-Based

Approach• No need for data on other users. – No cold-start or sparsity problems.

• Able to recommend to users with unique tastes.

• Able to recommend new and unpopular items – No first-rater problem.

• Can provide explanations of recommended items

by listing content-features that caused an item tobe recommended.

• Well-known technology The entire field ofClassification Learning is at (y)our disposal!



Disadvantages of Content-Based

Method• Requires content that can be encoded as

meaningful features.

• Users’ tastes must be represented as alearnable function of these content features.

• Unable to exploit quality judgments of otherusers.

– Unless these are somehow included in the contentfeatures.



Content-Boosted CF - I

Content-Based

Predictor

Training Examples

Pseudo User-ratings Vector

Items with Predicted Ratings

User-ratings Vector

User-rated Items

Unrated Items



Content-Boosted CF - II

• Compute pseudo user ratings matrix – Full matrix – approximates actual full user ratings matrix

• Perform CF – Using Pearson corr. between pseudo user-rating vectors

• This works better than either!

User Ratings

Matrix

Pseudo User

Ratings Matrix

Content-Based

Predictor



Why can’t the pseudo ratings be

used to help content-based filtering?• How about using the pseudo ratings to improve a content-based filter

itself? (or how access to unlabelled examples improves accuracy…) – Learn a NBC classifier C0 using the few items for which we have user

ratings – Use C0 to predict the ratings for the rest of the items

– Loop• Learn a new classifier C1 using all the ratings (real and predicted)• Use C1 to (re)-predict the ratings for all the unknown items

– Until no change in ratings

• With a small change, this actually works in finding a better classifier! – Change: Keep the class posterior prediction (rather than just the max class)

• This means that each (unlabelled) entity could belong to multiple classes—with

fractional membership in each• We weight the counts by the membership fractions

– E.g. P(A=v|c) = Sum of class weights of all examples in c that have A=v divided by Sumof class weights of all examples in c

• This is called expectation maximization – Very useful on web where you have tons of data, but very little of it is

labelled – Reminds you of K-means, doesn’t it?



(boosted) content filtering



Co-training• Suppose each instance has two parts:

x = [x1, x2]

x1, x2 conditionally independent given f(x)

• Suppose each half can be used to classify instance

f1, f2 such that f1(x1) = f2(x2) = f(x)

• Suppose f1, f2 are learnable

f1 H1, f2 H2, learning algorithms A1, A2

Unlabeled Instances

[x1, x2]

Labeled Instances

<[x1, x2], f1(x1)>A1 f2

Hypothesis

~A2

Small labeled data needed

You train me —I train you…



Observations

• Can apply A1 to generate as much trainingdata as one wants

– If x1 is conditionally independent of x2 / f(x),

– then the error in the labels produced by A1 – wil l look l ike random noise to A2 ! ! !

• Thus no limit to quality of the hypothesisA2 can make



Discussion of the Google News

Collaborative Filtering Paper

s07 Filtering Class

Documents

Transcript of s07 Filtering Class