Rachid Guerraoui , EPFL

28
Rachid Guerraoui, EPFL

description

Rachid Guerraoui , EPFL. R ecommendation systems are good. What is a good recommendation system?. A good recommendation system is one that provides good recommendations. What is a good recommendation?. You know it when you see it. - PowerPoint PPT Presentation

Transcript of Rachid Guerraoui , EPFL

Page 1: Rachid Guerraoui ,  EPFL

Rachid Guerraoui, EPFL

Page 2: Rachid Guerraoui ,  EPFL

What is a good recommendation system?

Recommendation systems are good

Page 3: Rachid Guerraoui ,  EPFL

A good recommendation system is one that provides good recommendations

What is a good recommendation?

Page 4: Rachid Guerraoui ,  EPFL

You know it when you see it

“ I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["hard-core pornography"]; and perhaps I could

never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that”

Justice Potter Stewart, US Supreme Court, 1964

Page 5: Rachid Guerraoui ,  EPFL
Page 6: Rachid Guerraoui ,  EPFL
Page 7: Rachid Guerraoui ,  EPFL

Ideally: Build and deploy your system

Pragmatic: Transform past into future

What is a good recommendation system ?

Page 8: Rachid Guerraoui ,  EPFL

Example

• Members of program committee (20) want to evaluate the submitted papers (200)

• Nobody has enough time to read all papers

• Each researcher is assigned a subset of papers

• A recommendation system uses the scores to find the opinion of all members about all papers

Page 9: Rachid Guerraoui ,  EPFL

What is a good recommendation?

It depends on the correlation

Theory to the rescue

Page 10: Rachid Guerraoui ,  EPFL

• n users• k * n objects• For each user and object: a grade– The grades of a user form his preference vector– The vectors of users form the preference matrix– Grades may be binary, discrete, continuous

General recommendation model

Page 11: Rachid Guerraoui ,  EPFL

Vectors of grades: v(p)(known partially to the players)

Input?

Vectors of grades: w(p)(seeking to approximate v(p))

Output?

Page 12: Rachid Guerraoui ,  EPFL

Ideal output

Target output

w(p) = v(p)

Minimize max |w(p)-v(p)| (Hamming distance)

Page 13: Rachid Guerraoui ,  EPFL

Compare with a perfect on-line algorithm

How to account for the level of correlation?

Page 14: Rachid Guerraoui ,  EPFL

Shared billboard

(1) All players know all partial vectors

The perfect on-line algorithm

Page 15: Rachid Guerraoui ,  EPFL

The perfect on-line algorithm

(2) Chooses elements of the partial vectors to fill (B budget)

The player is initially indulgent (learning phase)

The algorithm assigns initial papers

Page 16: Rachid Guerraoui ,  EPFL

(3) Knows the level of correlation

Hamming diameter of a set P

D(P) = max(v(p) − v(q) ) −∀p,q∈P

The perfect on-line algorithm

Page 17: Rachid Guerraoui ,  EPFL

20 pc members; 200 papers

Every member can read 10 papers

All have the same taste

Perfect solution possible?

Page 18: Rachid Guerraoui ,  EPFL

20 pc members; 200 papers

Two clusters of 10 have the same taste

Perfect solution possible?

Every member needs to read 20

Page 19: Rachid Guerraoui ,  EPFL

Assume player p can probe B objects

n/B*k – 1

How many other players does p need to collaborate with to fill its vector?

Page 20: Rachid Guerraoui ,  EPFL

20 pc members; 200 papers

4 clusters of 5 with diameter 8

Every member reads 20

What is the minimal error rate?

Page 21: Rachid Guerraoui ,  EPFL

Ideal algorithm (k=1)• A player p has to use ideas of (n/B)-1 other

players to estimate her/his preferences

In the worst case, p cannot do better

• The rate of error for p depends on the hamming distance between p and the other (n/B) players

• This is with a constant factor of the diameter of these n/B players

Page 22: Rachid Guerraoui ,  EPFL

Claim

For every B-algorithm, there is some distribution of preferences such that (with constant probability)

w(p) − v(p) ≥min(D(P)4) −∀P, p∈P, P ≥ n /B

Page 23: Rachid Guerraoui ,  EPFL

Proof (sketch)

Consider a constant D > 2B Define a preference vector as follows:

Let P be a set of players of size n/B - Let p in P with a random preference vector -Assign a random preference vector outside P

Choose a set S of D objects. For every player q in P, v(q)=v(p) except in S which is random

Page 24: Rachid Guerraoui ,  EPFL

Proof (sketch)Probes outside P provide no information to p

Probes inside P provide no information to p w.r.t S

Since p probes at most B objects and S contains D > 2B objects, there are at least D/2 objects for which p has no information

No algorithm can do better than guess preferences in S

The rate of error is at least D/4 and the diameter of P is less than D

Page 25: Rachid Guerraoui ,  EPFL

Optimality

An algorithm is (B,c)-optimal if for every input set of preferences

w(p) − v(p) ≤ min(cD(P)) −∀p∈P, P ≥ n /B

Page 26: Rachid Guerraoui ,  EPFL

So what?The best we can do is find clusters of players that

are - Small enough (small diameter) to provide

“accurate” preferencesAnd- Big enough to cover all objects

• Practically speaking? - Try different sizes of clusters

Page 27: Rachid Guerraoui ,  EPFL

Optimality

• Assume each player can evaluate B objects. • Given B, and the level of correlation among

players, there is a minimum rate of error that can be achieved.

• There is an algorithm that obtains a constant approximation of this error-rate, and each player evalutes O(B.Polylog(n)) objects.

Page 28: Rachid Guerraoui ,  EPFL

Definition of Optimality

• An algorithm is asymptotically optimal in terms of error rate, if for every player p we have:

• |w(p)-v(p)| < min|P|>n/B-1 cD(P)• Where c is a constant and D(P) is the diameter

of set P. P can be any set of players with size at least n/B.