Download - Algorithms and Incentives for Robust Ranking Rajat Bhattacharjee Ashish Goel Stanford University Algorithms and incentives for robust rankingAlgorithms.

Algorithms and Incentives for Robust

Ranking

Rajat Bhattacharjee

Ashish Goel

Stanford University

Algorithms and incentives for robust ranking. ACM-SIAM Symposium on Discrete Algorithms (SODA), 2007.

Incentive based ranking mechanisms. EC Workshop, Economics of Networked Systems, 2006.

http://www.stanford.edu/%7Erajatb/robustranking.pdf

http://www.stanford.edu/%7Erajatb/ranking.pdf

Algorithms and incentives for robust ranking

Outline

Motivation Model Incentive Structure Ranking Algorithm


Content : then and now

Traditional Content generation was

centralized (book publishers, movie production companies, newspapers)

Content distribution was subject to editorial control (paid professionals: reviewers, editors)

Internet Content generation is

mostly decentralized (individuals create webpages, blogs)

No central editorial control on content distribution (instead there are ranking and reco. systems like google, yahoo)


Heuristics Race

PageRank (uses link structure of the web) Spammers try to game the system by creating fraudulent link

structures Heuristics race: search engines and spammers have

implemented increasingly sophisticated heuristics to counteract each other

New strategies to counter the heuristics [Gyongyi, Garcia-Molina]

Detecting PageRank amplifying structures sparsest cut

problem (NP-hard) [Zhang et al.]


Amplification Ratio [Zhang, Goel, …]

Consider a set S, which is a subset of VIn(S): total weight of edges from V-S to S

Local(S): total weight of edges from S to S

10

S

w(S) = Local(S) + In(S)

Amp(S) = w(S)/In(S)

High Amp(S) → S is dishonest

Low Amp(S) → S is honest

Collusion free graph:where all sets are honest


Heuristics Race Then why do search engines

work so well? Our belief: because heuristics

are not in public domain Is this “the solution”?

Feedback/click analysis [Anupam et al.] [Metwally et al.] Suffers from click spam Problem of entities with little

feedback Too many web pages, can’t

put them on top slots to gather feedback


Ranking reversal

Ranking reversal

Entity A is better than entity B, but B is ranked higher than A

Keyword: Search Engine


Our result Theorem we would have liked to prove

Here is a reputation system and it is robust, i.e., has no ranking reversals even in the presence of malicious behavior

Theorem we prove Here is a ranking algorithm and incentive structure, which when

applied together imply an arbitrage opportunity for the users of the system whenever there is a ranking reversal (even in the presence of malicious behavior)


Where is the money?

Examples Amazon.com: better recommendations → more purchases →

more revenue Netflix: better recommendations → increased customer

satisfaction → increased registration → more revenue Google/Yahoo: better ranking → more eyeballs → more

revenue through ads

Revenue per entity Simple for Amazon.com and Netflix For Google/Yahoo, we can distribute the revenue from a user

on the web pages he looks at (other approaches possible)


Why share?

Because they will take it anyway!!!

My precious


Less compelling reasons

Difficulty of eliciting honest feedback is well known [Resnick et al.] [Dellarocas]

Search engine rankings are self-reinforcing [Cho, Roy] Strong incentive for players to game the system

Ballot stuffing and bad mouthing in reputation systems [Bhattacharjee, Goel] [Dellarocas]

Click spam in web rankings based on clicks [Anupam et al.] Web structures have been devised to game PageRank

[Gyongyi, Garcia-Molina] Problem of new entities

How should the system discover high quality, new entities in the system?

How should the system discover a web page whose relevance has suddenly changed (may be due to some current event)?


Outline



I-U Model

Inspect (I) User reads a snippet attached to a search result (Google/Yahoo) Looks at a recommendation for a book (Amazon.com)

Utilize (U) User goes to the actual web page (Google/Yahoo) Buys the book (Amazon.com)


I-U Model

Entities Web pages (Google/Yahoo), Books (Amazon.com) Each entity i has an inherent quality qi (think of it as the

probability that a user would utilize entity i, conditioned on the fact that the entity was inspected by the user)

The qualities qi are unknown, but we wish to rank entities according to their qualities

Feedback Tokens (positive and negative) placed on an entity by users Ranking is a function of the relative number of tokens

received by entities Slots

Placeholders for the results of a query


Sheep and Connoisseurs

• Sheep can appreciate a high quality entity when shown

• But wouldn’t go looking for a high quality entity

• Most users are sheep

• Connoisseurs will dig for a high quality entity which is not ranked high enough• The goal of this scheme is to aggregate the information that the connoisseurs have


User response


I-U Model

User response to a typical query Chooses to inspect the top j positions User chooses j at random from an unknown but fixed distribution Utility generation event for ei occurs if the user utilizes an entity ei

(assuming ei is placed among the top j slots) Formally

Utility generation event is captured by random variable

Gi = Ir(i) Ui

r(i) : rank of entity ei

Ir(i),Ui : independent Bernoulli random variables E[Ui] = qi (unknown) E[I1] ≥ E[I2] ≥ … ≥ E[Ik] (known)


Outline



Information Markets

View the problem as an info aggregation problem Float shares of entities and let the market decide their value

(ranking) [Hanson] [Pennock] Rank according to the price set by the market Work best for predicting outcomes which are objective

Elections (Iowa electronic market)

Distinguishing features of the ranking problem Fundamental problem: outcome is not objective Revenue: because of more eyeballs or better quality? Eyeballs in turn depend on the price set by the market However, an additional lever: the ranking algorithm


Game theoretic approaches

Example: [Miller et al.] Framework to incentivize honest feedback Counter lack of objective outcomes by comparing a user’s

reviews to that of his peers Selfish interests of a user should be in line with the desirable

properties of the system

Doesn’t address malicious users Benefits from the system, may come from outside the system

as well Revenue from outcome of these systems might overwhelm

the revenue from the system itself


Ranking mechanism: overview

Overview: Users place token (positive and negative) on the entities Ranking is computed based on the number of tokens on the

entities Whenever a revenue generation event takes place, the

revenue is shared among the users

Ranking algorithm Input: feedback scores of entities Output: probabilistic distribution over rankings of the entities Ensures that the number of inspections an entity gets is

proportional to the fraction of tokens on it


Incentive structure

A token is a three tuple: (p, u, e) p : +1 or -1 depending on whether a token is a positive token

or a negative token u : user who placed the token e : entity on which the token was placed Net weight of the tokens a user can place is bounded, that is

pi| is bounded User cannot keep placing positive tokens without placing a

negative token and vice versa


User account

Each user has an account Revenue shares are added or deducted from a user’s account Withdrawal is permitted but deposits are not Users can make profits from the system but not gain control by

paying If a user’s share goes negative: remove it from the system for

some pre-defined time

Let <1 and s>1 be pre-defined system parameters The fraction of revenue that the system distributes as incentives

to the users: Parameter s will be set later


Revenue share

Suppose a revenue generation event takes place for an entity e at time t R: revenue generated

For each token i placed on entity e ai is the net weight (positive - negative) of

tokens placed on entity e before token i was placed on e

The revenue shared by the system with the user who placed token i is proportional to

piR/ais

Adds up to at most R Negative token: the revenue share is negative,

deduct from the user’s account

1

2

3

4

5

6

7

8


Revenue share Some features

Parameter s controls relative importance of tokens placed earlier Tokens placed after token i have no bearing on the revenue

share of the user who placed token i Hence s is strictly greater than 1

Incentive for discovery of high quality entities Hence the choice of diminishing rewards

Emphasis is on making the process as implicit as possible

Resistance to racing The system shouldn’t allow a repeated cycle of actions which

pushes A above B and then B above A and so on We can add more explicit feature by multiplying any negative

revenue by (1+) where is an arbitrarily small positive number


Ranking by quality Either the entities are ranked by quality, or, there exists a profitable

arbitrage opportunity for the users in correcting the ranking

Ranking reversal: A pair of entities (i,k) such that qi<qk and i>k qi, qk: quality of entity i and k resp. i, k: number of tokens on entity i and k resp.

Revenue/utility generated by the entity: f(r,q) r: relative number of tokens placed on an entity q: quality of the entity For the I-U Model, our ranking algorithm ensures f(r,q) is

proportional to qr

Objective: A ranking reversal should present a profitable arbitrage opportunity


Arbitrage

There exists a pair of entities A and B

Placing a positive token on A and placing a negative token on B

The expected profit from A is more than the expected loss from B

1

2

3

4

5

6

7

8

1

2

3

4

5


Proof (for separable rev fns)

Suppose f(ri, qi) i-s < f(rk, qk) k

-s

ri = i (l l)-1, rk= k(l l)-1 It is profitable to put a negative token on entity i and a positive token on

entity k Assumption: f is separable, that is f(r,q) = qr

Choose parameter s greater than f(ri, qi) i

-s < f(ri, qk) i-s

f is increasing in q f(ri, qk) i

-s = qkrii

-s = qk i-s (l l)-

Definition of separable function Similarly f(rk, qk) k

-s = qk rkk

-s = qk k-s (l l)-

However qki-s(l l)-< qk k

-s (l l)-

i > k and s > Hence, f(ri, qi) i

-s < f(rk, qk) k-s


Proof (I-U Model)

The rate at which revenue is generated by entity i (k) is proportional to (ensured by our ranking algorithm) qii (qkk)

Rate at which incentives are generated by placing a positive token on entity k is qkkk

s

Loss due to placing a negative token on entity i is qiiis

If s>1, qkk1-s > qii

1-s

qi < qk (ranking reversal) i

> k (ranking reversal)

Thus a profitable arbitrage opportunity exists in correcting the system


Outline



Naive approach

Order the entities by the net number of tokens they have Problem?

Incentive for manipulation

Example: Slot 1: 1,000,000 inspections Slot 2: 500,000 inspections Entity 1: 1000 tokens Entity 2: 999 tokens


Ranking Algorithm

Proper ranking

If entity e1 has more positive feedback than entity e2, then if the user chooses to inspect the top t (for any t) slots, then the probability that e1 shows up should be higher than the probability that e2 shows up among the top t slots

Random variable Xe gives the position of entity e Entity e1 dominates e2 if for all t, Pr[Xe1 ≤ t] ≥ Pr[Xe2 ≤ t] Proper ranking: if the feedback score of e1 is more than the

feedback score of e2, then e1 dominates e2

Distribution returned by the algorithm is a proper ranking


Majorized case

≥

p : vector giving the normalized expected inspections of slots

S = E[I1] + E[I2] + … + E[Ik]

p = {E[I1]/S, E[I2]/S, …, E[Ik]/S}

: vector giving the normalized number of tokens on entities Special case: p majorizes

For all i, the sum of the i largest components of p is more than the sum of the i largest components of


Majorized case

Typically, the importance of top slots in a ranking system is far higher than the lower slots Rapidly decaying tail

The number of entities is order of magnitude more than the number of significant slots Heavy tail

Hence for web ranking p majorizes We believe for most applications p majorizes

Restrict to the majorized case here The details of the general case are in the paper


Hardy, Littlewood, Pólya

=1

=1

• Theorem [Hardy, Littlewood, Pólya]• The following two statements are equivalent: (1) The vector x is majorized by the vector y, (2) There exists a doubly stochastic matrix, D, such that x = Dy

• Interpret Dij as the probability that entity i shows up at position j

• This ensures that the number of inspections that an entity gets is directly proportional to its feedback score

Doubly stochastic matrix(Dij ≥ 0, ∑j Dij = 1, ∑j Dij = 1)


Birkhoff von Neumann Theorem

Hardy, Littlewood, Pólya theorem on majorization doesn’t guarantee that the ranking we obtain is proper We present a version of the theorem which takes care of this

Theorem [Birkhoff, von Neumann] An nxn matrix is doubly stochastic if and only if it is a convex

combination of permutation matrices Convex combination of permutation matrices Distribution over

rankings

Algorithms for computing Birkhoff von Neumann distribution O(m2) [Gonzalez, Sahni] O(mn log K) [Gabow, Kariv]


Conclusion

Theorem Here is a ranking algorithm and incentive structure, which when

applied together imply an arbitrage opportunity for the users of the system whenever there is a ranking reversal

Resistance to gaming We don’t make any assumptions about the source of the error in

ranking - benign or malicious So by the same argument the system is resistant to gaming as

well

Resistance to racing


Thank You