Download - When is it Best to Best-Reply? Michael Schapira (Yale University and UC Berkeley) Joint work with Noam Nisan (Hebrew U), Gregory Valiant (UC Berkeley)

When is it Best to Best-Reply?

Michael Schapira(Yale University and UC Berkeley)

Joint work withNoam Nisan (Hebrew U),

Gregory Valiant (UC Berkeley)and Aviv Zohar (Hebrew U)

Motivation: Internet Routing

Establish routes between Autonomous Systems (ASes).

Currently handled by the Border Gateway Protocol (BGP).

AT&T

Qwest

Comcast

Sprint

Internet Routing as a Game[Levin-S-Zohar]

• Internet routing is a game!– players = ASes – players’ types = preferences over routes– strategies = routes

• BGP = Best-Response Dynamics– each AS constantly selects its best

available route to each destination– … until a “stable state” (= PNE) is reached.

But…

• Challenge I: No synchronization ofplayers’ actions– players can best-reply simultaneously.– players can best-reply based on outdated information.– When is BGP guaranteed to converge to a stable state?

• Challenge II: Are players incentivized to follow best-response dynamics?– Can an AS gain from not executing BGP?

Agenda

• Mechanism design approach to best-response dynamics.(main focus of this talk)

• Convergence of best-response dynamics in asynchronous environments. [Jaggard-S-Wright]

(if time permits)

Agenda

•Part I: mechanism design approach to best-response dynamics.

•Part II: on the convergence of best-response dynamics in asynchronous environments.

Incentive-Compatible Best-Response

Dyanmics

Main Questions

• When is myopic best-replying also good in the long run?

• When can stable outcomes be implemented in partial-information settings?

• Can we reason about partial-information settings via complete-information games?

Our Results Have Implications For

• Internet protocols– Internet routing (BGP), congestion control (TCP)

• Auctions– 1st-price auctions, unit-demand auctions, GSP

• Matching– correlated markets, interns and hospitals

• Cost-sharing mechanisms– Moulin mechanisms, …

1st Price Auction

Bids 0 1 2 3 4 5

0 B:2 A:3 A:2 A:1 A:0 A:-1

1 B:1 B:1 A:2 A:1 A:0 A:-1

2 B:0 B:0 B:0 A:1 A:0 A:-1

3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1

Alice (va=4)

Bob(vb=2

)

winner:utility

Bids 0 1 2 3 4 5

0 B:2 A:3 A:2 A:1 A:0 A:-1

1 B:1 B:1 A:2 A:1 A:0 A:-1

2 B:0 B:0 B:0 A:1 A:0 A:-1

3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1

Alice (va=4)

Bob(vb=2

)

Ascending-Price English Auction

Bids 0 1 2 3 4 5

0 B:2 A:3 A:2 A:1 A:0 A:-1

1 B:1 B:1 A:2 A:1 A:0 A:-1

2 B:0 B:0 B:0 A:1 A:0 A:-1

3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1

Alice (va=4)

Bob(vb=2

)

Best-Reply(with some-tie breaking)

The Model

• n players

• Player i has – action set Ai

– (private) type ti єTi

– utility function ui

),(1

atuuAaTt iiij

n

jii

The Model: Dynamic Interaction

• Discrete time steps. Initial action profile a0.

• One player is activated in each time step– round-robin (cyclic) order– our results are independent of the order (and also hold for asynchronous

environments)

• Players’ strategies specify which actions are selected in each time step.– can be history-dependent

• Best-response dynamics = the strategy profile in which each player constantly best-replies to others’ actions

Two Possible Payoff Models

Cumulative model

– Payoffs are accumulated

– Alternative formulation with discount factors

Payoff at the limit

– If the dynamics converges to a stable outcome a*

– If no convergence, the resulting payoff is low.

More natural.sometimes too

restrictive

1

),(1

suplimk

kiii atuU *),( atuU iii

Weaker (actively discourages oscillations), interesting applications

Solution Concept• A strategy profile is an ex-post Nash equilibrium if no player wishes to deviate

from regardless of the types

(this is essentially the best possible in a distributed environment [Shneidman-Parkes])

),,'(),,('1

tUtUTti iiiiiiijj

n

2,1 0,0

3,0 1,3

Row Player: Type 1

3,1 1,0

2,0 0,3

Row Player: Type 2

Best-Replying is Not Always Best

• dominance-solvable• potential game• unique and Pareto optimal PNE

When is it Good to Best-Reply?

• Goal: identify a class of games in which best-response dynamics is an ex-post Nash equilibrium.– i.e., best-replying is incentive-compatible– close in spirit to “learning equilibria” [Brafman-tennenholtz]

• This class is going to be VERY restricted. Still… a variety of mechanisms/protocols.

• Remark: The best replies are not always unique. Thus, we must handle tie-breaking.

One Class of Games

• Lemma: If each realization of types yields a game in which each player has a single dominant strategy, then best-response dynamics is an ex-post Nash equilibrium.

9,0 1,1 1,3

10,0 0,2 0,1

10,0 0,1 0,3

9,0 1,2 1,1

• no player has a dominant strategy (in both realizations).

• best-response dynamics is an ex-post Nash equilibrium.

• This game is blindly solvable.

On the Other Hand…

Row Player: Type 1

Row Player: Type 2

Blindly-Dominated Strategy Sets

8 7 9

5 6 8

3 2 1

3 4 0

)','(min),(max',\',

iiisTSs

iiisTs

ssussuiiiii

T

Blindly-Solvable Games

• Defn: A game is blindly-solvable if iterated elimination of blindly-dominated strategy sets results in a single strategy profile.– Observation: the “surviving” strategy profile is the

unique PNE of the game.

• Defn: A partial-information game is blindly-solvable if every realization of types yields a blindly-solvable game.

Bids 0 1 2 3 4 5

0 B:2 A:3 A:2 A:1 A:0 A:-1

1 B:1 B:1 A:2 A:1 A:0 A:-1

2 B:0 B:0 B:0 A:1 A:0 A:-1

3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1

Alice (va=4)

Bob(vb=2

)

1st-Price Auctions Revisited

Merits of Blindly-Solvable Games

• Thm: Let G be a blindly-solvable partial-information game. Let a* be the surviving strategy profile. Then,

1. Best-response dynamics converges to a* within n(j|Aj|) time steps.

2. In the “payoff at the limit” model, best-response dynamics is incentive-compatible, and even collusion-proof, in ex-post Nash.

Intuition for Proof of (2)

• The first action that was not “eliminated” in the elimination sequence of G must belong to a manipulator.

• The manipulator’s utility from that action is lower than his utility from a*.

Bids 0 1 2 3 4 5

0 B:2 A:3 A:2 A:1 A:0 A:-1

1 B:1 B:1 A:2 A:1 A:0 A:-1

2 B:0 B:0 B:0 A:1 A:0 A:-1

3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1

Alice (va=4)

Bob(vb=2

)

Best-Response 1st-PriceAuction Mechanism

Implications forInternet Environments

• Under realistic conditions routing with the Border Gateway Protocol is incentive compatible. [Levin-S-Zohar]

• Convergence and incentive compatibility results for congestion control. [Godfrey-S-Zohar-Shenker]

Mechanism design without money!

BEYOND BLINDLY-SOLVABLE GAMES

Generalized 2nd-Price Auction (GSP)

• Used for selling ads on search engines.

• k slots. Each slot j with click-through-rate j.

• Users submit bids (per click) bi.

• They are ranked in order of bids.

• If ad is clicked: pay next highest bid.

• No dominant strategy equilibrium.

• There exists an equilibrium with VCG payments. [Edelman-Ostrovsky-Schwarz, Varian]

• Best-response dynamics (with tie-breaking) converge with probability 1 to that equilibrium. [Cary et al.]

• Thm (informal): Best-replying in GSP is incentive-compatible.– Generalizes the English auction of [Edelman-Ostrovsky-Schwarz]

Generalized 2nd-Price Auction (GSP)

Auctions With Unit-Demand Bidders

• n bidders. m items.

• Each bidder i has value vi,j for each item j, and is interested in at most one item.

• Thm: There exists a best-response mechanism for auctions with unit-demand bidders that is incentive-compatible in ex-post Nash and converges to the VCG outcome.– Generalizes the English auction of [Demange-Gale-Sotomayer]

• The proof of incentive-compatibility is simple. The proof of convergence is more complex and is based on Kuhn’s Hungarian method.

A NEW PERSPECTIVE ON SOME CENTRALIZED MECHANISMS

Centralized vs. Distributed

players declare types

output the outcome

simulate interaction

players reach a stable outcome in a

distributed manner

ex-post equilibrium in the decentralized

setting

dominant strategyimplementation in the

centralized setting.

centralized distributed

The Centralized Setting

• Each player i has an action set Ai, a private type ti, and a utility function ui (as before).

• Wanted: a direct revelation mechanism that outputs a pure Nash equilibrium of the game.

and incentivizes truthfulness

i

n

ii

n

iATM

11:

2,1 0,0

3,0 1,3

Row Player: Type 1

3,1 1,0

2,0 0,3

Row Player: Type 2

Clearly, This is Not Always Possible

Corollary I

• If every player has a single dominant strategy in every realization, then the direct-revelation mechanism is truthful.– Give each player his dominant strategy in the reported

realization.

Corollary II

•If the game is blindly solvable, then the direct-revelation mechanism is truthful.

9,0 1,1 1,3

10,0 0,2 0,1

10,0 0,1 0,3

9,0 1,2 1,1

Row Player: Type 1

Row Player: Type 2

More Blindly-Solvable Games

•Cost-Sharing mechanisms– Moulin mechanisms [Moulin, Moulin-Shenker]

– Acyclic mechanisms [Mehta-Roughgarden-Sundararajan]

•Matching games– Interns and Hospitals– Correlated two sided markets

Directions for Future Research

•Implementability of other kinds of equilibria (mixed Nash, correlated, …)?

•Incentive-compatibility of other kinds of dynamics (fictitious play, regret minimization)?

Agenda

•Part I: mechanism design approach to best-response dynamics.

•Part II: on the convergence of best-response dynamics in asynchronous environments.

Best-Response Dynamics

Out of Sync

Synchronous Environments

• In traditional best-response dynamics players are activated one at a time.

• More generally, the study of game dynamics normally supposes synchrony.

• What if the interaction between players is asynchronous? (Internet, markets)

Illustration

2,1 0,0

1,20,0

RowPlayer

ColumnPlayer

But…

2,1 0,0

1,20,0

RowPlayer

ColumnPlayer

• Infinite sequence of discrete time-steps

• In each time-step a subset of the players best-replies.

• The “schedule” is chosen by an adversarial entity (“the Scheduler”).

• The schedule must be fair (no player is indefinitely “starved” from best-replying).

Model for Analyzing Asynchronous Best-Response Dynamics

•Thm: If two pure Nash equilibria(or more) exist in a game then asynchronous best-reply dynamics can potentially oscillate.

• Implications for Internet protocols, diffusion of innovations in social networks, and more.

Result [Jaggard-S-Wright]

Directions for Future Research

•Characterization of games for which asynchronous best-response dynamics converge.

•More generally, exploring game dynamics in the realm that lies beyond synchronization (fictitious play, regret minimization).

THANK YOU!