When is it Best to Best-Reply?
Michael Schapira(Yale University and UC Berkeley)
Joint work withNoam Nisan (Hebrew U),
Gregory Valiant (UC Berkeley)and Aviv Zohar (Hebrew U)
Motivation: Internet Routing
Establish routes between Autonomous Systems (ASes).
Currently handled by the Border Gateway Protocol (BGP).
AT&T
Qwest
Comcast
Sprint
Internet Routing as a Game[Levin-S-Zohar]
• Internet routing is a game!– players = ASes – players’ types = preferences over routes– strategies = routes
• BGP = Best-Response Dynamics– each AS constantly selects its best
available route to each destination– … until a “stable state” (= PNE) is reached.
But…
• Challenge I: No synchronization ofplayers’ actions– players can best-reply simultaneously.– players can best-reply based on outdated information.– When is BGP guaranteed to converge to a stable state?
• Challenge II: Are players incentivized to follow best-response dynamics?– Can an AS gain from not executing BGP?
Agenda
• Mechanism design approach to best-response dynamics.(main focus of this talk)
• Convergence of best-response dynamics in asynchronous environments. [Jaggard-S-Wright]
(if time permits)
Agenda
•Part I: mechanism design approach to best-response dynamics.
•Part II: on the convergence of best-response dynamics in asynchronous environments.
Incentive-Compatible Best-Response
Dyanmics
Main Questions
• When is myopic best-replying also good in the long run?
• When can stable outcomes be implemented in partial-information settings?
• Can we reason about partial-information settings via complete-information games?
Our Results Have Implications For
• Internet protocols– Internet routing (BGP), congestion control (TCP)
• Auctions– 1st-price auctions, unit-demand auctions, GSP
• Matching– correlated markets, interns and hospitals
• Cost-sharing mechanisms– Moulin mechanisms, …
1st Price Auction
Bids 0 1 2 3 4 5
0 B:2 A:3 A:2 A:1 A:0 A:-1
1 B:1 B:1 A:2 A:1 A:0 A:-1
2 B:0 B:0 B:0 A:1 A:0 A:-1
3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1
Alice (va=4)
Bob(vb=2
)
winner:utility
Bids 0 1 2 3 4 5
0 B:2 A:3 A:2 A:1 A:0 A:-1
1 B:1 B:1 A:2 A:1 A:0 A:-1
2 B:0 B:0 B:0 A:1 A:0 A:-1
3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1
Alice (va=4)
Bob(vb=2
)
Ascending-Price English Auction
Bids 0 1 2 3 4 5
0 B:2 A:3 A:2 A:1 A:0 A:-1
1 B:1 B:1 A:2 A:1 A:0 A:-1
2 B:0 B:0 B:0 A:1 A:0 A:-1
3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1
Alice (va=4)
Bob(vb=2
)
Best-Reply(with some-tie breaking)
The Model
• n players
• Player i has – action set Ai
– (private) type ti єTi
– utility function ui
),(1
atuuAaTt iiij
n
jii
The Model: Dynamic Interaction
• Discrete time steps. Initial action profile a0.
• One player is activated in each time step– round-robin (cyclic) order– our results are independent of the order (and also hold for asynchronous
environments)
• Players’ strategies specify which actions are selected in each time step.– can be history-dependent
• Best-response dynamics = the strategy profile in which each player constantly best-replies to others’ actions
Two Possible Payoff Models
Cumulative model
– Payoffs are accumulated
– Alternative formulation with discount factors
Payoff at the limit
– If the dynamics converges to a stable outcome a*
– If no convergence, the resulting payoff is low.
More natural.sometimes too
restrictive
1
),(1
suplimk
kiii atuU *),( atuU iii
Weaker (actively discourages oscillations), interesting applications
Solution Concept• A strategy profile is an ex-post Nash equilibrium if no player wishes to deviate
from regardless of the types
(this is essentially the best possible in a distributed environment [Shneidman-Parkes])
),,'(),,('1
tUtUTti iiiiiiijj
n
2,1 0,0
3,0 1,3
Row Player: Type 1
3,1 1,0
2,0 0,3
Row Player: Type 2
Best-Replying is Not Always Best
• dominance-solvable• potential game• unique and Pareto optimal PNE
When is it Good to Best-Reply?
• Goal: identify a class of games in which best-response dynamics is an ex-post Nash equilibrium.– i.e., best-replying is incentive-compatible– close in spirit to “learning equilibria” [Brafman-tennenholtz]
• This class is going to be VERY restricted. Still… a variety of mechanisms/protocols.
• Remark: The best replies are not always unique. Thus, we must handle tie-breaking.
One Class of Games
• Lemma: If each realization of types yields a game in which each player has a single dominant strategy, then best-response dynamics is an ex-post Nash equilibrium.
9,0 1,1 1,3
10,0 0,2 0,1
10,0 0,1 0,3
9,0 1,2 1,1
• no player has a dominant strategy (in both realizations).
• best-response dynamics is an ex-post Nash equilibrium.
• This game is blindly solvable.
On the Other Hand…
Row Player: Type 1
Row Player: Type 2
Blindly-Dominated Strategy Sets
8 7 9
5 6 8
3 2 1
3 4 0
)','(min),(max',\',
iiisTSs
iiisTs
ssussuiiiii
T
Blindly-Solvable Games
• Defn: A game is blindly-solvable if iterated elimination of blindly-dominated strategy sets results in a single strategy profile.– Observation: the “surviving” strategy profile is the
unique PNE of the game.
• Defn: A partial-information game is blindly-solvable if every realization of types yields a blindly-solvable game.
Bids 0 1 2 3 4 5
0 B:2 A:3 A:2 A:1 A:0 A:-1
1 B:1 B:1 A:2 A:1 A:0 A:-1
2 B:0 B:0 B:0 A:1 A:0 A:-1
3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1
Alice (va=4)
Bob(vb=2
)
1st-Price Auctions Revisited
Merits of Blindly-Solvable Games
• Thm: Let G be a blindly-solvable partial-information game. Let a* be the surviving strategy profile. Then,
1. Best-response dynamics converges to a* within n(j|Aj|) time steps.
2. In the “payoff at the limit” model, best-response dynamics is incentive-compatible, and even collusion-proof, in ex-post Nash.
Intuition for Proof of (2)
• The first action that was not “eliminated” in the elimination sequence of G must belong to a manipulator.
• The manipulator’s utility from that action is lower than his utility from a*.
Bids 0 1 2 3 4 5
0 B:2 A:3 A:2 A:1 A:0 A:-1
1 B:1 B:1 A:2 A:1 A:0 A:-1
2 B:0 B:0 B:0 A:1 A:0 A:-1
3 B:-1 B:-1 B:-1 B:-1 A:0 A:-1
Alice (va=4)
Bob(vb=2
)
Best-Response 1st-PriceAuction Mechanism
Implications forInternet Environments
• Under realistic conditions routing with the Border Gateway Protocol is incentive compatible. [Levin-S-Zohar]
• Convergence and incentive compatibility results for congestion control. [Godfrey-S-Zohar-Shenker]
Mechanism design without money!
BEYOND BLINDLY-SOLVABLE GAMES
Generalized 2nd-Price Auction (GSP)
• Used for selling ads on search engines.
• k slots. Each slot j with click-through-rate j.
• Users submit bids (per click) bi.
• They are ranked in order of bids.
• If ad is clicked: pay next highest bid.
• No dominant strategy equilibrium.
• There exists an equilibrium with VCG payments. [Edelman-Ostrovsky-Schwarz, Varian]
• Best-response dynamics (with tie-breaking) converge with probability 1 to that equilibrium. [Cary et al.]
• Thm (informal): Best-replying in GSP is incentive-compatible.– Generalizes the English auction of [Edelman-Ostrovsky-Schwarz]
Generalized 2nd-Price Auction (GSP)
Auctions With Unit-Demand Bidders
• n bidders. m items.
• Each bidder i has value vi,j for each item j, and is interested in at most one item.
• Thm: There exists a best-response mechanism for auctions with unit-demand bidders that is incentive-compatible in ex-post Nash and converges to the VCG outcome.– Generalizes the English auction of [Demange-Gale-Sotomayer]
• The proof of incentive-compatibility is simple. The proof of convergence is more complex and is based on Kuhn’s Hungarian method.
A NEW PERSPECTIVE ON SOME CENTRALIZED MECHANISMS
Centralized vs. Distributed
players declare types
output the outcome
simulate interaction
players reach a stable outcome in a
distributed manner
ex-post equilibrium in the decentralized
setting
dominant strategyimplementation in the
centralized setting.
centralized distributed
The Centralized Setting
• Each player i has an action set Ai, a private type ti, and a utility function ui (as before).
• Wanted: a direct revelation mechanism that outputs a pure Nash equilibrium of the game.
and incentivizes truthfulness
i
n
ii
n
iATM
11:
2,1 0,0
3,0 1,3
Row Player: Type 1
3,1 1,0
2,0 0,3
Row Player: Type 2
Clearly, This is Not Always Possible
Corollary I
• If every player has a single dominant strategy in every realization, then the direct-revelation mechanism is truthful.– Give each player his dominant strategy in the reported
realization.
Corollary II
•If the game is blindly solvable, then the direct-revelation mechanism is truthful.
9,0 1,1 1,3
10,0 0,2 0,1
10,0 0,1 0,3
9,0 1,2 1,1
Row Player: Type 1
Row Player: Type 2
More Blindly-Solvable Games
•Cost-Sharing mechanisms– Moulin mechanisms [Moulin, Moulin-Shenker]
– Acyclic mechanisms [Mehta-Roughgarden-Sundararajan]
•Matching games– Interns and Hospitals– Correlated two sided markets
Directions for Future Research
•Implementability of other kinds of equilibria (mixed Nash, correlated, …)?
•Incentive-compatibility of other kinds of dynamics (fictitious play, regret minimization)?
Agenda
•Part I: mechanism design approach to best-response dynamics.
•Part II: on the convergence of best-response dynamics in asynchronous environments.
Best-Response Dynamics
Out of Sync
Synchronous Environments
• In traditional best-response dynamics players are activated one at a time.
• More generally, the study of game dynamics normally supposes synchrony.
• What if the interaction between players is asynchronous? (Internet, markets)
Illustration
2,1 0,0
1,20,0
RowPlayer
ColumnPlayer
Illustration
2,1 0,0
1,20,0
RowPlayer
ColumnPlayer
But…
2,1 0,0
1,20,0
RowPlayer
ColumnPlayer
• Infinite sequence of discrete time-steps
• In each time-step a subset of the players best-replies.
• The “schedule” is chosen by an adversarial entity (“the Scheduler”).
• The schedule must be fair (no player is indefinitely “starved” from best-replying).
Model for Analyzing Asynchronous Best-Response Dynamics
•Thm: If two pure Nash equilibria(or more) exist in a game then asynchronous best-reply dynamics can potentially oscillate.
• Implications for Internet protocols, diffusion of innovations in social networks, and more.
Result [Jaggard-S-Wright]
Directions for Future Research
•Characterization of games for which asynchronous best-response dynamics converge.
•More generally, exploring game dynamics in the realm that lies beyond synchronization (fictitious play, regret minimization).
THANK YOU!
Top Related