Time-Average Replicator and Best-Reply DynamicsJosef Hofbauer (Universit¨at Wien) and Sylvain Sorin...
Transcript of Time-Average Replicator and Best-Reply DynamicsJosef Hofbauer (Universit¨at Wien) and Sylvain Sorin...
Time-Average Replicator and Best-ReplyDynamics
Yannick Viossat (Universite Paris-Dauphine)
joint work with
Josef Hofbauer (Universitat Wien)
and
Sylvain Sorin (Universite Paris 6 and Ecole polytechnique)
Seminar on Discrete Mathematics and Game Theory, LSE, May 2010
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Philosophy
A quote from a famous German philosopher :
Proofs should not be made in public
B. von Stengel
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Philosophy
A quote from a famous German philosopher :
Proofs should not be made in public
B. von Stengel
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Getting started...
Introduction to evolutionary game theory
Talk’s topic
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Traditional vs evolutionary game theory
Standard game theory :
few agents
know the game (common knowledge)
high rationality, use elaborate thinking
Evolutionary game theory :
populations of agents,
need not fully understand the game
low rationality, use rules of thumb (or selection process)
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Traditional vs evolutionary game theory
Standard game theory :
few agents
know the game (common knowledge)
high rationality, use elaborate thinking
Evolutionary game theory :
populations of agents,
need not fully understand the game
low rationality, use rules of thumb (or selection process)
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Evolutionary Game Theory : conceptual framework
large population of agents
meet randomly, and play a symmetric game
strategies with good results spread (imitation, selection,...)
changes the average behaviour, hence the “good strategies”, hencethe strategies that spread...
Two approaches : static and dynamic
Evolutionary game dynamics : dynamical system modeling such a process
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Evolutionary Game Theory : conceptual framework
large population of agents
meet randomly, and play a symmetric game
strategies with good results spread (imitation, selection,...)
changes the average behaviour, hence the “good strategies”, hencethe strategies that spread...
Two approaches : static and dynamic
Evolutionary game dynamics : dynamical system modeling such a process
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Usual topics
Compare outcome of evolutionary game dynamics and standard conceptsin game theory :
are dominated strategies eliminated ?
do dynamics lead to Nash equilibria ?
if so, to which equilibrium ?
Here, different topic : relate the two most studied dynamics
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Topic of the talk
Major dynamics : replicator (REP) and best-reply dynamics (BRD)
different interpretations
different degrees of rationality
different mathematical formulations
but in many examples, same long-run behaviour for BRD andtime-average of REP (Gaunersdorfer and Hofbauer, 95)
Aim : find a formal link between BRD and time-average of REP
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Outline
1 Framework, notation
2 Dynamics
3 Similarities between BRD and the time-average of REP
4 Main result : theoretical link
5 Intuition
6 Comments
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Framework and notation - I
single, large population
randomly drawn agents play a two-player symmetric game
n possible pure strategies : 1, 2, ..., n
xi (t) : frequency of strategy i at time t
x(t) = (x1(t), x2(t)..., xn(t)) : state variable
state space : {(x1, ..., xn) ∈ Rn+,
∑
i xi = 1}
Evolutionary game dynamics : x = f (x , payoffs)
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Framework and notation - II
Payoffs :
payoff matrix A = (aij)1≤i,j≤n
expected payoff of strategy i against x : ai (x) = (Ax)i
mean payoff : x · a(x), where a(x) = (a1(x), ..., an(x))
For any quantity q(t), let q(t) = 1t
∫ t
0q(s)ds
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Replicator Dynamics (REP)
(REP) xi = xi [ai (x) − x · a(x)] with x = x(t)
differential equation due to Taylor and Jonker (78),
growth rate : difference between own and average payoff
idea : payoff = additional fitness
prototype of biological dynamics
Time-Average of REP (TAREP)
X (t) = x(t) =1
t
∫ t
0
x(s) ds with x(s) following (REP)
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Replicator Dynamics (REP)
(REP) xi = xi [ai (x) − x · a(x)] with x = x(t)
differential equation due to Taylor and Jonker (78),
growth rate : difference between own and average payoff
idea : payoff = additional fitness
prototype of biological dynamics
Time-Average of REP (TAREP)
X (t) = x(t) =1
t
∫ t
0
x(s) ds with x(s) following (REP)
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Best Reply Dynamics (BRD)
(BRD) x ∈ BR(x) − x
where BR(x) is the set of mixed best-replies to x
differential inclusion ; Matsui (91), Gilboa and Matsui (92)
population evolving towards best-reply to current situation
idea : in every time interval, a fraction of the populationswitches to
a current best-response
prototype of rational (but myopic) dynamics
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Similarities between TAREP and BRD - I
1. Convergence results also true for REP :
Example : In so called potential games and games with an interiorevolutionary stable strategy, any interior solution converges to a NashEquilibrium (NE).
2. Convergence results not true for REP :
Example : In zero-sum games with an interior equilibrium, any interiorsolution converges to the set of NE
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Similarities between TAREP and BRD - II
3. Divergence “exactly in the same way” :
Example : Generalized Rock-Paper-Scissors game
0 ǫ −1−1 0 ǫǫ −1 0
If ǫ ≥ 1, then any interior solution converges to the unique NE
If ǫ < 1, then any interior solution converges to the “Shapley triangle”(Gaunersdorfer and Hofbauer, 1995).
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Main result
(BRD) and (REP) look very different, yet striking similarities between thebehaviour of (BRD) and of the time-average of (REP).
Why ?
Main result : formal link
Up to a change in time, any interior solution of the time-average of REP
is a perturbed solution of BRD, with the perturbation vanishing as
t → +∞.
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Main result
(BRD) and (REP) look very different, yet striking similarities between thebehaviour of (BRD) and of the time-average of (REP).
Why ?
Main result : formal link
Up to a change in time, any interior solution of the time-average of REP
is a perturbed solution of BRD, with the perturbation vanishing as
t → +∞.
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Formal statement
Define perturbed best-reply correspondence BRǫ by :
y ∈ BRǫ(x) if ∀i , [maxj
aj(x)] − ai (x) > ǫ ⇒ yi < ǫ
Thm : if X (·) is the time-average of an interior solution of REP, then
X (t) ∈1
t
(
BRǫ(t)(X (t)) − X (t)))
with ǫ(t) → 0 as t → +∞
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Corollary
Corollary : the limit set of any interior solution of the time-average ofREP “has the same properties” as a true limit set of BRD
That is : internally chain transitive under BRD, hence invariant.
Proof : apply results on perturbed differential inclusions due to Benaım,Hofbauer and Sorin (2005, 2006).
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Corollary
Corollary : the limit set of any interior solution of the time-average ofREP “has the same properties” as a true limit set of BRD
That is : internally chain transitive under BRD, hence invariant.
Proof : apply results on perturbed differential inclusions due to Benaım,Hofbauer and Sorin (2005, 2006).
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Consequences
(almost) all properties mentioned above
in any zero-sum game (even with no interior equilibrium), everyinterior solution of TAREP converges to the set of NE
a better understanding
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Intuition - I
We want to show that under REP, the time-average of the past evolvestowards an approximate best-response to itself.
First idea : past of tomorrow = (past of today) + today
Formally : X =1
t(x − X )
We want : X ∈1
t(BRǫ(t)(X ) − X )
We need : x ∈ BRǫ(t)(X )
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Intuition - II
We need x ∈ BRǫ(t)(X ), that is : eventually, strategies having a highshare are almost best replies to the average population of the past.
Idea : REP is a selection process.
Strategies having a high share now are those that had :
- a good average growth rate in the past
- hence ( ?) a good average payoff in the past
- hence ( ?) a good payoff against the average population of the past
Problems : justify both “hence” + good versus best
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Growth rate and payoff
good average growth rate ⇔ good average payoff ?
Yes, because differences in growth rates = differences in payoffs
Recall :
(REP) xi = xi [ai (x) − x · a(x)] with x = x(t)
Let gi =xi
xi
and ai = ai (x). We have :
gi − gj = ai − aj
hence g i − g j = ai − aj
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Past average payoff and payoff against average past
good average payoff in the past
⇔ good payoff against average population of the past ?
Yes because the payoffs are linear in the population profile :
ai (x) = (Ax)i hence ai = (Ax)i = (Ax)i = (AX )i
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Good versus best
We saw : surviving strategies are good responses to the averagepopulation profile of the past
Why almost best responses ?
Answer : over a long period of time, a small difference in selectionpressures makes a large difference in shares
→ strategies that are good but not best-responses to the past areeliminated
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Link with logit map
BR(x) multivalued, no C1-selection
Logit approximation : br ǫ(x) = argmaxy∈∆ (y · a(x) − ǫ∑
k yk ln yk)
Unique solution : br ǫ(x) = L(a(x)/ǫ) with Li (U) = exp(Ui )P
jexp(Uj )
L : logit map, appears in multiplicative weight algorithms
Prop : the solution of (REP) starting at the barycenter satisfiesx(t) = br ǫ(X (t)) with ǫ = 1/t.
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Proof. We have :xi
xi
−xj
xj
= ai − aj
Integrating and assuming xi (0) = xj(0), this gives :
ln(xi/xj) =
∫ t
0
(ai − aj) = t(ai − aj)
hencexi
xj
= exp(t[ai − aj ]) =exp(tai )
exp(taj)
hence letting ǫ = 1/t and Z be a normalization factor :
xi (t) =exp(tai )
Z=
exp(ai/ǫ)
Z= br ǫ
i (X (t))
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Summary
REP and BRD seem very different but actually, the time-average ofREP is related to BRD
The link can be made using another standard tool : the logit map.
Shows a link between REP and multiplicative weight algorithms(also noted by Ed Hopkins).
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Comments I - What does the result really mean ?
The theorem says the time-average of REP : will lead you to someinvariant set of BRD
Does not say : will lead you to the same outcome as BRD
E.g., (BRD) and (REP) may lead to different equilibria.
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Example (Golman and Page, 2010)
A B C
A
B
C
1 −N 0−N2 0 1
0 0 0
REP leads to everybody playing A, hence so does it’s time-average
But BRD leads to everybody playing B.
Precisely : there exists a sequence ǫN → 0 such that more than 1 − ǫN ofthe state space flows to A under REP and to B under BRD.
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Sketch of proof
A B C
A
B
C
1 −N 0−N2 0 1
0 0 0
REP : if x1 > 1/N, strategy 1 earns more than average hence x1 > 0,hence leads to A.
BRD : for most initial conditions, flows first to C (unique best-response)then to B (best-response to C), hence leads to B.
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Comments II - extension
The analysis extends to a more general framework : “games against theenvironment”.
That is : the focal player faces a stream of vector payoffs, does not knowif she plays against Nature, one player, several players...
Link then between time-average of REP and Fictitious Play
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Comments III - lack of extensions
The proof uses two kind of linearities :
- the growth rates are linearly related to the payoffs : a property of (REP)
- the payoffs are linear in the population profile : a property of one ortwo-population settings.
→ The link between BRD and the time-average of REP does not extendto variants of REP nor to three-population dynamics
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
Comments IV - no-regret vs Nash
The strategies surviving a selection process are optimal against theaverage past, but not necessarily against the present.
Accordingly, (REP) satisfies a no-regret property (Hofbauer) but notconvergence to Nash equilibrium.
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics
The End
Thank you very much
Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics