Time-Average Replicator and Best-Reply DynamicsJosef Hofbauer (Universit¨at Wien) and Sylvain Sorin...

Time-Average Replicator and Best-ReplyDynamics

Yannick Viossat (Universite Paris-Dauphine)

joint work with

Josef Hofbauer (Universitat Wien)

and

Sylvain Sorin (Universite Paris 6 and Ecole polytechnique)

Seminar on Discrete Mathematics and Game Theory, LSE, May 2010

Hofbauer, Sorin, Viossat Replicator and Best-Reply Dynamics

Philosophy

A quote from a famous German philosopher :

Proofs should not be made in public

B. von Stengel


Getting started...

Introduction to evolutionary game theory

Talk’s topic


Traditional vs evolutionary game theory

Standard game theory :

few agents

know the game (common knowledge)

high rationality, use elaborate thinking

Evolutionary game theory :

populations of agents,

need not fully understand the game

low rationality, use rules of thumb (or selection process)


Evolutionary Game Theory : conceptual framework

large population of agents

meet randomly, and play a symmetric game

strategies with good results spread (imitation, selection,...)

changes the average behaviour, hence the “good strategies”, hencethe strategies that spread...

Two approaches : static and dynamic

Evolutionary game dynamics : dynamical system modeling such a process


Usual topics

Compare outcome of evolutionary game dynamics and standard conceptsin game theory :

are dominated strategies eliminated ?

do dynamics lead to Nash equilibria ?

if so, to which equilibrium ?

Here, different topic : relate the two most studied dynamics


Topic of the talk

Major dynamics : replicator (REP) and best-reply dynamics (BRD)

different interpretations

different degrees of rationality

different mathematical formulations

but in many examples, same long-run behaviour for BRD andtime-average of REP (Gaunersdorfer and Hofbauer, 95)

Aim : find a formal link between BRD and time-average of REP


Outline

1 Framework, notation

2 Dynamics

3 Similarities between BRD and the time-average of REP

4 Main result : theoretical link

5 Intuition

6 Comments


Framework and notation - I

single, large population

randomly drawn agents play a two-player symmetric game

n possible pure strategies : 1, 2, ..., n

xi (t) : frequency of strategy i at time t

x(t) = (x1(t), x2(t)..., xn(t)) : state variable

state space : {(x1, ..., xn) ∈ Rn+,

∑

i xi = 1}

Evolutionary game dynamics : x = f (x , payoffs)


Framework and notation - II

Payoffs :

payoff matrix A = (aij)1≤i,j≤n

expected payoff of strategy i against x : ai (x) = (Ax)i

mean payoff : x · a(x), where a(x) = (a1(x), ..., an(x))

For any quantity q(t), let q(t) = 1t

∫ t

0q(s)ds


Replicator Dynamics (REP)

(REP) xi = xi [ai (x) − x · a(x)] with x = x(t)

differential equation due to Taylor and Jonker (78),

growth rate : difference between own and average payoff

idea : payoff = additional fitness

prototype of biological dynamics

Time-Average of REP (TAREP)

X (t) = x(t) =1

t

∫ t

0

x(s) ds with x(s) following (REP)


Best Reply Dynamics (BRD)

(BRD) x ∈ BR(x) − x

where BR(x) is the set of mixed best-replies to x

differential inclusion ; Matsui (91), Gilboa and Matsui (92)

population evolving towards best-reply to current situation

idea : in every time interval, a fraction of the populationswitches to

a current best-response

prototype of rational (but myopic) dynamics


Similarities between TAREP and BRD - I

1. Convergence results also true for REP :

Example : In so called potential games and games with an interiorevolutionary stable strategy, any interior solution converges to a NashEquilibrium (NE).

2. Convergence results not true for REP :

Example : In zero-sum games with an interior equilibrium, any interiorsolution converges to the set of NE


Similarities between TAREP and BRD - II

3. Divergence “exactly in the same way” :

Example : Generalized Rock-Paper-Scissors game

0 ǫ −1−1 0 ǫǫ −1 0

If ǫ ≥ 1, then any interior solution converges to the unique NE

If ǫ < 1, then any interior solution converges to the “Shapley triangle”(Gaunersdorfer and Hofbauer, 1995).


Main result

(BRD) and (REP) look very different, yet striking similarities between thebehaviour of (BRD) and of the time-average of (REP).

Why ?

Main result : formal link

Up to a change in time, any interior solution of the time-average of REP

is a perturbed solution of BRD, with the perturbation vanishing as

t → +∞.


Formal statement

Define perturbed best-reply correspondence BRǫ by :

y ∈ BRǫ(x) if ∀i , [maxj

aj(x)] − ai (x) > ǫ ⇒ yi < ǫ

Thm : if X (·) is the time-average of an interior solution of REP, then

X (t) ∈1

t

(

BRǫ(t)(X (t)) − X (t)))

with ǫ(t) → 0 as t → +∞


Corollary

Corollary : the limit set of any interior solution of the time-average ofREP “has the same properties” as a true limit set of BRD

That is : internally chain transitive under BRD, hence invariant.

Proof : apply results on perturbed differential inclusions due to Benaım,Hofbauer and Sorin (2005, 2006).


Consequences

(almost) all properties mentioned above

in any zero-sum game (even with no interior equilibrium), everyinterior solution of TAREP converges to the set of NE

a better understanding


Intuition - I

We want to show that under REP, the time-average of the past evolvestowards an approximate best-response to itself.

First idea : past of tomorrow = (past of today) + today

Formally : X =1

t(x − X )

We want : X ∈1

t(BRǫ(t)(X ) − X )

We need : x ∈ BRǫ(t)(X )


Intuition - II

We need x ∈ BRǫ(t)(X ), that is : eventually, strategies having a highshare are almost best replies to the average population of the past.

Idea : REP is a selection process.

Strategies having a high share now are those that had :

- a good average growth rate in the past

- hence ( ?) a good average payoff in the past

- hence ( ?) a good payoff against the average population of the past

Problems : justify both “hence” + good versus best


Growth rate and payoff

good average growth rate ⇔ good average payoff ?

Yes, because differences in growth rates = differences in payoffs

Recall :

(REP) xi = xi [ai (x) − x · a(x)] with x = x(t)

Let gi =xi

xi

and ai = ai (x). We have :

gi − gj = ai − aj

hence g i − g j = ai − aj


Past average payoff and payoff against average past

good average payoff in the past

⇔ good payoff against average population of the past ?

Yes because the payoffs are linear in the population profile :

ai (x) = (Ax)i hence ai = (Ax)i = (Ax)i = (AX )i


Good versus best

We saw : surviving strategies are good responses to the averagepopulation profile of the past

Why almost best responses ?

Answer : over a long period of time, a small difference in selectionpressures makes a large difference in shares

→ strategies that are good but not best-responses to the past areeliminated


Link with logit map

BR(x) multivalued, no C1-selection

Logit approximation : br ǫ(x) = argmaxy∈∆ (y · a(x) − ǫ∑

k yk ln yk)

Unique solution : br ǫ(x) = L(a(x)/ǫ) with Li (U) = exp(Ui )P

jexp(Uj )

L : logit map, appears in multiplicative weight algorithms

Prop : the solution of (REP) starting at the barycenter satisfiesx(t) = br ǫ(X (t)) with ǫ = 1/t.


Proof. We have :xi

xi

−xj

xj

= ai − aj

Integrating and assuming xi (0) = xj(0), this gives :

ln(xi/xj) =

∫ t

0

(ai − aj) = t(ai − aj)

hencexi

xj

= exp(t[ai − aj ]) =exp(tai )

exp(taj)

hence letting ǫ = 1/t and Z be a normalization factor :

xi (t) =exp(tai )

Z=

exp(ai/ǫ)

Z= br ǫ

i (X (t))


Summary

REP and BRD seem very different but actually, the time-average ofREP is related to BRD

The link can be made using another standard tool : the logit map.

Shows a link between REP and multiplicative weight algorithms(also noted by Ed Hopkins).


Comments I - What does the result really mean ?

The theorem says the time-average of REP : will lead you to someinvariant set of BRD

Does not say : will lead you to the same outcome as BRD

E.g., (BRD) and (REP) may lead to different equilibria.


Example (Golman and Page, 2010)

A B C

A

B

C

1 −N 0−N2 0 1

0 0 0

REP leads to everybody playing A, hence so does it’s time-average

But BRD leads to everybody playing B.

Precisely : there exists a sequence ǫN → 0 such that more than 1 − ǫN ofthe state space flows to A under REP and to B under BRD.


Sketch of proof

A B C

A

B

C

1 −N 0−N2 0 1

0 0 0

REP : if x1 > 1/N, strategy 1 earns more than average hence x1 > 0,hence leads to A.

BRD : for most initial conditions, flows first to C (unique best-response)then to B (best-response to C), hence leads to B.


Comments II - extension

The analysis extends to a more general framework : “games against theenvironment”.

That is : the focal player faces a stream of vector payoffs, does not knowif she plays against Nature, one player, several players...

Link then between time-average of REP and Fictitious Play


Comments III - lack of extensions

The proof uses two kind of linearities :

- the growth rates are linearly related to the payoffs : a property of (REP)

- the payoffs are linear in the population profile : a property of one ortwo-population settings.

→ The link between BRD and the time-average of REP does not extendto variants of REP nor to three-population dynamics


Comments IV - no-regret vs Nash

The strategies surviving a selection process are optimal against theaverage past, but not necessarily against the present.

Accordingly, (REP) satisfies a no-regret property (Hofbauer) but notconvergence to Nash equilibrium.


The End

Thank you very much


Time-Average Replicator and Best-Reply DynamicsJosef Hofbauer (Universit¨at Wien) and Sylvain Sorin...

Documents

Transcript of Time-Average Replicator and Best-Reply DynamicsJosef Hofbauer (Universit¨at Wien) and Sylvain Sorin...