The Iterated Prisoner ‘ s Dilemma

The Iterated Prisoner‘s Dilemma

Darwin:

The small strength and speed of man, his want of natural weapons, etc., are more than counterbalanced ... by his social qualities, which led him

to give and receive aid from his fellow men.

Mutual aid

0bdefectsI

c-c-bcooperatesI

defects cooperates

II II

The one-shot PD

fails) hand' invisible' the(where Dilemma Social

SPRT

payoff sSucker' S ,Punishment P ,Temptation T Reward, R

PTDplays I

SRCplays I

D C

plays II plays II

Iplayer for payoff

II and I players

Game Dilemma sPrisoner'

Adam Smith (1723-1790)

• …by pursuing his own interest, man frequently promotes that of the society more effectually than when he really intends to promote it…

Adam Smith: Man intends only his own gain, and he is in this, as in many other cases, led by an invisible hand to promote an end which was no part of his intention.

Joseph Stiglitz: The reason that the invisible hand often seems invisible is that it is often not there.

Payoff for repeated games

1

)(...)0(lim

roundper payoff :1 case limiting

)()1( roundper payoff

...)2()1()0( :)( payoff total

)1(1/ rounds ofnumber average

roundfurther a for y probabilit

2

n

nAA

w

wAw

AwwAAwA

w

w

The Good, the Bad and the Discriminator

• ALLC

• ALLD

• TFT

• frequencies x,y,z (x+y+z=1)

Payoff matrix

population in payoff average

TFTALLD, ALLC,for payoff expected , ,

)1(

)1(0

round)per (i.e. )1(1/factor toup

P

PPP

cbwccb

wbb

cbccb

TFT

ALLD

ALLC

TFTALLDALLC

w

zyx

Replicator Dynamics

simplex unit on

)(

)(

)(

equation replicator

PPzz

PPyy

PPxx

z

y

x

Replicator Dynamics

)(

)1( zone middle

cbw

cwz

wb

c

IPD with errors

),1,1( )0,1,1(

),,( )0,0,0(

)1,1,1( )1,1,1(

defectedplayer -coafter coop toprob

cooperatedplayer -coafter coop toprob

round initial in coop toprob

),,( strategies stochastic

movea implement -mis y toprobabilit

TFT

ALLD

ALLC

q

p

f

qpf

IPD with errors

'')1(:' )1(:

': '':' : where

)1)(1(

)''()'( payoff

)',','(against ),,(

2

wqfwewqfwe

rruqprqpr

uww

ewrebwreec

qpfqpf

IPD with errors

IPD

commutenot do

rounds)many y (infinitel 1 and

error) (no 0 limits

w

Evolving Generosity

Reacting on co-player

errors! assume

TatTit For is (1,0)

ALLD is (0,0)

Dsplayer'-coafter Cplay toprob q

C splayer'-coafter Cplay prob.to

strategies),(

1 with,strategies),,(

p

qp

fqpf

The iterated Prisoner´s Dilemma

Adaptive Dynamics

small

),( payoff ,minority mutant

all s,homogeneou pop.resident

.)escalate.. toprob. ratio,-(sex trait some be let

h

xyAhxy

x

Rx

Adaptive Dynamics

0),(),(:),( iff invade?it can




xxAxyAxhW

xyAhxy

x

Rx

Adaptive Dynamics

limited)-(mutation sequenceon substitutitrait





xxAxyAxhW

xyAhxy

x

Rx

Adaptive Dynamics

direction favorable towardspoints

),(),(lim),0(

limited)-(mutation sequenceon substitutitrait





h

xxAxhxAx

h

Wx

xxAxyAxhW

xyAhxy

x

Rx

Adaptive Dynamics for the IPD

)'for evaluated sderivative (partial

),'('

)'(),'('

)'(

),(),'( difference payoff sInvader'

plays else everyone wherepopulationin

)','(' usingplayer for payoff ),'(

C usedplayer -coafter Cplay toprob.

C usedplayer -coafter Cplay toprob.

),( strategiesconsider

nn

nnq

Aqqnn

p

App

nnAnnA

n

qpnnnA

q

p

qpn


),(by defined plane-halfin ' if invadecan '

'invader for usadvantageomost direction into points

),'('

,),'('

with

),( field vector the


qpnnn

n

nnq

Aqnn

p

Ap

qp

qpn


))(1())(1(

)()1(

))(1())(1(

)(

IPDFor

'invader for usadvantageomost direction into points

),'('

,),'('

with

),( field vector the


2

2

qpqp

cqpbpq

qpqp

cqpbqp

n

nnq

Aqnn

p

Ap

qp

qpn

Reacting on last round

Reacting on last round

strategies ticprobabilis-non 16

Bully is (0,0,0,1)

Fairbut Firm is (1,0,1,1)

TFT is (1,0,1,0)

ALLD is (0,0,0,0)

ALLC is )1,1,1,1(

outcomeafter Cplay toprob where

),,,(

strategies onememory

ip

pppp

i

PTSR

The fearsome four

• Heteroclinic network• A = Tit or Tat• B = Firm But Fair• C = Bully• D = ALLD

…and the winner is…

Win-Stay. Lose-Shift WSLS

WSLS

... C D C D C C ... C C C

... D C D C D C ... C C C

TFTagainst TFT If

C C D C C...C C C

C C D D C...C C C

LSagainst WS WSLSIf

correcting-error is WSLS

WSLS

)2 (i.e. 2

if WSLSinvadecannot ALLD

ALLD invadecannot WSLS2

gets

2 roundper payoff

D... D D D D D D ALLD

... C D C D C D C WSLS

ALLDagainst simpleton'' a is WSLS

bcRTP

TPALLD

SP

Win-Stay, Lose-Shift WSLS

• Simple learning rule

• stable, error-correcting

• but needs retaliator to prepare the ground

Memory-one strategies

payoff average ofn computatio allows

****

****

***

)1)(1()1()1(

,,, statesbetween matrix transition

),,,(against ),,,( If

TS

RRRRRRRR

PTSRPTSR

qp

qpqpqpqp

Q

PTSR

qqqqpppp

Memory-one strategies

A new breath:

Press and Dyson PNAS 2012

AMS homepage (‚Maths in the Media‘)

‚The world of game theory is currently on fire...‘

‚this is a monumental surprise...‘

‚the emerging revolution of game theory...‘

Dyson‘s formula

)1,,(

),,(n the

1

1

111

:),,( Define

4

3

2

1

qpD

gqpDP

xqpqp

xqpqp

xqpqp

xqpqp

xqpD

II

PPSP

STST

TSTS

RRRR

Zero-determinant (ZD) strategies

Examples of ZD-strategies

Examples of ZD-strategies

surplus) players'-co of fold-

always payoffmaximin over surplus''(own

)(n the

)( and 1: if :rsExtortione

) and between (any value

then ,0 if :Equalizers

0

PPP

PPPP

P

RP

P

PP

I

III

II

III

Characterizations

))(1()( and 0

:rsExtortione

)1)(()1)((

:Equalizers

:strategies-ZD

cbpbcpp

ppcbppcb

pppp

STP

PRTS

TSPR

All reactive strategies are ZD

Extortion does not pay

Pairwise comparison of extortion

Neutral against AllD

Stable coexistence with AllD

Weakly dominated by TFT

Dominated by WSLS

If all five: no Nash equilibrium involves extortion

Complier strategies

The Iterated Prisoner ‘ s Dilemma

Documents

Transcript of The Iterated Prisoner ‘ s Dilemma