The Iterated Prisoner ‘ s Dilemma
description
Transcript of The Iterated Prisoner ‘ s Dilemma
The Iterated Prisoner‘s Dilemma
Darwin:
The small strength and speed of man, his want of natural weapons, etc., are more than counterbalanced ... by his social qualities, which led him
to give and receive aid from his fellow men.
Mutual aid
0bdefectsI
c-c-bcooperatesI
defects cooperates
II II
The one-shot PD
fails) hand' invisible' the(where Dilemma Social
SPRT
payoff sSucker' S ,Punishment P ,Temptation T Reward, R
PTDplays I
SRCplays I
D C
plays II plays II
Iplayer for payoff
II and I players
Game Dilemma sPrisoner'
Adam Smith (1723-1790)
• …by pursuing his own interest, man frequently promotes that of the society more effectually than when he really intends to promote it…
Adam Smith: Man intends only his own gain, and he is in this, as in many other cases, led by an invisible hand to promote an end which was no part of his intention.
Joseph Stiglitz: The reason that the invisible hand often seems invisible is that it is often not there.
Payoff for repeated games
1
)(...)0(lim
roundper payoff :1 case limiting
)()1( roundper payoff
...)2()1()0( :)( payoff total
)1(1/ rounds ofnumber average
roundfurther a for y probabilit
2
n
nAA
w
wAw
AwwAAwA
w
w
The Good, the Bad and the Discriminator
• ALLC
• ALLD
• TFT
• frequencies x,y,z (x+y+z=1)
Payoff matrix
population in payoff average
TFTALLD, ALLC,for payoff expected , ,
)1(
)1(0
round)per (i.e. )1(1/factor toup
P
PPP
cbwccb
wbb
cbccb
TFT
ALLD
ALLC
TFTALLDALLC
w
zyx
Replicator Dynamics
simplex unit on
)(
)(
)(
equation replicator
PPzz
PPyy
PPxx
z
y
x
Replicator Dynamics
)(
)1( zone middle
cbw
cwz
wb
c
IPD with errors
),1,1( )0,1,1(
),,( )0,0,0(
)1,1,1( )1,1,1(
defectedplayer -coafter coop toprob
cooperatedplayer -coafter coop toprob
round initial in coop toprob
),,( strategies stochastic
movea implement -mis y toprobabilit
TFT
ALLD
ALLC
q
p
f
qpf
IPD with errors
'')1(:' )1(:
': '':' : where
)1)(1(
)''()'( payoff
)',','(against ),,(
2
wqfwewqfwe
rruqprqpr
uww
ewrebwreec
qpfqpf
IPD with errors
IPD
commutenot do
rounds)many y (infinitel 1 and
error) (no 0 limits
w
Evolving Generosity
Reacting on co-player
errors! assume
TatTit For is (1,0)
ALLD is (0,0)
Dsplayer'-coafter Cplay toprob q
C splayer'-coafter Cplay prob.to
strategies),(
1 with,strategies),,(
p
qp
fqpf
The iterated Prisoner´s Dilemma
The iterated Prisoner´s Dilemma
The iterated Prisoner´s Dilemma
The iterated Prisoner´s Dilemma
The iterated Prisoner´s Dilemma
The iterated Prisoner´s Dilemma
Adaptive Dynamics
small
),( payoff ,minority mutant
all s,homogeneou pop.resident
.)escalate.. toprob. ratio,-(sex trait some be let
h
xyAhxy
x
Rx
Adaptive Dynamics
0),(),(:),( iff invade?it can
),( payoff ,minority mutant
all s,homogeneou pop.resident
.)escalate.. toprob. ratio,-(sex trait some be let
xxAxyAxhW
xyAhxy
x
Rx
Adaptive Dynamics
limited)-(mutation sequenceon substitutitrait
0),(),(:),( iff invade?it can
),( payoff ,minority mutant
all s,homogeneou pop.resident
.)escalate.. toprob. ratio,-(sex trait some be let
xxAxyAxhW
xyAhxy
x
Rx
Adaptive Dynamics
direction favorable towardspoints
),(),(lim),0(
limited)-(mutation sequenceon substitutitrait
0),(),(:),( iff invade?it can
),( payoff ,minority mutant
all s,homogeneou pop.resident
.)escalate.. toprob. ratio,-(sex trait some be let
h
xxAxhxAx
h
Wx
xxAxyAxhW
xyAhxy
x
Rx
Adaptive Dynamics for the IPD
)'for evaluated sderivative (partial
),'('
)'(),'('
)'(
),(),'( difference payoff sInvader'
plays else everyone wherepopulationin
)','(' usingplayer for payoff ),'(
C usedplayer -coafter Cplay toprob.
C usedplayer -coafter Cplay toprob.
),( strategiesconsider
nn
nnq
Aqqnn
p
App
nnAnnA
n
qpnnnA
q
p
qpn
Adaptive Dynamics for the IPD
),(by defined plane-halfin ' if invadecan '
'invader for usadvantageomost direction into points
),'('
,),'('
with
),( field vector the
),( strategiesconsider
qpnnn
n
nnq
Aqnn
p
Ap
qp
qpn
Adaptive Dynamics for the IPD
))(1())(1(
)()1(
))(1())(1(
)(
IPDFor
'invader for usadvantageomost direction into points
),'('
,),'('
with
),( field vector the
),( strategiesconsider
2
2
qpqp
cqpbpq
qpqp
cqpbqp
n
nnq
Aqnn
p
Ap
qp
qpn
Adaptive Dynamics for the IPD
Reacting on last round
Reacting on last round
strategies ticprobabilis-non 16
Bully is (0,0,0,1)
Fairbut Firm is (1,0,1,1)
TFT is (1,0,1,0)
ALLD is (0,0,0,0)
ALLC is )1,1,1,1(
outcomeafter Cplay toprob where
),,,(
strategies onememory
ip
pppp
i
PTSR
The fearsome four
• Heteroclinic network• A = Tit or Tat• B = Firm But Fair• C = Bully• D = ALLD
…and the winner is…
Win-Stay. Lose-Shift WSLS
WSLS
... C D C D C C ... C C C
... D C D C D C ... C C C
TFTagainst TFT If
C C D C C...C C C
C C D D C...C C C
LSagainst WS WSLSIf
correcting-error is WSLS
WSLS
)2 (i.e. 2
if WSLSinvadecannot ALLD
ALLD invadecannot WSLS2
gets
2 roundper payoff
D... D D D D D D ALLD
... C D C D C D C WSLS
ALLDagainst simpleton'' a is WSLS
bcRTP
TPALLD
SP
Win-Stay, Lose-Shift WSLS
• Simple learning rule
• stable, error-correcting
• but needs retaliator to prepare the ground
Memory-one strategies
payoff average ofn computatio allows
****
****
***
)1)(1()1()1(
,,, statesbetween matrix transition
),,,(against ),,,( If
TS
RRRRRRRR
PTSRPTSR
qp
qpqpqpqp
Q
PTSR
qqqqpppp
Memory-one strategies
A new breath:
Press and Dyson PNAS 2012
AMS homepage (‚Maths in the Media‘)
‚The world of game theory is currently on fire...‘
‚this is a monumental surprise...‘
‚the emerging revolution of game theory...‘
Dyson‘s formula
)1,,(
),,(n the
1
1
111
:),,( Define
4
3
2
1
qpD
gqpDP
xqpqp
xqpqp
xqpqp
xqpqp
xqpD
II
PPSP
STST
TSTS
RRRR
Zero-determinant (ZD) strategies
Zero-determinant (ZD) strategies
Examples of ZD-strategies
Examples of ZD-strategies
surplus) players'-co of fold-
always payoffmaximin over surplus''(own
)(n the
)( and 1: if :rsExtortione
) and between (any value
then ,0 if :Equalizers
0
PPP
PPPP
P
RP
P
PP
I
III
II
III
Characterizations
))(1()( and 0
:rsExtortione
)1)(()1)((
:Equalizers
:strategies-ZD
cbpbcpp
ppcbppcb
pppp
STP
PRTS
TSPR
All reactive strategies are ZD
Extortion does not pay
Pairwise comparison of extortion
Neutral against AllD
Stable coexistence with AllD
Weakly dominated by TFT
Dominated by WSLS
If all five: no Nash equilibrium involves extortion
Complier strategies