Dynamics of Cooperation in Spatial Prisoner’s Dilemma of Memory- Based Players Chenna Reddy Cotla...
-
Upload
kenny-macleod -
Category
Documents
-
view
216 -
download
2
Transcript of Dynamics of Cooperation in Spatial Prisoner’s Dilemma of Memory- Based Players Chenna Reddy Cotla...
Dynamics of Cooperation in Spatial Prisoner’s Dilemma of Memory-Based Players
Chenna Reddy CotlaDepartment of Computational Social ScienceGeorge Mason UniversityFairfax, VA, USA
Research QuestionWhat are the emergent dynamics
of cooperation in a population of memory based agents whose interaction is spatially constrained to local neighborhoods on a square lattice?
Prisoner’s Dilemma Prisoner’s Dilemma is an extensively studied Social Dilemma
that characterizes the problem of cooperation. Not enough evidence to convict two suspects of armed
robbery, enough for theft of getaway car Both confess (3 years each), both stay quiet (1 years each),
one tells (0 years) the other doesn’t (5 years) Stay quiet= cooperate (C) ; confess = defect (D) Payoff Matrix:
T > R > P > S
R is REWARD for mutual cooperation = 3S SUCKER’s payoff = 0T TEMPTATION to defect = 5P PUNISHMENT for mutual defection = 1
Cooperation in a finite population:Evolutionary Spatial Prisoner’s Dilemma Evolutionary dynamics along with non-random localized
strategic interactions on a spatial structure can support cooperation.
Players are located on a lattice and play bilateral PD with their neighbors depending upon the notion of neighborhood.
Sum payoff is gathered in each generation. Each player imitates the strategy of best scoring
neighbor in the next generation.
Moore Neighbors Von Neumann Neighbors
Nowak, M. and May, R. (1992). Evolutionary games and spatial chaos. Nature, 359(6398):826–829.
Evolutionary SPD
Fraction of cooperators in evolutionary SPD independent of initial configurationfor Von Neumann neighborhood
Nowak, M. and May, R. (1992). Evolutionary games and spatial chaos. Nature, 359(6398):826–829.
Evolutionary SPD: Significance
Regular lattices represent a limiting case of interaction topologies in the real world.
Emphasizes the importance of localized interaction in the emergence of cooperation.
Why we may need alternative models?Very simplistic assumptions about players:
no memory or reasoning.Players are pure imitators.May be more suitable for biological
context than social context.Recent experiments with human subjects
have shown that humans do not unconditionally imitate the best scoring neighbor as assumed in evolutionary SPD (Traulsen et al., 2009, Grujic et al., 2010 ).
Traulsen, A., Semmann, D., Sommerfeld, R., Krambeck, H., and Milinski, M. (2010). Human strategy updating in evolutionary games. Proceedings of the National Academy of Sciences, 107(7):2962.Grujic, J., Fosco, C., Araujo, L.,Cuesta, A.,and Sanchez, A. (2010). Social experiments in the meso scale: Humans playing a spatial prisoner’s dilemma, PLOS ONE, 5(11):e13749.
SPD with ACT-R agentsSPD with agents that make use of
memory model embodied in ACT-R cognitive architecture.
Why ACT-R?◦ ACT-R memory model was able to
reproduce important empirical dynamics in two person Prisoner’s Dilemma (Lebiere et al., 2000).
Lebiere, C., Wallach, D., and West, R. L. (2000). A memory-based account of the prisoner’s dilemma and other 2× 2 games. In Proceedings of the 3rd International Conference on Cognitive Modeling, pages 185–193.,
Higher Level Decision Making in ACT-R: Knowledge Representation In ACT-R Procedural and Declarative Knowledge interact to
produce higher level cognition. Procedural Memory:
◦ Implemented as a set of productions.◦ Basically IF THEN conditions that specify when a particular
production rule apply and what it does.◦ Example : A production to add two numbers IF the goal is to add n1 and n2 and n1 + n2 = n3 THEN set as subgoal to write n3 as a result.
Declarative Memory ◦ Memory of facts represented as a collection of declarative structures
called chunks. Example: An Addition fact
Fact 3+4 isa additionfact addend1 three addend2 four sum seven
Higher Level Decision Making in ACT-R: Knowledge Deployment In ACT-R Procedural and Declarative Knowledge interact to produce
higher level cognition. The production rules are selected based on their utility and the
declarative chunks are retrieved based on sub symbolic quantities called activation levels.
Activation of a chunk:
In ACT-R the activation levels of declarative chunks reflect frequency and recency of usage and the contextual similarity. Also, noise is added to account for stochastic component in decision making.
Base level activation Bi reflects the general usefulness of a given chunk based on how recently and frequently it was used in the past for achieving a goal. It is calculated using the following equation:
◦ d is forgetting rate and tj is the time since jth access.
Representational Details of Game Playing Mechanism Basic idea is to store game outcomes as chunks and use a single
production rule that captures decision making. If we consider a neighborhood of size n there are 2n+1 outcomes.
◦ Exponential computational time and space requirements.◦ We may run out of memory as n increases and we may need to wait for
years to compute according to current computational standards. To simplify matters a totalistic representation is used based on
the total payoff. Chunk C-kC is used to represent outcome when the given player
cooperated and k neighbors have also cooperated. D-kC represent the same scenario with the given player defecting.
Only 2n+2 chunks are needed. Activation is calculated using following computationally efficient
approximation (Petrov, 2006):
Petrov, A. (2006). Computationally efficient approximation of the base-level learning equation in act-r. In Proceedings of the Seventh International Conference on Cognitive Modeling, pages, 391–392.
Chunks and Production Rule Declarative Memory when a neighborhood of size n is
considered: (2n+2) Chunks
(C-nC isa outcome p-action C N-config nC payoff nR)(C-(n-1)C isa outcome p-action C N-config (n-1)C payoff (n-1)R + S)…(C-kC isa outcome p-action C N-config kC payoff kR + (n − k)S)…(C-1C isa outcome p-action C N-config 1C payoff R + (n − 1)S) (C-0C isa outcome p-action C N-config 0C payoff nS)
(D-nC isa outcome p-action D N-config nC payoff nT ) (D-(n − 1)C isa outcome p-action D N-config (n − 1)C payoff (n − 1)T + P
…(D-kC isa outcome p-action D N-config kC payoff kT + (n − k)P )…(D-1C isa outcome p-action D N-config 1C payoff T + (n − 1)P ) (D-0C isa outcome p-action D N-config 0C payoff nP)
Chunks and Production Rule Production Rule:
IF the goal is to play Spatial Prisoner’s Dilemma
and the most likely outcome of C is OutcomeC
and the most likely outcome of D is OutcomeD
THEN make the move associated with the largest payoff of OutcomeC and OutcomeD
Observe the opponent move and push new goal to make the next play.
Illustrative Scenario: Von Neumann Neighborhood Declarative Memory when V-N neighborhood is considered: 10 Chunks
(C-4C isa outcome p-action C N-config 4C payoff 4R)(C-3C isa outcome p-action C N-config 3C payoff 3R + S)(C-2C isa outcome p-action C N-config 2C payoff 2R + 2S)
(C-1C isa outcome p-action C N-config 1C payoff R + 3S)(C-0C isa outcome p-action C N-config 0C payoff 4S)(D-4C isa outcome p-action D N-config 4C payoff 4T)(D-3C isa outcome p-action D N-config 3C payoff 3T + P)(D-2C isa outcome p-action D N-config 2C payoff 2T + 2P)(D-1C isa outcome p-action D N-config 1C payoff T + 3P)(D-0C isa outcome p-action D N-config 0C payoff 4P)
Production Rule: IF the goal is to play Spatial Prisoner’s Dilemma and the most likely outcome of C is OutcomeC and the most likely outcome of D is OutcomeD THEN make the move associated with the largest payoff of OutcomeC and OutcomeD
Observe the opponent move and push new goal to make the next play.
Illustrative Scenario: Von Neumann Neighborhood
Chunk Activation
C-4C 0.10
C-3C 0.31
C-2C 0.03
C-1C 0.11
C-0C 0.05
Chunk Payoff
C-4C 12
C-3C 9
C-2C 6
C-1C 3
C-0C 0
D-4C 20
D-3C 16
D-2C 12
D-1C 8
D-0C 4
Chunks associated with C
Chunk Activation
D-4C 0.14
D-3C 0.04
D-2C 0.23
D-1C 0.29
D-0C 0.05
Chunks associated with D
Chunks and payoffs
C
C-3C with payoff 9 D-1C with payoff 8
Global Cooperation in SPD with Memory Based Agents (Synchronous Updating)
Lattice size: 100 × 100 Pay off matrix considered: { T = 5, R = 3, P = 1, S = 0} Cooperation levels fluctuate around 0.3214
Effect of Neighborhood Size on Global Cooperation Levels In SPD with memory based agents spatial structure may not
always support higher cooperation levels than well-mixed scenario.
Asymptotic cooperation levels in well-mixed case: 0.2447
Different Activation Regimes and Size of the LatticeAsynchronous updating (Axtell,
2001)◦Asymptotic cooperation levels for
Uniform Activation with V-N neighbors: 0.3238
◦ Asymptotic cooperation levels for Random Activation with V-N neighbors: 0.3220
Size of the lattice◦Global cooperation level is almost
invariant with size of the lattice. Axtell, R. (2001). Effects of interaction topology and activation regime in several multi-agent systems. Multi-Agent-Based Simulation, pages 33–48.
Conclusion and Further Research Partial cooperation levels are sustained in a SPD with
memory based agents. Spatial structure may not always support higher
cooperation levels than well-mixed case as in the evolutionary framework.
Decision making mechanism that is grounded in a cognitive architecture may present a convincing middle-way between the strict rationality assumptions of behavior and the overly simplistic characterization of behavior in Evolutionary game theory.
Future Directions:◦ Validation of model output using experimental results.◦ Effect of interaction topology on emergent cooperation
levels. Properties of Interaction topologies in real world lie between that of random networks and regular lattices.