Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.
-
Upload
noreen-chandler -
Category
Documents
-
view
214 -
download
0
description
Transcript of Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.
![Page 1: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/1.jpg)
Iterated Prisoner’s Dilemma Game in Evolutionary Computation
2003. 10. 2
Seung-Ryong Yang
![Page 2: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/2.jpg)
2
Agenda
Motivation
Iterated Prisoner’s Dilemma Game
Related Works
Strategic Coalition
Improving Generalization Ability
Experimental Results
Conclusion
![Page 3: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/3.jpg)
3
Motivation
Evolutionary approachUnderstanding complex behaviors by investigating simulation results using evolutionary processGiving a way to find optimal strategies in a dynamic environment
IPD gameModel complex phenomena such as social and economic behaviorsProvide a testbed to model dynamic environment
ObjectivesObtaining multiple good strategiesForming coalition to improve generalization ability
![Page 4: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/4.jpg)
4
Iterated Prisoner’s Dilemma Game (1/2)
OverviewPrisoner’s possible choice
Defection
Cooperation
CharacteristicsNon-cooperative
Non-zerosum
Types of Game2IPD (2-player Iterated Prisoner’s Dilemma) game
NIPD (N-player Iterated Prisoner’s Dilemma) game
Cooperate Defect
Cooperate R / R T / S
Defect S / T P / P
Payoff Matrix of 2IPD Game by Axelrod, R.(1984)
STRSPRT 2,
Cooperate Defect
Cooperate 3 / 3 0 / 5
Defect 5 / 0 1 / 1
![Page 5: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/5.jpg)
5
Iterated Prisoner’s Dilemma Game (2/2)Representation of Strategy
History Table Recent Action ∙∙∙ Last Action Recent Action ∙∙∙ Last Action
Own History Opponent’s History
0 1 0 ∙∙∙ 1
l = 2 : Example History 11 01
2N History
![Page 6: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/6.jpg)
6
Related Works
Previous StudyPaul J. Darwen and Xin Yao (1997) : Speciation as Automatic Categorical Modularization
Onn M. Shehory, et al. (1998) : Multi-agent Coordination through Coalition Formation
Y. G. Seo and S. B. Cho (1999) : Exploiting Coalition in Co-Evolutionary Learning
IssuesTopics are broad about coalition formation in multi-agent environment
Darwen and Yao have studied coalition in IPD game, but different
Focused on cooperation, the number of player, payoff variances, etc
![Page 7: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/7.jpg)
7
What is Different?
Co-evolutionary LearningSelection Method
Rank BasedRoulette wheelTournament
Coalition FormationCoalition keeps surviving to next generationCondition to form coalition is flexible
Decision Making in CoalitionAdapting several decision making methods to coalition
Borda Function, Condorect FunctionAverage Payoff, Highest Payoff Weighted Voting
![Page 8: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/8.jpg)
8
Evolving StrategyTo evolve strategy, we use ;
Genetic algorithmCo-evolutionary learningStrategic coalition
Evolutionary Process
![Page 9: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/9.jpg)
9
Evolution of Agents (1/2)
Ci
C1
Ck
Before Population Current Population Next Population
Ci
C1
CkCj
Ci
C1
Ck
Cj
Cl
Evolution of AgentsAgents can develop their strategy using co-evolutionary learningWeak agents are removed from the population
Evolution of CoalitionFormed coalition survives to next generation Agents can join coalition generation by generation
Coalition survives or grows up
![Page 10: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/10.jpg)
10
Evolution of Agents (2/2)Problem : Possibility of evolving by weak agents
Caused by removing better agent from the population who belongs to coalition
Making new agents by mixing better agents within coalition
PopulationCk
Ci
Cj
A1
A2
Random Extraction
CoalitionMutation
Ai
Repeat as the number of agents belong to coalition
![Page 11: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/11.jpg)
11
Strategic Coalition (1/2)
What is Coalition?A cooperative game as a set A of agents in which each subset of A is called coalition - Matthias Klusch and Andreas Gerber, 2002
A group of agents that work jointly in order to accomplish their tasks - Onn M. Shehory, 1995
Coalition in the IPD game
Forming coalition through round-robin game
Pursuing more payoff using generalization ability
Coalition forms autonomously without supervision
![Page 12: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/12.jpg)
12
DefinitionsDefinition 1 : Coalition Value
Definition 2 : Payoff Function
Definition 3 : Coalition Identification
CS
C
p
pw
wpS
Cp
C
i i
ii
C
iiiC
1
1
where
Strategic Coalition (2/2)
STRSPRT 2,
(1)
10)(1
1)(0
1
1
1
1
C
i iDi
C
i iCi
C
i iDi
C
i iCi
C
wC
wCDefect
wC
wCCooperate
D
if
if
)1(1
CRankCw
CS
wp
Rankii
Cii
(2)
(3)
Definition 4 : Decision Making
Definition 5 : Payoff Distribution
![Page 13: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/13.jpg)
13
Coalition Formation (1/2)
A1
A2
A3
A4
Ak
An
Am
A5
Aj
...Ai
A2
Ai
A5
A3
C1
Aj
...
C2
Ci
A1
A4
C1
Ak
Al
C2
Am
An
Ci
... ...
Initial Population PopulationIncluding coalition
2IPD game
FormCoalition
Ai A5 A5 C1 C2 Ci
...
![Page 14: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/14.jpg)
14
Coalition Formation (2/2)Algorithm
2IPD Game
Exceeds iterationper generation?
Game type?
Agent vs.Agent
Agent vs.Coalition
Coalition vs.Coalition
Satisfy conditionfor forming coalition?
FormingCoalition
JoiningCoalition
Genetic Operation
Satisfycondition?
N
N
N
Y
Y
StopY
2,
2.1 STpSTp ji
2.2 ,
STCji pp
2,.3 STpp ji
Forming coalition1. Round-robin 2IPD game2. Obtain rank3. Determine confidence of
agent according to the rank
Joining coalition1. Round-robin 2IPD game2. Obtain rank3. If number of agents > max. number of
agents within a coalition, remove the weakest agent
4. Determine confidence of each agent
![Page 15: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/15.jpg)
15
Coalition Decision Making
Decision makingTo decide coalition’s opinionUse weighted voting method
Sharing profitsDistribution payoff with each agent’s confidenceRank influences each weight
Determining next action of coalition
• : Weight for cooperation of coalition Ci
• : Weight for defection of coalition Ci
DiC
CiCCi
Cj
Ck
Cl
∑
∑
Ci
Cj
Ck
Cl
Previous Action Next Action
C
D
or
CiC
DiC
![Page 16: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/16.jpg)
16
Weight of AgentsAdjusting weight
Give incentive to agents in coalitionIt reflects decision making of coalition
DiC
CiCCi
Cj
Ck
Cl
∑
∑
Ci
Cj
Ck
Cl
Previous Action Next Action
C
D
or
Adjusting weight
![Page 17: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/17.jpg)
17
Improving Generalization Ability (1/2)
Problem of one good strategyNot adaptive to dynamic environment
Obtain multiple good strategies for specific environment
Ex) Biological immune system
MethodFitness sharing
Adjust confidences of multiple strategies by evolution
Co-evolution
Coalition formation
![Page 18: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/18.jpg)
18
Improving Generalization Ability (2/2)
How good a player performs against unknown player
Evaluation
Random Generationof 100 Strategies
2IPD Game
Extract Top Strategies
in the Population
1 0001110...2 0000100...
3 0100100...4 0001100...5 0010010...
10 0000010...Top Strategies
Genetically Evolved Strategies
IPDGame
![Page 19: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/19.jpg)
19
Test StrategyTest Strategies
Strategy CharacteristicsTit-For-Tat Initially cooperate, and then follow opponentTrigger Initially cooperate. Once opponent defects, continuously defectAllD Always defectCDCD Cooperate and defect over and overCCD Cooperate and cooperate and defectRandom Random move
Example Strategy
0 0 1 0 1 1 0 0
0 0 0 1 1 1 1 1
1 1 1 1 1 1 1 1
0 1 0 1 0 1 0 1
0 0 1 0 0 1 0 0
1 1 0 1 0 0 1 1
Tit-for-Tat
Trigger
AllD
CDCD
CCD
Random
![Page 20: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/20.jpg)
20
Example of Game
Tit-for-Tat
1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1
Vs.Evolved Strategy
0 0 0 0
1 0 0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history
1 1 1 0
1 1 1 1
1 1 1 1
0 0 1 0
1 0 1 1
1 1 1 1
1 1 1 1
0 1 0 03
5
1
1
1
3
0
1
1
1
Payoff Payoff
1
2 3 4 5
1
2 3 4 5
![Page 21: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/21.jpg)
21
Test Environment
Population size : 100
Crossover rate : 0.3
Mutation rate : 0.001
Number of generations : 200
Number of iterations : a third of population
Training set : Well-known 6 strategies
Experimental Result
![Page 22: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/22.jpg)
22
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10
Superior 10 Strategies
Payo
ff
Coalition Payoff
Coalition S.D
Random Payoff
Random S.D
Evolved Strategy vs. Random
Rank Genotype ofEvolved strategy
Evolved strategy Random
Avg. Payoff S.D. Avg. Payoff S.D
1 2 3 4 5 6 7 8 9 10
10111001111010111101001110011110101111011011101111111111111100111011111111111101001110111110111110110011000011111111111100111011111111111011001110111110111110111011111111111111111110111001111111111101
3.0800002.8000002.9200002.8800002.9400002.6800003.0400003.1600003.4800002.760000
1.9983991.9899751.9983991.9963971.9890701.6904441.9996001.9935901.9415461.985548
0.4800000.5500000.5200000.5700000.5400002.3500000.4900000.5000000.3800000.560000
0.4996000.4974940.4996000.6671580.5553381.9968730.4999000.6708200.4853860.496387
Random strategy is one of the weakest strategies for 2IPD game. In this game, the evolved strategies have a good performance. All strategies win the gameagainst Random test strategies with high payoffs.
Experimental Result
![Page 23: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/23.jpg)
23
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10
Superior 10 Strategies
Payo
ff
Coalition Payoff
Coalition S.D
TFT Payoff
TFT S.D
Evolved Strategy vs. Tit-for-Tat
Rank Genotype ofEvolved strategy
Evolved strategy Tit-for-Tat
Avg. Payoff S.D. Avg. Payoff S.D
1 2 3 4 5 6 7 8 9 10
11000100001011011100011011000010100111001000100000101101110000000100001010011100100010000010110111000101010000101101110011001000001010011100110011000010110111100111010000101101110001010100011011011100
3.0200003.0000001.0400001.0800002.9800003.0000001.0400003.0000003.0200003.000000
1.6369480.0000000.3979950.5600000.3458321.6248080.3979950.0000001.6369480.000000
2.6400003.0000000.9900001.0200002.9700002.6700000.9900003.0000002.6400003.000000
2.0616500.0000000.0994990.4237920.4112182.0447740.0994990.0000002.0616500.000000
Tit-for-Tat is a mimic strategy that gives “cooperation” on the first move in 2IPD game. The evolved strategies counteract in a proper way not to lose the game. It proves the generalization ability of the evolved strategies well.
Experimental Result
![Page 24: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/24.jpg)
24
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10
Superior 10 Strategies
Payo
ff
Coalition Payoff
Coalition S.D
Trigger Payoff
Trigger S.D
Evolved Strategy vs. Trigger
Rank Genotype ofEvolved strategy
Evolved strategy Trigger
Avg. Payoff S.D. Avg. Payoff S.D
1 2 3 4 5 6 7 8 9 10
10111011110011101000101110111100111010010011101111001111100010111011110011111001101110111100111110011011101111001111100110111111110010111000001110111100111110011011101111001111100100111011110011111001
1.0400001.0400001.0600001.0400001.0800001.0400001.0400001.0400001.0600001.040000
0.3979950.3979950.4431700.3979950.4833220.3979950.3979950.3979950.4431700.397995
0.9900000.9900001.0100000.9900001.0300000.9900000.9900000.9900001.0100000.990000
0.0994990.0994990.2233830.0994990.2984960.0994990.0994990.0994990.2233830.099499
Trigger strategy is never forgiving strategy for opponent’s defection. The way to win a game against Trigger is also choosing “defection” iteratively.
Experimental Result
![Page 25: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/25.jpg)
25
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10
Superior 10 Strategies
Payo
ff
Coalition Payoff
Coalition S.D
AllD Payoff
AllD S.D
Evolved Strategy vs. AllD
Rank Genotype ofEvolved strategy
Evolved strategy ALLD
Avg. Payoff S.D. Avg. Payoff S.D
1 2 3 4 5 6 7 8 9 10
00111111111110101111001111111111101011110011111111111010111100111011111110101111101111111111101011110011111111111010111110111011111110101111001111111111101011110011111111111010101100111111111110101111
1.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.000000
0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
1.0000001.0000001.0000001.0000001.0000001.0000001.0400001.0400001.0000001.000000
0.0000000.0000000.0000000.0000000.0000000.0000000.3979950.3979950.0000000.000000
The only way not to lose the game against AllD is only choosing “defection” on all moves. There is no way to cooperate for the game.
Experimental Result
![Page 26: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/26.jpg)
26
Number of Coalition
0
5
10
15
20
25
30
0 20 40 60 80 100 Generation
Coa
litio
n
Coalition survives next generation. In early evolutionary process, most of coalitionare formed. It makes genetic diversity high and better choice against opponents.Coalition can grow if the conditions of agents are satisfied.
Experimental Result
![Page 27: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/27.jpg)
27
Comparing the Results
The evolved strategies get more payoff against Random, CCD and CDCD than Tit-for-Tat, Trigger and AllD. It describes the evolved strategies exploit opponent’s actions well.
Experimental Result
![Page 28: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/28.jpg)
28
Bias of the Strategy
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0 50 100 150 200
RandomTFTTriggerAllDCDCDCCD
Bia
s
Generation
Bias shows how next choice of the strategies is selected against its opponents.The higher rate of bias means that a strategy chooses more “cooperation” than“defection” with a bias rate and vice versa.
Experimental Result
![Page 29: Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.](https://reader035.fdocuments.us/reader035/viewer/2022081605/5a4d1af17f8b9ab05997e8e3/html5/thumbnails/29.jpg)
29
Conclusions
ConclusionStrategic coalition might be a robust method that can adapt to a dynamic environmentDecision making methods influence the results, but not seriousThe evolved strategies by coalition generalize well against various opponents
DiscussionCan the strategic coalition be adapted to n-IPD game ?Which parameters in IPD game influence generalization ability ?How can make opponent strategies to test ?How can adapt this problem to real world ?