Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms...
Transcript of Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms...
![Page 1: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/1.jpg)
Multiagent Reinforcement Learning
Marc Lanctot
RLSS @ Lille, July 11th 2019
![Page 2: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/2.jpg)
Multi-Agent and AI
Joint work with many great collaborators!
![Page 3: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/3.jpg)
General Artificial Intelligence
1. What is Multiagent Reinforcement Learning (MARL)?
2. Foundations & Background
3. Basic Formalisms & Algorithms
4. Advanced Topics
Talk plan
![Page 4: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/4.jpg)
Part 1: What is MARL?
![Page 5: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/5.jpg)
General Artificial Intelligence
Multiagent Reinforcement Learning
pommerman.com Laser Tag
![Page 6: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/6.jpg)
General Artificial Intelligence
Multiagent Reinforcement Learning
![Page 7: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/7.jpg)
General Artificial Intelligence
Traditional (Single-Agent) RL
Source: Wikipedia
![Page 8: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/8.jpg)
General Artificial Intelligence
Source: Nowe, Vrancx & De Hauwere 2012
Multiagent Reinforcement Learning
![Page 9: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/9.jpg)
Presentation Title — SPEAKER
Motivations: Research in Multiagent RL
ApproximateSolution Methods
Approximate Solution Methods
Tabular Solution Methods
TabularSolutionMethods
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
![Page 10: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/10.jpg)
Presentation Title — SPEAKER
Motivations: Research in Multiagent RL
ApproximateSolution Methods
Approximate Solution Methods
Tabular Solution Methods
TabularSolutionMethods
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
Sutton & Barto ‘98, ‘18
![Page 11: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/11.jpg)
Presentation Title — SPEAKER
Motivations: Research in Multiagent RL
ApproximateSolution Methods
Approximate Solution Methods
Tabular Solution Methods
TabularSolutionMethods
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
First era of multiagent RL
![Page 12: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/12.jpg)
Presentation Title — SPEAKER
Motivations: Research in Multiagent RL
ApproximateSolution Methods
Approximate Solution Methods
Tabular Solution Methods
TabularSolutionMethods
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
Multiagent Deep RL era (‘16 - now)
![Page 13: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/13.jpg)
Presentation Title — SPEAKER
Motivations: Research in Multiagent RL
ApproximateSolution Methods
Approximate Solution Methods
Tabular Solution Methods
TabularSolutionMethods
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
Talk focus
![Page 14: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/14.jpg)
Presentation Title — SPEAKER
Motivations: Research in Multiagent RL
ApproximateSolution Methods
Approximate Solution Methods
Tabular Solution Methods
TabularSolutionMethods
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
My 10-year mission
![Page 15: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/15.jpg)
Presentation Title — SPEAKER
Important Historical Note
![Page 16: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/16.jpg)
Presentation Title — SPEAKER
Artificial Intelligence, Volume 171, Issue 7
![Page 17: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/17.jpg)
Presentation Title — SPEAKER
Artificial Intelligence, Volume 171, Issue 7
![Page 18: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/18.jpg)
Presentation Title — SPEAKER
Artificial Intelligence, Volume 171, Issue 7
![Page 19: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/19.jpg)
Presentation Title — SPEAKER
Artificial Intelligence, Volume 171, Issue 7
![Page 20: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/20.jpg)
General Artificial Intelligence
Centralized:
● One brain / algorithm deployed across many agents
Decentralized:
● All agents learn individually
● Communication limitations defined by environment
Some Specific Axes of MARL
![Page 21: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/21.jpg)
General Artificial Intelligence
Prescriptive:
● Suggests how agents should behave
Descriptive:
● Forecast how agent will behave
Some Specific Axes of MARL
![Page 22: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/22.jpg)
General Artificial Intelligence
Cooperative: Agents cooperate to achieve a goal
Competitive: Agents compete against each other
Neither: Agents maximize their utility which may
require cooperating and/or competing
Some Specific Axes of MARL
![Page 23: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/23.jpg)
General Artificial Intelligence
1. Centralized training for decentralized execution
(very common)
2. Mostly prescriptive
3. Mostly competitive; sprinkle of cooperative and neither
Our Focus Today
![Page 24: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/24.jpg)
Part 2: Foundations & Background
![Page 25: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/25.jpg)
General Artificial Intelligence
Shoham & Leyton-Brown ‘09
masfoundations.org
![Page 26: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/26.jpg)
Presentation Title — SPEAKER
Foundations of (MA)RL
Reinforcement Learning
Multiagent Reinforcement
Learning
Approximate Dynamic Programming
Game Theory
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
![Page 27: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/27.jpg)
Presentation Title — SPEAKER
Foundations of Multiagent RL
Reinforcement Learning
Multiagent Reinforcement
Learning
Approximate Dynamic Programming
Game Theory
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
![Page 28: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/28.jpg)
Presentation Title — SPEAKER
Biscuits vs CookiesA Note on Terminology
PlayerGame
StrategyBest Response
UtilityState
AgentEnvironmentPolicyGreedy Policy Reward(Information) State
![Page 29: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/29.jpg)
General Artificial Intelligence
● Set of players
Normal-form “One-Shot” Games
![Page 30: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/30.jpg)
General Artificial Intelligence
● Set of players
● Each player has set of actions
Normal-form “One-Shot” Games
![Page 31: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/31.jpg)
General Artificial Intelligence
● Set of players
● Each player has set of actions
● Set of joint actions
Normal-form “One-Shot” Games
![Page 32: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/32.jpg)
General Artificial Intelligence
● Set of players
● Each player has set of actions
● Set of joint actions
● A utility function
Normal-form “One-Shot” Games
![Page 33: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/33.jpg)
General Artificial Intelligence
Example: (Bi-)Matrix Games (n = 2)
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
![Page 34: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/34.jpg)
General Artificial Intelligence
Example: (Bi-)Matrix Games (n = 2)
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A Bactions
![Page 35: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/35.jpg)
General Artificial Intelligence
Example: (Bi-)Matrix Games (n = 2)
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
utility to row player
![Page 36: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/36.jpg)
General Artificial Intelligence
Example: (Bi-)Matrix Games (n = 2)
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
utility to row player
utility to column player
![Page 37: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/37.jpg)
General Artificial Intelligence
Example: (Bi-)Matrix Games (n = 2)
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
utility to row player
utility to column player
for joint action (a,B)
![Page 38: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/38.jpg)
General Artificial Intelligence
● Set of players
● Each player has set of actions
● Set of joint actions
● A utility function
Each player:
Normal-form “One-Shot” Games
![Page 39: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/39.jpg)
General Artificial Intelligence
● Set of players
● Each player has set of actions
● Set of joint actions
● A utility function
Each player:
Problem! This is a joint policy
Normal-form “One-Shot” Games
![Page 40: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/40.jpg)
General Artificial Intelligence
Suppose we are player and we fix policies of other players
Best Response
![Page 41: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/41.jpg)
General Artificial Intelligence
Suppose we are player and we fix policies of other players ( )
Best Response
![Page 42: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/42.jpg)
General Artificial Intelligence
Suppose we are player and we fix policies of other players ( )
Best Response
![Page 43: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/43.jpg)
General Artificial Intelligence
Suppose we are player and we fix policies of other players ( )
Best Response
![Page 44: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/44.jpg)
General Artificial Intelligence
Suppose we are player and we fix policies of other players ( )
is a best response to
Best Response
![Page 45: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/45.jpg)
General Artificial Intelligence
Solving a Matrix Game
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
![Page 46: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/46.jpg)
General Artificial Intelligence
Solving a Matrix Game
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
Let’s start here
![Page 47: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/47.jpg)
General Artificial Intelligence
Solving a Matrix Game
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
Both players have incentive to deviate(assuming the opponent stays fixed)
![Page 48: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/48.jpg)
General Artificial Intelligence
Solving a Matrix Game
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
![Page 49: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/49.jpg)
General Artificial Intelligence
Solving a Matrix Game
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
![Page 50: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/50.jpg)
General Artificial Intelligence
Solving a Matrix Game
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
(a,A) is a fixed point of this process
![Page 51: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/51.jpg)
General Artificial Intelligence
Solving a Matrix Game
0 , 0 1 , -1
-1 , 1 0, 0row player
column player
a
b
A B
(a,A) is a fixed point of this process
![Page 52: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/52.jpg)
General Artificial Intelligence
Let’s Try Another….
1 , -1 -1 , 1
-1 , 1 1, -1row player
column player
a
b
A B
![Page 53: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/53.jpg)
General Artificial Intelligence
Let’s Try Another….
1 , -1 -1 , 1
-1 , 1 1, -1row player
column player
a
b
A B
![Page 54: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/54.jpg)
General Artificial Intelligence
A Nash equilibrium is a joint policy such that no player has incentive to
deviate unilaterally.
Nash equilibrium
![Page 55: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/55.jpg)
General Artificial Intelligence
A Nash equilibrium is a joint policy such that no player has incentive to
deviate unilaterally.
Nash equilibrium: A Solution Concept
![Page 56: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/56.jpg)
General Artificial Intelligence
● Nash equilibrium always exists in finite games
● Computing a Nash eq. is PPAD-Complete
○ One solution is to focus on tractable subproblems
○ Another is to compute approximations
● Assumes players are (unbounded) rational
● Assumes knowledge:
○ Utility / value functions
○ Rationality assumption is common knowledge
Some Facts
![Page 57: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/57.jpg)
General Artificial Intelligence
Matching Pennies:
Two-Player Zero-Sum Games
1 , -1 -1 , 1
-1 , 1 1, -1row player
column player
a
b
A B
![Page 58: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/58.jpg)
General Artificial Intelligence
Matching Pennies:
Two-Player Zero-Sum Games
1 , -1 -1 , 1
-1 , 1 1, -1row player
column player
a
b
A B
![Page 59: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/59.jpg)
General Artificial Intelligence
Matching Pennies:
Two-Player Zero-Sum Games
1 , -1 -1 , 1
-1 , 1 1, -1row player
column player
a
b
A B
![Page 60: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/60.jpg)
General Artificial Intelligence
Matching Pennies:
Two-Player Zero-Sum Games
1 , -1 -1 , 1
-1 , 1 1, -1row player
column player
a
b
A B
![Page 61: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/61.jpg)
General Artificial Intelligence
Matching Pennies:
Two-Player Zero-Sum Games
1 , -1 -1 , 1
-1 , 1 1, -1row player
column player
a
b
A B
![Page 62: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/62.jpg)
General Artificial Intelligence
For any (possibly stochastic) joint policy ,
There exists a deterministic best response:
Best Response Condition
![Page 63: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/63.jpg)
General Artificial Intelligence
For any (possibly stochastic) joint policy ,
There exists a deterministic best response:
Proof: Assume otherwise. The values of each deterministic policy (action) must
be the same, by def. of BR. Then we can put full weight on any of them.
Best Response Condition
![Page 64: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/64.jpg)
General Artificial Intelligence
Matching Pennies:
Two-Player Zero-Sum Games
1 , -1 -1 , 1
-1 , 1 1, -1row player
column player
a
b
A B
![Page 65: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/65.jpg)
General Artificial Intelligence
● Solvable in polynomial time (!)
○ Easy to apply off-the-shelf solvers
● Will find one solution
● Matching Pennies:
This is a Linear Program!
![Page 66: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/66.jpg)
Max-min: P1 looks for a such that
Min-max: P1 looks for a such that
In two-player, zero-sum these are the same!
John von Neumann 1928 ---> The Minimax Theorem
Minimax
![Page 67: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/67.jpg)
The optima
● These exist! (They sometimes might be stochastic.)
● Calles a minimax-optimal joint policy. Also, a Nash equilibrium.
● They are interchangeable:
● Each policy is a best response to the other.
Consequences of Minimax
also minimax-optimal
![Page 68: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/68.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● Fictitious Play:
π10
π20
R1,R2
● Start with an arbitrary policy per
player (π10,π2
0),
![Page 69: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/69.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● Fictitious Play:
π10 BR1
1 BR12 BR1
3
π20
BR21
BR22
BR23
R1,R2
● Start with an arbitrary policy per
player (π10,π2
0),
○ Then, play best response
against a uniform distribution
over the past policy of the
opponent (BR1n,BR2
n).
⅓
⅓
⅓
π22
![Page 70: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/70.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● Fictitious Play:
π10 BR1
1 BR12 BR1
3
π20
BR21
BR22
BR23
R1,R2
● Start with an arbitrary policy per
player (π10,π2
0),
○ Then, play best response
against a uniform distribution
over the past policy of the
opponent (BR1n,BR2
n).
⅓
⅓
⅓
π12
![Page 71: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/71.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● Fictitious Play:
R
R 0
● Start with (R, P, S)= (1, 0, 0), (1, 0, 0)
![Page 72: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/72.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● Fictitious Play:
R P
R 0 1
P -1 0
● Start with (R, P, S)= (1, 0, 0), (1, 0, 0)
● Iteration 1:
○ BR11,BR2
1 = P, P
○ (½, ½, 0), (½, ½, 0)
![Page 73: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/73.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● Fictitious Play:
R P P
R 0 1 1
P -1 0 0
P -1 0 0
● Start with (R, P, S)= (1, 0, 0), (1, 0, 0)
● Iteration 1:
○ BR11,BR2
1 = P, P
○ (½, ½, 0), (½, ½, 0)
● Iteration 2:
○ BR12,BR2
2 = P, P
○ (⅓, ⅔, 0), (⅓, ⅔, 0)
![Page 74: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/74.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● Fictitious Play:
R P P S
R 0 1 1 -1
P -1 0 0 1
P -1 0 0 1
S 1 -1 -1 0
● Start with (R, P, S)= (1, 0, 0), (1, 0, 0)
● Iteration 1:
○ BR11,BR2
1 = P, P
○ (½, ½, 0), (½, ½, 0)
● Iteration 2:
○ BR12,BR2
2 = P, P
○ (⅓, ⅔, 0), (⅓, ⅔, 0)
● Iteration 3:
○ BR13,BR2
3 = S, S
○ (¼,½,¼), (¼,½,¼)
![Page 75: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/75.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● Fictitious Play:
R P P S S
R 0 1 1 -1 -1
P -1 0 0 1 1
P -1 0 0 1 1
S 1 -1 -1 0 0
S 1 -1 -1 0 0
● Start with (R, P, S)= (1, 0, 0), (1, 0, 0)
● Iteration 1:
○ BR11,BR2
1 = P, P
○ (½, ½, 0), (½, ½, 0)
● Iteration 2:
○ BR12,BR2
2 = P, P
○ (⅓, ⅔, 0), (⅓, ⅔, 0)
● Iteration 3:
○ BR13,BR2
3 = S, S
○ (¼,½,¼), (¼,½,¼)
![Page 76: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/76.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● double oracle [HB McMahan 2003]:
π10 BR1
1 BR12 BR1
3
π20
BR21
BR22
BR23
R1,R2
● Start with an arbitrary policy per
player (π10,π2
0),
○ Compute (pn,qn) by solving
the game at iteration n
○ Then, best response against
(pn,qn) and get a new best
response (BR1n,BR1
n).
p20
p21
p22
π22
q20 q2
1 q2
2
π12
![Page 77: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/77.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● double oracle:
R
R 0
● Start with (R, P, S)= (1, 0, 0), (1, 0, 0)
![Page 78: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/78.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● double oracle:
R P
R 0 1
P -1 0
● Start with (R, P, S)= (1, 0, 0), (1, 0, 0)
● Iteration 1:
○ BR11,BR2
1 = P, P
○ Solve the game : (0, 1, 0), (0, 1,
0)
![Page 79: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/79.jpg)
Multi-Agent Learning Tutorial
Normal Form Games: Algorithms● double oracle:
R P S
R 0 1 -1
P -1 0 1
S 1 -1 0
● Start with (R, P, S)= (1, 0, 0), (1, 0, 0)
● Iteration 1:
○ BR11,BR2
1 = P, P
○ Solve the game : (0, 1, 0), (0, 1,
0)
● Iteration 2:
○ BR12,BR2
2 = S, S
○ (⅓, ⅓, ⅓), (⅓, ⅓, ⅓)
![Page 80: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/80.jpg)
General Artificial Intelligence
Cooperative Games
1, 1 0, 0 0, 0
0, 0 2, 2 0, 0
0, 0 0, 0 5, 5
row player
column player
a
b
A B C
c
![Page 81: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/81.jpg)
General Artificial Intelligence
These are all Nash equilibria!
Cooperative Games
1, 1 0, 0 0, 0
0, 0 2, 2 0, 0
0, 0 0, 0 5, 5
row player
column player
a
b
A B C
c
![Page 82: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/82.jpg)
General Artificial Intelligence
No constraints on utilities!
General-Sum Games
3, 2 0, 0
0, 0 2, 3row player
column player
a
b
A B
![Page 83: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/83.jpg)
General Artificial Intelligence
What about sequential games…?
The Sequential Setting: Extensive-Form Games
![Page 84: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/84.jpg)
Presentation Title — SPEAKER
Perfect Information Games
![Page 85: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/85.jpg)
Presentation Title — SPEAKER
(Finite) Perfect Information Games: Model
● Start with an episodic MDP
![Page 86: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/86.jpg)
Presentation Title — SPEAKER
(Finite) Perfect Information Games: Model
● Start with an episodic MDP● Add a player identity function:
Simultaneous move node (many players play simultaneously)
![Page 87: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/87.jpg)
Presentation Title — SPEAKER
(Finite) Perfect Information Games: Model
● Start with an episodic MDP● Add a player identity function:
● Define rewards per player:
![Page 88: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/88.jpg)
Presentation Title — SPEAKER
(Finite) Perfect Information Games: Model
● Start with an episodic MDP● Add a player identity function:
● Define rewards per player:
● (Similarly for returns: is the return to player i from )
![Page 89: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/89.jpg)
Part 3: Basic Formalisms & Algorithms
![Page 90: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/90.jpg)
Presentation Title — SPEAKER
Foundations of RL
Reinforcement Learning
Multiagent Reinforcement
Learning
Approximate Dynamic Programming
Game Theory
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
![Page 91: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/91.jpg)
General Artificial Intelligence
Solving a turn-taking perfect
information game
Backward InductionP1
P2P2
P1P1
3, -3 -1, 1 6, -6-2, 2 -4, 4
4, -4 0, 0
![Page 92: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/92.jpg)
General Artificial Intelligence
Solving a turn-taking perfect
information game
Backward InductionP1
P2P2
P1P1
3, -3 -1, 1 6, -6-2, 2 -4, 4
4, -4 0, 06, -6 -2, 2
![Page 93: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/93.jpg)
General Artificial Intelligence
Solving a turn-taking perfect
information game
Backward InductionP1
P2P2
P1P1
3, -3 -1, 1 6, -6-2, 2 -4, 4
4, -4 0, 06, -6 -2, 2
4, -4 -2, 2
![Page 94: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/94.jpg)
General Artificial Intelligence
Solving a turn-taking perfect
information game
Backward InductionP1
P2P2
P1P1
3, -3 -1, 1 6, -6-2, 2 -4, 4
4, -4 0, 06, -6 -2, 2
4, -4 -2, 2
4, -4
![Page 95: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/95.jpg)
Intro to RL: Tabular Approximate Dyn. Prog.
![Page 96: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/96.jpg)
Turn-Taking 2P Zero-sum Perfect Info. Games
● Player to play at s: ● Reward to player i:● Subset of legal actions● Often assume episodic and
Values of a state to player i:Identities:
![Page 97: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/97.jpg)
2P Zero-Sum Perfect Info. Value Iteration
![Page 98: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/98.jpg)
A.K.A. Alpha-Beta, Backward Induction, Retrograde Analysis, etc…
Start from search state ,
Compute a depth-limited approximation:
---> Minimax Search
Minimax
![Page 99: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/99.jpg)
● Analogous to adaptation of value iteration
● Foundation of AlphaGo, AlphaGo Zero, AlphaZero
○ Better policy improvement via MCTS
○ Deep network func. approximation■ Policy prior cuts down breadth
■ Value network cuts the depth
Two-Player Zero-Sum Policy Iteration
Evaluation
Move probabilities
![Page 100: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/100.jpg)
2P Zero-Sum Games with Simultaneous Moves
Image from Bozansky et al. 2016
![Page 101: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/101.jpg)
“Markov Soccer”
Littman ‘94 He et al. ‘16
Also: Lagoudakis & Parr ‘02, Uther & Veloso ‘03, Collins ‘07
Markov Games
![Page 102: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/102.jpg)
Value Iteration for Zero-Sum Markov Games
computed above
![Page 103: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/103.jpg)
1. Start with arbitrary joint value functions
First MARL Algorithm: Minimax-Q (Littman ‘94)
my action opponent action
![Page 104: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/104.jpg)
1. Start with arbitrary joint value functions
First MARL Algorithm: Minimax-Q (Littman ‘94)
my action opponent action
Induces a matrix of values
![Page 105: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/105.jpg)
1. Start with arbitrary joint value functions
2. Define policy as in value iteration (by solving an LP)
First MARL Algorithm: Minimax-Q (Littman ‘94)
![Page 106: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/106.jpg)
1. Start with arbitrary joint value functions
2. Define policy as in value iteration (by solving an LP)
3. Generate trajectories of tuple using
behavior policy
First MARL Algorithm: Minimax-Q (Littman ‘94)
![Page 107: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/107.jpg)
1. Start with arbitrary joint value functions
2. Define policy as in value iteration (by solving an LP)
3. Generate trajectories of tuple using
behavior policy
4. Update
First MARL Algorithm: Minimax-Q (Littman ‘94)
![Page 108: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/108.jpg)
Follow-ups to Minimax Q:
● Friend-or-Foe Q-Learning (Littman ‘01)
● Correlated Q-learning (Greenwald & Hall ‘03)
● Nash Q-learning (Hu & Wellman ‘03)
● Coco-Q (Sodomka et al. ‘13)
Function approximation:
● LSPI for Markov Games (Lagoudakis & Parr ‘02)
First Era of MARL
![Page 109: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/109.jpg)
Singh, Kearns & Mansour ‘03, Infinitesimal Gradient Ascent (IGA)
First Era of MARL
![Page 110: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/110.jpg)
Formalize optimization as a
dynamical system:
policy gradients
Analyze using well-established
techniques
First Era of MARL
Image from Singh, Kearns, & Mansour ‘03
![Page 111: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/111.jpg)
→ Evolutionary Game Theory: replicator dynamics
First Era of MARL
time derivative
![Page 112: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/112.jpg)
→ Evolutionary Game Theory: replicator dynamics
First Era of MARL
time derivative utility of action a against the joint policy / population of other players
![Page 113: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/113.jpg)
→ Evolutionary Game Theory: replicator dynamics
First Era of MARL
time derivative utility of action a against the joint policy / population of other players
Expected / average utility of the joint policy / population
![Page 115: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/115.jpg)
WoLF: Win or Learn Fast. (Bowling & Veloso ‘01).
IGA is rational but not convergent!
● Rational: opponents converge to a fixed joint policy
→ learning agent converges to a best response of joint policy
● Convergent: learner necessarily converges to a fixed policy
Use specific variable learning rate to ensure convergence (in 2x2 games)
First Era of MARL
![Page 116: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/116.jpg)
Follow-ups to policy gradient and replicator dynamics:
● WoLF-IGA, WoLF-PHC
● WoLF-GIGA (Bowling ‘05)
● Weighted Policy Learner (Abdallah & Lesser ‘08)
● Infinitesimal Q-learning (Wunder et al. ‘10)
● Frequency-Adjusted Q-Learning (Kaisers et al. ‘10, Bloembergen et al. ‘11)
● Policy Gradient Ascent with Policy Prediction (Zhang & Lesser ‘10)
● Evolutionary Dynamics of Multiagent Learning (Bloembergen et al. ‘15)
First Era of MARL
![Page 117: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/117.jpg)
Why call it “the first era”?
So…...
![Page 118: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/118.jpg)
Why call it “the first era”?
Scalability was a major problem.
So…...
![Page 119: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/119.jpg)
Multi-Agent and AI
Second Era: Deep Learning meets Multiagent RL
Source: spectrum.ieee.org
Source: wikipedia.org
![Page 120: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/120.jpg)
Deep Q-Networks (DQN) Mnih et al. 2015
{ pixels, reward } { action }
Atari Emulator
Agent
“Human-level control through deep reinforcement learning”
● Represent the action value (Q) function using a
convolutional neural network.
● Train using end-to-end Q-learning.
● Can we do this in a stable way?
![Page 121: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/121.jpg)
Independent Q-Learning ApproachesIndependent Q-learning [Tan, 1993] Independent Deep Q-Networks [Tampuu et al., 2015]
![Page 122: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/122.jpg)
Foerster et al. ‘16
Learning to Communicate
![Page 123: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/123.jpg)
Sukhbaatar et al. ‘16
Learning to Communicate
![Page 124: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/124.jpg)
Foerster et al. ‘18
BIC-Net (Peng et al.’17)
Cooperative Multiagent Tasks
![Page 125: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/125.jpg)
Leibo et al. ‘17 Lerer & Peyskavich ‘18
Sequential Social Dilemmas
![Page 126: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/126.jpg)
● Idea: reduce nonstationarity & credit assignment issues using a central critic
● Examples: MADDPG [Lowe et al., 2017] & COMA [Foerster et al., 2017]
● Apply to both cooperative and competitive games
Centralized Critic Decentralized Actor Approaches
Critic
Actor 2Actor 1
s ro1 o2a1 a2
a2a1
Environment
Q(s|a)Q(s|a)
Decentralized actors trained via policy gradient:
Actor
Critic
Centralized critic trained to minimize loss:
![Page 127: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/127.jpg)
![Page 128: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/128.jpg)
![Page 129: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/129.jpg)
AlphaGo vs. Lee Sedol
Lee Sedol (9p): winner of 18 world titles
Match was played in Seoul, March 2016
AlphaGo won the match 4-1
![Page 130: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/130.jpg)
AlphaGo Zero
Mastering Go without Human Knowledge
![Page 131: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/131.jpg)
AlphaZero: One Algorithm, Three Games
GoShogiChess
![Page 134: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/134.jpg)
Emergent Coordination Through Competition
Liu et al. ‘ 19 and http://git.io/dm_soccer
![Page 135: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/135.jpg)
Capture-the-Flag (Jaderberg et al. ‘19)
https://deepmind.com/blog/capture-the-flag-science/
![Page 136: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/136.jpg)
https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/
AlphaStar (Vinyals et al. ‘19)
![Page 137: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/137.jpg)
https://openai.com/blog/openai-five-finals/
Dota 2: OpenAI Five
![Page 139: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/139.jpg)
Part 4: Partial Observability
![Page 140: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/140.jpg)
Presentation Title — SPEAKER
Foundations of Multiagent RL
Reinforcement Learning
Multiagent Reinforcement
Learning
Approximate Dynamic Programming
Game Theory
Single Agent Multiple (e.g. 2) Agents
Sm
all
Pro
blem
sLa
rge
Pro
blem
s
![Page 141: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/141.jpg)
Multi-Agent and AI
“small2” map
Independent Deep Q-networks (Mnih et al. ‘15)
![Page 142: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/142.jpg)
Multi-Agent and AI
“small3” map
Independent Deep Q-networks (Mnih et al. ‘15)
![Page 143: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/143.jpg)
● Idea: Fictitious self-play (FSP) + reinforcement learning ● Update rule in sequential setting equivalent to standard fictitious play (matrix game)● Approximate NE via two neural networks:
Fictitious Self-Play [Heinrich et al. ‘15, Heinrich & Silver 2016]
Reservoir Buffer
Circular Buffer
AVG Net
BR Net
Policy Mixing
Parameter
1. Best response net (BR): ○ Estimate a best response○ Trained via RL
2. Average policy net (AVG): ○ Estimate the time-average policy○ Trained via supervised learning
![Page 144: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/144.jpg)
Neural Fictitious Self-Play [Heinrich & Silver 2016]
“Closeness” to Nash
● Leduc Hold’em poker experiments:
● 1st scalable end-to-end approach to learn approximate Nash equilibria w/o prior domain knowledge○ Competitive with superhuman computer poker programs when it was released
![Page 145: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/145.jpg)
General Artificial Intelligence
ActiveAgentRandomDQN# 2
MetaSolver
DQN# 1
MetaSolver
Policy-Space Response Oracles (Lanctot et al. ‘17)
PSRO Meta Agent
Policy Set
Meta Strategy
Random DQN #1 DQN #2
Random 0.5 0.45 0.4
DQN #1 0.6 0.55 0.45
DQN #2 0.7 0.6 0.56
Random
DQN #1
DQN #2
DQN #KRandom DQN #1
Random 0.5 0.45
DQN #1 0.6 0.55
Random
Random 0.5
DQN #1RandomOpponent Policy
![Page 146: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/146.jpg)
Multi-Agent and AI
In RL:
● Each player uses optimizes independently
● After many steps, joint policy (𝜋1 , 𝜋2 ) co-learned for players 1 & 2
Computing JPC: start 5 separate instances of the same experiment, with
○ Same hyper-parameter values
○ Differ only by seed (!)
● Reload all 25 combinations and play 𝜋1i with 𝜋2
j for instances i, j
Quantifying “Joint Policy Correlation”
![Page 147: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/147.jpg)
Multi-Agent and AI
Joint Policy Correlation in Independent RL
InRL in small2 (first) map InRL in small4 map
![Page 148: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/148.jpg)
Multi-Agent and AI
JPC Results - Laser Tag
Game Diag Off Diag Exp. Loss
LT small2 30.44 20.03 34.2 %
LT small3 23.06 9.06 62.5 %
LT small4 20.15 5.71 71.7 %
Gathering field
147.34 146.89 none
Pathfind merge
108.73 106.32 none
![Page 149: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/149.jpg)
General Artificial Intelligence
Exploitability Descent (Lockhart et al. ‘19)
● A FP-like algorithm conv. without averaging!
● Amenable to function approximation
![Page 150: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/150.jpg)
A simple MDPA
B C D E
a b
Pr(B | A, a) = 0.75 Pr(C | A, a) = 0.25 Pr(D | A, b) = 0.4 Pr(E | A, b) = 0.6
3 -1 0 2 4 2 -3 2
c d e f g h i j
![Page 151: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/151.jpg)
A simple MDP Multiagent SystemA
B C D E
a b
3 -1 0 2 4 2 -3 2
c d e f g h i j
![Page 152: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/152.jpg)
(A, a, F, 1, B, c) is a terminal history.
Terminal history A.K.A. EpisodeA
B C D E
a b
3 -1 0 2 4 2 -3 2
c d e f g h i j
![Page 153: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/153.jpg)
(A, a, F, 1, B, c) is a terminal history. (A, b, G, 3, D, g) is a another terminal history.
Terminal history A.K.A. EpisodeA
B C D E
a b
3 -1 0 2 4 2 -3 2
c d e f g h i j
![Page 154: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/154.jpg)
(A, a, F, 2, C) is a history. It is a prefix of (A, a, F, 2, C, e) and (A, a, F, 2, C, f).
Prefix (non-terminal) HistoriesA
B C D E
a b
3 -1 0 2 4 2 -3 2
c d e f g h i j
![Page 155: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/155.jpg)
Another simple MDP:
Perfect Recall of Actions and Observations
Ae
p
0
-0.05
100
![Page 156: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/156.jpg)
Another simple MDP: A different MDP:
Perfect Recall of Actions and Observations
Ae
p
0
-0.05
100
A
B
C
. . .
e
e
e
0
0
0
100
100
-0.05
-0.05
![Page 157: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/157.jpg)
An information state is a set of histories consistent with an agent’s observations.
3-card Poker deck:
Jack, Queen, King
Partially Observable Environment
![Page 158: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/158.jpg)
Presentation Title — SPEAKER
TerminologyWhat is a “state”?
● An information state s corresponds to sequence of observations○ with respect to the player to act at s
Example information state in Poker:
Environment is in one of many world/ground states
Ante: 1 chip per player, , P1 bets 1 chip, P2 calls,
![Page 159: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/159.jpg)
Presentation Title — SPEAKER
Recall: Turn-Taking Perfect Information Games
→ Exactly one history per information state!
![Page 160: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/160.jpg)
General Artificial Intelligence
What………………. is a counterfactual value?
{Q,V}-values and Counterfactual Values
s a b c a b c a b c
The portion of the expected return (under s) from the start state, given that:
player i plays to reach information state s (then plays a).
![Page 161: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/161.jpg)
General Artificial Intelligence
What………………. is a q-value?
Q-values in Partially Observable Environments
sta b c a b c a b c
![Page 162: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/162.jpg)
General Artificial Intelligence
Q-values in Partially Observable Environments
All terminal histories z reachable from s, paired with their prefix histories ha, where h is in s
Reach probabilities: product of all policies’ state-action probabilities along the portion of the history between ha and z
Return achieved over terminal history z
![Page 163: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/163.jpg)
General Artificial Intelligence
Q-values in Partially Observable Environments
By Bayes rule
![Page 164: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/164.jpg)
General Artificial Intelligence
Q-values in Partially Observable Environments
Since h is in st and h is unique to st
![Page 165: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/165.jpg)
General Artificial Intelligence
Q-values in Partially Observable Environments
![Page 166: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/166.jpg)
General Artificial Intelligence
Q-values in Partially Observable Environments
Only player i’s reach probabilities Player i’s opponent’s probabilities (inc. chance!)
Similarly here and here
![Page 167: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/167.jpg)
General Artificial Intelligence
Q-values in Partially Observable Environments
Due to perfect recall (!!)
![Page 168: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/168.jpg)
General Artificial Intelligence
Q-values in Partially Observable Environments
![Page 169: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/169.jpg)
General Artificial Intelligence
Q-values in Partially Observable Environments
This is a counterfactual value!
![Page 170: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/170.jpg)
General Artificial Intelligence
Q-values in Partially Observable Environments
For full derivation, see Sec 3.2 of Srinivasan et al. ‘18
![Page 171: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/171.jpg)
General Artificial Intelligence
¯\_(ツ)_/¯
Yeah.. so.... ?
![Page 172: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/172.jpg)
General Artificial Intelligence
Zinkevich et al. ‘08
● Algorithm to compute approx
Nash eq. In 2P zero-sum games
● Hugely successful in Poker AI
● Size traditionally reduced apriori
based on expert knowledge
● Key innovation: counterfactual
values:
Counterfactual Regret Minimization (CFR)
Image form Sandholm ‘10
![Page 173: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/173.jpg)
General Artificial Intelligence
● Policy evaluation is analogous
● Policy improvement: use regret minimization algorithms
○ Average strategies converge to Nash in self-play
● Convergence guarantees are on the average policies
CFR is policy iteration!
![Page 176: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/176.jpg)
General Artificial Intelligence
Libratus (Brown & Sandholm ‘18)
![Page 177: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/177.jpg)
General Artificial Intelligence
Parameterized policy with parameters (e.g. a neural network)
Define a score function
Main idea: do gradient ascent on J.
1. REINFORCE (Williams ‘92, see RL book ch. 13) + PG theorem: you
can do this via estimates from sample trajectories.
2. Advantage Actor-Critic (A2C) (Mnih et al ‘16): you can use deep
networks to estimate the policy and baseline value v(s)
Policy Gradient Algorithms
![Page 178: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/178.jpg)
● Policy gradient is doing a form of CFR minimization!
● Several new policy gradient variants inspired connection to regret
Regret Policy Gradients (Srinivasan et al. ‘18)
![Page 179: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/179.jpg)
Replicator Dynamics Time-discretize
Parameterized policy
Update policy parameters to
minimize distance to time-discretized RD
Neural Replicator Dynamics (NeuRD)
Neural Replicator Dynamics (NeuRD)
Logits, where policy is
Advantage q(s,a)-v(s)
Omidshafiei, Hennes, Morrill et al. ‘19
![Page 180: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/180.jpg)
Biased Rock-Paper-Scissors Leduc Poker
NeuRD: Results
![Page 181: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/181.jpg)
Where to Go From Here?
![Page 182: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/182.jpg)
General Artificial Intelligence
Shoham & Leyton-Brown ‘09
masfoundations.org
![Page 183: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/183.jpg)
General Artificial Intelligence
● If multi-agent learning is the answer, what is the question?○ Shoham et al. ‘06
○ Hernandez-Leal et al. ‘19
● A comprehensive survey of MARL (Busoniu et al. ‘08)
● Game Theory and Multiagent RL (Nowé et al. ‘12)
● Study of Learning in Multiagent Envs (Hernandez-Leal et al. ‘17)
Surveys and Food for Thought
![Page 184: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/184.jpg)
General Artificial Intelligence
Bard et al. ‘19 Also Competition at IEEE Cog (ieee-cog.org)
The Hanabi Challenge
![Page 185: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/185.jpg)
● Open source framework for research
in RL & Games
● C++, Python, and Swift impl’s
● 25+ games
● 10+ algorithms
● Tell all your friends! (Seriously!)
→ August 2019
OpenSpiel: Coming Soon!
![Page 186: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/186.jpg)
AAAI 2020 Workshop on RL in Games?
http://aaai-rlg.mlanctot.info/
AAAI19-RLG Summary:
● 39 accepted papers
○ 4 oral presentations
○ 35 posters
● 1 “Mini-Tutorial”
● 3 Invited Talks
● Panel & Discussion
![Page 187: Multiagent Reinforcement Learning · Multi-Agent Learning Tutorial Normal Form Games: Algorithms Fictitious Play: π1 0 BR1 1 BR1 2 BR1 3 π2 0 BR2 1 BR2 2 BR2 3 R1,R2 Start with](https://reader034.fdocuments.us/reader034/viewer/2022042419/5f3677433c84f50569432be1/html5/thumbnails/187.jpg)
Multi-Agent and AI
Marc Lanctot
mlanctot.info/
Questions?