Learning in Multiagent Systems - University of South Carolina
Transcript of Learning in Multiagent Systems - University of South Carolina
Learning in Multiagent Systems
Learning in Multiagent Systems
Jose M Vidal
Department of Computer Science and Engineering, University of South Carolina
February 11, 2010
Abstract
We introduce the topic of learning in multiagent systems andpresent recent results.
Learning in Multiagent Systems
Introduction
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
Introduction
The Learning Problem
Weight
Sp
eed
+
+
+
+
+
++
+
+ −
−−
−−
−−
Learning in Multiagent Systems
Introduction
The Learning Problem
Weight
Sp
eed
+
+
+
+
+
++
+
+ −
−−
−−
−−
Learning in Multiagent Systems
Introduction
The Learning Problem
Weight
Sp
eed
+
+
+
+
+
++
+
+ −
−−
−−
−−
Learning in Multiagent Systems
Introduction
The Multiagent Learning Problem
Weight
Sp
eed
+
+
+
+
+
++
+
+ −
−−
−−
−−
Learning in Multiagent Systems
Cooperative Learning
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
Cooperative Learning
Sharing Learned Knowledge
Fairly easy with identical agent abilities.
Largely unexplored for heterogeneous agents.
Generalizing learned knowledge seems very domain-specific(the induction problem again).
Learning in Multiagent Systems
Cooperative Learning
Sharing Learned Knowledge
Fairly easy with identical agent abilities.
Largely unexplored for heterogeneous agents.
Generalizing learned knowledge seems very domain-specific(the induction problem again).
Learning in Multiagent Systems
Learning in Games
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
Learning in Games
j
C D
iA 0,0 5,1
B -1,6 1,5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Weight Function
kti (sj) = kt−1
i (sj) +
1 if st−1
j = sj ,
0 if st−1j 6= sj .
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Model of Opponent
Prti [sj ] =kti (sj)
∑sj∈Sjkti (sj)
.
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Best Response
Player i then determines the strategy that will give it the highestexpected utility given that j will play each of its sj ∈ Sj withprobability Prti [sj ].
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Example
j
C D
iA 0,0 1,2
B 1,2 0,0
si sj ki (C ) ki (D) Pri [C ] Pri [D]
A C 1 0 1 0
B D 1 1 .5 .5A D 1 2 1/3 2/3A D 1 3 1/4 3/4A D 1 4 1/5 4/5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Example
j
C D
iA 0,0 1,2
B 1,2 0,0
si sj ki (C ) ki (D) Pri [C ] Pri [D]
A C 1 0 1 0B D 1 1 .5 .5
A D 1 2 1/3 2/3A D 1 3 1/4 3/4A D 1 4 1/5 4/5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Example
j
C D
iA 0,0 1,2
B 1,2 0,0
si sj ki (C ) ki (D) Pri [C ] Pri [D]
A C 1 0 1 0B D 1 1 .5 .5A D 1 2 1/3 2/3
A D 1 3 1/4 3/4A D 1 4 1/5 4/5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Example
j
C D
iA 0,0 1,2
B 1,2 0,0
si sj ki (C ) ki (D) Pri [C ] Pri [D]
A C 1 0 1 0B D 1 1 .5 .5A D 1 2 1/3 2/3A D 1 3 1/4 3/4
A D 1 4 1/5 4/5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Example
j
C D
iA 0,0 1,2
B 1,2 0,0
si sj ki (C ) ki (D) Pri [C ] Pri [D]
A C 1 0 1 0B D 1 1 .5 .5A D 1 2 1/3 2/3A D 1 3 1/4 3/4A D 1 4 1/5 4/5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Theorem (Nash Equilibrium is Attractor to Fictitious Play)
If s is a strict Nash equilibrium and it is played at time t then itwill be played at all times greater than t.
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Theorem (Fictitious Play Converges to Nash)
If fictitious play converges to a pure strategy then that strategymust be a Nash equilibrium.
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Infinite Cycle Example
j
C D
iA 0,0 1,1
B 1,1 0,0
si sj ki (C ) ki (D) kj(A) kj(B)
1 1.5 1 1.5
A C 2 1.5 2 1.5B D 2 2.5 2 2.5A C 3 2.5 3 2.5B D 3 3.5 3 3.5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Infinite Cycle Example
j
C D
iA 0,0 1,1
B 1,1 0,0
si sj ki (C ) ki (D) kj(A) kj(B)
1 1.5 1 1.5A C 2 1.5 2 1.5
B D 2 2.5 2 2.5A C 3 2.5 3 2.5B D 3 3.5 3 3.5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Infinite Cycle Example
j
C D
iA 0,0 1,1
B 1,1 0,0
si sj ki (C ) ki (D) kj(A) kj(B)
1 1.5 1 1.5A C 2 1.5 2 1.5B D 2 2.5 2 2.5
A C 3 2.5 3 2.5B D 3 3.5 3 3.5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Infinite Cycle Example
j
C D
iA 0,0 1,1
B 1,1 0,0
si sj ki (C ) ki (D) kj(A) kj(B)
1 1.5 1 1.5A C 2 1.5 2 1.5B D 2 2.5 2 2.5A C 3 2.5 3 2.5
B D 3 3.5 3 3.5
Learning in Multiagent Systems
Learning in Games
Fictitious Play
Infinite Cycle Example
j
C D
iA 0,0 1,1
B 1,1 0,0
si sj ki (C ) ki (D) kj(A) kj(B)
1 1.5 1 1.5A C 2 1.5 2 1.5B D 2 2.5 2 2.5A C 3 2.5 3 2.5B D 3 3.5 3 3.5
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Fraction of Agents Playing s
let φ t(s) be the number of agents using strategy s at time t. Wecan then define
θt(s) =
φ t(s)
∑s ′∈S φ t(s ′)
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Expected Utility for Playing s
ut(s)≡ ∑s ′∈S
θt(s ′)u(s,s ′),
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Reproduction Rate
φt+1(s) = φ
t(s)(1 + ut(s)).
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Population Dynamics
Population size could change but we scale back or ignore.
Game must be symmetric.
A stable population of more than one strategy corresponds toa mixed strategy.
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Theorem (Nash equilibrium is a Steady State)
Every Nash equilibrium is a steady state for the replicatordynamics.
Proof.
By contradiction. If an agent had a pure strategy that wouldreturn a higher utility than any other strategy then this strategywould be a best response to the Nash equilibrium. If this strategywas different from the Nash equilibrium then we would have a bestresponse to the equilibrium which is not the equilibrium, so thesystem could not be at a Nash equilibrium.
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Theorem (Stable Steady State is a Nash Equilibrium)
A stable steady state of the replicator dynamics is a Nashequilibrium. A stable steady state is one that, after suffering froma small perturbation, is pushed back to the same steady state bythe system’s dynamics.
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Theorem (Asymptotically Stable is Trembling-Hand Nash)
An asymptotically stable steady state corresponds to a Nashequilibrium that is trembling-hand perfect and isolated. That is,the stable steady states are a refinement on Nash equilibria—onlya few Nash equilibria can are stable steady states.
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Definition (Evolutionary Stable Strategy)
An ESS is an equilibrium strategy that can overcome the presenceof a small number of invaders. That is, if the equilibrium strategyprofile is ω and small number ε of invaders start playing ω ′ thenESS states that the existing population should get a higher payoffagainst the new mixture (εω ′+ (1− ε)ω) than the invaders.
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
Theorem (ESS is Steady State of Replicator Dynamics)
ESS is an asymptotically stable steady state of the replicatordynamics. However, the converse need not be true—a stable statein the replicator dynamics does not need to be an ESS.
Learning in Multiagent Systems
Learning in Games
Replicator Dynamics
j
a b c
i
a 1,1 2,2 0,0
b 0,0 1,1 2,2
c 2,2 0,0 1,1
a c
b
Learning in Multiagent Systems
Learning in Games
The AWESOME Algorithm
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
Learning in Games
The AWESOME Algorithm
AWESOME
1 play -eq,play -sta← true;eq-rej ← false;φ ← πi ; t← 02 while play -sta3 do play φ for N times in a row (an epoch)4 ∀j update sj given what they played in these N rounds.5 if play -eq6 then if some player j has maxa(sj(a),πj(a)) > εe
7 then eq-rej ← true;φ ← random action8 else if ¬eq-rej ∧∃j maxa(sold
j (a),sj(a)) > εs
9 then play -sta← false10 eq-rej ← false11 b← arg maxa ui (a,s−i )12 if ui (b,s−i ) > ui (φ ,s−i ) + n|Ai |εt+1
s µ
13 then φ ← b14 ∀jsold
j ← sj
15 t← t + 116 goto 1
Learning in Multiagent Systems
Learning in Games
The AWESOME Algorithm
The Schedule
In order for the algorithm to always converge, εe and εs must bedecreased and N must be increased over time using a schedulewhere
1 εs and εe decrease monotonically to 0,
2 N increases to infinity,
3 ∏t←1,...,∞ 1− ∑i |Ai |Nt(εt
s )2 > 0
4 ∏t←1,...,∞ 1− ∑i |Ai |Nt(εt
e )2 > 0.
Learning in Multiagent Systems
Learning in Games
The AWESOME Algorithm
It Converges
Theorem (AWESOME converges)
With a valid schedule, the AWESOME algorithm converges to bestresponse if all the other players play fixed strategies and to a Nashequilibrium if all the other players are AWESOME players.
Learning in Multiagent Systems
Stochastic Games
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
Stochastic Games
What is a Stochastic Game?
One where the agents do not know the payoff they might get.
That is, an unexplored MDP.
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Reinforcement Learning Problem Definition
st is a state, taken from S ,
at is an action, taken from A,
P(st+1 |st ,at) is the state transition function
r(st ,at)→ℜ is the reward function.
The problem is to find the policy π(s)→ a which maximizes thediscounted successive rewards rt the agent receives if using π.That is, find
π∗ = arg max
π
∞
∑i=0
γi ri
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Reinforcement Learning Problem Definition
st is a state, taken from S ,
at is an action, taken from A,
P(st+1 |st ,at) is the state transition function
r(st ,at)→ℜ is the reward function.
The problem is to find the policy π(s)→ a which maximizes thediscounted successive rewards rt the agent receives if using π.That is, find
π∗ = arg max
π
∞
∑i=0
γi ri
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Q-learning
1 ∀s∀aQ(s,a)← 0; λ ← 1; ε ← 12 s← current state3 if rand() < ε Exploration rate4 then a← random action5 else a← arg maxa Q(s,a)6 Take action a7 Receive reward r8 s ′← current state9 Q(s,a)← λ (r + γ maxa′ Q(s ′,a′)) + (1−λ )Q(s,a)
10 λ ← .99λ
11 ε ← .98ε
12 goto 2
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Theorem (Q-learning Converges)
Given that the learning and exploration rates decrease slowlyenough, Q-learning is guaranteed to converge to the optimalpolicy.
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Definition (Nash Equilibrium Point)
A Nash equilibrium point is a tuple of n strategies (π∗1 , . . . ,π∗n)such that for all s ∈ S and i = 1, . . . ,n,
∀πi∈Πivi (s,π∗1 , . . .π∗n)≥ vi (s,π∗1 , . . .π∗i−1,πi ,π
∗i+1, . . . ,π
∗n)
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Theorem (Nash Equilibrium Point Exists)
Every n−player discounted stochastic game possesses at least oneNash equilibrium point in stationary strategies.
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
NashQ-learning
1 t← 02 s0← current state3 ∀s∈S∀j←1,...,n∀aj∈Aj
Qtj (s,a1, . . . ,an)← 0
4 Choose action ati
5 Observe r t1 , . . . , r t
n ; at1, . . . ,at
n; st+1 = s ′
6 for j ← 1, . . . ,n
7 do Qt+1j (s,a1, . . . ,an)←(1−λ t)Qt
j (s,a1, . . . ,an) + λ t(r tj + γNashQt
j (s ′))
where NashQtj (s ′) = Qt
j (s ′,π1(s ′) · · ·πn(s ′))
and π1(s ′) · · ·πn(s ′) are Nash EP calculated from Q values8 t← t + 19 goto 4
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Assumption
There exists an adversarial equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.
Assumption
There exists a coordination equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.
Theorem (NashQ-learning Converges)
Under these assumptions NashQ-learning converges to a Nashequilibrium as long as all the equilibria encountered during thegame are unique.
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Assumption
There exists an adversarial equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.
Assumption
There exists a coordination equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.
Theorem (NashQ-learning Converges)
Under these assumptions NashQ-learning converges to a Nashequilibrium as long as all the equilibria encountered during thegame are unique.
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
friend-or-foe
1 t← 02 s0← current state3 ∀s∈S∀aj∈Aj
Qti (s,a1, . . . ,an)← 0
4 Choose action ati
5 Observe r t1 , . . . , r t
n ; at1, . . . ,at
n; st+1 = s ′
6 Qt+1i (s,a1, . . . ,an)←
(1−λ t)Qti (s,a1, . . . ,an) + λ t(r t
i + γNashQti (s ′))
where NashQti (s ′) = maxπ∈Π(X1×···×Xk ) minyi ,...,yl∈Y1×···×Yl
∑x1,...,xk∈X1×···×Xkπ(x1) · · ·π(xk)Qi (s,x1, . . . ,xk ,y1, . . .yl)
and X are actions for i ’s friends and Y are for the foes.7 t← t + 18 goto 4
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Theorem (friend-or-foe converges)
friend-or-foe converges.
However, in general these do not correspond to a Nash equilibriumpoint.
Still, we can show
Theorem
foe-q learns values for a Nash equilibrium policy if the game hasan adversarial equilibrium and friend-q learns values for a Nashequilibrium policy if the game has a coordination equilibrium. Thisis true regardless of opponent behavior.
Learning in Multiagent Systems
Stochastic Games
Reinforcement Learning
Theorem (friend-or-foe converges)
friend-or-foe converges.
However, in general these do not correspond to a Nash equilibriumpoint. Still, we can show
Theorem
foe-q learns values for a Nash equilibrium policy if the game hasan adversarial equilibrium and friend-q learns values for a Nashequilibrium policy if the game has a coordination equilibrium. Thisis true regardless of opponent behavior.
Learning in Multiagent Systems
General Theories for Learning Agents
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
General Theories for Learning Agents
Moving Target Function Problem
As the other agents change their behavior, what you need todo also changes.
Learning in Multiagent Systems
General Theories for Learning Agents
CLRI Theory
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
General Theories for Learning Agents
CLRI Theory
CLRI Notation
δ ti (w) : W → A: decision function.
∆ti (w): target function
e(δ ti ) = Pr[δ t
i (w) 6= ∆ti (w) |w ∈D(W ): error function.
Learning in Multiagent Systems
General Theories for Learning Agents
CLRI Theory
δ ti ∆t
ie(δ t
i )
δt+1i
Lear
n
∆t+1iMove
e(δ t+1i )
Figure: The moving target function problem.
Learning in Multiagent Systems
General Theories for Learning Agents
CLRI Theory
δ ti ∆t
ie(δ t
i )
δt+1i
Lear
n
∆t+1iMove
e(δ t+1i )
Figure: The moving target function problem.
Learning in Multiagent Systems
General Theories for Learning Agents
CLRI Theory
δ ti ∆t
ie(δ t
i )
δt+1i
Lear
n
∆t+1iMove
e(δ t+1i )
Figure: The moving target function problem.
Learning in Multiagent Systems
General Theories for Learning Agents
CLRI Theory
δ ti ∆t
ie(δ t
i )
δt+1i
Lear
n
∆t+1iMove
e(δ t+1i )
Figure: The moving target function problem.
Learning in Multiagent Systems
General Theories for Learning Agents
CLRI Theory
CLRI Parameter
Change rate (c) is the probability that an agent will change atleast one of its incorrect mappings in δ t(w).
Learning rate (l) is the probability that the agent changes anincorrect mapping to the correct one.
Retention rate (r) represents the probability that the agentwill retain its correct mapping.
Impact (Iij) is the impact that i ’s learning has on j ’s targetfunction. Specifically, it is the probability that ∆t
j (w) will
change given that δt+1i (w) 6= δ t
i (w).
Learning in Multiagent Systems
General Theories for Learning Agents
CLRI Theory
CLRI Equation
E [e(δt+1i )] = 1− ri + vi
(|Ai |ri −1
|Ai |−1
)+ e(δ
ti )
(ri − li + vi
(|Ai |(li − ri ) + li − ci
|Ai |−1
))(1)
Learning in Multiagent Systems
General Theories for Learning Agents
N-Level Agents
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
General Theories for Learning Agents
N-Level Agents
I Think that You Think that I Think that. . .
0-level agent is one that does not recognize the existence ofother agents in the world.
1-level agent recognizes that there are other agents in theworld whose actions affect its payoff. It also has someknowledge that tells it the utility it will receive given any setof joint actions.
2-level agent believes that all other agents are 1-level agents.
Learning in Multiagent Systems
General Theories for Learning Agents
N-Level Agents
Decreasing Returns of Thinking
n-level always beats (n−1)-level agents.
Marginal utility gains grow smaller with each extra level.
Computational costs grow exponentially with each extra level.
Many times, it doesn’t pay to think about your opponent.
Learning in Multiagent Systems
General Theories for Learning Agents
N-Level Agents
Decreasing Returns of Thinking
n-level always beats (n−1)-level agents.
Marginal utility gains grow smaller with each extra level.
Computational costs grow exponentially with each extra level.
Many times, it doesn’t pay to think about your opponent.
Learning in Multiagent Systems
Collective Intelligence
Outline
1 Introduction
2 Cooperative Learning
3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm
4 Stochastic GamesReinforcement Learning
5 General Theories for Learning AgentsCLRI TheoryN-Level Agents
6 Collective Intelligence
Learning in Multiagent Systems
Collective Intelligence
COllective INtelligence
Idea: start with the global utility function U(s,~a) anddetermine from it the individual utilities.
Learning in Multiagent Systems
Collective Intelligence
Define Preferences
We define i ’s preference over s,~a as
Pi (s,~a) =∑~a′∈~A Θ[ri (s,~a)− ri (s,~a′)]
|~A|,
where Θ(x) is the Heaviside function which is 1 if x is greater thanor equal to 0, otherwise it is 0. SSimilarly, we define the global preference function as
P(s,~a) =∑~a′∈~A Θ[U(s,~a)−U(s,~a′)]
|~A|.
Learning in Multiagent Systems
Collective Intelligence
The Easier Case
A system where, for all agents i it is true thatPi (s,~a) = P(s,~a) is called factored.
But, might still not converge because agents’ actions changeother agents’ target function.
Learning in Multiagent Systems
Collective Intelligence
The Easier Case
A system where, for all agents i it is true thatPi (s,~a) = P(s,~a) is called factored.
But, might still not converge because agents’ actions changeother agents’ target function.
Learning in Multiagent Systems
Collective Intelligence
Opacity
We define the opacity Ωi for agent i as
Ωi (s,~a) = ∑~a′∈~A
Pr[~a′]|ui (s,~a)−ui (s,~a′−i ,~ai )||ui (s,~a)−ui (s,~a−i ,~a′i )|
.
Learning in Multiagent Systems
Collective Intelligence
System Categorization
If factored and zero opacity then its easy to solve, but rare.
If zero opacity then it amounts to multiple parallel learningproblems.
Goal: Find low opacity reward functions that are highlyfactored.
Learning in Multiagent Systems
Collective Intelligence
System Categorization
If factored and zero opacity then its easy to solve, but rare.
If zero opacity then it amounts to multiple parallel learningproblems.
Goal: Find low opacity reward functions that are highlyfactored.
Learning in Multiagent Systems
Collective Intelligence
System Categorization
If factored and zero opacity then its easy to solve, but rare.
If zero opacity then it amounts to multiple parallel learningproblems.
Goal: Find low opacity reward functions that are highlyfactored.
Learning in Multiagent Systems
Collective Intelligence
Wonderful Life
The wonderful life utility function gives each agent:
ui (s,~a) = U(s,~a)−U(s,~a−i ,0),
Learning in Multiagent Systems
Collective Intelligence
Aristocrat Utility
Another solution is the aristocrat utility
ui (s,~a) = U(s,~a)− ∑~a′∈~A
Pr[~a′]U(s,~a−i ,~a′i ),
where Pr[~a′] is the probability that ~a′ happens.
Learning in Multiagent Systems
Collective Intelligence
COIN Tests
Both utility functions have been shown to perform better thanui = U and other hand-tailored utility functions.