Learning in Multiagent Systems - University of South Carolina

Post on 04-Feb-2022

2 views 0 download

Transcript of Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Multiagent Systems

Jose M Vidal

Department of Computer Science and Engineering, University of South Carolina

February 11, 2010

Abstract

We introduce the topic of learning in multiagent systems andpresent recent results.

Learning in Multiagent Systems

Introduction

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

Introduction

The Learning Problem

Weight

Sp

eed

+

+

+

+

+

++

+

+ −

−−

−−

−−

Learning in Multiagent Systems

Introduction

The Learning Problem

Weight

Sp

eed

+

+

+

+

+

++

+

+ −

−−

−−

−−

Learning in Multiagent Systems

Introduction

The Learning Problem

Weight

Sp

eed

+

+

+

+

+

++

+

+ −

−−

−−

−−

Learning in Multiagent Systems

Introduction

The Multiagent Learning Problem

Weight

Sp

eed

+

+

+

+

+

++

+

+ −

−−

−−

−−

Learning in Multiagent Systems

Cooperative Learning

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

Cooperative Learning

Sharing Learned Knowledge

Fairly easy with identical agent abilities.

Largely unexplored for heterogeneous agents.

Generalizing learned knowledge seems very domain-specific(the induction problem again).

Learning in Multiagent Systems

Cooperative Learning

Sharing Learned Knowledge

Fairly easy with identical agent abilities.

Largely unexplored for heterogeneous agents.

Generalizing learned knowledge seems very domain-specific(the induction problem again).

Learning in Multiagent Systems

Learning in Games

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

Learning in Games

j

C D

iA 0,0 5,1

B -1,6 1,5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Weight Function

kti (sj) = kt−1

i (sj) +

1 if st−1

j = sj ,

0 if st−1j 6= sj .

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Model of Opponent

Prti [sj ] =kti (sj)

∑sj∈Sjkti (sj)

.

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Best Response

Player i then determines the strategy that will give it the highestexpected utility given that j will play each of its sj ∈ Sj withprobability Prti [sj ].

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0

B D 1 1 .5 .5A D 1 2 1/3 2/3A D 1 3 1/4 3/4A D 1 4 1/5 4/5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0B D 1 1 .5 .5

A D 1 2 1/3 2/3A D 1 3 1/4 3/4A D 1 4 1/5 4/5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0B D 1 1 .5 .5A D 1 2 1/3 2/3

A D 1 3 1/4 3/4A D 1 4 1/5 4/5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0B D 1 1 .5 .5A D 1 2 1/3 2/3A D 1 3 1/4 3/4

A D 1 4 1/5 4/5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0B D 1 1 .5 .5A D 1 2 1/3 2/3A D 1 3 1/4 3/4A D 1 4 1/5 4/5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Theorem (Nash Equilibrium is Attractor to Fictitious Play)

If s is a strict Nash equilibrium and it is played at time t then itwill be played at all times greater than t.

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Theorem (Fictitious Play Converges to Nash)

If fictitious play converges to a pure strategy then that strategymust be a Nash equilibrium.

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5

A C 2 1.5 2 1.5B D 2 2.5 2 2.5A C 3 2.5 3 2.5B D 3 3.5 3 3.5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5A C 2 1.5 2 1.5

B D 2 2.5 2 2.5A C 3 2.5 3 2.5B D 3 3.5 3 3.5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5A C 2 1.5 2 1.5B D 2 2.5 2 2.5

A C 3 2.5 3 2.5B D 3 3.5 3 3.5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5A C 2 1.5 2 1.5B D 2 2.5 2 2.5A C 3 2.5 3 2.5

B D 3 3.5 3 3.5

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5A C 2 1.5 2 1.5B D 2 2.5 2 2.5A C 3 2.5 3 2.5B D 3 3.5 3 3.5

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Fraction of Agents Playing s

let φ t(s) be the number of agents using strategy s at time t. Wecan then define

θt(s) =

φ t(s)

∑s ′∈S φ t(s ′)

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Expected Utility for Playing s

ut(s)≡ ∑s ′∈S

θt(s ′)u(s,s ′),

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Reproduction Rate

φt+1(s) = φ

t(s)(1 + ut(s)).

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Population Dynamics

Population size could change but we scale back or ignore.

Game must be symmetric.

A stable population of more than one strategy corresponds toa mixed strategy.

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Theorem (Nash equilibrium is a Steady State)

Every Nash equilibrium is a steady state for the replicatordynamics.

Proof.

By contradiction. If an agent had a pure strategy that wouldreturn a higher utility than any other strategy then this strategywould be a best response to the Nash equilibrium. If this strategywas different from the Nash equilibrium then we would have a bestresponse to the equilibrium which is not the equilibrium, so thesystem could not be at a Nash equilibrium.

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Theorem (Stable Steady State is a Nash Equilibrium)

A stable steady state of the replicator dynamics is a Nashequilibrium. A stable steady state is one that, after suffering froma small perturbation, is pushed back to the same steady state bythe system’s dynamics.

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Theorem (Asymptotically Stable is Trembling-Hand Nash)

An asymptotically stable steady state corresponds to a Nashequilibrium that is trembling-hand perfect and isolated. That is,the stable steady states are a refinement on Nash equilibria—onlya few Nash equilibria can are stable steady states.

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Definition (Evolutionary Stable Strategy)

An ESS is an equilibrium strategy that can overcome the presenceof a small number of invaders. That is, if the equilibrium strategyprofile is ω and small number ε of invaders start playing ω ′ thenESS states that the existing population should get a higher payoffagainst the new mixture (εω ′+ (1− ε)ω) than the invaders.

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Theorem (ESS is Steady State of Replicator Dynamics)

ESS is an asymptotically stable steady state of the replicatordynamics. However, the converse need not be true—a stable statein the replicator dynamics does not need to be an ESS.

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

j

a b c

i

a 1,1 2,2 0,0

b 0,0 1,1 2,2

c 2,2 0,0 1,1

a c

b

Learning in Multiagent Systems

Learning in Games

The AWESOME Algorithm

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

Learning in Games

The AWESOME Algorithm

AWESOME

1 play -eq,play -sta← true;eq-rej ← false;φ ← πi ; t← 02 while play -sta3 do play φ for N times in a row (an epoch)4 ∀j update sj given what they played in these N rounds.5 if play -eq6 then if some player j has maxa(sj(a),πj(a)) > εe

7 then eq-rej ← true;φ ← random action8 else if ¬eq-rej ∧∃j maxa(sold

j (a),sj(a)) > εs

9 then play -sta← false10 eq-rej ← false11 b← arg maxa ui (a,s−i )12 if ui (b,s−i ) > ui (φ ,s−i ) + n|Ai |εt+1

s µ

13 then φ ← b14 ∀jsold

j ← sj

15 t← t + 116 goto 1

Learning in Multiagent Systems

Learning in Games

The AWESOME Algorithm

The Schedule

In order for the algorithm to always converge, εe and εs must bedecreased and N must be increased over time using a schedulewhere

1 εs and εe decrease monotonically to 0,

2 N increases to infinity,

3 ∏t←1,...,∞ 1− ∑i |Ai |Nt(εt

s )2 > 0

4 ∏t←1,...,∞ 1− ∑i |Ai |Nt(εt

e )2 > 0.

Learning in Multiagent Systems

Learning in Games

The AWESOME Algorithm

It Converges

Theorem (AWESOME converges)

With a valid schedule, the AWESOME algorithm converges to bestresponse if all the other players play fixed strategies and to a Nashequilibrium if all the other players are AWESOME players.

Learning in Multiagent Systems

Stochastic Games

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

Stochastic Games

What is a Stochastic Game?

One where the agents do not know the payoff they might get.

That is, an unexplored MDP.

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Reinforcement Learning Problem Definition

st is a state, taken from S ,

at is an action, taken from A,

P(st+1 |st ,at) is the state transition function

r(st ,at)→ℜ is the reward function.

The problem is to find the policy π(s)→ a which maximizes thediscounted successive rewards rt the agent receives if using π.That is, find

π∗ = arg max

π

∑i=0

γi ri

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Reinforcement Learning Problem Definition

st is a state, taken from S ,

at is an action, taken from A,

P(st+1 |st ,at) is the state transition function

r(st ,at)→ℜ is the reward function.

The problem is to find the policy π(s)→ a which maximizes thediscounted successive rewards rt the agent receives if using π.That is, find

π∗ = arg max

π

∑i=0

γi ri

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Q-learning

1 ∀s∀aQ(s,a)← 0; λ ← 1; ε ← 12 s← current state3 if rand() < ε Exploration rate4 then a← random action5 else a← arg maxa Q(s,a)6 Take action a7 Receive reward r8 s ′← current state9 Q(s,a)← λ (r + γ maxa′ Q(s ′,a′)) + (1−λ )Q(s,a)

10 λ ← .99λ

11 ε ← .98ε

12 goto 2

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Theorem (Q-learning Converges)

Given that the learning and exploration rates decrease slowlyenough, Q-learning is guaranteed to converge to the optimalpolicy.

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Definition (Nash Equilibrium Point)

A Nash equilibrium point is a tuple of n strategies (π∗1 , . . . ,π∗n)such that for all s ∈ S and i = 1, . . . ,n,

∀πi∈Πivi (s,π∗1 , . . .π∗n)≥ vi (s,π∗1 , . . .π∗i−1,πi ,π

∗i+1, . . . ,π

∗n)

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Theorem (Nash Equilibrium Point Exists)

Every n−player discounted stochastic game possesses at least oneNash equilibrium point in stationary strategies.

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

NashQ-learning

1 t← 02 s0← current state3 ∀s∈S∀j←1,...,n∀aj∈Aj

Qtj (s,a1, . . . ,an)← 0

4 Choose action ati

5 Observe r t1 , . . . , r t

n ; at1, . . . ,at

n; st+1 = s ′

6 for j ← 1, . . . ,n

7 do Qt+1j (s,a1, . . . ,an)←(1−λ t)Qt

j (s,a1, . . . ,an) + λ t(r tj + γNashQt

j (s ′))

where NashQtj (s ′) = Qt

j (s ′,π1(s ′) · · ·πn(s ′))

and π1(s ′) · · ·πn(s ′) are Nash EP calculated from Q values8 t← t + 19 goto 4

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Assumption

There exists an adversarial equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.

Assumption

There exists a coordination equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.

Theorem (NashQ-learning Converges)

Under these assumptions NashQ-learning converges to a Nashequilibrium as long as all the equilibria encountered during thegame are unique.

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Assumption

There exists an adversarial equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.

Assumption

There exists a coordination equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.

Theorem (NashQ-learning Converges)

Under these assumptions NashQ-learning converges to a Nashequilibrium as long as all the equilibria encountered during thegame are unique.

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

friend-or-foe

1 t← 02 s0← current state3 ∀s∈S∀aj∈Aj

Qti (s,a1, . . . ,an)← 0

4 Choose action ati

5 Observe r t1 , . . . , r t

n ; at1, . . . ,at

n; st+1 = s ′

6 Qt+1i (s,a1, . . . ,an)←

(1−λ t)Qti (s,a1, . . . ,an) + λ t(r t

i + γNashQti (s ′))

where NashQti (s ′) = maxπ∈Π(X1×···×Xk ) minyi ,...,yl∈Y1×···×Yl

∑x1,...,xk∈X1×···×Xkπ(x1) · · ·π(xk)Qi (s,x1, . . . ,xk ,y1, . . .yl)

and X are actions for i ’s friends and Y are for the foes.7 t← t + 18 goto 4

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Theorem (friend-or-foe converges)

friend-or-foe converges.

However, in general these do not correspond to a Nash equilibriumpoint.

Still, we can show

Theorem

foe-q learns values for a Nash equilibrium policy if the game hasan adversarial equilibrium and friend-q learns values for a Nashequilibrium policy if the game has a coordination equilibrium. Thisis true regardless of opponent behavior.

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Theorem (friend-or-foe converges)

friend-or-foe converges.

However, in general these do not correspond to a Nash equilibriumpoint. Still, we can show

Theorem

foe-q learns values for a Nash equilibrium policy if the game hasan adversarial equilibrium and friend-q learns values for a Nashequilibrium policy if the game has a coordination equilibrium. Thisis true regardless of opponent behavior.

Learning in Multiagent Systems

General Theories for Learning Agents

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

General Theories for Learning Agents

Moving Target Function Problem

As the other agents change their behavior, what you need todo also changes.

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

CLRI Notation

δ ti (w) : W → A: decision function.

∆ti (w): target function

e(δ ti ) = Pr[δ t

i (w) 6= ∆ti (w) |w ∈D(W ): error function.

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

δ ti ∆t

ie(δ t

i )

δt+1i

Lear

n

∆t+1iMove

e(δ t+1i )

Figure: The moving target function problem.

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

δ ti ∆t

ie(δ t

i )

δt+1i

Lear

n

∆t+1iMove

e(δ t+1i )

Figure: The moving target function problem.

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

δ ti ∆t

ie(δ t

i )

δt+1i

Lear

n

∆t+1iMove

e(δ t+1i )

Figure: The moving target function problem.

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

δ ti ∆t

ie(δ t

i )

δt+1i

Lear

n

∆t+1iMove

e(δ t+1i )

Figure: The moving target function problem.

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

CLRI Parameter

Change rate (c) is the probability that an agent will change atleast one of its incorrect mappings in δ t(w).

Learning rate (l) is the probability that the agent changes anincorrect mapping to the correct one.

Retention rate (r) represents the probability that the agentwill retain its correct mapping.

Impact (Iij) is the impact that i ’s learning has on j ’s targetfunction. Specifically, it is the probability that ∆t

j (w) will

change given that δt+1i (w) 6= δ t

i (w).

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

CLRI Equation

E [e(δt+1i )] = 1− ri + vi

(|Ai |ri −1

|Ai |−1

)+ e(δ

ti )

(ri − li + vi

(|Ai |(li − ri ) + li − ci

|Ai |−1

))(1)

Learning in Multiagent Systems

General Theories for Learning Agents

N-Level Agents

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

General Theories for Learning Agents

N-Level Agents

I Think that You Think that I Think that. . .

0-level agent is one that does not recognize the existence ofother agents in the world.

1-level agent recognizes that there are other agents in theworld whose actions affect its payoff. It also has someknowledge that tells it the utility it will receive given any setof joint actions.

2-level agent believes that all other agents are 1-level agents.

Learning in Multiagent Systems

General Theories for Learning Agents

N-Level Agents

Decreasing Returns of Thinking

n-level always beats (n−1)-level agents.

Marginal utility gains grow smaller with each extra level.

Computational costs grow exponentially with each extra level.

Many times, it doesn’t pay to think about your opponent.

Learning in Multiagent Systems

General Theories for Learning Agents

N-Level Agents

Decreasing Returns of Thinking

n-level always beats (n−1)-level agents.

Marginal utility gains grow smaller with each extra level.

Computational costs grow exponentially with each extra level.

Many times, it doesn’t pay to think about your opponent.

Learning in Multiagent Systems

Collective Intelligence

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Learning in Multiagent Systems

Collective Intelligence

COllective INtelligence

Idea: start with the global utility function U(s,~a) anddetermine from it the individual utilities.

Learning in Multiagent Systems

Collective Intelligence

Define Preferences

We define i ’s preference over s,~a as

Pi (s,~a) =∑~a′∈~A Θ[ri (s,~a)− ri (s,~a′)]

|~A|,

where Θ(x) is the Heaviside function which is 1 if x is greater thanor equal to 0, otherwise it is 0. SSimilarly, we define the global preference function as

P(s,~a) =∑~a′∈~A Θ[U(s,~a)−U(s,~a′)]

|~A|.

Learning in Multiagent Systems

Collective Intelligence

The Easier Case

A system where, for all agents i it is true thatPi (s,~a) = P(s,~a) is called factored.

But, might still not converge because agents’ actions changeother agents’ target function.

Learning in Multiagent Systems

Collective Intelligence

The Easier Case

A system where, for all agents i it is true thatPi (s,~a) = P(s,~a) is called factored.

But, might still not converge because agents’ actions changeother agents’ target function.

Learning in Multiagent Systems

Collective Intelligence

Opacity

We define the opacity Ωi for agent i as

Ωi (s,~a) = ∑~a′∈~A

Pr[~a′]|ui (s,~a)−ui (s,~a′−i ,~ai )||ui (s,~a)−ui (s,~a−i ,~a′i )|

.

Learning in Multiagent Systems

Collective Intelligence

System Categorization

If factored and zero opacity then its easy to solve, but rare.

If zero opacity then it amounts to multiple parallel learningproblems.

Goal: Find low opacity reward functions that are highlyfactored.

Learning in Multiagent Systems

Collective Intelligence

System Categorization

If factored and zero opacity then its easy to solve, but rare.

If zero opacity then it amounts to multiple parallel learningproblems.

Goal: Find low opacity reward functions that are highlyfactored.

Learning in Multiagent Systems

Collective Intelligence

System Categorization

If factored and zero opacity then its easy to solve, but rare.

If zero opacity then it amounts to multiple parallel learningproblems.

Goal: Find low opacity reward functions that are highlyfactored.

Learning in Multiagent Systems

Collective Intelligence

Wonderful Life

The wonderful life utility function gives each agent:

ui (s,~a) = U(s,~a)−U(s,~a−i ,0),

Learning in Multiagent Systems

Collective Intelligence

Aristocrat Utility

Another solution is the aristocrat utility

ui (s,~a) = U(s,~a)− ∑~a′∈~A

Pr[~a′]U(s,~a−i ,~a′i ),

where Pr[~a′] is the probability that ~a′ happens.

Learning in Multiagent Systems

Collective Intelligence

COIN Tests

Both utility functions have been shown to perform better thanui = U and other hand-tailored utility functions.