Learning in Multiagent Systems - University of South Carolina

83
Learning in Multiagent Systems Learning in Multiagent Systems Jos´ e M Vidal Department of Computer Science and Engineering, University of South Carolina February 11, 2010 Abstract We introduce the topic of learning in multiagent systems and present recent results.

Transcript of Learning in Multiagent Systems - University of South Carolina

Page 1: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Multiagent Systems

Jose M Vidal

Department of Computer Science and Engineering, University of South Carolina

February 11, 2010

Abstract

We introduce the topic of learning in multiagent systems andpresent recent results.

Page 2: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Introduction

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 3: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Introduction

The Learning Problem

Weight

Sp

eed

+

+

+

+

+

++

+

+ −

−−

−−

−−

Page 4: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Introduction

The Learning Problem

Weight

Sp

eed

+

+

+

+

+

++

+

+ −

−−

−−

−−

Page 5: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Introduction

The Learning Problem

Weight

Sp

eed

+

+

+

+

+

++

+

+ −

−−

−−

−−

Page 6: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Introduction

The Multiagent Learning Problem

Weight

Sp

eed

+

+

+

+

+

++

+

+ −

−−

−−

−−

Page 7: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Cooperative Learning

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 8: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Cooperative Learning

Sharing Learned Knowledge

Fairly easy with identical agent abilities.

Largely unexplored for heterogeneous agents.

Generalizing learned knowledge seems very domain-specific(the induction problem again).

Page 9: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Cooperative Learning

Sharing Learned Knowledge

Fairly easy with identical agent abilities.

Largely unexplored for heterogeneous agents.

Generalizing learned knowledge seems very domain-specific(the induction problem again).

Page 10: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 11: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

j

C D

iA 0,0 5,1

B -1,6 1,5

Page 12: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 13: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Weight Function

kti (sj) = kt−1

i (sj) +

1 if st−1

j = sj ,

0 if st−1j 6= sj .

Page 14: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Model of Opponent

Prti [sj ] =kti (sj)

∑sj∈Sjkti (sj)

.

Page 15: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Best Response

Player i then determines the strategy that will give it the highestexpected utility given that j will play each of its sj ∈ Sj withprobability Prti [sj ].

Page 16: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0

B D 1 1 .5 .5A D 1 2 1/3 2/3A D 1 3 1/4 3/4A D 1 4 1/5 4/5

Page 17: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0B D 1 1 .5 .5

A D 1 2 1/3 2/3A D 1 3 1/4 3/4A D 1 4 1/5 4/5

Page 18: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0B D 1 1 .5 .5A D 1 2 1/3 2/3

A D 1 3 1/4 3/4A D 1 4 1/5 4/5

Page 19: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0B D 1 1 .5 .5A D 1 2 1/3 2/3A D 1 3 1/4 3/4

A D 1 4 1/5 4/5

Page 20: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Example

j

C D

iA 0,0 1,2

B 1,2 0,0

si sj ki (C ) ki (D) Pri [C ] Pri [D]

A C 1 0 1 0B D 1 1 .5 .5A D 1 2 1/3 2/3A D 1 3 1/4 3/4A D 1 4 1/5 4/5

Page 21: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Theorem (Nash Equilibrium is Attractor to Fictitious Play)

If s is a strict Nash equilibrium and it is played at time t then itwill be played at all times greater than t.

Page 22: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Theorem (Fictitious Play Converges to Nash)

If fictitious play converges to a pure strategy then that strategymust be a Nash equilibrium.

Page 23: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5

A C 2 1.5 2 1.5B D 2 2.5 2 2.5A C 3 2.5 3 2.5B D 3 3.5 3 3.5

Page 24: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5A C 2 1.5 2 1.5

B D 2 2.5 2 2.5A C 3 2.5 3 2.5B D 3 3.5 3 3.5

Page 25: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5A C 2 1.5 2 1.5B D 2 2.5 2 2.5

A C 3 2.5 3 2.5B D 3 3.5 3 3.5

Page 26: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5A C 2 1.5 2 1.5B D 2 2.5 2 2.5A C 3 2.5 3 2.5

B D 3 3.5 3 3.5

Page 27: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Fictitious Play

Infinite Cycle Example

j

C D

iA 0,0 1,1

B 1,1 0,0

si sj ki (C ) ki (D) kj(A) kj(B)

1 1.5 1 1.5A C 2 1.5 2 1.5B D 2 2.5 2 2.5A C 3 2.5 3 2.5B D 3 3.5 3 3.5

Page 28: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 29: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Fraction of Agents Playing s

let φ t(s) be the number of agents using strategy s at time t. Wecan then define

θt(s) =

φ t(s)

∑s ′∈S φ t(s ′)

Page 30: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Expected Utility for Playing s

ut(s)≡ ∑s ′∈S

θt(s ′)u(s,s ′),

Page 31: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Reproduction Rate

φt+1(s) = φ

t(s)(1 + ut(s)).

Page 32: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Population Dynamics

Population size could change but we scale back or ignore.

Game must be symmetric.

A stable population of more than one strategy corresponds toa mixed strategy.

Page 33: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Theorem (Nash equilibrium is a Steady State)

Every Nash equilibrium is a steady state for the replicatordynamics.

Proof.

By contradiction. If an agent had a pure strategy that wouldreturn a higher utility than any other strategy then this strategywould be a best response to the Nash equilibrium. If this strategywas different from the Nash equilibrium then we would have a bestresponse to the equilibrium which is not the equilibrium, so thesystem could not be at a Nash equilibrium.

Page 34: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Theorem (Stable Steady State is a Nash Equilibrium)

A stable steady state of the replicator dynamics is a Nashequilibrium. A stable steady state is one that, after suffering froma small perturbation, is pushed back to the same steady state bythe system’s dynamics.

Page 35: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Theorem (Asymptotically Stable is Trembling-Hand Nash)

An asymptotically stable steady state corresponds to a Nashequilibrium that is trembling-hand perfect and isolated. That is,the stable steady states are a refinement on Nash equilibria—onlya few Nash equilibria can are stable steady states.

Page 36: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Definition (Evolutionary Stable Strategy)

An ESS is an equilibrium strategy that can overcome the presenceof a small number of invaders. That is, if the equilibrium strategyprofile is ω and small number ε of invaders start playing ω ′ thenESS states that the existing population should get a higher payoffagainst the new mixture (εω ′+ (1− ε)ω) than the invaders.

Page 37: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

Theorem (ESS is Steady State of Replicator Dynamics)

ESS is an asymptotically stable steady state of the replicatordynamics. However, the converse need not be true—a stable statein the replicator dynamics does not need to be an ESS.

Page 38: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

Replicator Dynamics

j

a b c

i

a 1,1 2,2 0,0

b 0,0 1,1 2,2

c 2,2 0,0 1,1

a c

b

Page 39: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

The AWESOME Algorithm

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 40: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

The AWESOME Algorithm

AWESOME

1 play -eq,play -sta← true;eq-rej ← false;φ ← πi ; t← 02 while play -sta3 do play φ for N times in a row (an epoch)4 ∀j update sj given what they played in these N rounds.5 if play -eq6 then if some player j has maxa(sj(a),πj(a)) > εe

7 then eq-rej ← true;φ ← random action8 else if ¬eq-rej ∧∃j maxa(sold

j (a),sj(a)) > εs

9 then play -sta← false10 eq-rej ← false11 b← arg maxa ui (a,s−i )12 if ui (b,s−i ) > ui (φ ,s−i ) + n|Ai |εt+1

s µ

13 then φ ← b14 ∀jsold

j ← sj

15 t← t + 116 goto 1

Page 41: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

The AWESOME Algorithm

The Schedule

In order for the algorithm to always converge, εe and εs must bedecreased and N must be increased over time using a schedulewhere

1 εs and εe decrease monotonically to 0,

2 N increases to infinity,

3 ∏t←1,...,∞ 1− ∑i |Ai |Nt(εt

s )2 > 0

4 ∏t←1,...,∞ 1− ∑i |Ai |Nt(εt

e )2 > 0.

Page 42: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Learning in Games

The AWESOME Algorithm

It Converges

Theorem (AWESOME converges)

With a valid schedule, the AWESOME algorithm converges to bestresponse if all the other players play fixed strategies and to a Nashequilibrium if all the other players are AWESOME players.

Page 43: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 44: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

What is a Stochastic Game?

One where the agents do not know the payoff they might get.

That is, an unexplored MDP.

Page 45: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 46: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Reinforcement Learning Problem Definition

st is a state, taken from S ,

at is an action, taken from A,

P(st+1 |st ,at) is the state transition function

r(st ,at)→ℜ is the reward function.

The problem is to find the policy π(s)→ a which maximizes thediscounted successive rewards rt the agent receives if using π.That is, find

π∗ = arg max

π

∑i=0

γi ri

Page 47: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Reinforcement Learning Problem Definition

st is a state, taken from S ,

at is an action, taken from A,

P(st+1 |st ,at) is the state transition function

r(st ,at)→ℜ is the reward function.

The problem is to find the policy π(s)→ a which maximizes thediscounted successive rewards rt the agent receives if using π.That is, find

π∗ = arg max

π

∑i=0

γi ri

Page 48: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Q-learning

1 ∀s∀aQ(s,a)← 0; λ ← 1; ε ← 12 s← current state3 if rand() < ε Exploration rate4 then a← random action5 else a← arg maxa Q(s,a)6 Take action a7 Receive reward r8 s ′← current state9 Q(s,a)← λ (r + γ maxa′ Q(s ′,a′)) + (1−λ )Q(s,a)

10 λ ← .99λ

11 ε ← .98ε

12 goto 2

Page 49: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Theorem (Q-learning Converges)

Given that the learning and exploration rates decrease slowlyenough, Q-learning is guaranteed to converge to the optimalpolicy.

Page 50: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Definition (Nash Equilibrium Point)

A Nash equilibrium point is a tuple of n strategies (π∗1 , . . . ,π∗n)such that for all s ∈ S and i = 1, . . . ,n,

∀πi∈Πivi (s,π∗1 , . . .π∗n)≥ vi (s,π∗1 , . . .π∗i−1,πi ,π

∗i+1, . . . ,π

∗n)

Page 51: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Theorem (Nash Equilibrium Point Exists)

Every n−player discounted stochastic game possesses at least oneNash equilibrium point in stationary strategies.

Page 52: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

NashQ-learning

1 t← 02 s0← current state3 ∀s∈S∀j←1,...,n∀aj∈Aj

Qtj (s,a1, . . . ,an)← 0

4 Choose action ati

5 Observe r t1 , . . . , r t

n ; at1, . . . ,at

n; st+1 = s ′

6 for j ← 1, . . . ,n

7 do Qt+1j (s,a1, . . . ,an)←(1−λ t)Qt

j (s,a1, . . . ,an) + λ t(r tj + γNashQt

j (s ′))

where NashQtj (s ′) = Qt

j (s ′,π1(s ′) · · ·πn(s ′))

and π1(s ′) · · ·πn(s ′) are Nash EP calculated from Q values8 t← t + 19 goto 4

Page 53: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Assumption

There exists an adversarial equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.

Assumption

There exists a coordination equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.

Theorem (NashQ-learning Converges)

Under these assumptions NashQ-learning converges to a Nashequilibrium as long as all the equilibria encountered during thegame are unique.

Page 54: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Assumption

There exists an adversarial equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.

Assumption

There exists a coordination equilibrium for the entire game and forevery game defined by the Q functions encountered duringlearning.

Theorem (NashQ-learning Converges)

Under these assumptions NashQ-learning converges to a Nashequilibrium as long as all the equilibria encountered during thegame are unique.

Page 55: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

friend-or-foe

1 t← 02 s0← current state3 ∀s∈S∀aj∈Aj

Qti (s,a1, . . . ,an)← 0

4 Choose action ati

5 Observe r t1 , . . . , r t

n ; at1, . . . ,at

n; st+1 = s ′

6 Qt+1i (s,a1, . . . ,an)←

(1−λ t)Qti (s,a1, . . . ,an) + λ t(r t

i + γNashQti (s ′))

where NashQti (s ′) = maxπ∈Π(X1×···×Xk ) minyi ,...,yl∈Y1×···×Yl

∑x1,...,xk∈X1×···×Xkπ(x1) · · ·π(xk)Qi (s,x1, . . . ,xk ,y1, . . .yl)

and X are actions for i ’s friends and Y are for the foes.7 t← t + 18 goto 4

Page 56: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Theorem (friend-or-foe converges)

friend-or-foe converges.

However, in general these do not correspond to a Nash equilibriumpoint.

Still, we can show

Theorem

foe-q learns values for a Nash equilibrium policy if the game hasan adversarial equilibrium and friend-q learns values for a Nashequilibrium policy if the game has a coordination equilibrium. Thisis true regardless of opponent behavior.

Page 57: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Stochastic Games

Reinforcement Learning

Theorem (friend-or-foe converges)

friend-or-foe converges.

However, in general these do not correspond to a Nash equilibriumpoint. Still, we can show

Theorem

foe-q learns values for a Nash equilibrium policy if the game hasan adversarial equilibrium and friend-q learns values for a Nashequilibrium policy if the game has a coordination equilibrium. Thisis true regardless of opponent behavior.

Page 58: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 59: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

Moving Target Function Problem

As the other agents change their behavior, what you need todo also changes.

Page 60: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 61: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

CLRI Notation

δ ti (w) : W → A: decision function.

∆ti (w): target function

e(δ ti ) = Pr[δ t

i (w) 6= ∆ti (w) |w ∈D(W ): error function.

Page 62: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

δ ti ∆t

ie(δ t

i )

δt+1i

Lear

n

∆t+1iMove

e(δ t+1i )

Figure: The moving target function problem.

Page 63: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

δ ti ∆t

ie(δ t

i )

δt+1i

Lear

n

∆t+1iMove

e(δ t+1i )

Figure: The moving target function problem.

Page 64: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

δ ti ∆t

ie(δ t

i )

δt+1i

Lear

n

∆t+1iMove

e(δ t+1i )

Figure: The moving target function problem.

Page 65: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

δ ti ∆t

ie(δ t

i )

δt+1i

Lear

n

∆t+1iMove

e(δ t+1i )

Figure: The moving target function problem.

Page 66: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

CLRI Parameter

Change rate (c) is the probability that an agent will change atleast one of its incorrect mappings in δ t(w).

Learning rate (l) is the probability that the agent changes anincorrect mapping to the correct one.

Retention rate (r) represents the probability that the agentwill retain its correct mapping.

Impact (Iij) is the impact that i ’s learning has on j ’s targetfunction. Specifically, it is the probability that ∆t

j (w) will

change given that δt+1i (w) 6= δ t

i (w).

Page 67: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

CLRI Theory

CLRI Equation

E [e(δt+1i )] = 1− ri + vi

(|Ai |ri −1

|Ai |−1

)+ e(δ

ti )

(ri − li + vi

(|Ai |(li − ri ) + li − ci

|Ai |−1

))(1)

Page 68: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

N-Level Agents

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 69: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

N-Level Agents

I Think that You Think that I Think that. . .

0-level agent is one that does not recognize the existence ofother agents in the world.

1-level agent recognizes that there are other agents in theworld whose actions affect its payoff. It also has someknowledge that tells it the utility it will receive given any setof joint actions.

2-level agent believes that all other agents are 1-level agents.

Page 70: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

N-Level Agents

Decreasing Returns of Thinking

n-level always beats (n−1)-level agents.

Marginal utility gains grow smaller with each extra level.

Computational costs grow exponentially with each extra level.

Many times, it doesn’t pay to think about your opponent.

Page 71: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

General Theories for Learning Agents

N-Level Agents

Decreasing Returns of Thinking

n-level always beats (n−1)-level agents.

Marginal utility gains grow smaller with each extra level.

Computational costs grow exponentially with each extra level.

Many times, it doesn’t pay to think about your opponent.

Page 72: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

Outline

1 Introduction

2 Cooperative Learning

3 Learning in GamesFictitious PlayReplicator DynamicsThe AWESOME Algorithm

4 Stochastic GamesReinforcement Learning

5 General Theories for Learning AgentsCLRI TheoryN-Level Agents

6 Collective Intelligence

Page 73: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

COllective INtelligence

Idea: start with the global utility function U(s,~a) anddetermine from it the individual utilities.

Page 74: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

Define Preferences

We define i ’s preference over s,~a as

Pi (s,~a) =∑~a′∈~A Θ[ri (s,~a)− ri (s,~a′)]

|~A|,

where Θ(x) is the Heaviside function which is 1 if x is greater thanor equal to 0, otherwise it is 0. SSimilarly, we define the global preference function as

P(s,~a) =∑~a′∈~A Θ[U(s,~a)−U(s,~a′)]

|~A|.

Page 75: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

The Easier Case

A system where, for all agents i it is true thatPi (s,~a) = P(s,~a) is called factored.

But, might still not converge because agents’ actions changeother agents’ target function.

Page 76: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

The Easier Case

A system where, for all agents i it is true thatPi (s,~a) = P(s,~a) is called factored.

But, might still not converge because agents’ actions changeother agents’ target function.

Page 77: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

Opacity

We define the opacity Ωi for agent i as

Ωi (s,~a) = ∑~a′∈~A

Pr[~a′]|ui (s,~a)−ui (s,~a′−i ,~ai )||ui (s,~a)−ui (s,~a−i ,~a′i )|

.

Page 78: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

System Categorization

If factored and zero opacity then its easy to solve, but rare.

If zero opacity then it amounts to multiple parallel learningproblems.

Goal: Find low opacity reward functions that are highlyfactored.

Page 79: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

System Categorization

If factored and zero opacity then its easy to solve, but rare.

If zero opacity then it amounts to multiple parallel learningproblems.

Goal: Find low opacity reward functions that are highlyfactored.

Page 80: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

System Categorization

If factored and zero opacity then its easy to solve, but rare.

If zero opacity then it amounts to multiple parallel learningproblems.

Goal: Find low opacity reward functions that are highlyfactored.

Page 81: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

Wonderful Life

The wonderful life utility function gives each agent:

ui (s,~a) = U(s,~a)−U(s,~a−i ,0),

Page 82: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

Aristocrat Utility

Another solution is the aristocrat utility

ui (s,~a) = U(s,~a)− ∑~a′∈~A

Pr[~a′]U(s,~a−i ,~a′i ),

where Pr[~a′] is the probability that ~a′ happens.

Page 83: Learning in Multiagent Systems - University of South Carolina

Learning in Multiagent Systems

Collective Intelligence

COIN Tests

Both utility functions have been shown to perform better thanui = U and other hand-tailored utility functions.