Learning in Multiagent systems

Learning inLearning inMultiagent systemsMultiagent systems

Prepared by: Jarosław SzymczakPrepared by: Jarosław Szymczak

Based on: „Based on: „Fundamentals of Multiagent Systems with Fundamentals of Multiagent Systems with NetLogo Examples” NetLogo Examples” by Josby Joséé M Vidal M Vidal

Scenarios of learningScenarios of learning

cooperativecooperative learning – learning – ee..gg. each agent has own . each agent has own map, and together with other agents aggregate map, and together with other agents aggregate a global viewa global view

competitive learning – competitive learning – e.g. e.g. each selfish agent each selfish agent tries to maximize own utility by learning about tries to maximize own utility by learning about behaviors and weaknesses of other agentsbehaviors and weaknesses of other agents

agents learn because:agents learn because: they don’t know everything about the environmentthey don’t know everything about the environment they don’t know how the other agents behavethey don’t know how the other agents behave

The Machine Learning ProblemThe Machine Learning Problem

The goal of machine learning research is the The goal of machine learning research is the development of algorithms that increase the development of algorithms that increase the ability of an agent to match a set of inputs to ability of an agent to match a set of inputs to their corresponding outputs (Mitchell, 1997)their corresponding outputs (Mitchell, 1997)

Input here could be e.g. a set of photos Input here could be e.g. a set of photos depicting people and output would be set depicting people and output would be set consisting of {man, woman}, the machine consisting of {man, woman}, the machine learning algorithm will have to learn a proper learning algorithm will have to learn a proper recognition of photosrecognition of photos


Input set is usually divided into training and Input set is usually divided into training and testing set, they can be interleavedtesting set, they can be interleaved

Graphical representation of MLP:Graphical representation of MLP:


Induction bias – some learning algorithms Induction bias – some learning algorithms appear to perform better than others in certain appear to perform better than others in certain domains (e.g. two algorithms can learn perfectly domains (e.g. two algorithms can learn perfectly classify + and -, but still have different functions)classify + and -, but still have different functions)

No free lunch theorem - averages over all No free lunch theorem - averages over all possible learning problems there is no learning possible learning problems there is no learning algorithm that outperforms all othersalgorithm that outperforms all others

In multiagent scenario some of the fundamental In multiagent scenario some of the fundamental assumptions of machine learning are violated, assumptions of machine learning are violated, there is no longer fixed input, it keeps changing there is no longer fixed input, it keeps changing because other agents are also learningbecause other agents are also learning

Cooperative learningCooperative learning

We have given a two robots able to We have given a two robots able to communicate, they can share their knowledge communicate, they can share their knowledge (their capabilities, knowledge about terrain etc.)(their capabilities, knowledge about terrain etc.)

Sharing the information is really easy if robots Sharing the information is really easy if robots are identical, if not we need to somehow model are identical, if not we need to somehow model their capabilities to decide which information their capabilities to decide which information would be useful for another robotwould be useful for another robot

Most systems that share learned knowledge Most systems that share learned knowledge among agents, such as (Stone, 2000), simply among agents, such as (Stone, 2000), simply assume that all agents have the same assume that all agents have the same capabilities.capabilities.

Repeated gamesRepeated games

Nash equilibrium – I choose what is best for me when Nash equilibrium – I choose what is best for me when you are doing what you are doing and you choose what you are doing what you are doing and you choose what is best for you when I am doing what I am doingis best for you when I am doing what I am doing

In repeated games we have two players facing each In repeated games we have two players facing each other, like e.g. prisoner's dilemmaother, like e.g. prisoner's dilemma

Nash equilibrium is based on assumption of perfectly Nash equilibrium is based on assumption of perfectly rational players, in learning in games the assumption is rational players, in learning in games the assumption is that agents use some kind of algorithm, the theory that agents use some kind of algorithm, the theory determines the equilibrium strategy that will be arrived at determines the equilibrium strategy that will be arrived at by the various learning mechanisms and maps these by the various learning mechanisms and maps these equilibrium to the standard solution concepts, if possible.equilibrium to the standard solution concepts, if possible.

Fictitious playFictitious playAgent remembers everything the other Agent remembers everything the other aagents have done. E.g.:gents have done. E.g.:

Fictitious playFictitious play

Let’s have a look at some theorems:Let’s have a look at some theorems: (Nash Equilibrium is Attractor to Fictitious (Nash Equilibrium is Attractor to Fictitious

Play). If s is a strict Nash equilibrium and it is Play). If s is a strict Nash equilibrium and it is played at time t then it will be played at all played at time t then it will be played at all times greater than t (Fudenberg and Kreps, times greater than t (Fudenberg and Kreps, 1990).1990).

(Fictitious Play Converges to Nash). If (Fictitious Play Converges to Nash). If fictitious play converges to a pure strategy fictitious play converges to a pure strategy then that strategy must be a Nash equilibrium then that strategy must be a Nash equilibrium (Fudenberg and Kreps, 1990).(Fudenberg and Kreps, 1990).

Fictitious playFictitious play

Infinite cycles problem – we can avoid it by Infinite cycles problem – we can avoid it by using randomness, here is the example of using randomness, here is the example of infinite cycle in fictitious play:infinite cycle in fictitious play:

Replicator dynamicsReplicator dynamicsThis model assumes that This model assumes that the fraction of agents the fraction of agents playing a particular strategy playing a particular strategy replicator dynamics will replicator dynamics will grow in proportion to how grow in proportion to how well that strategy performs well that strategy performs in the population.in the population.A homogeneous population A homogeneous population of agents is assumed. The of agents is assumed. The agents are randomly paired agents are randomly paired in order to play a symmetric in order to play a symmetric game (same strategies and game (same strategies and payoffs). It is inspired by payoffs). It is inspired by biological evolution.biological evolution.

Let Let φφtt(s) be a number of agents (s) be a number of agents using strategy s at time t, uusing strategy s at time t, utt(s) (s) be an expected utility for an be an expected utility for an agent playing strategy s at time agent playing strategy s at time t t and u(s,s’) be utility that agant and u(s,s’) be utility that agant playing s receives against agent playing s receives against agent playing s’playing s’. We can define:. We can define:

Replicator dynamicsReplicator dynamics Let’s have a look at some theorems:Let’s have a look at some theorems:

(Nash equilibrium is a Steady State). Every Nash equilibrium is a(Nash equilibrium is a Steady State). Every Nash equilibrium is a steady state for the replicator dynamics (Fudenberg and Levine, steady state for the replicator dynamics (Fudenberg and Levine, 1998).1998).

(Stable Steady State is a Nash Equilibrium). A stable steady (Stable Steady State is a Nash Equilibrium). A stable steady statestate of the replicator dynamics is a Nash equilibrium. A stable of the replicator dynamics is a Nash equilibrium. A stable steady state is one that,steady state is one that, after suffering from a small perturbation, after suffering from a small perturbation, is pushed back to the same steady state byis pushed back to the same steady state by the system’s the system’s dynamics (Fudenberg and Levine, 1998).dynamics (Fudenberg and Levine, 1998).

(Asymptotically Stable is Trembling-Hand Nash). An (Asymptotically Stable is Trembling-Hand Nash). An asymptoticallyasymptotically stable steady state corresponds to a Nash stable steady state corresponds to a Nash equilibrium that is trembling-hand perfectequilibrium that is trembling-hand perfect and isolated. That is, and isolated. That is, the stable steady states are a refinement on Nash equilibriathe stable steady states are a refinement on Nash equilibria - - only a few Nash equilibria are stable steady states (Bomze, only a few Nash equilibria are stable steady states (Bomze, 1986).1986).

Evolutionary stable strategyEvolutionary stable strategy

An An ESSESS is an equilibrium strategy that can overcome the is an equilibrium strategy that can overcome the presence of a small number of invaders. That is, if the presence of a small number of invaders. That is, if the equilibrium strategy profile is equilibrium strategy profile is ωω and small number and small number εε of of invaders start playing invaders start playing ωω’’ then ESS states that the then ESS states that the existing population should get a higher payoff against the existing population should get a higher payoff against the new mixture (new mixture (εωεω‘+(1−+(1−εε))ωω)) than the invaders. than the invaders.

((ESSESS is Steady State of Replicator Dynamics). is Steady State of Replicator Dynamics). ESSESS is an is an asymptoticallyasymptotically stable steady state of the replicator stable steady state of the replicator dynamics. However, the converse need notdynamics. However, the converse need not be true—a be true—a stable state in the replicator dynamics does not need to stable state in the replicator dynamics does not need to be an be an ESSESS (Taylor (Taylor and Jonker, 1978).and Jonker, 1978).

Replicator dynamicsReplicator dynamics

AWESOME algorithmAWESOME algorithm

The abbreviation stands for:The abbreviation stands for:

Adapt When Every is Stationary, Adapt When Every is Stationary, Otherwise Move to EquilibriumOtherwise Move to Equilibrium

Stochastic gamesStochastic games

COMING SOON COMING SOON (THIS AUTUMN) (THIS AUTUMN)

Learning in Multiagent systems

Documents

Transcript of Learning in Multiagent systems