RoboCup: A Case Study in Multiagent System

RoboCup: A Case Study in Multiagent System

Vahid Mokhtari

2

Contents

Trends1

What is an Agent?2

Multiagent System33

Case study in RoboCup44

ERBASE2011

3

Trends in History of Computing

Ubiquity As processing capability spreads, sophistication

becomes ubiquitous

Interconnection Computer systems no

longer stand alone, but are networked into large

distributed systemsIntelligence

The complexity of tasks that we are capable of

automating and delegating to computers

has grown steadilyDelegation

We are giving control to computers, even in safety

critical tasks

Human-orientation

Programmers conceptualize and

implement software in terms of ever higher-level – more human-oriented –

abstractions

4

From Programming Perspective

Programming has progressed through

• machine code• assembly language• machine-independent programming languages• procedures & functions• abstract data types• objects• to agents

ERBASE2011

5

What is an Agent?

“An agent is an encapsulated computer system, situated in some environment, and capable of autonomous action in that environment in order to meet its design objectives”. Mike Wooldridge

ERBASE2011

6

Agent and Environment

ERBASE2011

7

Environment

Accessible vs. Inaccessible - can the agent “see” everything?

Deterministic vs. Non-deterministic - do actions have guaranteed effect?

Static vs. Dynamic - does the environment change on its own?

Discrete vs. Continuous - is the number of actions and percepts finite?

ERBASE2011

8

What is an Intelligent Agent?

“An intelligent agent is a piece of software that is:

• Reactive – respond to changes in its environment

• Proactive – persistently pursues its goals• Social – interacts with other agents.

• Cooperation• Coordination• Negotiation

ERBASE2011

9

Examples of Intelligent Agents Assistant agent in

MS Office

Trading agents

Web spiders

Computer viruses

Characters in computer games

ERBASE2011

10

Agents vs. Objects

Agent•Autonomous - make a decision base on receiving request•Active - can decide when to act and how•Each agent has its own thread of control

Object•Non-autonomous - make a decision base on invoking method•Passive - have no control over a method execution•There is a single thread of control in the system

ERBASE2011

11

Agent Oriented Architectures

First Classification• Reactive – receives input, processes it and produces an

output.• Deliberative – has an internal view of its environment and

is able to follow its own plans.• Hybrid – mixture of reactive and deliberative, that follows

its own plans, but also sometimes directly reacts to external events without deliberation.

BDI Agents Architecture• Beliefs: represent the informational state of the agent.• Desires: represent objectives or situations that the agent

would like to accomplish. find the best price, go to the party or become rich.

• Intentions: represent the specific goal (or set of goals) to commit to.

ERBASE2011

12

Agent Types (1)

Collaborative Agents

• emphasise autonomy and cooperation with other agents• negotiate with their peers to reach mutually acceptable agreements

during co-operative problem solving

Mobile Agents

• roam wide area networks• interact with foreign hosts• perform tasks on behalf of their owners• return ‘home’• collaborative agents that move across networks

ERBASE2011

13

Agent Types (2)

Interface Agents

• emphasise autonomy and learning in order to perform tasks for their owners• support and provide proactive assistance to a user using a particular application• limited cooperation with other agents• interface agent = personal assistant = personal digital assistant = personal agent

Information Agents

• manage, manipulate or collate information from many distributed sources on WANs

• information agents = internet agents ERBASE2011

14

Multiagent System (MAS)

A Multiagent System is one that consists of a number of agents, which interact with one-another. To successfully interact, they will require the ability to cooperate, coordinate, and negotiate with each other.ERBASE2011

15

Agents model each other’s goals, actions, and domain knowledge, which may differ and interact directly

The Fully General Multiagent System

ERBASE2011

16

Why MAS?

Some domains require it Parallelism Robustness Scalability Simpler programming To study intelligence Geographic distribution Cost effectiveness

ERBASE2011

17

MAS Research Area

Distributed Computing: Processors share data, but not control. Focus on low-level parallelization, synchronization.

Distributed AI: Control as well as data is distributed. Focus on problem solving, communication, and coordination. Distributed Problem Solving (DPS): Task

decomposition and/or solution synthesis. Multiagent Systems (MAS): Behavior coordination

or behavior management.ERBASE2011

18

MAS Taxonomy

Degree of Heterogeneity and Communication [Peter Stone,2000]

• The degree to which different agents play different roles is certainly an important MAS issue

• The degree to which the agents communicateERBASE2011

19

Issues in Building MAS

Agent’s Issues

• Reactive vs. Deliberative agents• Local vs. Global perspective• Benevolence vs. Competitiveness

ERBASE2011

20

Homogeneous Non-Communicating Multiagent Systems Several different agents with identical structure

(sensors, effectors, domain knowledge, and decision functions).

Different sensor input and effectors output. Situated differently in the environment and

they make their own decisions regarding which actions to take.

ERBASE2011

21

Heterogeneous Non-Communicating Multiagent Systems Agents are situated differently in the

environment Different sensory inputs and different

actions

ERBASE2011

22

Homogeneous Communicating Multiagent Systems

Agents are identical that they are situated differently in the environment

Agents can communicate together directly

ERBASE2011

23

Heterogeneous Communicating Multiagent Systems

Different sensory data, goals, actions, and domain knowledge

ERBASE2011

24

Learning Opportunities

Problems and Issues

• Modeling other agents• Agent Cooperation

• Collective intelligence• Agent interaction• Teamwork, formation, coordination

• Negotiation• Reactive vs. deliberative approaches• Agreement Technologies• Multiagent Reasoning, Planning, Adaptation• Resource Management

ERBASE2011

25

Importance of MAS

Research in “Distributed AI” started over 30 years ago, but only in the mid of 1990s has it become a major research trend in AI.

Now the main conference (AAMAS) attracts around 800 submissions (of which 20-25% get accepted) each year. In addition, there are dozens of smaller workshops and

conferences.

it’s a large, young and dynamic research community

ERBASE2011

26

RoboCupCase study in Multiagent System

ERBASE2011

27

What is RoboCup?

RoboCup is an international research and education initiative, attempting to foster Artificial Intelligence and Robotics research by providing a standard problem where a wide range of technologies can be integrated and examined.

Real-time sensor fusion

Reactive behavior

Strategy acquisition

Learning

Real-time planning

Multi-agent systems

Context recognition

Vision

Strategic decision-making

Motor control

Intelligent robot control

and many moreERBASE2011

28

Domain Characteristics

Environment

State Change

Info. Accessibility

Sensor Readings

Control

Dynamic

Real time

Incomplete

Non-symbolic

Distributed

ERBASE2011

29

The Standard Problem

RoboCup SoccerThe game of football, where the research

goals concern cooperative multi-robot

and multi-agent systems in dynamic

adversarial environments.

All robots in this league are fully autonomous.

RoboCup RescueIs a project to promote

research and development in

disaster rescue at various levels

involving multi-agent team work

coordination and physical robotic

agents for search and rescue.

30

RoboCup Soccer

Distributed Multiagent Teammates and

adversaries domain

Partial world view Noisy sensors and

actuators Real-time

ERBASE2011

31

Applied Machine Learning

ERBASE2011

The general aim of Machine Learning is to produce intelligent programs, often called agents, through a process of learning and evolving.

Algorithm types

• Supervised learning• Unsupervised learning• Reinforcement learning

32

Reinforcement Learning

Reinforcement Learning (RL) is an area of machine learning, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward.

• RL lends itself wonderfully to agent-oriented systems, where agents “have explicit goals, can sense aspects of their environments, and can choose actions to influence their environments”.

ERBASE2011

33

The Agent-Environment Interface

Agent and environment interact at discrete time steps: t=0, 1, 2, … Agent observes state at step t: st S Produces action at step t: at A(st) Gets resulting rewards: rt+1 R And resulting next step: St+1

ERBASE2011

34

General Process of RL

The "cause and effect" idea in RL

• The agent observes an input state.• An action is determined by a decision making function (policy).• The action is performed.• The agent receives a scalar reward or reinforcement from the

environment.• Information about the reward given for that state / action pair is

recorded.ERBASE2011

35

Action Selection Policies

-greedy

• Most of the time the action with the highest estimated reward is chosen, called the greediest action. An action is selected at random with a small probability .

-soft

• The best action is selected with probability 1 - and the rest of the time a random action is chosen uniformly.

softmax

• Softmax assigns a rank or weight to each of the actions, according to their action-value estimate. A random action is selected with regards to the weight associated with each action, meaning the worst actions are unlikely to be chosen.

36

Exploration and Exploitation

Exploiting (Off-policy)

• The action selection policy is so strict that it always chooses an action that gives the most reward previously.

Exploring (On-policy)

• The policy is so trying other possibilities may produce a better reward.

ERBASE2011

37

Subtask of RoboCup Soccer

Keepaway Soccer [UT Austin Vila, 2005]

• Keepaway, in which one team, the keepers, tries to maintain possession of the ball within a limited region, while the opposing team, the takers, attempts to gain possession.

ERBASE2011

38

SARSA (State-Action-Reward-State-Action)

SARSA is a learning algorithm in the reinforcement learning area of machine learning. On-policy learning method, It learns state-action values (Q values).

Qt+1(st , at) Qt(st , at) + [rt+1 + Qt(st+1 , at+1) - Qt(st , at)]

• Qt (st , at): old value• (0 ≤ ≤ 1): learning rate• (0 ≤ ≤ 1): discount factor• rt+1: rewardERBASE2011

39

SARSA Algorithm

Procedural steps

• Initialize the Q-values table, Q(s, a).• Observe the current state, s.• Choose an action, a, for that state based on one of the action selection policies.• Take the action, and observe the reward, r, as well as the new state, s'.• Update the Q-value for the state using the observed reward and the maximum

reward possible according to the formula for the next state.• Set the state to the new state, and repeat the process until a terminal state is

reached.ERBASE2011

40

Mapping Keepaway to SARSA

Episode

• An episode begins when the player is first asked to make a decision and ends when possession of the ball is lost by the keepers.

Actions

• HoldBall()• PassBall(k)• GetOpen()• GoToBall()• BlockPass(k)

Reward

• The reward ri is the number of primitive time steps that elapsed while following action ai−1: ri = ti − ti−1.

ERBASE2011

41

Hand-coded Algorithm

The keepers’ policy space

ERBASE2011

42

Result

Learned Agents Hand-coded Agents

keepers hold the ball for about 12 seconds on

average

keepers hold the ball for about 8.2 seconds on

averageERBASE2011

43

Summary

ERBASE2011

Programmers would like to develop software in more human oriented (agent oriented)

An agent is an intelligent software program which is: Situated, Autonomous, Flexible and Robust

Multiagent System is comprised of a number of agents that they are able to cooperate, coordinate and negotiate with each other

RoboCup is an international standard project which fosters artificial intelligence

Reinforcement Learning is one of the machine learning approaches that is adapted excellently for agent oriented environments

44

References

An Introduction to MultiAgent Systems

• Michael Wooldridge (Author)

Reinforcement Learning: An Introduction

• Richard S. Sutton and Andrew G. Barto

ERBASE2011

45

Other Articles

RoboCup Publications

• http://www.robocup.org/category/papers-publications/

Peter Stone

• http://www.cs.utexas.edu/~pstone/

ERBASE2011

46

Thanks For Your Attention

ERBASE2011

RoboCup: A Case Study in Multiagent System

Documents

Transcript of RoboCup: A Case Study in Multiagent System