RoboCup: A Case Study in Multiagent System

46
RoboCup: A Case Study in Multiagent System Vahid Mokhtari

description

Vahid Mokhtari. RoboCup: A Case Study in Multiagent System. Trends. 1. What is an Agent?. 2. Multiagent System. 3. 3. Case study in RoboCup. 4. 4. Contents. Trends in History of Computing. From Programming Perspective. What is an Agent?. Agent and Environment. Environment. - PowerPoint PPT Presentation

Transcript of RoboCup: A Case Study in Multiagent System

Page 1: RoboCup: A Case Study in Multiagent System

RoboCup: A Case Study in Multiagent System

Vahid Mokhtari

Page 2: RoboCup: A Case Study in Multiagent System

2

Contents

Trends1

What is an Agent?2

Multiagent System33

Case study in RoboCup44

ERBASE2011

Page 3: RoboCup: A Case Study in Multiagent System

3

Trends in History of Computing

Ubiquity As processing capability spreads, sophistication

becomes ubiquitous

Interconnection Computer systems no

longer stand alone, but are networked into large

distributed systemsIntelligence

The complexity of tasks that we are capable of

automating and delegating to computers

has grown steadilyDelegation

We are giving control to computers, even in safety

critical tasks

Human-orientation

Programmers conceptualize and

implement software in terms of ever higher-level – more human-oriented –

abstractions

Page 4: RoboCup: A Case Study in Multiagent System

4

From Programming Perspective

Programming has progressed through

• machine code• assembly language• machine-independent programming languages• procedures & functions• abstract data types• objects• to agents

ERBASE2011

Page 5: RoboCup: A Case Study in Multiagent System

5

What is an Agent?

“An agent is an encapsulated computer system, situated in some environment, and capable of autonomous action in that environment in order to meet its design objectives”. Mike Wooldridge

ERBASE2011

Page 6: RoboCup: A Case Study in Multiagent System

6

Agent and Environment

ERBASE2011

Page 7: RoboCup: A Case Study in Multiagent System

7

Environment

Accessible vs. Inaccessible - can the agent “see” everything?

Deterministic vs. Non-deterministic - do actions have guaranteed effect?

Static vs. Dynamic - does the environment change on its own?

Discrete vs. Continuous - is the number of actions and percepts finite?

ERBASE2011

Page 8: RoboCup: A Case Study in Multiagent System

8

What is an Intelligent Agent?

“An intelligent agent is a piece of software that is:

• Reactive – respond to changes in its environment

• Proactive – persistently pursues its goals• Social – interacts with other agents.

• Cooperation• Coordination• Negotiation

ERBASE2011

Page 9: RoboCup: A Case Study in Multiagent System

9

Examples of Intelligent Agents Assistant agent in

MS Office

Trading agents

Web spiders

Computer viruses

Characters in computer games

ERBASE2011

Page 10: RoboCup: A Case Study in Multiagent System

10

Agents vs. Objects

Agent•Autonomous - make a decision base on receiving request•Active - can decide when to act and how•Each agent has its own thread of control

Object•Non-autonomous - make a decision base on invoking method•Passive - have no control over a method execution•There is a single thread of control in the system

ERBASE2011

Page 11: RoboCup: A Case Study in Multiagent System

11

Agent Oriented Architectures

First Classification• Reactive – receives input, processes it and produces an

output.• Deliberative – has an internal view of its environment and

is able to follow its own plans.• Hybrid – mixture of reactive and deliberative, that follows

its own plans, but also sometimes directly reacts to external events without deliberation.

BDI Agents Architecture• Beliefs: represent the informational state of the agent.• Desires: represent objectives or situations that the agent

would like to accomplish. find the best price, go to the party or become rich.

• Intentions: represent the specific goal (or set of goals) to commit to.

ERBASE2011

Page 12: RoboCup: A Case Study in Multiagent System

12

Agent Types (1)

Collaborative Agents

• emphasise autonomy and cooperation with other agents• negotiate with their peers to reach mutually acceptable agreements

during co-operative problem solving

Mobile Agents

• roam wide area networks• interact with foreign hosts• perform tasks on behalf of their owners• return ‘home’• collaborative agents that move across networks

ERBASE2011

Page 13: RoboCup: A Case Study in Multiagent System

13

Agent Types (2)

Interface Agents

• emphasise autonomy and learning in order to perform tasks for their owners• support and provide proactive assistance to a user using a particular application• limited cooperation with other agents• interface agent = personal assistant = personal digital assistant = personal agent

Information Agents

• manage, manipulate or collate information from many distributed sources on WANs

• information agents = internet agents ERBASE2011

Page 14: RoboCup: A Case Study in Multiagent System

14

Multiagent System (MAS)

A Multiagent System is one that consists of a number of agents, which interact with one-another. To successfully interact, they will require the ability to cooperate, coordinate, and negotiate with each other.ERBASE2011

Page 15: RoboCup: A Case Study in Multiagent System

15

Agents model each other’s goals, actions, and domain knowledge, which may differ and interact directly

The Fully General Multiagent System

ERBASE2011

Page 16: RoboCup: A Case Study in Multiagent System

16

Why MAS?

Some domains require it Parallelism Robustness Scalability Simpler programming To study intelligence Geographic distribution Cost effectiveness

ERBASE2011

Page 17: RoboCup: A Case Study in Multiagent System

17

MAS Research Area

Distributed Computing: Processors share data, but not control. Focus on low-level parallelization, synchronization.

Distributed AI: Control as well as data is distributed. Focus on problem solving, communication, and coordination. Distributed Problem Solving (DPS): Task

decomposition and/or solution synthesis. Multiagent Systems (MAS): Behavior coordination

or behavior management.ERBASE2011

Page 18: RoboCup: A Case Study in Multiagent System

18

MAS Taxonomy

Degree of Heterogeneity and Communication [Peter Stone,2000]

• The degree to which different agents play different roles is certainly an important MAS issue

• The degree to which the agents communicateERBASE2011

Page 19: RoboCup: A Case Study in Multiagent System

19

Issues in Building MAS

Agent’s Issues

• Reactive vs. Deliberative agents• Local vs. Global perspective• Benevolence vs. Competitiveness

ERBASE2011

Page 20: RoboCup: A Case Study in Multiagent System

20

Homogeneous Non-Communicating Multiagent Systems Several different agents with identical structure

(sensors, effectors, domain knowledge, and decision functions).

Different sensor input and effectors output. Situated differently in the environment and

they make their own decisions regarding which actions to take.

ERBASE2011

Page 21: RoboCup: A Case Study in Multiagent System

21

Heterogeneous Non-Communicating Multiagent Systems Agents are situated differently in the

environment Different sensory inputs and different

actions

ERBASE2011

Page 22: RoboCup: A Case Study in Multiagent System

22

Homogeneous Communicating Multiagent Systems

Agents are identical that they are situated differently in the environment

Agents can communicate together directly

ERBASE2011

Page 23: RoboCup: A Case Study in Multiagent System

23

Heterogeneous Communicating Multiagent Systems

Different sensory data, goals, actions, and domain knowledge

ERBASE2011

Page 24: RoboCup: A Case Study in Multiagent System

24

Learning Opportunities

Problems and Issues

• Modeling other agents• Agent Cooperation

• Collective intelligence• Agent interaction• Teamwork, formation, coordination

• Negotiation• Reactive vs. deliberative approaches• Agreement Technologies• Multiagent Reasoning, Planning, Adaptation• Resource Management

ERBASE2011

Page 25: RoboCup: A Case Study in Multiagent System

25

Importance of MAS

Research in “Distributed AI” started over 30 years ago, but only in the mid of 1990s has it become a major research trend in AI.

Now the main conference (AAMAS) attracts around 800 submissions (of which 20-25% get accepted) each year. In addition, there are dozens of smaller workshops and

conferences.

it’s a large, young and dynamic research community

ERBASE2011

Page 26: RoboCup: A Case Study in Multiagent System

26

RoboCupCase study in Multiagent System

ERBASE2011

Page 27: RoboCup: A Case Study in Multiagent System

27

What is RoboCup?

RoboCup is an international research and education initiative, attempting to foster Artificial Intelligence and Robotics research by providing a standard problem where a wide range of technologies can be integrated and examined.

Real-time sensor fusion

Reactive behavior

Strategy acquisition

Learning

Real-time planning

Multi-agent systems

Context recognition

Vision

Strategic decision-making

Motor control

Intelligent robot control

and many moreERBASE2011

Page 28: RoboCup: A Case Study in Multiagent System

28

Domain Characteristics

Environment

State Change

Info. Accessibility

Sensor Readings

Control

Dynamic

Real time

Incomplete

Non-symbolic

Distributed

ERBASE2011

Page 29: RoboCup: A Case Study in Multiagent System

29

The Standard Problem

RoboCup SoccerThe game of football, where the research

goals concern cooperative multi-robot

and multi-agent systems in dynamic

adversarial environments.

All robots in this league are fully autonomous. 

RoboCup RescueIs a project to promote

research and development in

disaster rescue at various levels

involving multi-agent team work

coordination and physical robotic

agents for search and rescue.

Page 30: RoboCup: A Case Study in Multiagent System

30

RoboCup Soccer

Distributed Multiagent Teammates and

adversaries domain

Partial world view Noisy sensors and

actuators Real-time

ERBASE2011

Page 31: RoboCup: A Case Study in Multiagent System

31

Applied Machine Learning

ERBASE2011

The general aim of Machine Learning is to produce intelligent programs, often called agents, through a process of learning and evolving.

Algorithm types

• Supervised learning• Unsupervised learning• Reinforcement learning

Page 32: RoboCup: A Case Study in Multiagent System

32

Reinforcement Learning

Reinforcement Learning (RL) is an area of machine learning, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward.

• RL lends itself wonderfully to agent-oriented systems, where agents “have explicit goals, can sense aspects of their environments, and can choose actions to influence their environments”.

ERBASE2011

Page 33: RoboCup: A Case Study in Multiagent System

33

The Agent-Environment Interface

Agent and environment interact at discrete time steps: t=0, 1, 2, … Agent observes state at step t: st S Produces action at step t: at A(st) Gets resulting rewards: rt+1 R And resulting next step: St+1

ERBASE2011

Page 34: RoboCup: A Case Study in Multiagent System

34

General Process of RL

The "cause and effect" idea in RL

• The agent observes an input state.• An action is determined by a decision making function (policy).• The action is performed.• The agent receives a scalar reward or reinforcement from the

environment.• Information about the reward given for that state / action pair is

recorded.ERBASE2011

Page 35: RoboCup: A Case Study in Multiagent System

35

Action Selection Policies

-greedy

• Most of the time the action with the highest estimated reward is chosen, called the greediest action. An action is selected at random with a small probability .

-soft

• The best action is selected with probability 1 - and the rest of the time a random action is chosen uniformly.

softmax

• Softmax assigns a rank or weight to each of the actions, according to their action-value estimate. A random action is selected with regards to the weight associated with each action, meaning the worst actions are unlikely to be chosen.

Page 36: RoboCup: A Case Study in Multiagent System

36

Exploration and Exploitation

Exploiting (Off-policy)

• The action selection policy is so strict that it always chooses an action that gives the most reward previously.

Exploring (On-policy)

• The policy is so trying other possibilities may produce a better reward.

ERBASE2011

Page 37: RoboCup: A Case Study in Multiagent System

37

Subtask of RoboCup Soccer

Keepaway Soccer [UT Austin Vila, 2005]

• Keepaway, in which one team, the keepers, tries to maintain possession of the ball within a limited region, while the opposing team, the takers, attempts to gain possession.

ERBASE2011

Page 38: RoboCup: A Case Study in Multiagent System

38

SARSA (State-Action-Reward-State-Action)

SARSA is a learning algorithm in the reinforcement learning area of machine learning. On-policy learning method, It learns state-action values (Q values).

Qt+1(st , at) Qt(st , at) + [rt+1 + Qt(st+1 , at+1) - Qt(st , at)]

• Qt (st , at): old value• (0 ≤ ≤ 1): learning rate• (0 ≤ ≤ 1): discount factor• rt+1: rewardERBASE2011

Page 39: RoboCup: A Case Study in Multiagent System

39

SARSA Algorithm

Procedural steps

• Initialize the Q-values table, Q(s, a).• Observe the current state, s.• Choose an action, a, for that state based on one of the action selection policies.• Take the action, and observe the reward, r, as well as the new state, s'.• Update the Q-value for the state using the observed reward and the maximum

reward possible according to the formula for the next state.• Set the state to the new state, and repeat the process until a terminal state is

reached.ERBASE2011

Page 40: RoboCup: A Case Study in Multiagent System

40

Mapping Keepaway to SARSA

Episode

• An episode begins when the player is first asked to make a decision and ends when possession of the ball is lost by the keepers.

Actions

• HoldBall()• PassBall(k)• GetOpen()• GoToBall()• BlockPass(k)

Reward

• The reward ri is the number of primitive time steps that elapsed while following action ai−1: ri = ti − ti−1.

ERBASE2011

Page 41: RoboCup: A Case Study in Multiagent System

41

Hand-coded Algorithm

The keepers’ policy space

ERBASE2011

Page 42: RoboCup: A Case Study in Multiagent System

42

Result

Learned Agents Hand-coded Agents

keepers hold the ball for about 12 seconds on

average

keepers hold the ball for about 8.2 seconds on

averageERBASE2011

Page 43: RoboCup: A Case Study in Multiagent System

43

Summary

ERBASE2011

Programmers would like to develop software in more human oriented (agent oriented)

An agent is an intelligent software program which is: Situated, Autonomous, Flexible and Robust

Multiagent System is comprised of a number of agents that they are able to cooperate, coordinate and negotiate with each other

RoboCup is an international standard project which fosters artificial intelligence

Reinforcement Learning is one of the machine learning approaches that is adapted excellently for agent oriented environments

Page 44: RoboCup: A Case Study in Multiagent System

44

References

An Introduction to MultiAgent Systems

• Michael Wooldridge (Author)

Reinforcement Learning: An Introduction

• Richard S. Sutton and Andrew G. Barto

ERBASE2011

Page 45: RoboCup: A Case Study in Multiagent System

45

Other Articles

RoboCup Publications

• http://www.robocup.org/category/papers-publications/

Peter Stone

• http://www.cs.utexas.edu/~pstone/

ERBASE2011

Page 46: RoboCup: A Case Study in Multiagent System

46

Thanks For Your Attention

ERBASE2011