Bulding Practical Agent Teams: A hybrid perspective Milind Tambe [email protected] Computer Science Dept...

Bulding Practical Agent Teams: Bulding Practical Agent Teams: A hybrid perspectiveA hybrid perspective

Milind Tambe [email protected] Science Dept

University of Southern California

Joint work with theTEAMCORE GROUPhttp://teamcore.usc.edu

SBIA 2004

Long-Term Research Goal

Building large-scale heterogeneous teams

Types of entities: Agents, people, sensors, resources, robots,..

Scale: 1000s or more

Domains: Highly uncertain, real-time, dynamic

Activities: Form teams, persist for long durations, coordinate, adapt…

Some applications:

Large-scale disaster rescue Agent facilitated human orgsLarge area security

WWWAgent ProxiesFor People

InformationAgents

Ontology-basedMatchmakers

Domains and Motivations

Team Scale & Complexity

Task &domaincomplexity

Small-scalehomogeneous

Small-scaleheterogeneous

Large-scaleheterogeneous

Low

Medium

High

WWWAgent ProxiesFor People

InformationAgents

Ontology-basedMatchmakers

Motivation: BDI+POMDP Hybrids

Teamcore proxy

Team proxyTeam proxy

TOP: Team plans, organizations, agents

ExtinguishFires

ExecuteRescue

Rescue civilians

Extinguish

[Ambulanceteam]

[RAP team]

[Firecompany]

ClearRoads

Compute Optimal Policy usingDistributed Partially Observable

Markov Decision Processes (POMDPs)

• BDI approach

Frameworks: Teamcore/Machinetta, GPGP,…•+ve: Ease of use for human developers; coordinate large-scale teams• -ve: Quantitative team evaluations difficult (given uncertainty/cost)

Frameworks: MTDP, DEC-MDP/DEC-POMDP, POIPSG,…•+ve: Quantitative of team performance evaluation easy (with uncertainty)•-ve: Scale-up difficult, difficult for human developers to program policies

• Distributed POMDP approach

BDI + POMDP Synergy

Teamcore proxy


ExtinguishFires

ExecuteRescue

Rescue civilians

Extinguish

[Ambulance]

[RAP team]

[Firecompany]

ClearRoads

Distributed POMDPs for TOP & proxy analysis and refinement

Combine “traditional” TOP approaches with distributed POMDPs POMDPs improve TOP/proxies: E.g., Improve role allocationTOP constrain POMDP policy search: Orders of magnitude speedup

Role allocation algorithms Communication algorithms

Overall Research Framework

Teamwork proxy infrastructure

Offline, optimal On-line approximate

Agent-agent Adopt DCOP: Asynch complete distributed constraint optimize (Modi et al, 03, Maheswar et al 04)

Equilibrium, threshold (Okamoto,03, Maheswar et al, 04)

Agent-human

(Adjustable autonomy)

Optimal transfer-of-control strategies (via MDPs/POMDPs) (Scerri et al,’02)

?

Explicit Implicit(Plan recognition)

Agent-agent

BDI teamwork theories + decision theoretic filter (Pynadath/Tambe, 03)

Socially attentive Monitoring (Kaminka et al 00)

Agent-human

?Monitoring by overhearing (Kaminka et al 02)

Distributed POMDP Analysis:Multiagent Team Decision Problem (MTDP)

(Nair et al 03b, Nair et al 04, Paruchuri et al 04)

Electric Elves: 24/7 from 6/00 to 12/00 (Chalupsky et al, IAAI’2001)

“ More & More computers are ordering food,…we need to think about marketing [to these computers]” local Subway owner

Papers

MeetMaker

Teamcoreproxy

Scheduleragent

Teamcoreproxy

InterestMatcher

Teamcoreproxy

• Reschedule meetings• Decide presenters •Order our meals

Teamcoreproxy

Teamcoreproxy

Modules within the Proxies: AA (Scerri, Pynadath and Tambe, JAIR’2002)

• Reschedule meetings

Teamcoreproxy

Team-oriented Program

Communication

Role allocation

Adjustable autonomy

Proxy algorithms

Communication

Role allocation

Adj. Autonomy:MDPs for

transfer-of-controlpolicies

: Meeting

Role: user arriveson time

Autonomous and User Delays

0

2

4

6

8

10

12

14

16

UsersD

elay

ed M

eetin

gs

User DelayAutonomous Delay

MDP Policies: Planned sequence of transfers of control, coordination changes E.g., ADAH: Ask, delay, ask, cancel

Back to Hybrid BDI-POMDP Frameworks

Motivation: Communication in Proxies

Proxy’s heuristic “BDI” communication rules example:

RULE1(“joint intentions” {Levesque et al 90}): If (fact F agent’s private state) AND F matches goal of team’s plan AND (F team state) Then possible communicative goal CG to communicate F

RULE2: If possible communicative goal CG AND ( miscoordination-cost > Communication-cost) Then Communicate CG

Motivation: Earlier BDI Evaluation

Comm NoComm

ISIS97-CMUnited97 3.27 1.73

ISIS97-Andhill97 - 3.38 -4.36

ISIS98-CMUnited97 4.04 3.91

ISIS98-Andhill97 -1.53 -2.13

0100200300400500600

2 3 4 5 6 7 8

Helicopter domain

Testing Communication Selectivity

(Pynadath & Tambe, JAAMAS’03)

Testing teamwork in RoboCup

(Tambe et al, IJCAI’99)

• Quantiative analysis of optimality or complexity of optimal response difficult • Challenge in domains with significant uncertainty and costs

Distributed POMDPs

COM-MTDP (Pynadath and Tambe, 02) RMTDP (Nair, Tambe, Marsella 03)

S: states of the world (e.g., helicopter position, enemy position)

Ai: Actions (Communicate action, domain action )

P: State transition probabilities

R: Reward; sub-divided based on action types

STATE

i

COM-MTDP: Analysis of Comunication

: observations (e.g., E enemy-on-radar, NE enemy-not-on-radar)

O: probability of observation given destination state & past action

Belief state (each Bi history of observations, messages)

Individual policies : Bi i (Domain action)

: Bi i (Communication)

Goal: Find joint policies and maximize total expected reward

STATE

E,E E,NE NE,NE NE,E

0.1 0.4 0.1 0.4

Table per state, previous action

Landmark1, Landmark2,E,NE…

Complexity Results in COM-MTDP

Individualobservability

Collectiveobservability

CollectivePartial obser

Noobservability.

Nocommunication

P-complete NEXPcomplete

NEXPcomplete

NPcomplete

Generalcommunication

P-complete NEXPcomplete

NEXPComplete

NPcomplete

Fullcommunication

P-complete P-complete PSPACEcomplete

NPcomplete

Complexity:

I. Locally optimal solution (No global team optimality)II. Hybrid approach: POMDP + BDI

Approach I: Locally Optimal Policy (Nair et al 03)

Repeat until convergence to local equilibrium, for each agent K:

Fix policy for all except agent KFind optimal response policy for agent K

Find optimal response policy for agent K, given fixed policies for others:

Problem becomes finding an optimal policy for a single agent POMDP

“Extended” state defined as not as

Define new transition function

Define new observation function

Define multiagent belief state

Dynamic programming over belief states

Significant speedup over exhaustive search, but problem size limited

II: Hybrid BDI + POMDP

Domain


CommunicationRole

allocationAdjustable autonomy

Proxy algorithms

Distributed POMDP Model (Exploit TOP)

VaryCommunpolicies

AA : Fixed

action policyCOM-MTDP:

Evaluate alternate communication policies

Feedback for modifying proxy communication algorithms

Derive locally, globallyoptimal communication

Policy

OptimalOptimal

Compare Communication Policiesover Different Domains

Given domain, for different observability conditions & comm costs: Evaluate Teamcore (rule1+rule2); Jennings, others, compare with optimal Optimal:

TEAMCORE

: O(|S|| |)T

Distributed POMDPs to Analyze Role Allocations: RMTDP

Role Allocation: Illustration

Task: Move cargo from X to Y, large reward for cargo at destination• Three routes with varying length and failure rates• Scouts make a route safe for transports

Uncertainty: In actions and observations• Scouts may fail along a route (and transports may replace scouts)• Scouts failure rate decreases if more scouts to a route• Scouts’ failure may not be observable to transports

Team-Oriented Program

Organization hierarchy

Plan hierarchy

Best initial role allocation: How many helos in SctTeam A, B, C & Transport

TOP: Almost entire RMTDP policy is completely fixed

Policy gap only on step 1: Best role allocation in initial state for each agent

Assume six helicopter agents: 84 combinations (84 RMTDP policies)

Analyzing Role Allocation in Teamwork

Domain


Role allocation

CommunicationAdjustable autonomy

Proxy algorithms

Distributed POMDP Model

R-MTDP:Evaluate alternate role-taking policies

Feedback for specific role allocation in TOP

Search policy space foroptimal role-taking policy

OptOptRole-takingRole-taking

Role executionPolicy

S1S2

S3

S4

S5? ….

Fill in gapsIn policies

RMTDP Policy Search: Efficiency Improvements

Belief-based policy evaluation Not entire observation histories, only beliefs required by TOP

Form hierarchical policy groups for branch-&-bound searchObtain upper bound on values of policies within a policy-groupIf individual policies higher valued than a group, prune the groupExploit TOP for generating policy groups, and for upper bounds

V 1tπ)ω 1t

2,ω 1t1,ωt

22 , πωt

11π,S 1tO( )S 1t,ωt

22 , πωt

11π,StP(

)ωt

22π , ωt

11π,R(St)ωt

2 , ωt1,St(Vt

π

V 1tπ)1

2,1

1,

22π ,

11π,S 1t(O )S 1t,

22π ,

11π,StP(

)22

π , 11

π,R(St)2

,1

,St(Vtπ

tttttt

tttt

E.g., history: T=1:<Scout1okay, Scout2fail>; T=2:<Scout1fail, Scout2fail> history: T=1:<Scout1okay, Scout2okay>; T=2:<Scout1fail, Scout2fail>E.g., T=2: <CriticalFailure>

MaxExp: Hierarchical Policy Groups

6

6

1 5

6

0 6

6

2 4

6

3 3

6

4 2

6

1 5

1 0 0

6

1 5

0 0 1

6

2 4

1 1 0

6

2 4

0 0 2… ….

41670192627733420

2926

MaxExp: Upperbound Policy Group Value

Obtain max for each component over all start states & observation histories If each component independent: Can evaluate each separately

Dependence: Start of next component based on end state of previous

Why speedup:

No duplicate start states: multiple paths of previous component merge

No duplicate observation histories

6

2 4

3420

DoScouting[Scout 2; Transport 4]

DoTransport[Transport from previous]

RemainScouts[Scout from previous]

[84] [3300] [36]

Team-A =2Team-B =0Team-C =0Transport =4

Team-A =1Team-B =1Team-C =0Transport =4

…

SafeRoute=1Transport=3

SafeRoute=2Transport=4…

Helicopter Domain: Computational Savings

0

50

100

150

200

250

300

350

3 4 5 6 7 8 9 10

Number of agents

Num

ber o

f nod

es

-2

-1

0

1

2

3

4

5

3 4 5 6 7 8 9 10

Number of agents

Tim

e i

n s

ecs

(lo

g s

ca

le)

NOPRUNE-OBSNOPRUNE

MAXEXP

NOFAIL

NOPRUNE-OBS: No pruning, maintain full observation history NOPRUNE: No pruning, maintain beliefs not observation histories MAXEXP: Pruning using MAXEXP heuristic, using beliefs NOFAIL: MAXEXP enhanced with “no failure” for quicker upper bound

Does RMTDP Improve Role Allocation?

0

1

2

3

4

5

6

7

4 5 6 7 8

Number of agents

Num

ber

tran

spor

ts

RMTDP

COP

MDP

RoboCup Rescue: Computational Savings

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

2 3 4 5 6 7

Number of ambulances

Nu

mb

er

of

no

de

s

MAXEXPNOPRUNE

0

1

2

3

4

5

6

2 3 4 5 6 7

Number of ambulances

Run

tim

e in

sec

s (lo

g sc

ale)

RoboCupRescue: RMTDP Improves Role Allocation

Uniform civilian distribution

0

1

2

3

Civilians casualties Building damage

Skewed Civilian Distribution

0

1

2

3

Civilians casualties Building damage

SUMMARY

Team proxy


COM-MTDP & R-MTDP:Distributed POMDPs for analysis

Combine “traditional” TOP approaches with distributed POMDPs Exploit POMDPs to improve TOP/teamcore proxies Exploit TOP to constrain POMDP policy search Key policy evaluation complexity results

TOP: Team plans organizations, agents

Future Work

Agent-basedSimulationtechnologyVisualization

Trainee

Thank You

Contact:

Milind Tambe

[email protected]

http://teamcore.usc.edu/tambe

http://teamcore.usc.edu

Key Papers cited in this Presentation

Rajiv T. Maheswaran, Jonathan P. Pearce, and Milind Tambe. Distributed Algorithms for DCOP: A Graphical Game-Based Approach. Proceedings of the 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004).

Praveen Paruchuri, Milind Tambe, Fernando Ordonez, Sarit Kraus, Towards a formalization of teamwork with resource constraints, International Joint Conference on Autonomous Agents and Multiagent Systems, 2004.

Ranjit Nair, Maayan Roth, Makoto Yokoo and Milind Tambe: "Communication for Improving Policy Computation in Distributed POMDPs". In Proceedings of The Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04), 2004. (Post-script/PDF).

Rajiv T. Maheswaran, Milind Tambe, Emma Bowring, Jonathan P. Pearce, Pradeep Varakantham "Taking DCOP to the Real World : Efficient Complete Solutions for Distributed Event Scheduling". In Proceedings of the third International Joint Conference on Agents and Multi Agent Systems, AAMAS-2004.

Modi, P.J., Shen, W., Tambe, M., Yokoo, M. “Solving Distributed Constraint Optimization Problems Optimally, Efficiently and Asynchronously” Artificial Intelligence Journal (accepted)

D.V.Pynadath and M.Tambe. Automated teamwork among heterogeneous software agents and humans. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS). 7:71--100, 2003.** [pdf] **

Nair, R., Tambe, M., Yokoo, M., Pynadath, D. and Marsella, S. Taming Decentralized POMDPs: Towards efficient policy computation for multiagent settings Proceedings of the International Joint conference on Artificial Intelligence (IJCAI), 2003

Nair, R., Tambe, M., and Marsella, S. Role allocation and reallocation in multiagent teams: Towards a practical analysis Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003

Scerri, P., Johnson, L., Pynadath, D., Rosenbloom, P. Si, M., Schurr, N. and Tambe, M. A prototype infrastructure for distributed robot, agent, person teams Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003

Scerri, P. Pynadath, D. and Tambe, M. Towards adjustable autonomy for the real-world Journal of AI Research (JAIR), 2002, Volume 17, Pages 171-228 ** [pdf] **

Pynadath, D. and Tambe, M. The communicative multiagent team decision problem: Analyzing teamwork theories and models Journal of AI Research (JAIR), 2002

Kaminka, G., Pynadath, D. and Tambe, M. Monitoring teams by overhearing: A multiagent plan-recognition approach Journal of AI Research (JAIR), 2002 ** [pdf] **

All the Co-authors

Bulding Practical Agent Teams: A hybrid perspective Milind Tambe [email protected] Computer Science Dept...

Documents

Transcript of Bulding Practical Agent Teams: A hybrid perspective Milind Tambe [email protected] Computer Science Dept...