Bulding Practical Agent Teams: A hybrid perspective Milind Tambe [email protected] Computer Science Dept...
-
Upload
ezra-crawford -
Category
Documents
-
view
221 -
download
0
Transcript of Bulding Practical Agent Teams: A hybrid perspective Milind Tambe [email protected] Computer Science Dept...
Bulding Practical Agent Teams: Bulding Practical Agent Teams: A hybrid perspectiveA hybrid perspective
Milind Tambe [email protected] Science Dept
University of Southern California
Joint work with theTEAMCORE GROUPhttp://teamcore.usc.edu
SBIA 2004
Long-Term Research Goal
Building large-scale heterogeneous teams
Types of entities: Agents, people, sensors, resources, robots,..
Scale: 1000s or more
Domains: Highly uncertain, real-time, dynamic
Activities: Form teams, persist for long durations, coordinate, adapt…
Some applications:
Large-scale disaster rescue Agent facilitated human orgsLarge area security
WWWAgent ProxiesFor People
InformationAgents
Ontology-basedMatchmakers
Domains and Motivations
Team Scale & Complexity
Task &domaincomplexity
Small-scalehomogeneous
Small-scaleheterogeneous
Large-scaleheterogeneous
Low
Medium
High
WWWAgent ProxiesFor People
InformationAgents
Ontology-basedMatchmakers
Motivation: BDI+POMDP Hybrids
Teamcore proxy
Team proxyTeam proxy
TOP: Team plans, organizations, agents
ExtinguishFires
ExecuteRescue
Rescue civilians
Extinguish
[Ambulanceteam]
[RAP team]
[Firecompany]
ClearRoads
Compute Optimal Policy usingDistributed Partially Observable
Markov Decision Processes (POMDPs)
• BDI approach
Frameworks: Teamcore/Machinetta, GPGP,…•+ve: Ease of use for human developers; coordinate large-scale teams• -ve: Quantitative team evaluations difficult (given uncertainty/cost)
Frameworks: MTDP, DEC-MDP/DEC-POMDP, POIPSG,…•+ve: Quantitative of team performance evaluation easy (with uncertainty)•-ve: Scale-up difficult, difficult for human developers to program policies
• Distributed POMDP approach
BDI + POMDP Synergy
Teamcore proxy
Team proxyTeam proxy
ExtinguishFires
ExecuteRescue
Rescue civilians
Extinguish
[Ambulance]
[RAP team]
[Firecompany]
ClearRoads
Distributed POMDPs for TOP & proxy analysis and refinement
Combine “traditional” TOP approaches with distributed POMDPs POMDPs improve TOP/proxies: E.g., Improve role allocationTOP constrain POMDP policy search: Orders of magnitude speedup
Role allocation algorithms Communication algorithms
Overall Research Framework
Teamwork proxy infrastructure
Offline, optimal On-line approximate
Agent-agent Adopt DCOP: Asynch complete distributed constraint optimize (Modi et al, 03, Maheswar et al 04)
Equilibrium, threshold (Okamoto,03, Maheswar et al, 04)
Agent-human
(Adjustable autonomy)
Optimal transfer-of-control strategies (via MDPs/POMDPs) (Scerri et al,’02)
?
Explicit Implicit(Plan recognition)
Agent-agent
BDI teamwork theories + decision theoretic filter (Pynadath/Tambe, 03)
Socially attentive Monitoring (Kaminka et al 00)
Agent-human
?Monitoring by overhearing (Kaminka et al 02)
Distributed POMDP Analysis:Multiagent Team Decision Problem (MTDP)
(Nair et al 03b, Nair et al 04, Paruchuri et al 04)
Electric Elves: 24/7 from 6/00 to 12/00 (Chalupsky et al, IAAI’2001)
“ More & More computers are ordering food,…we need to think about marketing [to these computers]” local Subway owner
Papers
MeetMaker
Teamcoreproxy
Scheduleragent
Teamcoreproxy
InterestMatcher
Teamcoreproxy
• Reschedule meetings• Decide presenters •Order our meals
Teamcoreproxy
Teamcoreproxy
Modules within the Proxies: AA (Scerri, Pynadath and Tambe, JAIR’2002)
• Reschedule meetings
Teamcoreproxy
Team-oriented Program
Communication
Role allocation
Adjustable autonomy
Proxy algorithms
Communication
Role allocation
Adj. Autonomy:MDPs for
transfer-of-controlpolicies
: Meeting
Role: user arriveson time
Autonomous and User Delays
0
2
4
6
8
10
12
14
16
UsersD
elay
ed M
eetin
gs
User DelayAutonomous Delay
MDP Policies: Planned sequence of transfers of control, coordination changes E.g., ADAH: Ask, delay, ask, cancel
Back to Hybrid BDI-POMDP Frameworks
Motivation: Communication in Proxies
Proxy’s heuristic “BDI” communication rules example:
RULE1(“joint intentions” {Levesque et al 90}): If (fact F agent’s private state) AND F matches goal of team’s plan AND (F team state) Then possible communicative goal CG to communicate F
RULE2: If possible communicative goal CG AND ( miscoordination-cost > Communication-cost) Then Communicate CG
Motivation: Earlier BDI Evaluation
Comm NoComm
ISIS97-CMUnited97 3.27 1.73
ISIS97-Andhill97 - 3.38 -4.36
ISIS98-CMUnited97 4.04 3.91
ISIS98-Andhill97 -1.53 -2.13
0100200300400500600
2 3 4 5 6 7 8
Helicopter domain
Testing Communication Selectivity
(Pynadath & Tambe, JAAMAS’03)
Testing teamwork in RoboCup
(Tambe et al, IJCAI’99)
• Quantiative analysis of optimality or complexity of optimal response difficult • Challenge in domains with significant uncertainty and costs
Distributed POMDPs
COM-MTDP (Pynadath and Tambe, 02) RMTDP (Nair, Tambe, Marsella 03)
S: states of the world (e.g., helicopter position, enemy position)
Ai: Actions (Communicate action, domain action )
P: State transition probabilities
R: Reward; sub-divided based on action types
STATE
i
COM-MTDP: Analysis of Comunication
: observations (e.g., E enemy-on-radar, NE enemy-not-on-radar)
O: probability of observation given destination state & past action
Belief state (each Bi history of observations, messages)
Individual policies : Bi i (Domain action)
: Bi i (Communication)
Goal: Find joint policies and maximize total expected reward
STATE
E,E E,NE NE,NE NE,E
0.1 0.4 0.1 0.4
Table per state, previous action
Landmark1, Landmark2,E,NE…
Complexity Results in COM-MTDP
Individualobservability
Collectiveobservability
CollectivePartial obser
Noobservability.
Nocommunication
P-complete NEXPcomplete
NEXPcomplete
NPcomplete
Generalcommunication
P-complete NEXPcomplete
NEXPComplete
NPcomplete
Fullcommunication
P-complete P-complete PSPACEcomplete
NPcomplete
Complexity:
I. Locally optimal solution (No global team optimality)II. Hybrid approach: POMDP + BDI
Approach I: Locally Optimal Policy (Nair et al 03)
Repeat until convergence to local equilibrium, for each agent K:
Fix policy for all except agent KFind optimal response policy for agent K
Find optimal response policy for agent K, given fixed policies for others:
Problem becomes finding an optimal policy for a single agent POMDP
“Extended” state defined as not as
Define new transition function
Define new observation function
Define multiagent belief state
Dynamic programming over belief states
Significant speedup over exhaustive search, but problem size limited
II: Hybrid BDI + POMDP
Domain
Team-oriented Program
CommunicationRole
allocationAdjustable autonomy
Proxy algorithms
Distributed POMDP Model (Exploit TOP)
VaryCommunpolicies
AA : Fixed
action policyCOM-MTDP:
Evaluate alternate communication policies
Feedback for modifying proxy communication algorithms
Derive locally, globallyoptimal communication
Policy
OptimalOptimal
Compare Communication Policiesover Different Domains
Given domain, for different observability conditions & comm costs: Evaluate Teamcore (rule1+rule2); Jennings, others, compare with optimal Optimal:
TEAMCORE
: O(|S|| |)T
Distributed POMDPs to Analyze Role Allocations: RMTDP
Role Allocation: Illustration
Task: Move cargo from X to Y, large reward for cargo at destination• Three routes with varying length and failure rates• Scouts make a route safe for transports
Uncertainty: In actions and observations• Scouts may fail along a route (and transports may replace scouts)• Scouts failure rate decreases if more scouts to a route• Scouts’ failure may not be observable to transports
Team-Oriented Program
Organization hierarchy
Plan hierarchy
Best initial role allocation: How many helos in SctTeam A, B, C & Transport
TOP: Almost entire RMTDP policy is completely fixed
Policy gap only on step 1: Best role allocation in initial state for each agent
Assume six helicopter agents: 84 combinations (84 RMTDP policies)
Analyzing Role Allocation in Teamwork
Domain
Team-oriented Program
Role allocation
CommunicationAdjustable autonomy
Proxy algorithms
Distributed POMDP Model
R-MTDP:Evaluate alternate role-taking policies
Feedback for specific role allocation in TOP
Search policy space foroptimal role-taking policy
OptOptRole-takingRole-taking
Role executionPolicy
S1S2
S3
S4
S5? ….
Fill in gapsIn policies
RMTDP Policy Search: Efficiency Improvements
Belief-based policy evaluation Not entire observation histories, only beliefs required by TOP
Form hierarchical policy groups for branch-&-bound searchObtain upper bound on values of policies within a policy-groupIf individual policies higher valued than a group, prune the groupExploit TOP for generating policy groups, and for upper bounds
V 1tπ)ω 1t
2,ω 1t1,ωt
22 , πωt
11π,S 1tO( )S 1t,ωt
22 , πωt
11π,StP(
)ωt
22π , ωt
11π,R(St)ωt
2 , ωt1,St(Vt
π
V 1tπ)1
2,1
1,
22π ,
11π,S 1t(O )S 1t,
22π ,
11π,StP(
)22
π , 11
π,R(St)2
,1
,St(Vtπ
tttttt
tttt
E.g., history: T=1:<Scout1okay, Scout2fail>; T=2:<Scout1fail, Scout2fail> history: T=1:<Scout1okay, Scout2okay>; T=2:<Scout1fail, Scout2fail>E.g., T=2: <CriticalFailure>
MaxExp: Hierarchical Policy Groups
6
6
1 5
6
0 6
6
2 4
6
3 3
6
4 2
6
1 5
1 0 0
6
1 5
0 0 1
6
2 4
1 1 0
6
2 4
0 0 2… ….
41670192627733420
2926
MaxExp: Upperbound Policy Group Value
Obtain max for each component over all start states & observation histories If each component independent: Can evaluate each separately
Dependence: Start of next component based on end state of previous
Why speedup:
No duplicate start states: multiple paths of previous component merge
No duplicate observation histories
6
2 4
3420
DoScouting[Scout 2; Transport 4]
DoTransport[Transport from previous]
RemainScouts[Scout from previous]
[84] [3300] [36]
Team-A =2Team-B =0Team-C =0Transport =4
Team-A =1Team-B =1Team-C =0Transport =4
…
SafeRoute=1Transport=3
SafeRoute=2Transport=4…
Helicopter Domain: Computational Savings
0
50
100
150
200
250
300
350
3 4 5 6 7 8 9 10
Number of agents
Num
ber o
f nod
es
-2
-1
0
1
2
3
4
5
3 4 5 6 7 8 9 10
Number of agents
Tim
e i
n s
ecs
(lo
g s
ca
le)
NOPRUNE-OBSNOPRUNE
MAXEXP
NOFAIL
NOPRUNE-OBS: No pruning, maintain full observation history NOPRUNE: No pruning, maintain beliefs not observation histories MAXEXP: Pruning using MAXEXP heuristic, using beliefs NOFAIL: MAXEXP enhanced with “no failure” for quicker upper bound
Does RMTDP Improve Role Allocation?
0
1
2
3
4
5
6
7
4 5 6 7 8
Number of agents
Num
ber
tran
spor
ts
RMTDP
COP
MDP
RoboCup Rescue: Computational Savings
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2 3 4 5 6 7
Number of ambulances
Nu
mb
er
of
no
de
s
MAXEXPNOPRUNE
0
1
2
3
4
5
6
2 3 4 5 6 7
Number of ambulances
Run
tim
e in
sec
s (lo
g sc
ale)
RoboCupRescue: RMTDP Improves Role Allocation
Uniform civilian distribution
0
1
2
3
Civilians casualties Building damage
Skewed Civilian Distribution
0
1
2
3
Civilians casualties Building damage
SUMMARY
Team proxy
Team proxyTeam proxy
COM-MTDP & R-MTDP:Distributed POMDPs for analysis
Combine “traditional” TOP approaches with distributed POMDPs Exploit POMDPs to improve TOP/teamcore proxies Exploit TOP to constrain POMDP policy search Key policy evaluation complexity results
TOP: Team plans organizations, agents
Future Work
Agent-basedSimulationtechnologyVisualization
Trainee
Thank You
Contact:
Milind Tambe
http://teamcore.usc.edu/tambe
http://teamcore.usc.edu
Key Papers cited in this Presentation
Rajiv T. Maheswaran, Jonathan P. Pearce, and Milind Tambe. Distributed Algorithms for DCOP: A Graphical Game-Based Approach. Proceedings of the 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004).
Praveen Paruchuri, Milind Tambe, Fernando Ordonez, Sarit Kraus, Towards a formalization of teamwork with resource constraints, International Joint Conference on Autonomous Agents and Multiagent Systems, 2004.
Ranjit Nair, Maayan Roth, Makoto Yokoo and Milind Tambe: "Communication for Improving Policy Computation in Distributed POMDPs". In Proceedings of The Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04), 2004. (Post-script/PDF).
Rajiv T. Maheswaran, Milind Tambe, Emma Bowring, Jonathan P. Pearce, Pradeep Varakantham "Taking DCOP to the Real World : Efficient Complete Solutions for Distributed Event Scheduling". In Proceedings of the third International Joint Conference on Agents and Multi Agent Systems, AAMAS-2004.
Modi, P.J., Shen, W., Tambe, M., Yokoo, M. “Solving Distributed Constraint Optimization Problems Optimally, Efficiently and Asynchronously” Artificial Intelligence Journal (accepted)
D.V.Pynadath and M.Tambe. Automated teamwork among heterogeneous software agents and humans. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS). 7:71--100, 2003.** [pdf] **
Nair, R., Tambe, M., Yokoo, M., Pynadath, D. and Marsella, S. Taming Decentralized POMDPs: Towards efficient policy computation for multiagent settings Proceedings of the International Joint conference on Artificial Intelligence (IJCAI), 2003
Nair, R., Tambe, M., and Marsella, S. Role allocation and reallocation in multiagent teams: Towards a practical analysis Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003
Scerri, P., Johnson, L., Pynadath, D., Rosenbloom, P. Si, M., Schurr, N. and Tambe, M. A prototype infrastructure for distributed robot, agent, person teams Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003
Scerri, P. Pynadath, D. and Tambe, M. Towards adjustable autonomy for the real-world Journal of AI Research (JAIR), 2002, Volume 17, Pages 171-228 ** [pdf] **
Pynadath, D. and Tambe, M. The communicative multiagent team decision problem: Analyzing teamwork theories and models Journal of AI Research (JAIR), 2002
Kaminka, G., Pynadath, D. and Tambe, M. Monitoring teams by overhearing: A multiagent plan-recognition approach Journal of AI Research (JAIR), 2002 ** [pdf] **
All the Co-authors