Towards A Formalization Of Teamwork With Resource Constraints

15
1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park December,2003

description

Towards A Formalization Of Teamwork With Resource Constraints. Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park December,2003. Motivation: Teamwork with Resource Constraints. - PowerPoint PPT Presentation

Transcript of Towards A Formalization Of Teamwork With Resource Constraints

Page 1: Towards A Formalization Of Teamwork With Resource Constraints

1University of Southern California

Towards A Formalization Of Teamwork With Resource Constraints

Praveen Paruchuri, Milind Tambe, Fernando OrdonezUniversity of Southern California

Sarit KrausBar-Ilan University,Israel

University of Maryland, College Park December,2003

Page 2: Towards A Formalization Of Teamwork With Resource Constraints

2University of Southern California

Motivation: Teamwork with Resource Constraints

Agent teams: Agents maximize team rewards and also ensure limited resource consumption

E.g., Limited communication bandwidth, limited battery power etcExample Domain:

Sensor Net Agents - Limited replenishable energy Mars Rovers - Limited energy for each daily activity

Page 3: Towards A Formalization Of Teamwork With Resource Constraints

3University of Southern California

Framework & Context

Framework for agent teams with resource constraints in complex and dynamic environmentsResource constraints soft, not “hard” Okay for Sensor to exceed energy threshold when needed. Okay for Mars rover to exceed allocated energy once in a

while for a regular activity.

MDPPOMDP MTDP

Single Agent Multi Agent

CMDP ???

With resource Constrains

Context

Page 4: Towards A Formalization Of Teamwork With Resource Constraints

4University of Southern California

Our Contributions

Extended MTDP ( EMTP ) – A Distributed MDP framework

EMTDP ≠ CMDP with many agents. Policy Randomization in CMDP

– Causes miscoordination in teams. Algorithm for transforming conjoined EMTDP (initial

formulation dealing with joint actions) into actual EMTDP (reasoning about individual actions).

Proof of equivalence between different transformations.Solution algorithm for the actual EMTDP.

Maximize Expected Team Reward

Bound Expected Resource Consumption

Page 5: Towards A Formalization Of Teamwork With Resource Constraints

5University of Southern California

E-MTDP: Formally Defined

An E-MTDP (for a 2 agent case) is a tuple <S,A,P,R,C1,C2,T1,T2,N,Q> where, S,A,P,R : As defined in MTDP. C1 = [ ]: Vector of cost of resource k for joint action a

in state i ( for agent 1). T1 = [ ]: Threshold on expected resource k consumption. N = [ ]: Vector of joint communication costs for joint

action a in state i. Q : Threshold on communication costs

Simplifying assumptions: Individual observability (no POMDPs) Two agents

ci ak

1 ^

T k1n

i a^

Page 6: Towards A Formalization Of Teamwork With Resource Constraints

6University of Southern California

Conjoined EMTDP – Simple example

Two agent case

S1

S7

S3

S2 S4

S5

S6

a1b2=.9

a2b1=.3

a2b1=.7a1b2=.7

a1b1=1

a1b1=.3

a2b1=.7a1b2=.9

a1b2=.1

R(S1,a2b2)=9C1(S1,a2b2)=7C2(S1,a2b2)=7

a2b2=1

Page 7: Towards A Formalization Of Teamwork With Resource Constraints

7University of Southern California

Linear Program : Solving Conjoined EMTDP

M ax X ri a i a

ai^ ^̂

x x pj a

ai a

aiija

j^^

^^

^

xi a

^ 0

x c ti a i ak

aik^ ^

^

1 1

x c ti a

ai i a kk^

^^ 2 2

x n Qiai a

ai^̂

LP for solving MDP

Maximizing Reward

Handling constraints

Expected cost of resource k over all states and actions less than t1

Page 8: Towards A Formalization Of Teamwork With Resource Constraints

8University of Southern California

Sample LP solution

VISITED( X11) 0.000000 a1b1 to be executed 0% timeVISITED( X12) 0.3653846 a1b2 : 36% = 9/25VISITED( X13) 0.6346154 a2b1 : 64% = 16/25VISITED( X14) 0.000000 a2b2 : 0%

B1(16/25) B2(9/25)

a1( 9/25) 144/625= .23

81/625= .13

a2(16/25) 256/625= .4

144/625= .23

Should have been 0.

(Miscoordination)

Page 9: Towards A Formalization Of Teamwork With Resource Constraints

9University of Southern California

Conjoined to Actual EMTDP: Transformation

a b1 1 a b1 2

a b n1S1

a bm 1

a bm n

S1

A1c

C a( )1

N C a( )1C a m( )

N C a m( )

P f

P f

a b1 1

a b n1

a b1 1a b n1

a bm 1a bm n

a bm 1a bm n

X c11

X c n1X o 11

X o n1X c m 1

X c m nX o m 1

X o m n

For each state, for each joint action,

Introduce transition between original and new statesIntroduce transitions between new states and original target states

Introduce a communication and non-communication action for each different individual action and add corresponding new states

A c1

A o1

A m c

A m o

Page 10: Towards A Formalization Of Teamwork With Resource Constraints

10University of Southern California

Non-linear Constraints

Need to introduce non-linear constraints

For each original state For each new state introduced by no communication action

– Set conditional probability of corresponding actions equal

Ex: P(b1/ ) = P(b1/ )=……=P(b1/ ) && …….. && P(bn/ ) = P(bn/ )=……=P(bn/ ).

, , , - Observable, Reached by Comm action

, , , - Unobservable, No Comm action

A o1 A o2 A m o

A o1 A o2 A m o

A c1 A c2 A m c

A o1 A o2 A m o

Page 11: Towards A Formalization Of Teamwork With Resource Constraints

11University of Southern California

Reason for non-linear constraints

Agent B has no hint of state if NC actions. Necessity to make its actions independent of source state. Probability of action b1 from state should equal

probability of same action (i.e b1) from .Miscoordination avoided Actions independent of state.

Transformation example -

A o1A o2

Page 12: Towards A Formalization Of Teamwork With Resource Constraints

12University of Southern California

Experimental Results

Fig 1

S1 - 0S9 - 8

Page 13: Towards A Formalization Of Teamwork With Resource Constraints

13University of Southern California

Experiments : Example domain 2

Domain 1: Comparing Expected rewards –Comm Threshold Conjoined Deterministic Miscoordination EMTDP0 10.55 0 No reward 6.993 10.55 0 No 8.916 10.55 0 No 10.55( Miscoordination resulted in violating resource constraints )Domain 2 -

A team of two rovers and several scientists using themEach scientist has a daily routine of observationsRover can use a limited amount of energy in serving a scientistExperiment conducted: Observe Martian rocksRovers Maximize observation output within the energy budget provided.Soft constraint – Exceeding energy budget on a day is not catastrophicOverutilizing frequently affects other scientist’s workUncertainty – Only .75 chance of succeeding in an observationEMTDP had about 180 states, 1500 variables and 40 non-linear constraints.Could handle problem of this order in below 20 secs.

Page 14: Towards A Formalization Of Teamwork With Resource Constraints

14University of Southern California

Summary and Future Work

Novel formalization of teamwork with resource constraintsMaximize expected team reward but bound expected resource consumption.Provided a EMTDP formulation where agents avoid miscoordination even though randomized policies.Proved equivalence of different EMTDP transformation strategies ( see paper for details )Introduction of non-linear constraints.

Future Work - Need to fix on complexity. Experiment on n-agent case. Extend work to partially observable domains.

Page 15: Towards A Formalization Of Teamwork With Resource Constraints

15University of Southern California

Thank You

Any Questions ???