1 University of Southern California Increasing Security through Communication and Policy...

19
1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of 1 University of Southern California Increasing Security through Communication and Policy...

Page 1: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

1University of Southern California

Increasing Security through Communication and Policy Randomization in Multiagent Systems

Praveen Paruchuri, Milind Tambe, Fernando Ordonez

University of Southern California

Sarit Kraus

Bar-Ilan University,Israel

University of Maryland, College Park

Page 2: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

2University of Southern California

Motivation: The Prediction Game

An UAV (Unmanned Aerial Vehicle) Flies between the 4 regions

Can you predict the UAV-fly pattern ??

Pattern 11, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,……Pattern 21, 4, 3, 1, 1, 4, 2, 4, 2, 3, 4, 3,… (as generated by 4-sided dice)Can you predict if 100 numbers in pattern 2 are given ??

Randomization decreases Predictability Increases Security

Region 1 Region 2

Region 3 Region 4

Page 3: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

3University of Southern California

Problem Definition

Problem : Increase security by decreasing predictability for agent-team acting in adversarial environments. Even if Policy Given, it is Secure Environment is stochastic and observable (MDP-based) Communication is a limited Efficient Algorithms for

Reward/Randomization/Communication Tradeoff

Page 4: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

4University of Southern California

Assumptions

Assumptions for agent-team: Adversary is unobservable

– Adversary’s actions/capabilities or payoffs are unknown Communication is encrypted (safe)

Assumptions for Adversary: Knows the agents plan/policy Exploits action predictability Can see the agent’s state

Page 5: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

5University of Southern California

Solution Technique

Technique developed: Intentional policy randomization CMDP based framework :

– Sequential Decision Making

– Limited Communication Resources

– CMDP Constrained Markov Decision Process

Increase Security => Solve Multi-criteria problem for agents Maximize action unpredictability (Policy randomization) Maintain reward above threshold (Quality constraints) Communication usage below threshold (Resource constraints)

Page 6: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

6University of Southern California

Domains

Scheduled activities at airports like security check, refueling etc Can be observed by adversaries Randomization of schedules helpful

UAV-team patrolling humanitarian mission Adversary disrupts mission – Can disrupt food, harm refugees,

shoot down UAV’s etc Randomize UAV patrol policy

Page 7: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

7University of Southern California

Our Contributions

1. Randomized policies for Multi-agent CMDP (MCMDP)

2. Solve Miscoordination Randomized polices in team settings

– Policy not implementable!

(Reward constraint gets violated)

Communication Resource <Threshold

Expected TeamReward >Threshold

Maximize Policy

Randomization

Page 8: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

8University of Southern California

Miscoordination: Effect of Randomization

Meeting tomorrow 9am – 40%, 10am – 60%

Communicate to coordinate Limited Communication

Agent 1/ Agent 2

9 am

(.4)

10 am

(.6)

9am (.4) .16 .24

10am (.6) .24 .36

Should have been 0(Violates Threshold

Rewards)

Page 9: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

9University of Southern California

Communication Issue

Generate Randomized Implementable policies Limited communication

Problem of communication M coordination points N units of communication Generate best communication policy Communication policy can also be randomized

Transform MCMDP to implementable MCMDPSolution algorithm for transformed MCMDP

Page 10: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

10University of Southern California

MCMDP: Formally Defined

An MCMDP (for a 2 agent case) is a tuple

<S,A,P,R, C1,C2, T1,T2, N,Q> where, S,A,R – Joint states, actions, rewards P – Transition function C1 - Cost vector for resource k T1 - Threshold on expected resource k consumption. N - Joint communication cost vector Q - Threshold on communication costs

Basic terms used : x(s,a) : Expected times action a is taken in state s Policy (as function of x) :

^

^

( , ) ( , ) / ( , )a A

s a x s a x s a

Page 11: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

11University of Southern California

Entropy : Measure of randomness

Randomness or information content quantified using Entropy ( Shannon 1948 )Entropy for CMDP - Additive Entropy – Add entropies of each state

Weighted Entropy – Weigh each state by it contribution to total flow

where alpha_j is the initial flow of the system

H x s a s aAs S a A

( ) ( ( , ) lo g ( , ))

H x

x s a

s a s aWa A

jj S

a As S

( )

( , )

( , ) lo g( ( , ) )

^

^

Page 12: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

12University of Southern California

Issue 1: Randomized Policy Generation

Non-linear Program: Max entropy, Reward above threshold, Communication below threshold

Obtains required randomization

Appends communication for every action

Issue 2: Generate the Communication Policy

QNx

RRx

x

alphaAxst

xH w

min

0

)(max

Page 13: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

13University of Southern California

Issue 2: Transformed MCMDP

11ba

12ba

22ba

S1S1

a2o

a2C

a1o

a1C

For each state, for each joint action,

Transition between original and new statesTransitions between new states and original target states

Introduce C (communication) and NC for different individual action, add corresponding new states

21ba )( 1aC

)( 1aNC)( 2aC

)( 2aNC

a1b1a1b2

a1b1a1b2

a2b1a2b2

a2b1a2b2

Page 14: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

14University of Southern California

Non-linear Constraints

Need to introduce non-linear constraints

For each original state For each new state introduced by no communication action

– Conditional probability of corresponding actions equal

Ex: P(b1/ ) = P(b1/ ) &&

P(b2/ ) = P(b2/ )

, - Observable, Reached by Comm action

, - Unobservable, No Comm action

A o1 A o2

A o1 A o2

A c1 A c2

A o1 A o2

Page 15: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

15University of Southern California

Non-Linear constraints: Handling Miscoordination

Agent B has no hint of state if NC actions. Necessity to make its actions independent of source state. Probability of action b1 from state should equal

probability of same action (i.e b1) from .

Meeting scenario: Irrespective of agent A’s plan If agent B’s plan is 20% 9am & 80% 10am B is independent of A

Miscoordination avoided Actions independent of state.

A o1A o2

Page 16: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

16University of Southern California

Experimental Results

0 1 2 3 4 5 6

> 11> 7

> 3

0

0.5

1

1.5

2

Weighted Entropy

Communication Threshold

Reward (>1 to >11)

Entropy vs Reward vs Communication

X-axis

Y –

axis

Z-axis

Page 17: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

17University of Southern California

Experimental Conclusions

Reward Threshold decreases => Entropy increases

Communication increases => Agents coordinate better Coordination invisible to adversary Agents coordinate better to fool the adversary

Increased communication Higher entropy !!!

Page 18: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

18University of Southern California

Summary

Randomized Policies in Multiagent MDP settings

Developed NLP to maximize weighted entropy with reward and communication constraints.

Provided transformation algorithm to explicitly reason about communication actions.

Showed that communication increases security.

Page 19: 1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

19University of Southern California

Thank You

Any Questions ???