Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic...

Deep Reinforcement Learning for Green Security Games with Real-time Information

Yufei Wang1, Ryan Shi2, Lantao Yu3, Yi Wu4,

Rohit Singh5, Lucas Joppa6, Fei Fang2

Peking University1, Carnegie Mellon University2

Stanford University3 , University of California, Berkeley4

World Wide Fund for Nature5, Microsoft Research6

AAAI 19

Green Security Challenges

4/25/2019 3

Motivation

122 rhinos in 2009 1215 rhinos in 2014

Green Security Games

44/25/2019

• Green Security Games model the strategic interaction between law enforcement agencies (defenders) and their opponents (attackers). [Fang, Stone, and Tambe 2015; Fang et al. 2016; Xu et al. 2017]

• However, previous work mainly focuses on computing patrol strategy offline.

Motivation

• Design patrol routes with a limited number of patrol resources.

The effect of Real-time Information

54/25/2019

Motivation

Our contribution

• We propose a game model GSG-I for green security games, that incorporates the vital real-time information.

• We design an efficient algorithm DeDOL that combines deep reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I.

• DeDOL is built upon the Policy Space Response Oracle framework, with two important domain-specific enhancements that improves the training efficiency.

4/25/2019 6

Contribution

Our proposed GSG-I Model • GSG-I: Green Security Games with Real-Time Information

Snares placed by the attacker

Attacker’s footprints

Defender’s footprints

Animal catching prob 𝑷𝒊𝒋.

74/25/2019

GSG-I Model

Attacker’s view

Defender’s view

Approximating Best Response against a fixed opponent with DQN

Up Down Left Right Still

84/25/2019

DQN Strategy

[Mnih et al. 2015]

Double Oracle & Policy Space Response Oracle

4/25/2019 9

DO & PSRO

Parameterized policy Approximate Best Response using Deep RL

PSRO

[McMahan, Gordon, and Blum 2003; Lanctot et al. 2017]

Initial heuristic strategy

• Attacking Tool Placement

104/25/2019

Enhancement on PSRO

Attacker: parameterized heuristic Defender: random sweeping

• Moving Direction

DQN strategy performance

4/25/2019 11

DQN Experiments

DQN strategy demo

DQN patroller against a heuristic poacher on 7x7 Gaussian grid.

DQN poacher against a random sweeping patroller on 7x7 random grid.

124/25/2019

DQN Experiments

Local modes: restrict attacker’s entry point

Ease training

Provide good building block strategy

Global mode with 4 entry points local mode with 1 top-left entry point

134/25/2019

Enhancement on PSRO

The DeDOL algorithm workflow

• Deep-Q Network based Double Oracle enhanced with Local modes

144/25/2019

DeDOL

Generate initial Strategies 𝐺0 ={𝑅𝑆,𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐}

PSRO(Local, Entry Point 1, 𝐺0)

PSRO(Local, Entry Point 2, 𝐺0)

PSRO(Local, Entry Point 𝑛, 𝐺0)

…

Combine all subgames to 𝐺𝑇

𝐺𝑇1

𝐺𝑇2

𝐺𝑇𝑛

PSRO(Global, 𝐺𝑇)

Output optimal defender strategy 𝜎𝑇+𝑘

𝑑

DeDOL Performance

• Vanilla PSRO: no heuristic initial strategy, no local mode• CFR: counter-factual regret minimization• Metric: highest expected utility against a best-response poacher

across all iterations• DeDOL with local mode performs the best!

154/25/2019

Experiments

Summary

• We incorporates the real-time information in Green Security Games, and proposed a new game model.

• We design an efficient algorithm DeDOL to compute the optimal patrol strategy in GSG-I, which is built upon the Double Oracle / Policy Space Response Oracle framework with domain-specific enhancements.

164/25/2019

Motivation

Thank you!

Lantao Yu

[email protected]

Fei [email protected]

Yi [email protected]

Rohit [email protected]

Lucas [email protected]

Yufei Wang

[email protected]

Zheyuan Ryan Shi

[email protected]

174/25/2019The Azure resources are provided by Microsoft for Research AI for Earth award program.

Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic...

Documents

Transcript of Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic...