Classroom Management: Transition Games, Energizers, Attention Getters and Positive Reinforcement
Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic...
Transcript of Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic...
![Page 1: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/1.jpg)
Deep Reinforcement Learning for Green Security Games with Real-time Information
Yufei Wang1, Ryan Shi2, Lantao Yu3, Yi Wu4,
Rohit Singh5, Lucas Joppa6, Fei Fang2
Peking University1, Carnegie Mellon University2
Stanford University3 , University of California, Berkeley4
World Wide Fund for Nature5, Microsoft Research6
AAAI 19
![Page 2: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/2.jpg)
Green Security Challenges
4/25/2019 3
Motivation
122 rhinos in 2009 1215 rhinos in 2014
![Page 3: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/3.jpg)
Green Security Games
44/25/2019
• Green Security Games model the strategic interaction between law enforcement agencies (defenders) and their opponents (attackers). [Fang, Stone, and Tambe 2015; Fang et al. 2016; Xu et al. 2017]
• However, previous work mainly focuses on computing patrol strategy offline.
Motivation
• Design patrol routes with a limited number of patrol resources.
![Page 4: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/4.jpg)
The effect of Real-time Information
54/25/2019
Motivation
![Page 5: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/5.jpg)
Our contribution
• We propose a game model GSG-I for green security games, that incorporates the vital real-time information.
• We design an efficient algorithm DeDOL that combines deep reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I.
• DeDOL is built upon the Policy Space Response Oracle framework, with two important domain-specific enhancements that improves the training efficiency.
4/25/2019 6
Contribution
![Page 6: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/6.jpg)
Our proposed GSG-I Model • GSG-I: Green Security Games with Real-Time Information
Snares placed by the attacker
Attacker’s footprints
Defender’s footprints
Animal catching prob 𝑷𝒊𝒋.
74/25/2019
GSG-I Model
Attacker’s view
Defender’s view
![Page 7: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/7.jpg)
Approximating Best Response against a fixed opponent with DQN
Up Down Left Right Still
84/25/2019
DQN Strategy
[Mnih et al. 2015]
![Page 8: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/8.jpg)
Double Oracle & Policy Space Response Oracle
4/25/2019 9
DO & PSRO
Parameterized policy Approximate Best Response using Deep RL
PSRO
[McMahan, Gordon, and Blum 2003; Lanctot et al. 2017]
![Page 9: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/9.jpg)
Initial heuristic strategy
• Attacking Tool Placement
104/25/2019
Enhancement on PSRO
Attacker: parameterized heuristic Defender: random sweeping
• Moving Direction
![Page 10: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/10.jpg)
DQN strategy performance
4/25/2019 11
DQN Experiments
![Page 11: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/11.jpg)
DQN strategy demo
DQN patroller against a heuristic poacher on 7x7 Gaussian grid.
DQN poacher against a random sweeping patroller on 7x7 random grid.
124/25/2019
DQN Experiments
![Page 12: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/12.jpg)
Local modes: restrict attacker’s entry point
Ease training
Provide good building block strategy
Global mode with 4 entry points local mode with 1 top-left entry point
134/25/2019
Enhancement on PSRO
![Page 13: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/13.jpg)
The DeDOL algorithm workflow
• Deep-Q Network based Double Oracle enhanced with Local modes
144/25/2019
DeDOL
Generate initial Strategies 𝐺0 ={𝑅𝑆,𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐}
PSRO(Local, Entry Point 1, 𝐺0)
PSRO(Local, Entry Point 2, 𝐺0)
PSRO(Local, Entry Point 𝑛, 𝐺0)
…
Combine all subgames to 𝐺𝑇
𝐺𝑇1
𝐺𝑇2
𝐺𝑇𝑛
PSRO(Global, 𝐺𝑇)
Output optimal defender strategy 𝜎𝑇+𝑘
𝑑
![Page 14: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/14.jpg)
DeDOL Performance
• Vanilla PSRO: no heuristic initial strategy, no local mode• CFR: counter-factual regret minimization• Metric: highest expected utility against a best-response poacher
across all iterations• DeDOL with local mode performs the best!
154/25/2019
Experiments
![Page 15: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/15.jpg)
Summary
• We incorporates the real-time information in Green Security Games, and proposed a new game model.
• We design an efficient algorithm DeDOL to compute the optimal patrol strategy in GSG-I, which is built upon the Double Oracle / Policy Space Response Oracle framework with domain-specific enhancements.
164/25/2019
Motivation
![Page 16: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built](https://reader035.fdocuments.us/reader035/viewer/2022070217/61237dc2c01c2873b82b8679/html5/thumbnails/16.jpg)
Thank you!
Lantao Yu
Rohit [email protected]
Lucas [email protected]
Yufei Wang
Zheyuan Ryan Shi
174/25/2019The Azure resources are provided by Microsoft for Research AI for Earth award program.