GPU Accelerated Markov Decision Processes in Crowd...

25
GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México [email protected] Benjamín Hernández National Center for Computational Sciences Oak Ridge National Laboratory Tennessee, USA [email protected]

Transcript of GPU Accelerated Markov Decision Processes in Crowd...

Page 1: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

GPU Accelerated Markov Decision

Processes in Crowd Simulation

Sergio Ruiz

Computer Science Department

Tecnológico de Monterrey, CCM

Mexico City, México

[email protected]

Benjamín Hernández

National Center for Computational Sciences

Oak Ridge National Laboratory

Tennessee, USA

[email protected]

Page 2: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

2

Contents

• Introduction

• Optimization Approaches

• Problem solving strategy

• A simple example

• Algorithm description

• Results

• Conclusions & future work

Page 3: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

3

Crowd Simulation

Path Planning Local Collision Avoidance (LCA)

Page 4: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

4

Optimization Approaches

• According to (Reyes et al. 2009, Foka and Trahanias 2003),

Markov Decision Processes (MDPs) are computationally

inefficient: as the state space grows, the problem becomes

intractable.

• Decomposition offers the possibility to solve large MDPs (Sucar

2007, Meuleau et al. 1998, Singh and Cohn 1998), either in

State Space decomposition, or Process decomposition.

• (Mausam and Weld. 2004) follow the idea of concurrency to

solve MDPs generating solutions close to optimal extending the

Labeled Real-time Dynamic Programming method.

Page 5: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

5

Optimization Approaches

• (Sucar 2007) proposes a parallel implementation of weakly

coupled MDPs.

• (Jóhansson 2009) presents a dynamic programming framework

that implements the Value Iteration algorithm to solve MDPs

using CUDA.

• (Noer 2013) explores the design and implementation of a point-

based Value Iteration algorithm for Partially Observable MDPs

(POMDPs) with approximate solutions. The GPU

implementation supports belief stat pruning which avoids

calculations.

Page 6: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

6

Problem Solving Strategy

• We propose a parallel Value

Iteration MDP solving algorithm to

guide groups of agents toward

assigned goals while avoiding

obstacles interactively. For optimal

performance the algorithm is run

over a hexagonal grid in the context

of a Fully Observable MDP.

Page 7: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

7

Problem Solving Strategy

• A Markov Decision Process is a tuple 𝑀 = 𝑆, 𝐴, 𝑇, 𝑅

• S is a finite set of states. In our case, 2D cells.

• A is a finite set of actions. In our case, 6 directions.

• T is a transition model T(s, a, s’).

• R is a reward function R(s).

• A policy 𝜋 is a solution that specifies the action for an agent

at a given state.

• 𝜋∗ is the optimal policy. Transition

Page 8: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

8

Problem Solving Strategy

States Value Iteration

𝜋𝑡∗ 𝑠 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎𝑄𝑡 𝑠, 𝑎

𝑄𝑡 𝑠, 𝑎 = 𝑅 𝑠, 𝑎 + 𝛾 𝑇𝑠𝑗𝑎𝑉𝑡−1 𝑗

5

𝑗=0

𝑉𝑡 𝑠 = 𝑄𝑡 𝑠, 𝜋∗ 𝑠 ; 𝑉0 𝑠 = 0

Page 9: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

9

Problem Solving Strategy

• We propose to temporarily override the optimal policy when

agent density in a cell is above a certain threshold 𝝈.

Page 10: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

10

A simplified example

1 2 3 4

a -3 -3 -3 +100

b -3 -3 -100

c -3 -3 -3 -3

A = { N, W, E }

𝛾 = 1 (for simplicity)

Transitions:

p = 0.8 (probability of taking a current action)

q = 0.1 (probability of taking another action)

𝜋𝑡∗ 𝑠 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎𝑄𝑡 𝑠, 𝑎

𝑄𝑡 𝑠, 𝑎 = 𝑅 𝑠, 𝑎 + 𝛾 𝑇𝑠𝑗𝑎𝑉𝑡−1 𝑗

2

𝑗=0

What is 𝜋 for cell a3 ? 𝜋 𝑎3 = max{𝑄 𝑎3,𝑊 ,𝑄 𝑎3,𝑁 , 𝑄 𝑎3, 𝐸 }

𝑄 𝑎3, 𝑬 = 100 + 1.0(0.8(100) + 0.1(-3) + 0.1(0)) 𝑄 𝑎3,𝑾 = -3 + 1.0 (0.1(100) + 0.8(-3) + 0.1(0)) 𝑄 𝑎3,𝑵 = 0 + 1.0 (0.1(100) + 0.1(-3) + 0.8(0))

=> max is 𝑄 𝑎3, 𝑬

𝑄 𝑎3, 𝐄 = 100 + 1.0 ( 0.8(100) + 0.1(-3) + 0.1(0) ) 𝑄 𝑎3,𝐖 = -3 + 1.0 ( 0.1(100) + 0.8(-3) + 0.1(0) ) 𝑄 𝑎3, 𝐍 = 0 + 1.0 ( 0.1(100) + 0.1(-3) + 0.8(0) )

𝑅 𝑠, 𝑎 𝛾 𝑇𝑠𝑗

𝑎𝑉 𝑗

2

𝑗=0

Page 11: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

11

Algorithm

– Data collect: current cell needs to know rewards from neighboring cells and out of bound values.

– Input generation: build 𝑇𝑠𝑗𝑎 and 𝑅 𝑠, 𝑎 = 𝑅𝑊

– Value Iteration: optimal policy computed using parallel transformations and parallel reduction by key.

𝑄 𝑎3, 𝐄 = 100 + 1.0 ( 0.8(100) + 0.1(-3) + 0.1(0) ) 𝑄 𝑎3,𝐖 = -3 + 1.0 ( 0.1(100) + 0.8(-3) + 0.1(0) ) 𝑄 𝑎3, 𝐍 = 0 + 1.0 ( 0.1(100) + 0.1(-3) + 0.8(0) )

𝑅 𝑠, 𝑎 𝛾 𝑇𝑠𝑗

𝑎𝑉 𝑗

2

𝑗=0

Page 12: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

12

Algorithm: input generation

• Transition matrix requirements:

𝑇𝑃 =

𝑝 ⋯ 𝑝⋮ ⋱ ⋮𝑝 ⋯ 𝑝

𝑇𝑟,𝑐𝑄=

𝑞𝑖 ⋯ 𝑞𝑖⋮ ⋱ ⋮𝑞𝑖 ⋯ 𝑞𝑖

𝐷𝐴 =1 ⋯ 0⋮ ⋱ ⋮0 ⋯ 1

𝐷𝐵 =0 ⋯ 1⋮ ⋱ ⋮1 ⋯ 0

Dimensions: |A|x|A| i.e. each cell can compute neighboring info

𝑟 ∈ 1,𝑀𝐷𝑃𝑟𝑜𝑤𝑠 𝑞𝑖 =𝑞

𝑅𝐸𝑖−1

𝑐 ∈ 1,𝑀𝐷𝑃𝑐𝑜𝑙𝑢𝑚𝑛𝑠

Page 13: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

13

Algorithm: input generation

where 𝑇𝑟,𝑐 = 𝑇𝑝 ∘ 𝐷𝐴 + 𝑇𝑟,𝑐𝑄∘ 𝐷𝐵 =

𝑝 𝑞 𝑞𝑞 𝑝 𝑞𝑞 𝑞 𝑝

𝑄 𝑎3, 𝐄 = 100 + 1.0 ( 0.8(100) + 0.1(-3) + 0.1(0) ) 𝑄 𝑎3,𝐖 = -3 + 1.0 ( 0.1(100) + 0.8(-3) + 0.1(0) ) 𝑄 𝑎3, 𝐍 = 0 + 1.0 ( 0.1(100) + 0.1(-3) + 0.8(0) )

Transition matrix 𝑇𝑠𝑗𝑎 computation:

𝑇𝑠𝑗𝑎 =

𝑇1,1 ⋯ 𝑇1,𝑀𝐷𝑃𝑐𝑜𝑙𝑢𝑚𝑛𝑠⋮ ⋱ ⋮

𝑇𝑀𝐷𝑃𝑟𝑜𝑤𝑠,1 ⋯ 𝑇𝑀𝐷𝑃𝑟𝑜𝑤𝑠,𝑀𝐷𝑃𝑐𝑜𝑙𝑢𝑚𝑛𝑠

Represents a Cell

Page 14: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

14

Algorithm: Parallel Value Iteration

1. Computation of Q-values.

𝜋𝑡 = 𝑅𝑊 + 𝛾 𝑇𝑠𝑗𝑎 𝑉

Consecutive parallel transformations (mult, mult, sum) results in a matrix Q that stores |A|-tuple of policies for taking all actions per each cell.

Page 15: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

15

Algorithm: Parallel Value Iteration

2. Selection of best Q-values.

– Parallel reduction: from every consecutive |A|-tuple in 𝜋𝑡, the largest value index indicates current best policy.

3. Check for convergence.

– If 𝜋𝑡 − 𝜋𝑡−1 = [0,… , 0]

Page 16: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

16

Crowd Navigation

Video

https://www.youtube.com/watch?v=369td2O8dxY

Page 17: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

17

Results: test scenarios

Office (1,584 cells) Maze (100x100

cells) Champ de Mars

(100x100 cells)

Implementation: CUDA Thrust, OpenMP and CUDA Backbends CPU: Intel Core i7 CPU running at 3.40GHz. ARM (Jetson TK1): 32 bit ARM quad-core Cortex-A15 CPU running at 2.32GHz. GPUs: Tegra K1 192 CUDA Cores, Tesla K40c 2880 CUDA cores, Geforce GTX TITAN 2688 CUDA cores.

Page 18: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

18

Results: GPU performance

Page 19: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

19

Results: GPU speedup

Intel CPU baseline: 8 threads ARM CPU baseline: 4 threads

Page 20: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

20

Conclusion

• Parallelization of the proposed algorithm was made possible by formulating it in terms of matrix operations, leveraging the “massive” data parallelism in GPU computing to reduce the MDP solution time.

• We demonstrated that standard parallel transformation and reduction operations provide the means to solve MDPs via Value Iteration with optimal performance.

Page 21: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

21

Conclusion

• Taking advantage of the proposed hexagonal grid partitioning method, our implementation provides a good level of space discretization and performance.

• We obtained a 90x speed up using GPUs enabling us to simulate crowd behavior interactively.

• We found the Jetson TK1 GPU to have a remarkable performance, opening many possibilities to incorporate real-time MDP solvers in mobile robotics.

Page 22: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

22

Future Work

• Reinforcement learning. Evaluate different parameter values to obtain policy convergence in the least number of iterations without losing precision in the generated paths.

• Couple the MDP solver with a Local Collision Avoidance method to obtain more precise simulation results at microscopic level.

• Investigate further applications of our MDP solver beyond the context of crowd simulation.

Page 23: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

GPU Accelerated Markov Decision

Process in Crowd Simulation

Sergio Ruiz

Computer Science Department

Tecnológico de Monterrey, CCM

Mexico City, México

[email protected]

Benjamín Hernández

National Center for Computational Sciences

Oak Ridge National Laboratory

Tennessee, USA

[email protected]

Thank you!

This research was partially supported by: CONACyT SNI-54067, CONACyT PhD

scholarship 375247, Nvidia Hardware Grant and Oak Ridge Leadership Computing Facility

at the Oak Ridge National Laboratory, under DOE Contract No. DE-AC05-00OR22725.

Further reading: Ruiz, S. Hernandez, B. “A parallel solver for Markov Decision Process in Crowd Simulation” MICAI 2015, 14th Mexican International Conference on Artificial Intelligence, At Cuernavaca, Mexico, IEEE volume: ISBN 978-1-5090-0323-5

Page 24: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

24

Additional Results: Intel CPU

Page 25: GPU Accelerated Markov Decision Processes in Crowd Simulationon-demand.gputechconf.com/gtc/2016/presentation/s6268... · 2016. 4. 20. · Process in Crowd Simulation Sergio Ruiz Benjamín

25

Additional Results: ARM CPU