Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is...

33
Scalable Reinforcement Leaning on Cray Systems Benjamin Robbins, Director - AI & Advanced Productivity Ananda Kommaraju, Kristyn Maschhoff, Mike Ringenburg

Transcript of Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is...

Page 1: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.

S ca lab le Re in fo r cemen t

Lean ing on C ray Sys tems

Benjamin Robbins, Director - AI & Advanced ProductivityAnanda Kommaraju, Kristyn Maschhoff, Mike Ringenburg

Page 2: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

Purpose: Demonstrate steps to reproduce a working distributed reinforcement learning configuration on a Cray system. Platform for further exploration.

Relevance: Reinforcement learning approaches have produced many of the recent state of the art results for Machine Learning. Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take advantage of high-performance-computing.

Results: Cray systems’ ability to support mixed node types (GPU, CPUs) and resource configuration flexibility make for an ideal platform to explore and push limits of reinforcement learning.

2

Overview

Page 3: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

3

Recent Advances in AI

Page 4: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

4

Reinforcement Learning

Learning from the rewards of a given action.

Page 5: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

5

Reinforcement Learning (RL)

Image: geekshumor.com

Page 6: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

6

Reinforcement Learning (RL) Basics

Reinforcement Learning Illustration (https://i.stack.imgur.com/eoeSq.png)

Page 7: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

Find the optimal Policy.

7

The Goal of RL?

The policy tells the agent how to act given a particular state.

Page 8: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

8

Many RL Methods

https://www.intel.ai/introducing-reinforcement-learning-coach-0-10-0/#gs.91i90q

Page 9: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

9

Distributed Reinforcement Learning

https://cdn-images-1.medium.com/max/1600/1*tYxWuyksovxA1Thu8PggPQ.jpeg

Page 10: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

• Explore Reinforcement Learning algorithms • Understand if possible to leverage existing open source frameworks• Study nested parallelism and complex workflows • Distributed RL is very active area of research. Participate in research and make

available for other Cray end-users• Scalability is a requirement for many real-world problems. How can distributed RL

at the scale of Cray help solve these problems?

10

Motivation for Study of Distributed RL on Cray Systems

Page 11: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

11

Environments for Study

Page 12: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

12

Distributed Reinforcement Learning Generalized

Trained Policy

Actions

Experiences

Target Policy

Samples

WeightsTrained Policy

Actions

ExperiencesSamples

Weights

Centralized Learner/Policy Individual EnvironmentIndividual Environment

Head Node Worker Node Worker Node

Page 13: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

Setup and Configuration

13

Page 14: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

• Nested Parallelism• Resource Scaling• Multiple algorithms can be

mapped

14

Mapping RL to Cray XC

Update weights

Policy gradients/experiences

GPUGlobal Policy ModelPolicy Optimizer

Policy gradients/experiences

CPU Cores

Head Node

GPU

CPU Cores *

GPU

Worker Node

Worker Node

Update weights

* Multiple Environments per CPUMultiple CPUs per Worker

CPU Cores *

Policy EvaluatorPolicy Model

Policy EvaluatorPolicy Model

Page 15: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

• Ray is a flexible, high-performance distributed execution framework.• Developed by RISELab at UC Berkeley• Very active open-source project – https://ray.readthedocs.io/en/latest/index.html• Ray has libraries for tuning and reinforcement learning• RLlib is Ray’s reinforcement learning library• RLlib supports many RL methods.

15

What is Ray?

Page 16: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

• Allocate nodes through SLURM workload manager

• ccmlogin (Cluster compatible mode)

• Start a Ray head node with initial settings

16

Deploy a Ray Cluster on Cray XC

Page 17: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

• Independently start Ray workers with initial settings• Resource scaling - Increase or reduce workers/resources

17

Deploy a Ray Cluster on Cray XC

Page 18: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

18

Execute Rllib algorithmsimport rayfrom ray.rllib.agents.ppo import PPOAgentfrom ray.tune import run_experiments

def train_fn(config, reporter):agent1 = PPOAgent(env="CartPole-v0", config=config)for _ in range(100000):

result = agent1.train()result["phase"] = 1reporter(**result)phase1_time = result["timesteps_total"]

state = agent1.save()agent1.stop()

if __name__ == "__main__":ray.init(redis_address="10.128.0.225:6380")run_experiments({

"demo": {"run": train_fn,"local_dir": "/lus/scratch/user/ray_results/custom/","config": {

"lr": 0.01,"num_workers": 64,

},},

})

Page 19: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

RL Methods and Results

19

Page 20: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

20

RL Method – Actor/Critic

https://www.cleaninginstitute.org/sites/default/files/assets/1/Photos/700x700/child-coloring-wall.jpg

Actor

Critic

Page 21: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

21

Distributed RL Method – A3C

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2

Asynchronous Advantage Actor Critic

Page 22: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

22

Distributed RL Method – A3C vs A2C

https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#a2c

Page 23: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

23

Qbert_16W_PPO

SpaceInvader_16W_A2C

Breakout_7W_PPO

BeamRider_7W_A2C

Page 24: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

24

Multi-Node Scaling

Page 25: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

25

Distributed RL Method – Ape-X

https://openreview.net/pdf?id=H1Dy---0Z

Page 26: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

26

Distributed RL Method - IMPALA

https://arxiv.org/pdf/1802.01561.pdf

Importance Weighted Actor Learner Architecture

Observations are trajectories of experience (sequence of states, actions, and rewards) not gradients.

Page 27: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

27

IMPALA and Ape-X

Page 28: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

• 8 GPUs total• Assign 1 GPU as Head

Node• 7 GPUs as Worker Nodes• 9 CPU cores per Worker

28

Mapping RL to Cray CS

Head NodeGlobal Policy Graph

Policy Optimizer

Workers

Single GPU Node on CS Storm

CPU Cores

GPU GPU GPU GPU

GPU GPU GPU GPU

Workers

Page 29: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

29

XC vs Dense GPU Node on CS

Page 30: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

• Explored Reinforcement Learning as a HPC workload

• Deploy a UC Berkeley’s RISElab’s Ray cluster on XC system

• Trained state-of-art RL agents• UC Berkeley’s RLlib using Ray’s distributed

execution• IMPALA, Ape-X, PPO, A2C/A3C on Atari games

(from gym library)• Variety of resource configurations• Scaled training on multi-node and single-node

XC with mixed CPU and GPU

30

Key Results

0

0.2

0.4

0.6

0.8

1

1.2

Qbert_A2C Qbert_PPO SpaceInvaders_A2C SpaceInvaders_PPO

Scalingacrossnodes

1 2 4 8 16 nodes

Page 31: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

• Explore Optimization of Ray Libraries• Identify problems sets that map well to Distributed RL• Comparative studies against other published results• Explore architecture needs for computational node support and future

network requirements

31

What’s Next?

Page 32: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

© 2018 Cray Inc.PROPRIETARY & CONFIDENTIAL

S A F E H A R B O R S TAT E M E N T

This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about our financial guidance and expected operating results, our opportunities and future potential, our product development and new product introduction plans, our ability to expand and penetrate our addressable markets and other statements that are not historical facts.

These statements are only predictions and actual results may materially vary from those projected. Please refer to Cray's documents filed with the SEC from time to time concerning factors that could affect the Company and these forward-looking statements.

32

Page 33: Scalable Reinforcement Leaning on Cray Systems · 2019. 10. 11. · Reinforcement learning is resource and communication intensive and is, therefore, an excellent candidate to take

THANK YOUQ U E S T I O N S ?

[email protected]

@cray_inc

linkedin.com/company/cray-inc-

cray.com