Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving...

39
Reinforcement Learning Workflows for AI

Transcript of Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving...

Page 1: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning Workflows for AI

Page 2: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Key Takeaways

2

▪ What is reinforcement learning and why should I care about it?

▪ How do I set up and solve a reinforcement learning problem?

▪ What are some common challenges?

Page 3: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Why Should You Care About Reinforcement Learning?

3

Page 4: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

State estimationJoint trajectory generation

(e.g. MPC)

Low-level control

(e.g. PID)

Motor torquesMeasurements

One Approach Could Be…

4

Page 5: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Motor torquesMeasurements Black Box

Controller

Any Alternatives?

5

Page 6: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Applications of Reinforcement Learning

6

Page 7: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

7

What is reinforcement learning?

Type of machine learning that trains an ‘agent’ through trial & error

interactions with an environment

Page 8: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning vs Machine Learning vs Deep Learning

8

Page 9: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning vs Machine Learning vs Deep Learning

9

Page 10: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning vs Machine Learning vs Deep Learning

10

Page 11: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning vs Machine Learning vs Deep Learning

11

Page 12: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Deep Learning

Complex reinforcement learning problems typically need deep neural networks

[Deep Reinforcement Learning]

What about deep learning?

Reinforcement Learning vs Machine Learning vs Deep Learning

12

Page 13: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Fetch!

Let’s try

this…

Yay!

Here is

your treat!

AGENT

Reinforcement

Learning

Algorithm

Policy

ENVIRONMENT

ACTIONS

REWARD

OBSERVATIONS

Policy update

Analogies with pet training

How does reinforcement learning training work?

13

Page 14: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning ConceptsTraining a self-driving car

▪ Vehicle’s computer…

(agent)

▪ is reading sensor measurements from LIDAR, cameras,…

(observations)

▪ that represent road conditions, vehicle position,…

(environment)

▪ and generates steering, braking, throttle commands,…

(action)

▪ based on an internal state-to-action mapping…

(policy)

▪ that tries to optimize, e.g., lap time & fuel efficiency…

(reward).

▪ The policy is updated through repeated trial-and-error by a

reinforcement learning algorithm

AGENT

Reinforcement

Learning

Algorithm

Policy

ENVIRONMENT

ACTIONS

REWARD

OBSERVATIONS

Policy update

14

Page 15: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning ConceptsTraining a self-driving car

▪ Vehicle’s computer uses the final state-to-action

mapping… (policy)

▪ to generate steering, braking, throttle commands,…

(action)

▪ based on sensor readings from LIDAR, cameras,…

(observations)

▪ that represent road conditions, vehicle position,…

(environment).

POLICY

ENVIRONMENT

ACTIONSOBSERVATIONS

By definition, this trained policy is

optimizing lap time & fuel efficiency

After training, only trained

policy is needed

15

Page 16: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Control system Reinforcement learning system

PLANTCONTROLLER

REFERENCE

MEASUREMENT

MANIPULATED

VARIABLE

+

-

ERROR

Reinforcement learning has parallels to control system design

Controller Policy

Plant Environment

Measurement Observation

Manipulated variable

Error/Cost function

Adaptation mechanism

Action

Reward

RL Algorithm

AGENT

Reinforcement

Learning

Algorithm

Policy

ENVIRONMENT

ACTION

REWARD

OBSERVATIONS

Policy update

Reinforcement Learning vs Controls

16

Page 17: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Policy Representation and Deep Learning

Observations Next action

Representation options

▪ Look-up table

▪ Polynomials

A

5

4

3

2

1

1 2 3 4 5

17

Look-up tables do not scale well

Page 18: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Policy Representation and Deep Learning

Observations

(camera frame, sensors, …)

Next action

Representation options

▪ Look-up table

▪ Polynomials

▪ (Deep) neural networks Deep neural network policy

Neural networks allow representation of complex policies

18

Page 19: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

19

How do I set up and solve

a reinforcement learning problem?

Page 20: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning Workflow

Environment Reward Policy

representation

Training DeploymentAgent

▪ Large number of simulations needed

▪ Parallel & GPU computing can speed up training

▪ Training could still take hours or days

▪ Select training algorithm

▪ Tune hyperparameters

▪ Deep network? Table? Polynomial?

▪ Numerical value that evaluates goodness of policy

▪ Reward shaping can be challenging

▪ Simulation models or real hardware

▪ Virtual models are safer and cheaper

▪ Trained policy is a standalone

function

20

Page 21: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning Workflow

Environment Reward Policy

representation

Training DeploymentAgent

21

Page 22: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reinforcement Learning Toolbox

Introduced in

▪ Built-in and custom reinforcement learning algorithms

▪ Environment modeling in MATLAB and Simulink

– Existing scripts and models can be reused

▪ Deep Learning Toolbox support for representing policies

▪ Training acceleration with Parallel Computing Toolbox and

MATLAB Parallel Server

▪ Deployment of trained policies with GPU Coder and

MATLAB Coder

▪ Reference examples for getting started

22

Page 23: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Example: Walking Robot

Environment Reward Policy Training DeploymentAgent

Motor torquesMeasurements

Control objective: Walk

on a straight line

23

Page 24: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Creating the Environment

24

Page 25: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reward Shaping

25

Page 26: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

26

Page 27: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Creating the Agent

27

Page 28: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Training the Agent

28

Page 29: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

29

Page 30: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Applications of Reinforcement Learning

30

Page 31: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Autonomous Driving Example

Environment

Traditional

Controller

RL Agent

Image (Observation)

Car Position (Observation)

Objective: Augment traditional controller with

reinforcement learning to improve lap time

Start/Goal

+

Steer, throttle,

brake

31

Page 32: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reward

RL Agent

Vehicle Model

Traditional

Controller

32

Page 33: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

33

Page 34: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Results

Traditional controller +

reinforcement learning

34

Lap time (s)

Baselin

e

RL+

Baselin

e

30% performance improvement

Page 35: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Reference Applications in Documentation

▪ Controller Design

▪ Robotic Locomotion

▪ Lane Keep Assist

▪ Adaptive Cruise Control

▪ Imitation Learning

35

Page 36: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Pros & Cons of Reinforcement Learning

Pros

▪ No data required before training

▪ New possibilities with AI for

hard-to-solve problems

▪ Complex end-to-end solutions can

be developed

▪ Uncertain, nonlinear environments

can be used

▪ Trained policies are hard to verify

(no performance guarantees)

▪ Many trials/data points required

(sample inefficient)

– Training with real hardware can be

expensive and dangerous

▪ Large number of design parameters

– Reward signal

– Network architectures

– Training Hyperparameters

Cons

Simulations are key in Reinforcement Learning

36

Page 37: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

▪ Reuse existing code and models for

environments

▪ Use simulations for policy

verification

– Simulate extreme scenarios

▪ Run simulation trials in parallel to

accelerate training

▪ Consult Reinforcement Learning

Toolbox examples

– Iterative tuning with simulations

Challenges

▪ Trained policies are hard to verify

(no performance guarantees)

▪ Many trials/data points required

(sample inefficient)

– Training with real hardware can be

expensive and dangerous

▪ Large number of design parameters

– Reward signal

– Network architectures

– Training Hyperparameters

How Can MATLAB and Simulink Help?

37

Page 38: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Key Takeaways

38

▪ What is reinforcement learning and why should I care about it?

▪ How do I set up and solve a reinforcement learning problem?

▪ What are some common challenges?

Page 39: Reinforcement Learning Workflows for AI · Reinforcement Learning Concepts Training a self-driving car Vehicle’s computer uses the final state-to-action mapping… (policy) to generate

Learn More

▪ Reference examples for controls,

robotics, and autonomous system

applications

▪ Documentation written for

engineers and domain experts

▪ Tech Talk video series on

Reinforcement Learning concepts

▪ Reinforcement Learning ebooks

available at mathworks.com

39