An introduction to reinforcement learning (rl)

An Introduction to Reinforcement Learning (RL) and RL Brain Machine Interface (RL-BMI)

Aditya Tarigoppula www.joefrancislab.com

SUNY Downstate Medical Center

Outline

RL Examples

Environment

Value functions

Optimality

Methods for attaining optimality

DP MC TD

BMI & RL-BMI

Eligibility Traces

START / END

RL Examples

Stanford Autonomous Helicopterhttp://heli.stanford.edu/

Reinforcement Learning Brain Machine InterfaceJoe Francis Lab.

Environment model - Markov decision process 1) States ‘S’

2) Actions ‘A’

3) State transition probabilities.

4) Reward

Deterministic, non-stationary policy

RL Problem: The decision maker, ‘agent’ needs to learn the optimum policy in an ‘environment’ to maximize the total amount of reward it receives over the long term.

'},,|'1Pr{' s

assPaatsstssta

...321

trtrtrtR

TtrtrtrtrtR

• Agent performs the action under the policy being followed.

• Environment is everything else other than the agent.

assP '

Value Functions: State Value Function

State – Action Value Function

},|{),(

aassrE

aassREasQ

)]'([),(

sVRPas

ssREsV

Optimal Value Function:

Optimal Policy – A policy that is better than or equal to all the other policies is called Optimal policy.

(in the sense of maximizing expected reward)

Optimal state value function

Optimal state-action value function

Bellman optimality equation

)(max)(* sVsV

),(max),(* asQasQ

},|)','(max{),(

},|)'({max)(

aassasQrEasQ

aasssVrEsV

At time = tAcquire Brain State

DecoderAction Selection (trying to execute an optimum action)

Action executedAt time = t +1Observe reward Update the decoder

EXAMPLE

1.0Pr 1.0Pr

EXAMPLE

))](*),((*1.0))(*...

)...,((*1.0))(*),((*8.0[)(

sVasRsV

asRsVasRsV

Prof. Andrew Ng, Lecture 16, Machine learning

Outline

Environment

Value functions

Optimality

DP MC TD

BMI & RL-BMI

Eligibility Traces

START / END

We're here !

RL Examples

Solution Methods for RL problem◦ Dynamic Programming (DP) – is a method for optimization of

problems which exhibit the characteristics of overlapping sub problems and optimal substructure.

◦ Monte Carlo method (MC) - requires only experience--sample sequences of states, actions, and rewards from interaction with an environment.

◦ Temporal Difference learning (TD) – is a method that combines the better aspects of DP (estimation) and MC (experience) without incorporating the ‘troublesome’ aspects of both.

Dynamic ProgrammingPolicy Evaluation

Dynamic ProgrammingPolicy Improvement

)())(,( ' sVssQ

10 ...... VVVEIIEIE

E – Policy Evaluation I – Policy Improvement

Policy Iteration Value Iteration

Replace entire section with

DYNAMIC

PROGRAMMING

)]'([max)( ''

' sVRPsV ass

Monte Carlo Vs. DP

◦ The estimates for each state are independent. In other words, MC methods do not "bootstrap“.

◦ DP includes only one-step transitions

whereas the MC diagram goes all the

way to the end of the episode.

◦ The computational expense of estimating the value of a single state is less when one requires the value of only a subset of the states.

Monte Carlo Policy Evaluation

Every visit MC First visit MC

-> Without a model, we need Q value estimates.-> All state-action pairs should be visited.-> Exploration techniques 1) Exploring starts 2) e-greedy Policy

Next SlideMONTE

As promised, this is the “NEXT SLIDE” !

Temporal Difference Methods◦ Like MC, TD methods can learn directly from raw experience

without a model of the environment's dynamics. Like DP, TD methods update estimates are based in part on other learned estimates, without waiting for a final outcome (they bootstrap).

)]()([)()( 11 ttttt sVsVrsVsV

TD(lambda)

trace decay parameter

Bias decreases

Variance Increases

Bias –Variance Tradeoff

Intuition: start with large ‘lamda’ and then decrease over time

Q Learning

Difference

Outline

Environment

Value functions

Optimality

DP MC TD

BMI & RL-BMI

Eligibility Traces

START / END

We're here !

RL Examples

Eligibility Traces

Outline

Environment

Value functions

Optimality

DP MC TD

BMI & RL-BMI

Eligibility Traces

START / END

We're here !

RL Examples

Online/closed loop RL-BMI architecture

),(),(

))]([max(_

actionsQasQ

tsiQindexoutputaction

tanh(.)

reward

traceeerrTDdelta

asQasQrerrTD ttttt

),(),(*_ 11

‘delta’ used for updating the weights through back-propagation

NEURALSIGNAL

Scott, S. H. (1999). "Apparatus for measuring and perturbing shoulder and elbow joint positions and torques during reaching." J Neurosci Methods 89(2): 119-27.

SET UP

Autonomous Helicopter (Stanford Uni) http://heli.stanford.edu/papers/iser04-invertedflight.pdf

Position , orientation, velocity and angular velocity ),,,,,,,,,,,(

zyxzyx ),,,,,,,(

a1 a2R1 R2

Dynamics Dynamics

DynamicsRandom Gen

Dynamics

Actor-Critic Model

http://drugabuse.gov/researchreports/methamph/meth04.gif

References Reinforcement Learning: An Introduction

Richard S. Sutton & Andrew G. Barto Prof. Andrew Ng’s machine Learning Lectures http://heli.stanford.edu Dr. Joseph T. Francis

www.joefrancislab.com Prof. Peter Dayan Dr. Justin Sanchez Group

http://www.bme.miami.edu/nrg/

An introduction to reinforcement learning (rl)

Technology

Transcript of An introduction to reinforcement learning (rl)

Quantum Reinforcement Learning - arXiv · quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state

Multi-agent reinforcement learning for new generation ... · Reinforcement Learning Contents Introduction Reinforcement Learning Single-Agent RL State-Action Vetoes Undesired State-Action

· MARL Marcello Restelli Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory MARL algorithms Best-Response Learning Equilibrium

Sequence-to-Sequence Reinforcement Learning Seq2seq RL for ...

Hierarchical Imitation and Reinforcement Learning · Learning good agent behavior from reward signals alone— the goal of reinforcement learning (RL)—is particularly difﬁcult

MACHINE LEARNING TECHNIQUES AND APPLICATIONS Reinforcement ...lasa.epfl.ch/teaching/lectures/ML_Phd/Slides/RL-GradientMethods.pdf · MACHINE LEARNING TECHNIQUES AND APPLICATIONS Reinforcement

Tutorial: Deep Reinforcement Learning€¦ · Reinforcement Learning in a nutshell RL is a general-purpose framework for decision-making I RL is for an agent with the capacity to

Manix Au - Automatic State Construction using DT for RL agen · Manix Au 04.03.2004 . 8 Introduction To The Thesis Reinforcement Learning Reinforcement Learning (RL) is a computational

Novelty Search for Deep Reinforcement Learning Policy ... · Reinforcement learning (RL) problems o›en feature deceptive local optima, and learning methods that optimize purely

A2-RL: Aesthetics Aware Reinforcement Learning for Image … · 2018-03-13 · A2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping Debang Li 1;2, Huikai Wu , Junge Zhang

Reinforcement Learning An Introduction and …MLSecurity/talks/pascal.pdf–Automated structure learning,sum-product networks •Reinforcement learning –Constrained RL,motion-oriented

Soar-RL: Reinforcement Learning and Soar

Unsupervised Methods For Subgoal Discovery During Intrinsic …rafati.net/slides/KEG-2019-slides.pdf · 2019. 11. 26. · Reinforcement Learning Reinforcement learning (RL) is learning

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real · 2020. 6. 29. · RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real Kanishka Rao1, Chris Harris1, Alex Irpan1,

Distributed Reinforcement Learning with ADMM-RL › docs › fy19osti › 74798.pdfDistributed Reinforcement Learning with ADMM-RL Peter Graf, Jen Annoni, Chris Bay, Devon Sigler,

Introduction to Reinforcement Learning - Inriachercheurs.lille.inria.fr/~lazaric/Webpage/EC-RL_Course15_files/... · EC-RL Course Introduction to Reinforcement Learning ... Optimal

AI ASSISTED ANNOTATOR USING REINFORCEMENT LEARNING · 2020. 6. 15. · AI Assisted Annotator using Reinforcement Learning A PREPRINT Reinforcement Learning (RL) is a computational

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Realopenaccess.thecvf.com/content_CVPR_2020/papers/Rao_RL... · 2020-06-07 · RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

A2-RL: Aesthetics Aware Reinforcement Learning for Image ...openaccess.thecvf.com/content_cvpr_2018/papers_backup/Li_A2-RL... · tomatic image cropping. The A2-RL model can ﬁnish

Reinforcement Learning - Yisong Yue · Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents 1 Reinforcement Learning Introduction to RL. Markov Decision Processes.