A Quick Look at the “Reinforcement Learning”...

MVA-RL Course

A Quick Look at the“Reinforcement Learning” course

A. LAZARIC (SequeL Team @INRIA-Lille)ENS Cachan - Master 2 MVA

SequeL – INRIA Lille

Why

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 2/16

Why: Important Problems



I Autonomous robotics

I Elder careI Exploration of

unknown/dangerousenvironments

I Robotics for entertainment



I Autonomous roboticsI Elder care

I Exploration ofunknown/dangerousenvironments




I Autonomous roboticsI Elder careI Exploration of

unknown/dangerousenvironments




I Autonomous roboticsI Financial applications

I Trading execution algorithmsI Portfolio managementI Option pricing




I Trading execution algorithms

I Portfolio managementI Option pricing




I Trading execution algorithmsI Portfolio management

I Option pricing




I Trading execution algorithmsI Portfolio managementI Option pricing



I Autonomous roboticsI Financial applicationsI Energy management

I Energy grid integrationI Maintenance schedulingI Energy market regulationI Energy production

management




I Energy grid integration

I Maintenance schedulingI Energy market regulationI Energy production

management




I Energy grid integrationI Maintenance scheduling

I Energy market regulationI Energy production

management




I Energy grid integrationI Maintenance schedulingI Energy market regulation

I Energy productionmanagement




I Energy grid integrationI Maintenance schedulingI Energy market regulationI Energy production

management



I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systems

I Web advertisingI Product recommendationI Date matching




I Web advertising

I Product recommendationI Date matching




I Web advertisingI Product recommendation

I Date matching




I Web advertisingI Product recommendationI Date matching



I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systemsI Social applications

I Bike sharing optimizationI Election campaignI ER service optimizationI Intelligent Tutoring Systems



I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systemsI Social applications I Bike sharing optimization

I Election campaignI ER service optimizationI Intelligent Tutoring Systems




I Election campaign

I ER service optimizationI Intelligent Tutoring Systems




I Election campaignI ER service optimization

I Intelligent Tutoring Systems




I Election campaignI ER service optimizationI Intelligent Tutoring Systems


What


What: Decision-Making under Uncertainty

Agent

Environment

state /actuationaction /

perception


How: Reinforcement Learning

Reinforcement learning is learning what to do – how tomap situations to actions – so as to maximize a

numerical reward signal in an unknown uncertainenvironment. The learner is not told which actions to

take, as in most forms of machine learning, but she mustdiscover which actions yield the most reward by tryingthem (trial–and–error). In the most interesting and

challenging cases, actions may affect not only theimmediate reward but also the next situation and,

through that, all subsequent rewards (delayed reward).

“An introduction to reinforcement learning”,Sutton and Barto (1998).


How: the Course

Agent

Environment

state /actuationaction /

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty


What: the Highlights of the CourseHow to model an RL problem

I What: Markov decision processI Tools: probability, processes, Markov chain

How to solve exactly an RL problem

How to solve incrementally an RL problem

How to efficiently explore in an RL problem

How to solve approximately an RL problem

With examples from resource optimization, trade execution,(computer) games, recommendation systems.



How to solve exactly an RL problemI What: Dynamic programmingI Tools: fixed point, operators








How to solve incrementally an RL problemI What: temporal difference, Q-learningI Tools: stochastic approximation








How to efficiently explore in an RL problemI What: multi-armed bandit problemI Tools: concentration inequalities








How to solve approximately an RL problemI What: approximate dynamic programmingI Tools: statistical learning theory



What: the Highlights of the Course

How to model an RL problem







When/What/Where

I 7 lectures

I 4 practical sessions (and homework) [1 point each]

I 1 final project (report and oral presentation) [16 points]

Opportunities for spring internship and Ph.D. positions.


When/What/Whereresearchers.lille.inria.fr/˜lazaric/Webpage/Teaching.html

Date Topic Classroom29/09 Intro/MDP Conference06/10 Dynamic Programming Condorcet13/10 RL Algorithms Condorcet20/10 TP on DP and RL Condorcet27/10 Multi-arm Bandit (1) Condorcet03/11 TP on Bandit Amphi Curie10/11 Multi-arm Bandit (2) [projects] Amphi Curie17/11 TP on Bandit Condorcet24/11 Approximate DP Condorcet01/12 TP on ADP Condorcet08/12 Sample Complexity of ADP Condorcet15/12 Guest lecture (TBD)

mid-Jan Evaluation (TBD)

Lectures are from 11am to 1pm, TP from 11am to 1:15pm.


Reinforcement Learning

Alessandro [email protected]

sequel.lille.inria.fr

A Quick Look at the “Reinforcement Learning”...

Documents

Transcript of A Quick Look at the “Reinforcement Learning”...