¥BOOKER,LASHON LANGFORD,JOHN ¥PROPER,SCOTT Idea for ...mlittman/rl3/rl2/fs04-intro.pdf · Taxi...

Introduction to Real-LifeReinforcement Learning

Michael L. LittmanRutgers University

Department of Computer Science

Brief History

Idea for symposium came out of a discussion Ihad with Satinder Singh @ ICML 2003 (DC).

Both were starting new labs. Wanted tohighlight an important challenge in RL.

Felt we could help create some momentum bybringing together like minded researchers.

Attendees (Part I)

• ABRAMSON,MYRIAM GREENWALD,LLOYD

• BAGNELL,JAMES GROUNDS,MATTHEW

• BENTIVEGNA,DARRIN JONG,NICHOLAS

• BLANK,DOUGLAS LANE,TERRAN

• BOOKER,LASHON LANGFORD,JOHN

• DIUK,CARLOS LEROUX,DAVE

• FAGG,ANDREW LITTMAN,MICHAEL

• FIDELMAN,PEGGY MCGLOHON,MARY

• FOX,DIETER MCGOVERN,AMY

• GORDON,GEOFFREY MEEDEN,LISA

Attendees (More)

• MIKKULAINEN,RISTO

• MUSLINER,DAVID

• PETERS,JAN

• PINEAU,JOELLE

• PROPER,SCOTT

Definitions

What is “reinforcement learning”?

• Decision making driven to maximize ameasurable performance objective.

What is “real life”?

• “Measured” experience. Data doesn’tcome from a model with known or pre-defined properties/assumptions.

Multiple Lives

• Real-life learning (us): use real data,possibly small (even toy) problems

• Life-sized learning (Kaelbling): large statespaces, possibly artificial problems

• Life-long learning (Thrun): Same learningsystem, different problems (somewhatorthogonal)

Find The Ball

Learn:

• which way to turn

• to minimize steps

• to see goal (ball)

• from camera input

• given experience.

The RL Problem

Input: <s1, a1, s2, r1>, <s2, a2, s3, r2>, …, st

Output: ats to maximize discounted sum of ris.

, right, , +1

Problem Formalization: MDP

Most popular formalization: Markov decision process

Assume:

• States/sensations, actions discrete.

• Transitions, rewards stationary and Markov.

• Transition function: Pr(s’|s,a) = T(s,a,s’).

• Reward function: E[r|s,a] = R(s,a).

• Optimal policy !*(s) = argmaxa Q*(s,a)

• where Q*(s,a) = R(s,a) + " #s’ T(s,a,s’) maxa’ Q*(s’,a’)

Find the Ball: MDP Version

• Actions: rotate left/right

• States: orientation

• Reward: +1 for facing ball, 0 otherwise

It Can Be Done: Q-learning

Since optimal Q function is sufficient, useexperience to estimate it (Watkins & Dayan 92)

Given <s, a, s’, r>:Q(s,a) $ Q(s,a) + %t(r + " maxa’ Q(s’,a’) – Q(s,a) )

• all (s,a) pairs updated infinitely often

• Pr(s’|s,a) = T(s,a,s’), E[r|s,a] = R(s,a)

• #%t = !, #%t 2 < !

Then: Q(s,a) & Q*(s,a)

Real-Life Reinforcement Learning

Emphasize learning with real* data.

Q-learning good, but might not be right here…

Mismatches to “Find the Ball” MDP:

• Efficient exploration: data is expensive

• Rich sensors: never see the same thing twice

• Aliasing: different states can look similar

• Non-stationarity: details change over time

* Or, if simulated, from simulators developed outsidethe AI community

RL2: A Spectrum

Unmodified physical world

Controlled physical world

Electronic-only world

Pure math world

Detailed simulation

Lab-created simulation

RLRL gray zone

Unmodified Physical World

weight loss (BodyMedia)

helicopter (Bagnell)

Controlled Physical World

Mahadevan and Connell, 1990

Electronic-only World

Recovery from corruptednetwork interfaceconfiguration.

Java/Windows XP:Minimize time to repair.

Littman, Ravi, Fenson,Howard, 2004

After 95 failure episodes

Learning to sort fastLittman & Lagoudakis

Pure Math World

backgammon (Tesauro)

Detailed Simulation

• Independently developed

elevator control (Crites, Barto)

RARS video game

Robocup Simulator

Lab-created Simulation

Car on the Hill

Taxi World

The Plan

Talks, Panels

Talk slot: 30 minutes, shoot for 25 minutes toleave time for switchover, questions, etc.

Try plugging in during a break.

Panel slot: 5 minutes per panelist (slidesoptional), will use the discussion time

Friday, October 22nd, AM

9:00 Michael Littman, Introduction to Real-lifeReinforcement-learning

9:30 Darrin Bentivegna, Learning From Observationand Practice Using Primitives

10:00 Jan Peters, Learning Motor Primitives with Reinforcement Learning

10:30 break

11:00 Dave LeRoux, Instance-Based Reinforcement Learning on the Sony Aibo Robot

11:30 Bill Smart, Applying Reinforcement Learning toReal Robots: Problems and Possible Solutions

12:00 HUMAN-LEVEL AI PANEL, Roy

12:30 lunch break

Friday, October 22nd, PM

2:00 Andy Fagg, Learning Dexterous ManipulationSkills Using the Control Basis

2:30 Dan Stronger, Simultaneous Calibration of Action and Sensor Models on a Mobile Robot

3:00 Dieter Fox, Reinforcement Learning for SensingStrategies

3:30 break

4:00 Roberto Santiago, What is Real Life? Using Simulation to Mature Reinforcement Learning

4:30 OTHER MODELS PANEL, Diuk, Greenwald, Lane

5:00 Gerry Tesauro, RL-Based Online Resource Allocation in Multi-Workload Computing Systems

5:30 session ends

Saturday, October 23rd, AM

9:00 Drew Bagnell, Practical Policy Search

9:30 John Moody, Learning to Trade via Direct Reinforcement

10:00 Risto Miikkulainen, Learning Robust Control andComplex Behavior Through Neuroevolution

10:30 break

11:00 Michael Littman, Real Life MultiagentReinforcement Learning

11:30 MULTIAGENT PANEL, Stone, Reidmiller, Moody, Bowling

12:00 HIERARCHY/STRUCTURED REPRESENTATIONSPANEL, Tadepalli, McGovern, Jong, Grounds

12:30 lunch break

rtific

Saturday, October 23rd, PM

2:00 Lisa Meeden, Self-Motivated, Task-IndependentReinforcement Learning for Robots

2:30 Marge Skubic and David Noelle, A BiologicalInspired Adaptive Working Memory for Robots

3:00 COGNITIVE ROBOTICS PANEL, Blank, Noelle,Booksbaum

3:30 break

4:00 Peggy Fidelman, Learning Ball Acquisition andFast Quadrupedal Locomotion on a Physical Robot

4:30 John Langford, Real World Reinforcement Learning Theory

5:00 OTHER TOPICS PANEL, Abramson, Proper, Pineau

5:30 session ends

Cognitiv

obotics

Sunday, October 24th, AM

9:00 Satinder Singh, RL for Human Level AI

9:30 Geoff Gordon, Learning Valid Predictive Representations

10:00 Yasutake Takahashi, Abstraction of State/Actionbased on State Value Function

10:30 break

11:00 Martin Reidmiller/Stephan Timmer, RL for technical process control

11:30 Matthew Taylor, Speeding Up Reinforcement Learning with Behavior Transfer

12:00 Discussion: Wrap Up, Future Plans

12:30 symposium ends

Plenary

Saturday (tomorrow) night

6pm-7:30pm Plenary

Each symposium gets a 10-minute slot

Ours: Video. I need today’s speakers to joinme for lunch and also immediately after thesession today.

Darrin’s Summary

• extract features

• domain knowledge

• function approximators

• bootstrap learning/behavior transfer

• improve current skill

• learn skill initially using other methods

• start with low-level skills

What Next?

• Collect successes to point to– Contribute to newly created page:

http://neuromancer.eecs.umich.edu/cgi-bin/twiki/view/Main/SuccessesOfRL

– We’re already succeeding (ideas are spreading)

– rejoice: control theorists are scared of us

• Sources of information– This workshop web site:

http://www.cs.rutgers.edu/~mlittman/rl3/rl2/ .

– Will include pointers to slides, papers

– Can include twiki links or a pointer from RL repository.

– Michael requesting slides / URLs / videos (up front).

– Newly created Myth Page:http://neuromancer.eecs.umich.edu/cgi-bin/twiki/view/Main/MythsofRL

Other Activities

• Possible Publication Activities– special issue of a journal (JMLR? JAIR?)

– editted book

– other workshops

– guidebook for newbies

– textbook?

• Benchmarks– Upcoming NIPS workshop on benchmarks

– We need to push for including real-life examples

– greater set of domains, make an effort to widenapplications

Future Challenges

• How can we better talk about the inherent problemdifficulty? Problem classes?

• Can we clarify the distinction between controltheory and AI problems?

• Stress making sequential decisions (outside roboticsas well).

• What about structure? Can we say more?

• Need to encourage a fresh perspective.

• Help convey how to see problems as RL problems.

¥BOOKER,LASHON LANGFORD,JOHN ¥PROPER,SCOTT Idea for ...mlittman/rl3/rl2/fs04-intro.pdf · Taxi...

Documents

Transcript of ¥BOOKER,LASHON LANGFORD,JOHN ¥PROPER,SCOTT Idea for ...mlittman/rl3/rl2/fs04-intro.pdf · Taxi...

Lashon Academy Charter School · 2016-08-16 · Lashon Academy Charter School Parent and Student Handbook 2016 -2017 School Year . Page 2 of 37 Table of Contents ... Angela Gleason

2018-2019 - Lashon Academy › wp-content › uploads › 2019 › ...Member 4 ABAGI, LILIA Fire Suppression / HazMat Team Team leader PORTER, MICHAEL Member 2/Alternate Leader GRANDI,

Observing Users Paul Bogen, LaShon Johnson, Jehoon Park.

Welcome to Petah TikvahAsher bachar banu mi kol am, V'rom'manu mi kol lashon, V'kidshanu b'mitzvotav. Vatiten lanu Adonai Eloheynu b'ahavah, Moadim I'simcha, chagim u'zmanirn I'Sásðn.

The Yahweh Name (Exodus 6:6-8). Lithuanian born Eliezer Ben-Yehuda Founded The Council for the Hebrew language (Va’ad ha-Lashon ha-'ivrit) “The Yahweh.

Last time we met we discussed Shtika and Shmirat Lashon which means Silence and Careful Speech. I know how hard it is to be quiet sometimes and how really.

CSCI 1951C Designing Humanity Centered Robotscs.brown.edu/~mlittman/courses/cs1951c/FINAL_HANDOUT...CSCI 1951C Designing Humanity Centered Robots Fall 2015 9:00 AM- 12:00PM T TH Instructors:

Geremy LaShon White

Lashon HaRa - The Power of Words םילימ לש ןחוכ - … · Lashon Hara - The Power of Words Not For Distribution 3. Jo’s apologies mean nothing at this point. Only now,

Saturation at the LHeClhec.web.cern.ch/sites/lhec.web.cern.ch/files/forshaw.DIS08.pdf · DVCS 100 1000 W [GeV] 10 20 30! [nb] FS04 sat FS04 no sat CGC ZEUS (e+p) ZEUS (e-p) Q2 = 9.6

CAE OAA Safety Briefing Template Doc Ref No. QCS-TRM-8 Template Revision: 0 Template Owner: GM-SQC FS04.

mlittman/landau_zener.pdf · Created Date: 8/1/2002 10:47:28 AM

Maximum Likelihood Inverse Reinforcement Learningcs.brown.edu/~mlittman/theses/babes.pdf · ABSTRACT OF THE DISSERTATION MAXIMUM LIKELIHOOD INVERSE REINFORCEMENT LEARNING by MONICA

JUST-IN-TIME AND JUST-IN-PLACE DEADLOCK RESOLUTIONcs.brown.edu/~mlittman/theses/zeng.pdfJust-in-time and Just-in-place Deadlock Resolution by Fancong Zeng Dissertation Director: Prof.

HOWDY! AS REQUESTED BY POPULAR DEMAND, THE …mlittman/courses/cs105-06b... · Lets open (run) the file by double clicking the icon on our desktop, named “python-2.4.2”. You will

PROBABLY APPROXIMATELY CORRECT (PAC) EXPLORATION IN ...cs.brown.edu/~mlittman/theses/strehl.pdf · use a framework (PAC-MDP) based on the Probably Approximately Correct (PAC) framework

LaShon Harah avoiding evil speech - storage.googleapis.comstorage.googleapis.com/wzukusers/user-19466297... · LaShon Harah avoiding evil speech 1 LaShon Harah avoiding Evil Speech

IMPROVED EMPIRICAL METHODS IN REINFORCEMENT …cs.brown.edu/~mlittman/theses/marivate.pdf · IMPROVED EMPIRICAL METHODS IN REINFORCEMENT-LEARNING EVALUATION BY VUKOSI N. MARIVATE

Abstractof“AlgorithmsforthePersonalizationofAIforcs.brown.edu/~mlittman/theses/brawner.pdfAbstractof“AlgorithmsforthePersonalizationofAIfor RobotsandtheSmartHome”byStephenBrawner,Ph.D.,BrownUniversity,May2018.

Council of Graduate Coordinators and Staff (CGCS) Meeting ...grad.mst.edu/media/administrative/grad/documents/Sept17_2010.pdf · campus fs04 fs05 fs06 fs07 fs08 fs09 fs10 ct 2 6 12