Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf ·...

38
Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Transcript of Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf ·...

Page 1: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

ReinforcementLearningforCPSSafetyEngineering

SamGreen,Çetin KayaKoç,Jieliang LuoUniversityofCalifornia,SantaBarbara

Page 2: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Motivations

Page 3: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Safety-criticaldutiesdesiredbyCPS?

• Autonomousvehiclecontrol:UAV,passengervehicles,deliverytrucks• Automaticallyrespondingto,orpreventing,damage• Industrialrobotcontrolforusearoundhumans• Largeprocessautomation• E.g.,optimizationoffactory

Page 4: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

ReinforcementLearning

Page 5: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

GeorgiaTech,https://www.youtube.com/watch?v=f2at-cqaJMM

Page 6: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Deepmind,https://arxiv.org/abs/1707.02286

Page 7: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

MachineLearning

Supervised Unsupervised Reinforcement

Page 8: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

IntroductiontoRL

• Acomputationalapproachtolearningfrominteraction• Establishedinthe1980s• Objectiveistotakeactionstomaximizeareward(orminimizeacost)• SeenasapathtowardArtificialGeneralIntelligence

• RLisattheintersectionbetween• Psychology• ControlTheory• ComputerScience/AI

• Resurgencewithadventofdeeplearningmethods

Page 9: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

[Mnih,etal.AsynchronousMethodsforDeepReinforcementLearning,2016]

AdvancesinRLsince2015

20152015201520152015201620162016

Page 10: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Terminology

• Agent – Thethingwearelearningtocontrol• Environment – Allthefactorsaffectingtheagent• Action – Performedbyagentinanattempttoaffectchangeontheenvironment• Reward – Returnedbytheenvironmenttotheagentaftertheagentmakesanaction.Usedtohelptheagentlearn.• AKAthenegativecost

Page 11: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

[R.Sutton,andA.Barto.ReinforcementLearning:AnIntroduction.2016]

Page 12: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

MarkovDecisionProcess

• WhatRLsolves• Environmentswhereagent’sdecisionsareonlydependentonpresent• Anobjectinflight• Self-drivingcar• Manufacturingprocess• Robotcontrol

• It’snotthatthepastdoesn’tmatter,butthelawsofphysicsguaranteecertainthings,e.g.momentum• MethodsalsoexisttosolveapproximateMDP

Page 13: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Example:StudentMarkovChain

[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf]

Starthereatthebeginningofeachepisode

Page 14: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

RLforCPSSafetyEngineering

• InterdisciplinarynaturesmakesRLinterestingforCPSengineering• AI,ML(Math,Statistics)• Mechanicsdesignandsimulation(ME,Physics,CS)• Programmingandimplementation(CS,EE)

Page 15: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

MountainCarExample

Page 16: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

• Agentisanunderpoweredcarwith3actions:• Backward,Neutral,Forward

• Reward:=-1pertimestep• Implicitgoal:=Reachtheflagasfastaspossible

• State:=x-pos andvelocity

Canonicalexample:MountainCar

[R.Sutton,andA.Barto.ReinforcementLearning:AnIntroduction.2016]

Page 17: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Model-FreeControlviaPolicy-BasedRL• Asimplephysicsmodeldeterminesthebehaviorofcar• Capturespositionofthecaronthehill• Captureseffectoflimitedenginepower

• Usingaphysicsmodelsimplifiesapproach• Useanefficienttraditionalcontroller

• Butinmanyscenariosthemodelisnotavailableortoocomplex• Amazonpackagedeliverydrone

• Solvemountaincarusingsophisticatedmethodastoyexample• Directlytrainaneuralnetwork-basedpolicy

Page 18: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo
Page 19: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

RLTerminologyandNotation

• 𝑆𝑡 – Stateoftheenvironmentattime𝑡• x-axispositionandvelocity

• 𝐴𝑡 – Actiontakenbyagentattime𝑡• Backward,Neutral,Forward

• 𝜋 – Thepolicyfunction;returnsthenextactiontotake.Stochasticinthisexample• 𝜃– Aparametervectorforthepolicy;i.e.theweightslearnedinaneuralnetwork

Puttingeverythingtogether:𝐴'()~𝜋𝜃 𝐴𝑡,𝑆𝑡 = 𝑃(𝐴𝑡|𝑆𝑡, 𝜃)

Page 20: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Thepolicy𝜋𝜃• 𝜋𝜃 isoftenapproximated• Deepneuralnetworksarepowerforapproximation• WewillusegradientascenttooptimizetheDNN

Page 21: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Thepolicyfunction𝜋𝜃,approximatedbyNN

• Stateinformationattime𝑡:• PositionandVelocity

• Actionoptionsattime𝑡:• Forwardacceleration• Neutral• Backwardacceleration

PositionVelocity

Input Output

𝜋𝜃Prob(F)Prob(N)Prob(B)

Page 22: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Rewardfunction• Ateverytimesteptakeanaction• Forward,neutral,backward• Eachactionhasarewardof-1• Trainagenttoreachtheflaginminimumtimesteps

Page 23: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Example:MarkovRewardProcess

[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf]

Starthereatthebeginningofeachepisode

Page 24: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

HowtotraintheNN?

• Smallnetworkscanbeeffectivelytrainedwithgeneticalgorithms• Geneticalgorithmsworkpoorlywithlargenetworks(parameterspaceistoolarge)• Gradient-ascentoptimizationworkswithlargeparameterspace Position

Velocity

Prob(F)Prob(N)Prob(B)

𝜋𝜃

Page 25: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Monte-CarloPolicyGradient(REINFORCE)

• FindDNNparametervector𝜃 suchthat𝜋𝜃 maximizesthereward• Foreveryepisode,untilflagisreached• Getstateinformation(position&velocity)fromenvironment• FeedNNwithstateinformation• NNwilloutputaprobabilityfor(F)orward,(N)eutral,and(B)ackward• RandomlyselectactionF,N,andB(usingtheaboveprobabilities)• Storethestateinformationandactiontaken

• Onceflagisreached• Assignthemostrewardtothelastaction…leastrewardtothefirstaction• Update𝜃 s.t. actionsmadeattheendaremoreprobable

[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html]

Page 26: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Monte-CarloPolicyGradient

• Methodleveragesmethodscreatedforsupervisedlearning• Inputs≔ thestateinformation(position,velocity)• Predictions:=forward,neutral,orbackwardactiontaken• Labels(“groundtruth”):=Aftertheepisodewasover,assignmostvaluetothelastactions.Assignleastvaluetothefirstactions

• Runmanyepisodes,aftereachepisodefinishes(flagisreached)strengthenthenetworksuchthatthelastmovesbecomemoreprobable

[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html]

Page 27: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Gradient-ascent

• Gradientalgorithmsfindalocalextremum• Atendofeachepisode,adjusteachparameterin𝜃 s.t. actionsmadeneartheendarestrengthened• Howmuchandinwhichdirectiontomoveeachparameterisdeterminedbythebackpropagationmethod

𝜃1𝜃2

EpisodeRewards

Page 28: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo
Page 29: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Caveats

• DeepRLisusuallyslowtolearn

• Transferringknowledgefromoneproblemtoanotherisdifficult

• Rewardfunctioncanbecomplex

Page 30: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

SafetyandSecurityConsiderations

Page 31: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo
Page 32: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

SafetyandSecurityConsiderations

• DNNsareblack-boxmodels• PossibletogiveaninputwhichcausesDNNtoprovidewildoutput

• Effortstomitigatethislimitation• E.g.ConstrainedPolicyOptimization

Page 33: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

ConstrainedPolicyOptimization

• School-bookRLspecifiesonlytherewardfunction• Problem:whenanagentislearning,itmaytryanything• Potentiallyunsafewhentrainingisinphysicalenvironment

• Constraintscanbeaddedtotheobjectivefunction

[Achiam etal.“ConstrainedPolicyOptimization”,2017]

Page 34: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

CurrentEfforts

Page 35: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

DevelopingRLforQuadcopterControl• GoodcasestudyforcomplexautonomousCPS• Collisionavoidance• Targettracking• Packagedelivery

• Usingopensourcefirmwareandhardware

Page 36: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

UsingMicrosoftAirSim for1st-orderlearning

[S.Shahetal.AirSim:High-FidelityVisualandPhysicalSimulationforAutonomousVehicles.2017.]

Page 37: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Conclusions

• RLisageneralizablemethodtotacklemanyCPSdecisionmakingproblems• High-capacitymodelscanmakesophisticateddecisions

• GoodapproachforCPSeducation,becauseofinterdisciplinarynature

• Openproblemswhenusingblack-boxfunctionsforsafetyapplications

Page 38: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo

Questions?