Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Efficient Policy GradientOptimization/Learning of Feedback

Controllers

Chris Atkeson

Punchlines

• Optimize and learn policies.

Switch from “value iteration” to “policy iteration”.

• This is a big switch from optimizing and learning value functions.

• Use gradient-based policy optimization.

Motivations

• Efficiently design nonlinear policies

• Make policy-gradient reinforcement learning practical.

Model-Based Policy Optimization• Simulate policy u = π(x,p) from some initial

states x0 to find policy cost.

• Use favorite local or global optimizer to optimize simulated policy cost.

• If gradients are used, they are typically numerically estimated.

• Δp = -ε ∑x0w(x0)Vp 1st order gradient

• Δp = -(∑x0w(x0)Vpp)-1 ∑x0w(x0)Vp 2nd order

Can we make model-based policy gradient more efficient?

Analytic Gradients• Deterministic policy: u = π(x,p) • Policy Iteration (Bellman Equation):

Vk-1(x,p) = L(x,π(x,p)) + V(f(x,π(x,p)),p)

• Linear models: f(x,u) = f0 + fxΔx + fuΔu

L(x,u) = L0 + LxΔx + LuΔu

π(x,p) = π0 + πxΔx + πpΔp

V(x,p) = V0 + VxΔx + VpΔp

• Policy Gradient:

Vxk-1 = Lx + Luπx + Vx(fx + fuπx)

Vpk-1 = (Lu + Vxfu)πp + Vp

Handling Constraints

• Lagrange multiplier approach, with constraint violation value function.

Vpp: Second Order Models

Regularization

LQBR: Linear (dynamics) Quadratic (cost) Bilinear (policy) Regulator

Timing Test

Antecedents

• Optimizing control “parameters” in DDP: Dyer and McReynolds 1970.

• Optimal output feedback design (1960s-1970s)

• Multiple model adaptive control (MMAC)

• Policy gradient reinforcement learning

• Adaptive critics, Werbos: HDP, DHP, GDHP, ADHDP, ADDHP

When Will LQBR Work?

• Initial stabilizing policy is known (“output stabilizable”)

• Luu is positive definite.

• Lxx is positive semi-definite and (sqrt(Lxx),Fx) is detectable.

• Measurement matrix C has full row rank.

Locally Linear Policies

Local Policies

Cost Of One Gradient Calculation

Continuous Time

Other Issues

• Model Following

• Stochastic Plants

• Receding Horizon Control/MPC

• Adaptive RHC/MPC

• Combine with Dynamic Programming

• Dynamic Policies -> Learn State Estimator

Optimize Policies

• Policy Iteration, with gradient-based policy improvement step.

• Analytic gradients are easy.

• Non-overlapping sub-policies make second order gradient calculations fast.

• Big problem: How choose policy structure?

Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Documents

Transcript of Efficient Policy Gradient Optimization/Learning of Feedback Controllers

Robust event-triggered output feedback controllers for ...

Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Feedback Scheduling of Model Predictive Controllers …lup.lub.lu.se/search/ws/files/5417203/625676.pdf · Feedback Scheduling of Model Predictive Controllers Dan Henriksson, Anton

InTech-Experimental Evaluation of Output Feedback Tracking Controllers for Robot Manipulators

gradient 1 gradient 2 gradient 3 gradient 4 ECDIS buyers · PDF fileECDIS buyers guide gradient 1 gradient 2 gradient 3 gradient 4 gradient 1 gradient 2 gradient 3 gradient 4 10436

Iterative feedback tuning of wind turbine controllers · @J @ˆ (î); (2) where iis the iteration number, @J=@ˆ(î) the gradient of the cost function Eq. (1), Ria positive definite

PRO PTZ Camera ControlLERS - Panasonic USA...recalls, and smoother camera movements via a responsive joystick which offers physical feedback. Hardware controllers Hardware controllers

Feedback Fundamentals · 2020-01-13 · Feedback Fundamentals 1. Introduction 2. Controllers with Two Degrees of Freedom 3. The Gangs of Four and Seven 4. The Sensitivity Functions

Stability Analysis of Feedback Control Systemsinside.mines.edu/~jjechura/ProcessDynamics/14_ControllerTuning.pdf · A typical approach for tuning PID controllers is as follows: •

PID and state feedback controllers using DNA strand ...PID and state feedback controllers using DNA strand displacement reactions Nuno M. G. Paulino y1, Mathias Foo 2, Jongmin Kim

Feedback Controllers R12

MCEN 467 – Control Systems Chapter 4: Basic Properties of Feedback Part D: The Classical Three- Term Controllers.

Robustness analysis of DNA-based biomolecular feedback ... · biomolecular feedback controllers to parametric and time delay uncertainties. in Proceedings - 2016 IEEE Biomedical Circuits

Output Feedback Passivity Based Controllers for Dynamic Positioning of Ships

Implementing Nonlinear Feedback Controllers using DNA ...DNA-based feedback controllers, the biomolecular process to be controlled here is both dynamic and nonlinear. Note also that

gradient 1 gradient 2 gradient 3 gradient 4 ECDIS … Buyers Guide v2 0 19...ECDIS buyers guide gradient 1 gradient 2 gradient 3 gradient 4 gradient 1 gradient 2 gradient 3 gradient

Cardiopulmonary Anatomy Physiology Essentials of ...€¦ · Feedback A A pressure gradient is defined as the difference in pressures occuring between two points. B A pressure gradient

Control System Design Guide, Third Edition: Using Your Computer to Understand and Diagnose Feedback Controllers

Performance Analysis and Feedback Control of ATRIAS, A Three-Dimensional Bipedal Robot · 2017. 12. 15. · Bipedal Robot This paper develops feedback controllers for walking in 3D,

Chapter 7 Feedback Controllers. On-off Controllers Simple Cheap Used In residential heating and domestic refrigerators Limited use in process control.