Iterative LQR & Model Predictive Control1 Sanjiban Choudhury Iterative LQR & Model Predictive...

1

Sanjiban Choudhury

Iterative LQR & Model Predictive Control

TAs: Matthew Rockett, Gilwoo Lee, Matt Schmittle

Content from Drew Bagnell, Pieter Abeel

Table of Controllers

2

Control Law Uses model Stability Guarantee

Minimize Cost

PID No No No

Pure Pursuit Circular arcs Yes - with assumptions No

Lyapunov Non-linear Yes No

LQR Linear Yes Quadratic

Can we use LQR to swing up a pendulum?

3✓ = ⇡

✓ = 0


3✓ = ⇡

✓ = 0

No! (Large angles imply

large linearization error)


3✓ = ⇡

✓ = 0

No! (Large angles imply

large linearization error)

But we can track a reference swing up trajectory

(small linearization error)

4

But, first we need to talk about time-varying systems

Today’s objectives

5

1. LQR for time-varying systems

2. Trajectory following with iLQR

3. General nonlinear trajectory optimization with iLQR

4. Model predictive control (MPC)

LQR for Time-Varying Dynamical Systems

6

xt+1 = Atxt +Btut


6

xt+1 = Atxt +Btut

c(xt, ut) = x

Tt Qtxt + u

Tt Rtut


6

xt+1 = Atxt +Btut

c(xt, ut) = x

Tt Qtxt + u

Tt Rtut

Straight forward to get LQR equations

Kt = �(Rt +BTt Vt+1Bt)

�1BTt Vt+1At

Vt = Qt +KTt RtKt + (At +BtKt)

TVt+1(At +BtKt)

7

Why do we care about time-varying?

Ans: Linearization about a trajectory

Trajectory tracking for stationary rolls?

8

−10

1 −2

0

2

−3

−2

−1

0

1

2

3

Easting (m)Northing (m)

Dow

n (

m)

9

How do we get such behaviors?

−10

−5

0

5

−6−4

−20

24

6

−2

−1

0

1

2

Easting (m)Northing (m)

Dow

n (m

)

Nose-in funnel Stationary rolls

Figure 1: (Best viewed in color.) (a) Series of snapshots throughout an autonomous flip. (b) Series of snapshotsthroughout an autonomous roll. (c) Overlay of snapshots of the helicopter throughout a tail-in funnel. (d)Overlay of snapshots of the helicopter throughout a nose-in funnel. (See text for details.)

Figure 1: (Best viewed in color.) (a) Series of snapshots throughout an autonomous flip. (b) Series of snapshotsthroughout an autonomous roll. (c) Overlay of snapshots of the helicopter throughout a tail-in funnel. (d)Overlay of snapshots of the helicopter throughout a nose-in funnel. (See text for details.)

Task: Minimize tracking error

10

minu0,u1,...,uT�1

T�1X

t=0

c(xt, ut)

subject to xt+1 = f(xt, ut) 8t

In this scenario, cost is simply a quadratic tracking cost

Why is this a hard optimization problem?

11

Iterative LQR (iLQR)

i=0 i=10 i=100

Start by guessing a control sequence, Forward simulate dynamics, Linearize about trajectory, Solve for new control sequence

and repeat!

Step 1: Get a reference trajectory

12

6

4

22

1

5 0

0

-1

-2

-2

Down (m)

0 -4

-6-5

Note: Simply executing open loop trajectory wont work!

x

ref0 , u

ref0 , x

ref1 , u

ref1 , . . . , x

refT�1, u

refT�1

Step 2: Initialize your algorithm

13

Choose initial trajectory at iteration 0 to linearize about

x

0(t), u0(t) = {x00, u

00, x

01, u

01, . . . , x

0T�1, u

0T�1}


13


x

0(t), u0(t) = {x00, u

00, x

01, u

01, . . . , x

0T�1, u

0T�1}

It’s a good idea to choose the reference trajectory


13


x

0(t), u0(t) = {x00, u

00, x

01, u

01, . . . , x

0T�1, u

0T�1}

It’s a good idea to choose the reference trajectory

Initialization is very important! We will be perturbing this initial trajectory

Step 3: Linearize your dynamics!

14

At a given iteration i, we are going to linearize about

x

i0, u

i0, x

i1, . . .


14

At a given iteration i, we are going to linearize about

x

i0, u

i0, x

i1, . . .

Change of variable - we will track the delta perturbations

�xt = xt � x

it

�ut = ut � uit


15

Reference Trajectory

Linearization Trajectory

x

it

16

Step 3: Linearize your dynamics!�xt = xt � x

it �ut = ut � ui

t

16



t

�xt+1 = At�xt +Bt�ut + (f(xit, u

it)� x

it+1)

A

t

=@f

@x

��x

it

Bt =@f

@u

��uit

16



t

�xt+1 = At�xt +Bt�ut + (f(xit, u

it)� x

it+1)

A

t

=@f

@x

��x

it

Bt =@f

@u

��uit

This is an affine system, not linear

17


Homogenous coordinate system

�xt

1

�

�xt+1

1

�=

At+1 f(xi

t, uit)� x

it+1

0 1

� �xt

1

�+

Bt+1

0

� �ut

0

�

Affine dynamics is now linear!

At Bt

�ut

18

Step 4: Quadricize cost about trajectoryOur cost function is already quadratic, otherwise we would apply

Taylor expansion

c(xt, ut) = (xt � x

reft )TQ(xt � x

reft ) + (ut � u

reft )TR(ut � u

reft )

18

Step 4: Quadricize cost about trajectoryOur cost function is already quadratic, otherwise we would apply

Taylor expansion

c(xt, ut) = (xt � x

reft )TQ(xt � x

reft ) + (ut � u

reft )TR(ut � u

reft )

=

�xt

1

�T Q Q(xi

t � x

reft )

(xit � x

reft )TQ (xi

t � x

reft )T (xi

t � x

reft )

� �xt

1

�

+

�ut

1

�T R R(ui

t � ureft )

(uit � uref

t )TR (uit � uref

t )T (uit � uref

t )

� �ut

1

�

Qt

Rt

We have all the ingredients to call LQR!

19

Kt = �(Rt + BTt Vt+1Bt)

�1BTt Vt+1At

similarly calculate the value function …

20

Step 5: Do a backward pass

KT�1, VT�1

KT�2, VT�2

KT�3, VT�3

K0, V0

Calculate controller gains for all time steps

21

Step 6: Do a forward pass to get new trajectory

x

i+1t = f(xi+1

t , u

i+1t )

Compute control action

u

i+1t = u

it + Kt

x

i+1t � x

it

1

� Apply dynamics

21


x

i+1t = f(xi+1

t , u

i+1t )

x

i+10 = x

i0

ui+10 = ui

0 + Kt

01

�


u

i+1t = u

it + Kt

x

i+1t � x

it

1

� Apply dynamics

22



x

i+1t = f(xi+1

t , u

i+1t )

Apply dynamicsu

i+1t = u

it + Kt

x

i+1t � x

it

1

�

(xi+11 , u

i+11 )

(xi1, u

i1)

23



x

i+1t = f(xi+1

t , u

i+1t )

Apply dynamicsu

i+1t = u

it + Kt

x

i+1t � x

it

1

�

(xi2, u

i2)

(xi+12 , u

i+12 )

24



x

i+1t = f(xi+1

t , u

i+1t )

Apply dynamicsu

i+1t = u

it + Kt

x

i+1t � x

it

1

�

25

Problem: Forward pass will go bonkersWhy?

25


Linearization error gets bigger and bigger and bigger

25


Linearization error gets bigger and bigger and bigger

Remedies: Change cost function to penalize deviation from linearization

Questions

26

Questions

26

1. Can we solve LQR for continuous time dynamics?

Questions

26


Yes! Refer to Continuous Algebraic Ricatti Equations (CARE)

Questions

26



2. Can LQR handle arbitrary costs (not just tracking)?

Questions

26




Yes! Just quadricize the cost

Questions

26





3. What if I want to penalize control derivatives?

Questions

26






No problem! Add control as part of state space

Questions

26







4. Can we handle noisy dynamics?

Questions

26







4. Can we handle noisy dynamics?

Yes! Gaussian noise does not change the answer

Table of Controllers

27

Control Law Uses model Stability Guarantee

Minimize Cost

PID No No No

Pure Pursuit Circular arcs Yes - with assumptions No

Lyapunov Non-linear Yes No

LQR Linear Yes Quadratic

iLQR Non-linear Yes Yes

28

iLQR is just one technique

It’s far from perfect - can’t deal with model errors / constraints …

Model Predictive Control (MPC)

iLQR

DDP

Shooting

CMAESTrajLib

MPPI

Recap: Feedback control framework

29

Look at current state error and compute control actions

Reference


29


xt

x

reft

ut = ⇡(xt, xreft )

Reference


29


xt

x

reft


ut+1 = ⇡(xt+1, xreft+1)

x

reft+1

xt+1

Reference


29


Goal: To drive error to 0 … to optimally drive it to 0

xt

x

reft


ut+1 = ⇡(xt+1, xreft+1)

x

reft+1

xt+1

Reference

Limitations of this framework

30


A fixed control law that looks at instantaneous feedback

Fixed Reference

Why is it so difficult to create a magic control law?

Problem 1: What if we have constraints?

31

Simple scenario: Car tracking a straight line


31


Small error, control within

steering constraints


31




Large error, control violates



31




Large error, control violates


We could “clamp control command” … but what are the implications?

General problem: Complex models

32

xt+1 = f(xt, ut)

g(xt, ut) 0

Dynamics

Constraints

Such complex models imply we need to:

1. Predict the implications of control actions

2. Do corrections NOW that would affect the future

3. It may not be possible to find one law - might need to predict

Example: Rough terrain mobility

33

Problem 2: What if some errors are worse than others?

34

We need a cost function that penalizes states non-uniformly

35

Key Idea:

Frame control as an optimization problem

Model predictive control (MPC)

36

1. Plan a sequence of control actions

2. Predict the set of next states unto a horizon H

3. Evaluate the cost / constraint of the states and controls

4. Optimize the cost


37

xk+1 = f(xk, uk+1)

minut+1,...ut+H

t+H�1X

k=t

J(xk, uk+1)

g(xk, uk+1) 0

(Predict next state with dynamics)

(Constraints)

(Cost)(plan till horizon H)


38

xk+1 = f(xk, uk+1)

minut+1,...ut+H

t+H�1X

k=t

J(xk, uk+1)

g(xk, uk+1) 0

ut

ut+1

ut+2

ut+3ut+4

xt

xt+1

xt+2

xt+3

xt+4

Jt Jt+1

Jt+2Jt+3

Jt+4

How are the controls executed?

39

Step 1: Solve optimization problem to a horizon

xt

xt+1

xt+2

xt+3

xt+4


40

xt xt+1

Step 2: Execute the first control



41

xt xt+1

Step 3: Repeat!

xt+2

xt+3

xt+4

xt+5



MPC is a framework

42

1 1. Concepts 1.1 Main Idea

Model Predictive Control

P(s)%

Objectives Model Constraints

Plant Optimizer

Measurements

Output Input Reference

Objectives Model Constraints

Plan Do

Plan Do

Plan Do Time

Receding horizon strategy introduces feedback.

MPC Part I – Introduction F. Borrelli, C. Jones, M. Morari - Fall Semester 2014 (revised August 2014) 1-4

1 1. Concepts 1.2 Classical Control vs MPC

Table of Contents

1. Concepts1.1 Main Idea1.2 Classical Control vs MPC1.3 Mathematical Formulation

MPC Part I – Introduction F. Borrelli, C. Jones, M. Morari - Fall Semester 2014 (revised August 2014)

Step 3: Repeat!



Why do we need to replan?

43

What happens if the controls are planned once and executed?

Why do we need to replan?

44

What happens if the controls are planned once and executed?


45

xk+1 = f(xk, uk+1)

minut+1,...ut+H

t+H�1X

k=t

J(xk, uk+1)

g(xk, uk+1) 0

(Predict next state with dynamics)

(Constraints)

(Cost)(plan till horizon H)

Iterative LQR & Model Predictive Control1 Sanjiban Choudhury Iterative LQR & Model Predictive...

Documents

Transcript of Iterative LQR & Model Predictive Control1 Sanjiban Choudhury Iterative LQR & Model Predictive...