50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost: resolution dx...

79
50 th Anniversary of The Curse of Dimensionality • Continuous States: Storage cost: resolution dx Computational cost: resolution dx • Continuous Actions: Computational cost: resolution du

Transcript of 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost: resolution dx...

Page 1: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

50th Anniversary of The Curse of Dimensionality

• Continuous States:

Storage cost:

resolutiondx

Computational cost:

resolutiondx

• Continuous Actions:

Computational cost:

resolutiondu

Page 2: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Beating The Curse Of Dimensionality

• Reduce dimensionality (biped examples)• Use primitives (Poincare section)• Parameterize V, policy (future lecture)• Reduce volume of state space explored• Use greater depth search• Adaptive/Problem-specific grid/sampling

– Split where needed– Random sampling – add where needed

• Random action search• Random state search• Hybrid Approaches: combine local and global opt.

Page 3: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Use Brute Force

• Deal with computational cost by using cluster supercomputer.

• Main issue is minimizing communication between nodes.

Page 4: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Cluster Supercomputing

• (8) Cores w/ small local memory (cache)

• (100) Nodes w/ shared memory (16GB)

• (4-16Gb/s) Network

• (100T) Disks

Page 5: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Q(x,u) = L(x,u) + V(f(x,u))

• c = L(x,u): as in desktop case

• x_next = f(x,u): as in desktop case

• V(x_next)– Uniform grid– Multilinear interpolation if all values available,

distance weighted averaging if bad values

Page 6: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Allocate grid to cores/nodes

Page 7: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Handle Overlap

Page 8: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Push Updated V’s To Users

Page 9: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

So what does this all mean for programming?

• On a node, split grid cells among threads, which execute on cores.

• Share updates of V(x) and u(x) within node almost for free using shared memory.

• Pushing updated V(x) and u(x) to other nodes uses the network which is relatively slow…..

Page 10: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Dealing with the slow network

• Organize grid cells into packet-sized blocks. Send them as a unit.

• Threshold updates: too small, don’t send it.

• Only do 1/N updates for each block (maximum skip time).

• Tolerate packet loss (UDP) vs. verification (TCP/MPI)

Page 11: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Use Adaptive Grid

• Reduce computational and storage costs by using adaptive grid.

• Generate adaptive grid using random sampling.

Page 12: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Trajectory-BasedDynamicProgramming

Page 13: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Full Trajectories Helps Reduce Resolution Needed

SIDP Trajectory Based

Page 14: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Reducing the Volume Explored

Page 15: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

An Adaptive Grid Approach

Page 16: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 17: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 18: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 19: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 20: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Global PlanningPropagate Value Function Across

Trajectoriesin Adaptive Grid

Page 21: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Growing theExploredRegion:Adaptive Grids

Page 22: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Bidirectional Search

Page 23: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

BidirectionalSearchCloseup

Page 24: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Spine Representation

Page 25: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Growing theExploredRegion:SpineRepresentation

Page 26: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Comparison

Page 27: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 28: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

One Link Swing UpNeeded Only

63 Points

Page 29: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Trajectories ForEach Point

Page 30: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Random Sampling of States• Initialize with a point at the goal with local models

based on LQR.• Choose a random new state x.• Use the nearest stored point’s local model of the value

function to predict the value of the new point (VP).• Optimize a trajectory from x to the goal. At each step

use the nearest stored point’s local model of the policy to create an action. Use DDP to refine this trajectory. VT is cost of trajectory starting from x.

• Store point at start of trajectory if |VT - VP |> λ

(surprise), VT < Vlimit and VP < Vlimit, otherwise discard.• Interleave re-optimization of all stored points. Only

update if Vnew < V (V is upper bound on value).• Gradually increase Vlimit.

Page 31: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Two Link Pendulum

• Criterion:

Page 32: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 33: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

AnkleAngle Hip

Angle

AnkleTorque

HipTorque

Page 34: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 35: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 36: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Four Links

Page 37: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Four Links: 8 dimensional system

Page 38: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 39: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Convergence?• Because we create trajectories to the goal,

each value function estimate at a point is an upper bound for the value at that point.

• Eventually all value function entries will be consistent with their nearest neighbor’s local model, and no new points can be added.

• We are using more aggressive acceptance tests for new points: VB < λVP, λ < 1, and VP < Vlimit vs. |VB – VP| < ε and VB < Vlimit

• Not clear if needed new points can be blocked.

Page 40: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Use Local Models

• Try to achieve a sparse representation using local models.

Page 41: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Linear Quadratic Regulators

Page 42: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Learning From Observation

Page 43: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Regulator tasks• Examples: balance a pole, move at a constant

velocity

• A reasonable starting point is a Linear Quadratic Regulator (LQR controller)

• Might have nonlinear dynamics xk+1 = f(xk,uk), but since stay around xd, can locally linearize xk+1 = Axk + Buk

• Might have complex scoring function c(x,u), but can locally approximate with a quadratic model c xTQx + uTRu

• dlqr() in matlab

Page 44: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Linearization Example

• Iθdd = -mgl sin(θ) – μθd + τ

• Linearize

• Discretize time

• Vectorize

• (θ θd) k+1

T = (1 T; -mglT/I 1-μT/I) (θ θd) kT

+ (0 T/I)T τk

Page 45: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

LQR Derivation• Assume V() quadratic: Vk+1(x) = xTVxx:k+1x

• C(x,u) = xTQx + uTRu + (Ax+Bu)TVxx:k+1 (Ax+Bu)

• Want C/u = 0

• BTVxx:k+1Ax = -(BTVxx:k+1B + R)u

• u = Kx (linear controller)

• K = - (BTVxx:k+1B + R)-1BTVxx:k+1A

• Vxx:k= ATVxx:k+1A + Q + ATVxx:k+1BK

Page 46: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Trajectory Optimization (closed loop)

• Differential Dynamic Programming (local approach to DP).

Page 47: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Learning Trajectories

Page 48: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Q function

• x: state, u: control or action

• Dynamics: xk+1 = f(xk, uk)

• Cost function: L(x,u)

• Value function V(x) = ∑L(x,u)

• Q function Q(x,u) = L(x,u) + V(f(x,u))

• Bellman’s Equation V(x) = minu Q(x,u)

• Policy/control law: u(x) = argminu Q(x,u)

Page 49: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Local Models About

Page 50: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Propagating Local Models Along a Trajectory:

Differential Dynamic ProgrammingGradient version

• Vx:k-1 = Qx = Lx + Vxfx

• Δu = Qu = Lu + Vxfu

Page 51: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Differential Dynamic Programming (DDP)

[McReynolds 70, Jacobson 70]

t

Initial

Terminal

Value function(update)Execution

Q(T-2)u’(T-2)V(T-2)

Q(T-1):Action value functionu’(T-1):New control outputV(T-1):State value function

Improved trajectory

Nominal trajectory

V(T)

Require:•Dynamics model•Penalty function tr

Page 52: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Propagating Local Models Along a Trajectory:

Differential Dynamic Programming

Page 53: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Levenberg Marquardt• y = f(x)

• minx (s = yTy/2)

• gradient ∂s/∂x = (∂f/∂x)Ty = JTy

• Hessian ∂2s/∂2x = H = (∂2f/∂2x) y + JTJ

• 2nd order gradient descent Δx = H-1JTy

• Problem: H not positive definite

• Solution: Δx = (H + λI)-1JTy

• λ small: 2nd order approach

• λ large: 1st order approach, Δx = JTy/λ

• Trick 2: H ≈ JTJ

Page 54: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Levenberg Marquardt-like DDP

• Δu = (Quu + λI)-1Qu

• K = (Quu + λI)-1Qux

• Drop fxx, fxu, fux, and fuu terms

Page 55: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Other tricks

• If Δu fails, try ε Δu

• Just optimize last part of trajectory.

• Regularize Qxx

Page 56: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Neighboring Optimal Control

Page 57: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

What Changes When Task Periodic?

• Discount factor means V() might increase along trajectory. V() cannot always decrease in periodic tasks.

Page 58: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:
Page 59: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Robot Hopper Example

Page 60: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Dimensionality Reduction

• Use of simple models (for example LIPM)

• Poincaré section

Page 61: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Inverted Pendulum Model• Massless legs

• State: pitch angular velocity at TOP

• Controls: ankle torque, step length

ø

Page 62: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Optimization Criterion

• T is step duration; Ta is ankle torque; ø is leg swing angle; Vd is desired velocity.

• Ankle torque: ∑(Ta2)

• Swing leg acceleration: (ø/T2)2

• Match desired velocity: (2sin(ø/2)/T – Vd)2

• Criterion is weighted sum of above terms.

Page 63: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Top

Transition

Poincaré section

Poincaré Section

Page 64: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Optimal Controller For Sagittal Plane Only (Vd=1)

FootPlacementPolicy

AnkleTorquePolicy

ValueFunction

ReturnMap

VELOCITY VELOCITY

Page 65: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Return Map (Vd=1)

Page 66: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Optimal Controller For Sagittal Plane Only (Vd=1)

FootPlacementPolicy

AnkleTorquePolicy

ValueFunction

ReturnMap

VELOCITY VELOCITY

Page 67: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Foot Placement Policies

Page 68: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Ankle Torque Policies

Page 69: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Return Maps

Page 70: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

ø

ø

h

Add Torso

Page 71: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Optimization Criteria

• Ankle torque: ∑(Ta2)

• Swing leg acceleration: (ø/T2)2

• Match desired velocity: (2sin(ø/2)/T – Vd)2

• Desired torso angle: ∑(ψd)2

Page 72: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Simulation

Page 73: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Simulation

Page 74: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Commands

Page 75: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Torso

Page 76: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

What difference does a torso make?

Page 77: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

3D version: Add roll

• State: pitch velocity, roll, roll velocity at TOP

• Action 1: Sagittal foot placement

• Action 2: Sagittal ankle torque

• Action 3: Lateral foot placement

Page 78: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Roll Optimization Criteria

• T = step duration.

• Ankle torque: torque2

• Swing leg acceleration: (ø/T2)2

• Match desired velocity: (2sin(ø/2)/T – Vd)2

• Roll leg acceleration: (øroll/T2)2

Page 79: 50 th Anniversary of The Curse of Dimensionality Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions:

Lateral foot placement at fixed roll