Learning to Control an Octopus Arm with Gaussian Process...

16
Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Collaborators: Peter Szabo and Dmitry Volkinshtein

Transcript of Learning to Control an Octopus Arm with Gaussian Process...

Page 1: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Learning to Control an Octopus Arm

with Gaussian Process Temporal

Difference Methods

Yaakov Engel

Collaborators: Peter Szabo and Dmitry Volkinshtein

Page 2: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Bayesian RL Tutorial 2/16

Page 3: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

The Octopus Arm

Can bend and twist at any point

Can do this in any direction

Can be elongated and shortened

Can change cross section

Can grab using any part of the arm

Virtually infinitely many DOF

Bayesian RL Tutorial 3/16

Page 4: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Octopus Arm Anatomy 101

Bayesian RL Tutorial 4/16

Page 5: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Our Arm Model

CN

� �� �� �� �� �� �� �� �� �� �

� �� �� �� �� �� �� �� �

C1

� �� ��

����

����

��

���� �� !

"#$%&'

()*+

, ,, ,, ,- -- -- -. .. .. .. ./ // // // / 0 00 00 01 11 11 1

2 22 22 23 33 33 3

ventral side

dorsal side

pair #1

pair #N+1

longitudinal muscle

longitudinal muscle

transverse muscle

transverse muscle

arm tip

arm base

Bayesian RL Tutorial 5/16

Page 6: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

The Muscle Model

� � � �� � � �� � � �� � � �� � � �

� � � �� � � �� � � �� � � �� � � �

f(a) = (k0 + a(kmax − k0)) (` − `0) + βd`

dt

a ∈ [0, 1]

Bayesian RL Tutorial 6/16

Page 7: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Other Forces

• Gravity

• Buoyancy

• Water drag

• Internal pressures (maintain constant compartmental volume)

Dimensionality

10 compartments ⇒

22 point masses × (x, y, x, y)

= 88 state variables

Bayesian RL Tutorial 7/16

Page 8: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

The Control Problem

Starting from a random position, bring {any part, tip} of arm

into contact with a goal region, optimally.

Optimality criteria:

Time, energy, obstacle avoidance

Constraint:

We only have access to sampled trajectories

Our approach:

Define problem as a MDP

Apply Reinforcement Learning algorithms

Bayesian RL Tutorial 8/16

Page 9: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

The Task

−0.1 −0.05 0 0.05 0.1 0.15

−0.1

−0.05

0

0.05

0.1

0.15

t = 1.38

Bayesian RL Tutorial 9/16

Page 10: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Actions

Each action specifies a set of fixed activations –

one for each muscle in the arm.

Action # 1 Action # 2 Action # 3

Action # 4 Action # 5 Action # 6

Base rotation adds duplicates of actions 1,2,4 and 5 with

positive and negative torques applied to the base.

Bayesian RL Tutorial 10/16

Page 11: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Rewards

Deterministic rewards:

+10 for a goal state,

Large negative value for obstacle hitting,

-1 otherwise.

Energy economy:

A constant multiple of the energy expended by the muscles in

each action interval was deducted from the reward.

Bayesian RL Tutorial 11/16

Page 12: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Now, to the Movies...

Bayesian RL Tutorial 12/16

Page 13: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Fixed Base Task I

Bayesian RL Tutorial 13/16

Page 14: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Fixed Base Task II

Bayesian RL Tutorial 14/16

Page 15: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Rotating Base Task I

Bayesian RL Tutorial 15/16

Page 16: Learning to Control an Octopus Arm with Gaussian Process ...ppoupart/ICML-07-tutorial-slides/icml07-brl... · The Octopus Arm Can bend and twist at any point Can do this in any direction

Rotating Base Task II

Bayesian RL Tutorial 16/16