Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2)...
-
Upload
stewart-rich -
Category
Documents
-
view
219 -
download
0
Transcript of Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2)...
![Page 1: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/1.jpg)
Value Function Approximationon Non-linear Manifolds for Robot Motor Control
Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu Vijayakumar2)
1) Computer Science, Tokyo Institute of Technology2) School of Informatics, University of Edinburgh
![Page 2: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/2.jpg)
2Maze Problem: Guide Robot to Goal
Robot knows its position but doesn’t know which direction to go.
We don’t teach the best action to take at each position but give a reward at the goal.
Task: make the robot select the optimal action.
Up
RightLeft
Down
Possible actions
Position (x,y)
Goal
reward
![Page 3: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/3.jpg)
3Markov Decision Process (MDP)
An MDP consists of : set of states, : set of actions, : transition probability, : reward,
An action the robot takes at state is specified by policy .
Goal: make the robot learn optimal policy
RPAS ,,,SAPR
is right left, down, up,
)sΡ(s,a,
)(sa
),( asR
a s
![Page 4: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/4.jpg)
4Definition of Optimal PolicyAction-value function:
discounted sum of future rewards when taking in and following thereafter
Optimal value:
Optimal policy: is computed if is given.Question: How to compute ?
aassrEasQt
tt
000
,),(
),(maxarg),( asQasQ
),(maxarg),( asQasa
a s
Q
Q
![Page 5: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/5.jpg)
5Policy Iteration
Starting from some initial policy iterate Steps 1 and 2 until convergence.Step 1. Compute for current
Step 2. Update by
Policy iteration always converges to if in step 1 can be computed.
Question: How to compute ?
(Sutton & Barto, 1998)
),(maxarg)( asQsa
aassrEasQt
tt
000
,|),(
),( asQ
),( asQ
),( asQ
![Page 6: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/6.jpg)
6
can be recursively expressed by
can be computed by solving Bellman equation
Drawback: dimensionality of Bellman equation becomes huge in large state and action spaces
Bellman Equation
high computational cost
))'(,'()',,(),(),('
ssQsasPasRasQs
),( asQ
),( asQ
as ,
AS
![Page 7: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/7.jpg)
7Least-Squares Policy Iteration
Linear architecture:
is learned so as to optimally approximate Bellman equation in the least-squares sense
# of parameters is only :
LSPI works well if we choose appropriateQuestion: How to choose ?
(Lagoudakis and Parr, 2003)
: fixed basis functions : parameters: # of basis functions
),(),(ˆ1
aswasQ i
K
ii
),( asiiwK
KASK
Kii 1}{
Kii 1}{
Kiiw 1}{
![Page 8: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/8.jpg)
8Popular Choice: Gaussian Kernel (GK)
Smooth Gaussian tail goes over
partitions
: Euclidean distance: Centre state
2
2
2
),(exp)(
ssED
sk c
ED
cscs
cs
Partitions
![Page 9: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/9.jpg)
9Approximated Value Function by GK
Values around the partitions are not approximated well.
Approximated by GKOptimal value function
Log scale
20 randomly located Gaussians
![Page 10: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/10.jpg)
10Policy Obtained by GK
GK provides an undesired policy around the partition.
GK-based policyOptimal policy
![Page 11: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/11.jpg)
11Aim of This Research
Gaussian tails go over the partition.Not suited for approximating
discontinuous value functions.
We propose new Gaussian kernel to overcome this problem.
![Page 12: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/12.jpg)
12State Space as a GraphOrdinary Gaussian uses Euclidean distance.
Euclidean distance does not incorporate state space structure, so tail problems occur.
We represent state space structure by a graph, and use it for defining Gaussian kernels.
(Mahadevan, ICML 2005)
2
2
2
),(exp)(
ssED
sk c
![Page 13: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/13.jpg)
13
Natural distance on graph is shortest path.
We use shortest path in Gaussian function.
We call this kernel geodesic Gaussian.SP can be efficiently computed by Dijkstra.
Geodesic Gaussian Kernels
2
2
2
),(exp)(
ssSP
sk cEuclidean distance
Shortest path
![Page 14: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/14.jpg)
14Example of Kernels
Tails do not go across the partition.Values smoothly decrease along the maze.
Geodesic GaussianOrdinary Gaussiancs
cs cs
![Page 15: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/15.jpg)
15
Values near the partition are well approximated.Discontinuity across the partition is preserved.
Ordinary Gaussian
Optimal
Comparison of Value Functions
Geodesic Gaussian
![Page 16: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/16.jpg)
16Comparison of Policies
GGKs provide good policies near the partition.
Geodesic GaussianOrdinary Gaussian
![Page 17: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/17.jpg)
17
Average over 100 runs
Geodesic
Ordinary
Ordinary Gaussian: tail problemGeodesic Gaussian: no tail problem
Experimental Result
Number of kernels
Fra
ctio
n of
opt
imal
st
ates
![Page 18: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/18.jpg)
18Robot Arm Reaching
2-DOF robot arm State space
Joint 1
Joint 2
Endeffector
Object
Obstacle
Joint 1 (degree)
Join
t 2 (
degr
ee)
0
180
-1801000-100Reward:
+1 reach the object0 otherwise
Task: move the end effector to reach the object
![Page 19: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/19.jpg)
19Robot Arm Reaching
Successfully avoids the obstacle and can reach the object.
Moves directly towards the object without
avoiding the obstacle.
Ordinary Gaussian Geodesic Gaussian
![Page 20: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/20.jpg)
20Khepera Robot Navigation
Khepera has 8 IR sensors measuring the distance to obstacles.
Task: explore unknown maze without collision
Reward: +1 (forward)-2 (collision)0 (others)
Sensor value: 0 - 1030
![Page 21: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/21.jpg)
21State Space and Graph
Discretize 8D state space by self-organizing map.
Partitions
2D visualization
![Page 22: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/22.jpg)
22Khepera Robot Navigation
When facing obstacle, goes backward (and goes
forward again).
When facing obstacle, makes a turn
(and go forward).
Ordinary Gaussian Geodesic Gaussian
![Page 23: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/23.jpg)
23Experimental Results
Average over 30 runs
Geodesic outperforms ordinary Gaussian.
Geodesic
Ordinary
![Page 24: Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1)2) Christopher Towell 2) Sethu Vijayakumar.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f2f5503460f94c48e88/html5/thumbnails/24.jpg)
24Conclusion
Value function approximation:
good basis function neededOrdinary Gaussian kernel:
tail goes over discontinuitiesGeodesic Gaussian kernel:
smooth along the state spaceThrough the experiments, we showed
geodesic Gaussian is promising in high-dimensional continuous problems!