Goal-Directed Feature Learning Cornelius Weber and Jochen Triesch
On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS
description
Transcript of On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS
![Page 1: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/1.jpg)
On Linking Reinforcement Learningwith Unsupervised Learning
Cornelius Weber, FIAS
presented at Honda HRI, Offenbach, 17th March 2009
![Page 2: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/2.jpg)
for taking action, we need only the relevant features
x
y
z
![Page 3: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/3.jpg)
unsupervisedlearningin cortex
reinforcementlearning
in basal ganglia
state spaceactor
Doya, 1999
![Page 4: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/4.jpg)
actor
state space
1-layer RL model of BG ...
go left?
go right?... is too simple to handle complex input
![Page 5: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/5.jpg)
complex input(cortex)
need another layer(s) to pre-process complex data
feature detection
action selection
actor
state space
![Page 6: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/6.jpg)
models’ background:
- gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995)
- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)
- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)
- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...
- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)
![Page 7: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/7.jpg)
sensory input
reward
action
scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’;
reward given if horizontal bar at specific position
![Page 8: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/8.jpg)
model that learns the relevant features
top layer: SARSA RL
lower layer: winner-take-all feature learning
both layers: modulate learning by δ
RL weights
featureweights
input
action
![Page 9: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/9.jpg)
SARSA with WTA input layer
![Page 10: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/10.jpg)
note: non-negativity constraint on weights
Energy function: estimation error of state-action value
identities used:
![Page 11: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/11.jpg)
RL action weights
feature weights
data
learning the ‘short bars’ data
reward
action
![Page 12: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/12.jpg)
short bars in 12x12 average # of steps to goal: 11
![Page 13: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/13.jpg)
RL action weights
feature weights
input reward 2 actions (not shown)
data
learning ‘long bars’ data
![Page 14: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/14.jpg)
WTAnon-negative
weights
SoftMaxnon-negative
weights
SoftMaxno weight
constraints
![Page 15: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/15.jpg)
Discussion
- simple model: SARSA on winner-take-all network with δ-feedback
- learns only the features that are relevant for action strategy
- theory behind: derivation of value function estimation (approx.)
- non-negative coding aids feature extraction
- link between unsupervised- and reinforcement learning
- demonstration with more realistic data needed
Bernstein FocusNeurotechnology,BMBF grant 01GQ0840
EU project 231722“IM-CLeVeR”,call FP7-ICT-2007-3
Frankfurt Institutefor Advanced Studies,FIAS
Sponsors
![Page 16: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/16.jpg)
![Page 17: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/17.jpg)
![Page 18: On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS](https://reader036.fdocuments.us/reader036/viewer/2022062807/56815170550346895dbfa2ed/html5/thumbnails/18.jpg)
Bernstein FocusNeurotechnology,BMBF grant 01GQ0840
EU project 231722“IM-CLeVeR”,call FP7-ICT-2007-3
Frankfurt Institutefor Advanced Studies,FIAS
Sponsors
thank you ...