Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar...
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar...
![Page 1: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/1.jpg)
Presentation next week: cerebellum and supervised learning
Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors in arm movements.
Nature. 1998 392(6675):494-7.
![Page 2: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/2.jpg)
Motor learning: learning algorithms - network and distributed representations - supervised learning
- perceptrons and LMS- backpropagation- reinforcement learning
- unsupervised learning - Hebbian networks
![Page 3: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/3.jpg)
Motor learning
- supervised learning - knowledge of desired behavior is specified
x
y
i.e. for every input x, we know the corresponding desired output y
![Page 4: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/4.jpg)
Motor learning - supervised learning
e.g. learning mapping between joint configuration and end point
Vision gives you information about both values (or could use proprioception for joint angles)
![Page 5: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/5.jpg)
Motor learning - supervised learning
- limited feedback from the periphery - just get a ‘good’ or ‘bad’ evaluation - have to adjust behavior to maximize ‘good’ evaluation
=> reinforcement learning
e.g. maze learning
Sequence of actions leads to a reward - how do we learn the appropriate sequence?
![Page 6: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/6.jpg)
Motor learning - unsupervised learning
- no feedback from the periphery - rely on statistics of the inputs (or outputs) to find structure in the data
e.g. clustering of data x2
x1
![Page 7: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/7.jpg)
Motor learning - unsupervised learning
- no feedback from the periphery - rely on statistics of the inputs (or outputs) to find structure in the data
e.g. clustering of data x2
x1
Develop representations based on properties of the data
![Page 8: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/8.jpg)
Motor learning
- supervised motor learning - parameterized models - non-parametric, ‘neural network’ models - reinforcement learning
- unsupervised learning - Hebbian learning - principle components analysis - independent components analysis
![Page 9: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/9.jpg)
Supervised motor learning - learning parameterized models
linear regression
We know the general structureof the model:
y = a*x + b
but we don’t know parameters a or b.
x
y
-10 -5 0 5 10-80
-60
-40
-20
0
20
40
60
80
We want to estimate a and b based on paired data sets {xi} and {yi}
![Page 10: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/10.jpg)
-10 -5 0 5 10-80
-60
-40
-20
0
20
40
60
80
Parameterized models
Linear regression
y = a*x + b
analytical solution (Intro stats):
b = xiyi/xixi
a = <y> - b<x>
<x> is the expected value of x, i.e. the meanx
y
This is from Intro stats – single step of calculation across all data
![Page 11: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/11.jpg)
Linear regression using iterative gradient descent
y* = a* x + b*; a*, b* the correct parameters, y* is the observed data
assume initial parameters a and b, and define an error term:
E = 1/2(y – y*)2; y is the value predicted by the current parameters y is the target value
we want to find parameters which minimize this error - move the parameters to reduce the errors a = a + da; da is the change in a to reduce the error b = b + db; db is the change in b to reduce the error
choose da, db along the gradient of the error
Parameterized models
![Page 12: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/12.jpg)
y* = a* x + b* E = 1/2(y – y*)2
find the gradient of the error wrt to the parameters:
dE/da = -(y – y*)dy*/da; = -(y – y*)x;
dE/db = -(y – y*);
choose a = a - (y – y*)x; b = b - (y – y*);
with 0< < 1 to control speed of learning
Parameterized models
![Page 13: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/13.jpg)
x
y
e.g. iterative gradient descent for linear regression
Parameterized models
![Page 14: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/14.jpg)
learn limb parameters for 2dof:
x = l1*cos(1) + l2*cos(1+2)y = l1*sin(1) - l2*sin(1+2)
x
y
Parameterized models
![Page 15: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/15.jpg)
Motor learning and representations - how are properties of the limb represented by the CNS?
Distributed representations - parameters are not explicitly fit - both the parameters and the model structure are identified
angle
end
posi
tion
Learn parameters and model within a distributed network
![Page 16: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/16.jpg)
Distributed models - network architecture
1
2
…1
2
x yW
y1 = w11*x1 + w21*x2
y2 = w12*x1 + w22*x2
=> y = Wx
inputs outputs
as shown here, this is just linear regression
1
2
1
2
w11
w22
w12
w21
![Page 17: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/17.jpg)
Distributed network models
1
2
1
x yW
inputs outputs
simple network one layer linear units
3
y = Wx
2
from inputs x and corresponding outputs y*, find W that best approximates the function
![Page 18: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/18.jpg)
Distributed network models
1
2
1
x yW
inputs outputs
simple network one layer linear units
3
2
To fit the network parameters:
define error:
E = ½(y – y*)2
take derivitive wrt weights: dE/dW = -(y - y*)xT
update weights: W = W - u (y - y*)xT
or changing weight by weight: wij = wij - u (yj - yj*)xi
T
i.e. similar to the rule for linear regression
this is Widrow-Hoff/ adaline/ LMS rule - least mean squares rule
![Page 19: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/19.jpg)
Distributed network models - linear units, single layer networks
1
21
x yW
inputs outputs
32
batch mode: learn from all the data at once
W = W + udE/dW
online mode: learn from each data point at a time
wi = wi + udEi/dwi; for {xi,yi}, the ith data point
![Page 20: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/20.jpg)
Distributed network models
1
21
x yW
inputs outputs
32
linear units, single layer networks
- essentially linear regression
- gradient descent learning rule leads to LMS update rule to change weights iteratively
![Page 21: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/21.jpg)
Distributed network models
- more complicated computations- classification: learn to assign data points to correct category
x2
x1
![Page 22: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/22.jpg)
Distributed network models
- more complicated computations- classification: learn to assign data points to correct category
x2
x1
y = 1
y = -1
We want to classify the inputs (x) to outputs of either y = {-1,1} i.e. categorize the data
![Page 23: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/23.jpg)
Distributed network models
- more complicated computations- classification: learn to assign data points to correct category
x2
x1
w
y = w*x > 0
y = w*x < 0
The weight vector acts to project the inputs to produce the outputs - if we take y = sign(w*x), we’re can do classification
![Page 24: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/24.jpg)
Distributed network models - categorization (non-linear transformation)
Learning in nonlinear networks - outputs are non-linear function of their inputs:
sigmoidal ‘squashing’ function g(Wx) = 1/(1 - exp(Wx))
1
2
1
x yW
patterns category (0,1)
3
-10 -5 0 5 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Wx
g(W
x)
works like a ‘bistable’ categorization unit can also use g(x) = sign(x) (Perceptrons)
![Page 25: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/25.jpg)
Distributed network models - categorization (non-linear transformation)
1
2
1
x yW
patterns category (0,1)
3
Learning in nonlinear networks
y = g(Wx) = 1/(1 - exp(Wx))
Find the gradient:
E = ½(y – y*)2
dE/dw = -(y - y*) g’(Wx) x
note that: g’(z) = g(z)(1 – g(z))
this is the basic neural network learning rule
![Page 26: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/26.jpg)
Distributed network models
non-linear units, single layer networks
- ‘logistic’, non-linear regression
- allows learning of categorization problems
1
21
x yW
patterns category (0,1)
3
![Page 27: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/27.jpg)
Distributed network models - single layer, classification networks
x2
x1
x1 x2 y0 0 00 1 01 0 01 1 1
find a network to perform logical AND function
![Page 28: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/28.jpg)
Distributed network models - single layer, classification networks
x2
x1
x1 x2 y0 0 00 1 01 0 01 1 1
find a network to perform AND logical function
![Page 29: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/29.jpg)
Distributed network models - single layer, classification networks
logical AND
x2
x1
- choose W = [10 10]
x1 x2
y
1 1
- need an offset to the inputs to shift the origin
x
W
y
-.6 -.6
x1 x2 Wx threshold(y) 0 0 -1.2 00 1 -.2 01 0 -.2 01 1 .8 1
![Page 30: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/30.jpg)
Distributed network models - single layer, classification networks
x2
x1
x1 x2 y0 0 00 1 11 0 11 1 0
find a network to perform logical XOR function
What weights will make this work? there are none single layer networks are computationally limited
![Page 31: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/31.jpg)
Distributed network models - multiple layer networks
x1 x2
y
1 -2
x
y
h1 h2 -.5 -1.5
11 1
1 x1 x2 h1 h2 Wh y
0 0 0 0 0 0 0 1 1 0 1 11 0 1 0 1 11 1 1 1 -1 0
-.5
- more complicated computations can be performed with multiple layer networks - can characterize problems which are not linearly separable
XOR can be solved with multi-layered network
![Page 32: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/32.jpg)
Distributed network models - learning in multiple layer networks
1
2
1
2
1
2
x hW
inputs outputs
yV
Consider a linear network:
h = Wx y = Vh
NB: there’s not much point to multiple layers with linear units since it can all be reexpressed as a single linear network:
y = VWx = W’x; i.e. just redefine your weight matrix
![Page 33: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/33.jpg)
Distributed network models - learning in multiple layer networks
y
1
2
1
2
1
2
x hW
inputs outputs
V
linear network: h = Wx y = Vh
Form the error: E = ½(y – y*)2
To update the weights V, from h to y:
dE/dV = (y - y*) dy/dV = (y – y*) h
i.e. the same rule as for the single layer network
![Page 34: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/34.jpg)
Distributed network models - learning in multiple layer networks
y
1
2
1
2
1
2
x hW
inputs outputs
V
linear network: h = Wx y = Vh
To update the weights W, from x to h use chain rule:
dE/dW = (y - y*) dy/dW; = (y - y*) dy/dh dh/dW; = (y – y*) V x - this is the gradient for the ‘hidden’ layer
![Page 35: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/35.jpg)
Distributed network models - learning in multiple layer networks
y
1
2
1
2
1
2
x hW
inputs outputs
V
non-linear network:
h = g(Wx) y = g(Vh)
Updating weights V is same as before:
dE/dV = (y – y*) g’(Vh) h
![Page 36: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/36.jpg)
Distributed network models - learning in multiple layer networks
y
1
2
1
2
1
2
x hW
inputs outputs
V
To update the weights W use chain rule:
dE/dW = (y - y*) dy/dW; = (y - y*) dy/dh dh/dW; = (y – y*) g’(Vh) V g’(Wx) x
Essentially, we’re propagating the error backwards through the network, changing weights according to how much they affect the output
=> Backpropagation learning
![Page 37: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/37.jpg)
1
2
1
2
1
2
x hW
inputs outputs
yV
Distributed network models - backpropagation learning in multiple layer networks
1. Find out how much of the error in output is due to V- the responsibility will be due to the activity of h: dE/dV = (y-y*)h
- change V according to this responsibility
2. Find out how much of the error is due to W - units in h which have a large output weight V will be more responsible for the error (i.e. weight error by V): (y-y*) V - values in h will be due to activities in x (i.e. weight h responsibility by x): dE/dW = (y-y*) V x - change W according to this ‘accumulated’ responsibility
linear network: h = Wx y = Vh
![Page 38: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/38.jpg)
Learning in multi-layer neural networks - backpropagation learning
- allows for simple learning of arbitrarily complex input/output mappings
- with enough ‘neurons’, most any mapping is possible
- results in ‘distributed’ representations
- knowledge of the mapping is distributed across neuronal populations not individual cells
- changes in restricted regions of the input state space will result in restricted changes of the output
![Page 39: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/39.jpg)
Learning in multi-layer neural networks - backpropagation learning
- much slower than paramaterized models - network needs to estimate the parameters and model structure from scratch - convergence can be slow, especially if the error surface is shallow
- speed can be increased by altering the learning rate (annealing) or by using conjugate gradient descent - or with ‘momentum’ W = W – udE/dW – n< change in W last time>
error
parameters
![Page 40: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/40.jpg)
Learning in multi-layer neural networks - backpropagation learning
- local minima - error surface might have small ‘basins’ which can trap the network
error
parameters
global mininumlocal mininum
Start the network in different initial conditions to find the global mininum
![Page 41: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/41.jpg)
Learning in multi-layer neural networks - backpropagation learning
- Choosing the learning rate - small values for u can take long time for network to converge - large values can lead to instability
+
learning rate too high learning rate ok
![Page 42: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/42.jpg)
Motor learning: learning algorithms - gradient descent - change model parameters to reduce error in prediction
- parameterized models - non-parametric models
- single layer, linear and non-linear networks- LMS/adaline learning rules
- multi layer, non-linear networks- back propagation learning
- in all of the above, we knew the correct answer and tried to match it - i.e. ‘supervised learning’
- But what if our knowledge of outcome is limited?
=> reinforcement learning
![Page 43: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/43.jpg)
Reinforcement learning - supervised learning, but with limited feedback
1
2
1
inputs outputs
3
2
environment
evaluation: {good, bad}
The environment sends back a global signal saying good or bad (1 or -1) depending on system performance e.g. move the limb and bump into things (pain as a reinforcer)
![Page 44: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/44.jpg)
Reinforcement learning - supervised learning, but with limited feedback
Using a global reinforcement signal to train a network Basic idea - start with initial network
- produce an output based on a given inpuyt - but add noise to the network to explore
- evaluate the output
- find those units with large activity
- change weights so that they’ll be large the next time the input is given
![Page 45: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/45.jpg)
Reinforcement learning - supervised learning, but with limited feedback
Using a global reinforcement signal to train a network Associative reward-penalty algorithm (AR-P)
1
2
1
3
2
x yW
Consider probabilistic outputs: p(y) = 1/(1+exp(-Wx))
The output produced on any given trial is therefore stochastic, with expected value determined by the sigmoid: <y> = tanh(-Wx)
We then use gradient descent to get the update rule:
dW = { u+ (y - <y>) W, if r is rewardu- (-y - <y>) W, if r is penalty
![Page 46: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/46.jpg)
Reinforcement learning - supervised learning, but with limited feedback
Using a global reinforcement signal to train a network Associative reward-penalty algorithm (AR-P)
1
2
1
3
2
x yW
dW = { u+ (y - <y>) W, if r is reward u- (-y - <y>) W, if r is penalty
1) if expected value is close to what it actually did, then don’t change things (nothing new)
2) if expected value is different from what it did, and it was rewarded, then change W so that it will do it again
3) if expected value is different from what it did, and it was penalized, then change W so that it won’t do it again
=> trial and error learning
![Page 47: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/47.jpg)
Reinforcement learning - supervised learning, but with limited feedback
Using a global reinforcement signal to train a network
Much slower than gradient descent
More biologically plausible - how is error backpropagated in supervised learning?
- more directly ethologically plausible- based on direct reward/penalty feedback- information about survival
![Page 48: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/48.jpg)
Motor learning: learning algorithms - gradient descent - change model parameters to reduce error in prediction
- parameterized models - non-parametric models
- single layer, linear and non-linear networks- LMS/adaline learning rules
- multi layer, non-linear networks- back propagation learning
- reinforcement learning - AR-P networks - Q learning, TD learning, dynamic programming
- unsupervised learning
![Page 49: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/49.jpg)
![Page 50: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/50.jpg)
![Page 51: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/51.jpg)
![Page 52: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/52.jpg)
![Page 53: Presentation next week: cerebellum and supervised learning Kitazawa S, Kimura T, Yin PB. Cerebellar complex spikes encode both destinations and errors.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d375503460f94a107d5/html5/thumbnails/53.jpg)