Sigmoidal Neurons

Upload
kylangaines 
Category
Documents

view
69 
download
3
Embed Size (px)
description
Transcript of Sigmoidal Neurons
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
1
Sigmoidal NeuronsSigmoidal Neurons
In backpropagation networks, we typically choose In backpropagation networks, we typically choose = 1 and = 1 and = 0. = 0.
11
00
11
ffii(net(netii(t))(t))
netnetii(t)(t)11
= = 11
= = 0.10.1
/))(net(1
1))(net(
tii ietf
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
2
Sigmoidal NeuronsSigmoidal Neurons
This leads to a simplified form of the sigmoid function: This leads to a simplified form of the sigmoid function:
We do not need a modifiable threshold We do not need a modifiable threshold , because , because we will use “dummy” inputs as we did for perceptrons. we will use “dummy” inputs as we did for perceptrons.
The choice = 1 works well in most situations and The choice = 1 works well in most situations and results in a very simple derivative of S(net).results in a very simple derivative of S(net).
)(1
1)(
netenetS
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
3
Sigmoidal NeuronsSigmoidal Neurons
This result will be very useful when we develop the This result will be very useful when we develop the backpropagation algorithm.backpropagation algorithm.
xexS
1
1)(
2)1(
)()('
x
x
e
e
dx
xdSxS
22 )1(
1
1
1
)1(
11xxx
x
eee
e
))(1)(( xSxS
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
4
Backpropagation LearningBackpropagation Learning
Similar to the Adaline, the goal of the Similar to the Adaline, the goal of the Backpropagation learning algorithm is to modify the Backpropagation learning algorithm is to modify the network’s weights so that its output vectornetwork’s weights so that its output vector
oopp = (o = (op,1p,1, o, op,2p,2, …, o, …, op,Kp,K))
is as close as possible to the desired output vector is as close as possible to the desired output vector
ddpp = (d = (dp,1p,1, d, dp,2p,2, …, d, …, dp,Kp,K))
for K output neurons and input patterns p = 1, …, P.for K output neurons and input patterns p = 1, …, P.
The set of inputoutput pairs (exemplars) The set of inputoutput pairs (exemplars) {({(xxpp, , ddpp)  p = 1, …, P} constitutes the training set.)  p = 1, …, P} constitutes the training set.
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
5
Backpropagation LearningBackpropagation LearningAlso similar to the Adaline, we need a cumulative Also similar to the Adaline, we need a cumulative error function that is to be minimized:error function that is to be minimized:
P
pppErr
1
),(Error do
We can choose the mean square error (MSE) once We can choose the mean square error (MSE) once again (but the 1/P factor does not matter):again (but the 1/P factor does not matter):
P
p
K
jjplP 1 1
2, )(
1MSE
wherewhere
jpjpjp dol ,,,
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
6
TerminologyTerminologyExample: Example: Network function f: Network function f: RR3 3 RR22
output layeroutput layer
hidden layerhidden layer
input layerinput layer
input vectorinput vector
output vectoroutput vector
xx11 xx22
oo22oo11
xx33
)1,2(1,1w
)1,2(4,2w
)0,1(1,1w
)0,1(3,4w
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
7
Backpropagation LearningBackpropagation LearningFor input pattern p, the ith input layer node holds xFor input pattern p, the ith input layer node holds xp,ip,i. .
Net input to jth node in hidden layer:Net input to jth node in hidden layer:
n
iipijj xwnet
0,
)0,1(,
)1(
K
kkpkp
K
kkpp odlE
1
2,,
1
2, )()(Network error for p: Network error for p:
jjpjkkp xwSo )1(
,)1,2(
,,Output of kth node in output layer: Output of kth node in output layer:
j
jpjkk xwnet )1(,
)1,2(,
)2(Net input to kth node in output layer: Net input to kth node in output layer:
n
iipijjp xwSx
0,
)0,1(,
)1(,Output of jth node in hidden layer: Output of jth node in hidden layer:
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
8
Backpropagation LearningBackpropagation LearningAs E is a function of the network weights, we can use As E is a function of the network weights, we can use gradient descent to find those weights that result in gradient descent to find those weights that result in minimal error.minimal error.
For individual weights in the hidden and output layers, we For individual weights in the hidden and output layers, we should move against the error gradient (omitting index p):should move against the error gradient (omitting index p):
)1,2(,
)1,2(,
jkjk w
Ew
)0,1(,
)0,1(,
ijij w
Ew
Output layer: Output layer: Derivative easy to calculateDerivative easy to calculate
Hidden layer: Hidden layer: Derivative difficult to calculateDerivative difficult to calculate
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
9
Backpropagation LearningBackpropagation LearningWhen computing the derivative with regard to wWhen computing the derivative with regard to wk,jk,j
(2,1)(2,1), we , we can disregard any output units except ocan disregard any output units except okk::
ki
kki odlE 22 )(
)(2 kkk
odo
E
Remember that oRemember that okk is obtained by applying the sigmoid is obtained by applying the sigmoid function S to netfunction S to netkk
(2)(2), which is computed by: , which is computed by:
j
jjkk xwnet )1()1,2(,
)2(
Therefore, we need to apply the chain rule twice. Therefore, we need to apply the chain rule twice.
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
10
Backpropagation LearningBackpropagation Learning
)1.2(,
)2(
)2()1.2(, jk
k
k
k
kjk w
net
net
o
o
E
w
E
Since Since j
jjkk xwnet )1()1,2(,
)2(
)1()1.2(
,
)2(
jjk
k xw
net
We have: We have:
We know that:We know that: )2()2(
' kk
k netSnet
o
Which gives us:Which gives us: )1()2()1.2(
,
')(2 jkkkjk
xnetSodw
E
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
11
Backpropagation LearningBackpropagation LearningFor the derivative with regard to wFor the derivative with regard to wj,ij,i
(1,0)(1,0), notice that E depends , notice that E depends on it through neton it through netjj
(1)(1), which influences each o, which influences each okk with k = 1, …, K: with k = 1, …, K:
Using the chain rule of derivatives again: Using the chain rule of derivatives again:
, )2(kk netSo
iiijj xwnet )0,1(
,)1( , )1()1(
jj netSx
)0,1(,
)1(
)1(
)1(
1)1(
)2(
)2()0,1(, ij
j
j
jK
k j
k
k
k
kij w
net
net
x
x
net
net
o
o
E
w
E
K
kijjkkkk
ij
xnetSwnetSodw
E
1
)1()1,2(,
)2()0,1(
,
'')(2
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
12
Backpropagation LearningBackpropagation Learning
This gives us the following weight changes at the This gives us the following weight changes at the output layer:output layer:
… … and at the inner layer: and at the inner layer:
with)1()1.2(, jkjk xw
)2(')( kkkk netSod
with)0,1(, ijij xw
)1(
1
)1,2(, ' j
K
kjkkj netSw
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
13
Backpropagation LearningBackpropagation LearningAs you surely remember from a few minutes ago:As you surely remember from a few minutes ago:
Then we can simplify the generalized error terms:Then we can simplify the generalized error terms:))(1)(()(' xSxSxS
)2(')( kkkk netSod )1()( kkkkk oood
And:And:
)1(
1
)1,2(, ' j
K
kjkkj netSw
)1()1(
1
)1,2(, 1 jj
K
kjkkj xxw
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
14
Backpropagation LearningBackpropagation LearningThe simplified error terms The simplified error terms kk and and jj use variables that are calculated in the feedforward use variables that are calculated in the feedforward phase of the network and can thus be calculated very efficiently.phase of the network and can thus be calculated very efficiently.
Now let us state the final equations again and reintroduce the subscript p for the pth pattern:Now let us state the final equations again and reintroduce the subscript p for the pth pattern:
with)1(,,
)1.2(, jpkpjk xw
)1()( ,,,,, kpkpkpkpkp oood
with,,)0,1(
, ipjpij xw
)1(,
)1(,
1
)1,2(,,, 1 jpjp
K
kjkkpjp xxw
September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning
15
Backpropagation LearningBackpropagation LearningAlgorithmAlgorithm Backpropagation; Backpropagation; Start with randomly chosen weights;Start with randomly chosen weights; whilewhile MSE is above desired threshold and computational MSE is above desired threshold and computational bounds are not exceeded, bounds are not exceeded, dodo
forfor each input pattern each input pattern xxpp, 1 , 1 p p P, P, Compute hidden node inputs;Compute hidden node inputs; Compute hidden node outputs;Compute hidden node outputs; Compute inputs to the output nodes;Compute inputs to the output nodes; Compute the network outputs;Compute the network outputs; Compute the error between output and desired output;Compute the error between output and desired output; Modify the weights between hidden and output nodes;Modify the weights between hidden and output nodes; Modify the weights between input and hidden nodes;Modify the weights between input and hidden nodes; endforendfor endwhile.endwhile.