Backpropergation
-
Upload
malith-madhushanka -
Category
Documents
-
view
5 -
download
2
description
Transcript of Backpropergation
11 Backpropagation
11-14
, for . (11.45)
Finally, the weights and biases are updated using the approximate steep-est descent rule:
, (11.46)
. (11.47)
ExampleTo illustrate the backpropagation algorithm, let’s choose a network and ap-ply it to a particular problem. To begin, we will use the 1-2-1 network that we discussed earlier in this chapter. For convenience we have reproduced the network in Figure 11.8.
Next we want to define a problem for the network to solve. Suppose that we want to use the network to approximate the function
for . (11.48)
To obtain our training set we will evaluate this function at several values of .
Figure 11.8 Example Function Approximation Network
Before we begin the backpropagation algorithm we need to choose some ini-tial values for the network weights and biases. Generally these are chosen to be small random values. In the next chapter we will discuss some rea-sons for this. For now let’s choose the values
m m m m 1+ T m 1+= m M 1– 2 1=
m k 1+ m k m m 1– T–=
m k 1+ m k m–=
22+
g p 14psin+= 2– p 2
p
����Σ
����Σ ��
������
���� �
�
Example
11-15
, , , .
The response of the network for these initial values is illustrated in Figure 11.9, along with the sine function we wish to approximate.
Figure 11.9 Initial Network Response
Next, we need to select a training set . In this case, we will sample the function at 21 points in the range [-2,2] at equally spaced intervals of 0.2. The training points are indicated by the circles in Figure 11.9.
Now we are ready to start the algorithm. The training points can be pre-sented in any order, but they are often chosen randomly. For our initial in-put we will choose , which is the 16th training point:
.
The output of the first layer is then
The second layer output is
1 0 0.27–
0.41–= 1 0 0.48–
0.13–= 2 0 0.09 0.17–= 2 0 0.48=
p
a2
p1 t1 p2 t2 pQ tQ
p 1=
a0 p 1= =
1 1 1 0 1+ 0.27–
0.41–1
0.48–
0.13–+ 0.75–
0.54–= = =
1
1 e0.75+
1
1 e0.54+
0.321
0.368 .= =
11 Backpropagation
11-16
.
The error would then be
.
The next stage of the algorithm is to backpropagate the sensitivities. Be-fore we begin the backpropagation, recall that we will need the derivatives of the transfer functions, and . For the first layer
.
For the second layer we have
.
We can now perform the backpropagation. The starting point is found at the second layer, using Eq. (11.44):
.
The first layer sensitivity is then computed by backpropagating the sensi-tivity from the second layer, using Eq. (11.45):
The final stage of the algorithm is to update the weights. For simplicity, we will use a learning rate . (In Chapter 12 the choice of learning rate will be discussed in more detail.) From Eq. (11.46) and Eq. (11.47) we have
a2
f2 2 1 2+ purelin 0.09 0.17–
0.321
0.3680.48+ 0.446= = =
e t a– 14psin+ a2– 1
41sin+ 0.446– 1.261= = = =
f1n f
2n
f1n
ndd 1
1 e n–+
e n–
1 en–+
21
1
1 e n–+–
1
1 e n–+1 a1– a1= = = =
f2n
ndd n 1= =
2 22 2 –– 2 f
2n2 1.261– 2 1 1.261– 2.522–= = = =
1 1 1 2 T 2 1 a11– a1
1 0
0 1 a21– a2
1
0.09
0.17–2.522–= =
1 0.321– 0.321 0
0 1 0.368– 0.368
0.09
0.17–2.522–=
0.218 0
0 0.233
0.227–
0.429
0.0495–
0.0997 .= =
0.1=
Batch vs. Incremental Training
11-17
,
,
.
This completes the first iteration of the backpropagation algorithm. We next proceed to randomly choose another input from the training set and perform another iteration of the algorithm. We continue to iterate until the difference between the network response and the target function reaches some acceptable level. (Note that this will generally take many passes through the entire training set.) We will discuss convergence criteria in more detail in Chapter 12.
To experiment with the backpropagation calculation for this two-layer net-work, use the MATLAB® Neural Network Design Demonstration Backprop-agation Calculation (nnd11bc).
Batch vs. Incremental TrainingThe algorithm described above is the stochastic gradient descent algo-rithm, which involves “on-line” or incremental training, in which the net-work weights and biases are updated after each input is presented (as with the LMS algorithm of Chapter 10). It is also possible to perform batch train-ing, in which the complete gradient is computed (after all inputs are ap-plied to the network) before the weights and biases are updated. For example, if each input occurs with equal probability, the mean square error performance index can be written
. (11.49)
The total gradient of this performance index is
2 1 2 0 2 1 T– 0.09 0.17– 0.1 2.522– 0.321 0.368–= =
0.171 0.0772– ,=
2 1 2 0 2– 0.48 0.1 2.522–– 0.732= = =
1 1 1 0 1 0 T– 0.27–
0.41–0.1 0.0495–
0.09971– 0.265–
0.420–= = =
1 1 1 0 1– 0.48–
0.13–0.1 0.0495–
0.0997– 0.475–
0.140–= = =
Incremental Training
Batch Training
F E T= E – T –1Q q q– T
q q–
q 1=
Q
= =