[IEEE ICNN'95 - International Conference on Neural Networks - Perth, WA, Australia (27 Nov.-1 Dec....

6
LEARNING WITH EASE: Smart Neural Nets B. W. DAHANAYAKE and A. R. M. UPTON Division of Neurology. HSC. McMaster Universily. Hamilton, Ontario, Canada U N 325; e-maiL dnhannya@#.r.mmaster.ca ABSTRACT - We introduce smart neural nets that learnfast with ease by regular backpropagatwn. Ihb b achieved by avoiding the ure of the sigmoid non-linear functwn driven conventwnal or Socratic neurons, and choosing the neurons of the hidden layers and the outplcl layer appropriately. To devebp the smart neural nets, we droduce what we call 'the mart neurons' and 'the intelligent neurons 'rhar have the underpinning of )tmy thinking' or 'deBOn0 thinking,. Ihe intelligent neurons are obtained by introducing the non-emotional innovation feed back into the smart neurons. Ihe intelligent neurons asymptoticaUy become the same as the smart neurons. nte smart neural nets are constructed by using the smart neurons and intelligent neurons. llze smart neurons alone are employed to form the hidden kzyer (or layers) of the smart neural net. Ihe ourpw layer of the smart neural net b constructed by wing the intelligent neurons abne. We compare the pegormanre of the smari neural nets against that of the conventional neural nets toward the regular i n n o v h nbackpropagationlearning. V&e the conventional neural nets, the smart neural nets seem to learn f a t and smoothly by the regular innovation backpropagation learning. Further, the sigmoid non-linearfunction driven conventional or Socratic neurons are not essenrial to build feed forward neural nets. IN f m t , much more elpcient and f a t learning neural nets can be buik by avoiding the conventional or Socratic neurons. " ... a new idea is delicate; it can be killed by a sneer or a yawn; it can be stabbed to de& by a quip and worried to de& by a frown on the right person 's brow. " - Charlie Brower - 1. WHY ARE WE WRITING THIS PAPER? The conventional neurons consist of the weighted sum of the input (or adahe) followed by a sigmoid non-hearity that is differentiable and of saturating nature. The conventional feed forward neural nets are obtained by forming layers of conventional neurons and interconnecting the neurons of adjacent layers. The conventional neural nets are widely used for practical applications in various disciplines. However, conventional neural nets have been vary slow Icamers, and therefore, it is important to improve the learning capabilities of the neural nets to make the neural nets more attractive for practical applications. How can we improve the learning capabilities of the neural nets? To answer this question, let us assume that we are teaching a class that consists of a group of students. Each student in the class consists of a distinct biological neural net. We have the ability to change neither the neuron nor the neural net directly. Therefore, as teachers, we have to develop teaching methodologies so that each student in the class learns fast and efficiently. When we consider teaching a group of students, we have no other choice but to develop efficient learning methodologies for making the students learn efficiently. However, when we consider artificial neural nets, we have two clear choices to make the neural nets fast learners. One is the development of the fast leaming algorithms, as teachers do when they are teaching the students. The second alternative is to design the neurons appropriately so that the neural nets learn fast. Here, We are going to use the second altemative. The slow learning characteristic of the conventional neural neis is a direct result of the inherent weakness in learning capabil- ities of the sigmoid non-hear function driven conventional neurons that are used to construct the neural nets. Therefore, if we want to construct neural nets that learn fast, we should avoid the use of the conventional neurons. in other words, we have to use the second approach. We are going to avoid the sigmoid non-hear function driven conventional neurons. and introduce smaxt neurons in their place to form the neural nets that leam fast in a well behaved manner. We also completely avoid the use of the threshold (hard limiter) non-linear function driven neurons. 2. WHAT IS SOCRATIC LEARNING? To understand why we have modeled the conventional neuron the way it is, it may be useful to consider the system of thinking that our actions are govemed. What is our system of thinking? To a greater extent, it is Socratic thinking, which develops as we grow through the educational system. In general, society is pervaded by Socratic thinking. Let us consider the fundamental characteristic of Socratic thinking. The Socratic system is based on: Critical questioning is the foundation to Socratic thinking. Use ROC-LOGIC or HARD-EDGED LOGIC. Question: If we group together several ROC-LOGIC people. and ask them to come up with a solution to a particular problem, what would the most probable outcome be? hWW: - every individual argues to defend hisher own position and rejects - no solution would come out. - the whole discussion would beeome paralysed. - If a group of ROCK-LOGIC people came out with a solution, it would tend to be after prolonged period of unnecessary bickering. An example of this phenomenon is very apparent in peace negotiations among warring fractions. - dichotomies or polarities (+/-, trudfalse, awnone) other views. Socratic Thiikig: Is it adequate? Let's see what deBono has to say about critical thinking 111. " 0 We need to realize that critical thinking is totally inadequate by itself. 0 We need to reduce our obsession with critical thinking. We need to question the high esteem in which we hold critical (Critical thinking is a legitimate and useful cover for jealousy, as is often the case in academic world. Personal attacks and power plays can simply be given the cloak of genuine criticism.) emergence of new ideas very difficult. thinking. 0 We need to understand the easy use of critical thinking makes This is particularly so when a new idea n e d s to be judged within a new paradigm not within the old paradigm which, by definition. it does not fit!" Having reviewed the basics of Socratic learning, we are now going to see how the underpinning of the conventional neurons are related

Transcript of [IEEE ICNN'95 - International Conference on Neural Networks - Perth, WA, Australia (27 Nov.-1 Dec....

LEARNING WITH EASE: Smart Neural Nets B. W. DAHANAYAKE and A. R. M. UPTON

Division of Neurology. HSC. McMaster Universily. Hamilton, Ontario, Canada U N 325; e-maiL dnhannya@#.r.mmaster.ca

ABSTRACT - We introduce smart neural nets that learn fast with ease by regular backpropagatwn. Ihb b achieved by avoiding the ure of the sigmoid non-linear functwn driven conventwnal or Socratic neurons, and choosing the neurons of the hidden layers and the outplcl layer appropriately. To devebp the smart neural nets, we droduce what we call 'the mart neurons' and 'the intelligent neurons 'rhar have the underpinning of )tmy thinking' or 'deBOn0 thinking,. Ihe intelligent neurons are obtained by introducing the non-emotional innovation feed back into the smart neurons. Ihe intelligent neurons asymptoticaUy become the same as the smart neurons. nte smart neural nets are constructed by using the smart neurons and intelligent neurons. llze smart neurons alone are employed to form the hidden kzyer (or layers) of the smart neural net. Ihe ourpw layer of the smart neural net b constructed by wing the intelligent neurons abne.

We compare the pegormanre of the smari neural nets against that of the conventional neural nets toward the regular i n n o v h n backpropagation learning. V&e the conventional neural nets, the smart neural nets seem to learn f a t and smoothly by the regular innovation backpropagation learning. Further, the sigmoid non-linearfunction driven conventional or Socratic neurons are not essenrial to build feed forward neural nets. IN f m t , much more elpcient and f a t learning neural nets can be buik by avoiding the conventional or Socratic neurons.

" ... a new idea is delicate; it can be killed by a sneer or a yawn; it can be stabbed to de& by a quip and worried to de& by a

frown on the right person 's brow. " - Charlie Brower -

1. WHY ARE WE WRITING THIS PAPER? The conventional neurons consist of the weighted sum of

the input (or adahe ) followed by a sigmoid non-hearity that is differentiable and of saturating nature. The conventional feed forward neural nets are obtained by forming layers of conventional neurons and interconnecting the neurons of adjacent layers. The conventional neural nets are widely used for practical applications in various disciplines. However, conventional neural nets have been vary slow Icamers, and therefore, it is important to improve the learning capabilities of the neural nets to make the neural nets more attractive for practical applications. How can we improve the learning capabilities of the neural nets?

To answer this question, let us assume that we are teaching a class that consists of a group of students. Each student in the class consists of a distinct biological neural net. We have the ability to change neither the neuron nor the neural net directly. Therefore, as teachers, we have to develop teaching methodologies so that each student in the class learns fast and efficiently.

When we consider teaching a group of students, we have no other choice but to develop efficient learning methodologies for making the students learn efficiently. However, when we consider artificial neural nets, we have two clear choices to make the neural nets fast learners. One is the development of the fast leaming algorithms, as teachers do when they are teaching the students. The second alternative is to design the neurons appropriately so that the neural nets learn fast. Here, We are going to use the second

altemative. The slow learning characteristic of the conventional neural

neis is a direct result of the inherent weakness in learning capabil- ities of the sigmoid non-hear function driven conventional neurons that are used to construct the neural nets. Therefore, if we want to construct neural nets that learn fast, we should avoid the use of the conventional neurons. in other words, we have to use the second approach. We are going to avoid the sigmoid non-hear function driven conventional neurons. and introduce smaxt neurons in their place to form the neural nets that leam fast in a well behaved manner. We also completely avoid the use of the threshold (hard limiter) non-linear function driven neurons.

2. WHAT IS SOCRATIC LEARNING? To understand why we have modeled the conventional

neuron the way it is, it may be useful to consider the system of thinking that our actions are govemed. What is our system of thinking? To a greater extent, it is Socratic thinking, which develops as we grow through the educational system. In general, society is pervaded by Socratic thinking. Let us consider the fundamental characteristic of Socratic thinking.

The Socratic system is based on:

Critical questioning is the foundation to Socratic thinking. Use ROC-LOGIC or HARD-EDGED LOGIC.

Question: If we group together several ROC-LOGIC people. and ask them to come up with a solution to a particular problem, what would the most probable outcome be? h W W : - every individual argues to defend hisher own position and rejects

- no solution would come out. - the whole discussion would beeome paralysed. - If a group of ROCK-LOGIC people came out with a solution, it

would tend to be after prolonged period of unnecessary bickering. An example of this phenomenon is very apparent in peace negotiations among warring fractions.

- dichotomies or polarities (+/-, trudfalse, awnone)

other views.

Socratic Thiikig: Is it adequate? Let's see what deBono has to say about critical thinking 111. " 0 We need to realize that critical thinking is totally inadequate by

itself. 0 We need to reduce our obsession with critical thinking.

We need to question the high esteem in which we hold critical

(Critical thinking is a legitimate and useful cover for jealousy, as is often the case in academic world. Personal attacks and power plays can simply be given the cloak of genuine criticism.)

emergence of new ideas very difficult.

thinking.

0 We need to understand the easy use of critical thinking makes

This is particularly so when a new idea n e d s to be judged within a new paradigm not within the old paradigm which, by definition. it does not fit!"

Having reviewed the basics of Socratic learning, we are now going to see how the underpinning o f the conventional neurons are related

to Socratic system of leaming. Justifiably, we can call the conven- tional neurons 'Socratic neurons'.

"Sysremanlics: All !he system are inlinile(y complex. The illusion of sinylliciry comes from focusing attention on one or few variables. "

3. Conventional Neurons and Conventional Neural Nets Conventional neurons and conventional neural nets can also be called Socratic neurons and Socratic neural nets respectively.

3.1 CONVENTIONAL NEURONS (Socratic Neuron): The conventional o r Socratic neurons usually consist of the

weighted sum of the input vector followed by a sigmoid non-linear function, which is differentiable and saturating in nature. The hyperbolic tanh(.) function is widely used as the non-linear function. Configuration of the conventional neuron is given in Figure 1.1.

I4t) rt 11 ENW=mh(Y) innovation backpropagation

Fig. 1.1: Conventional Neuron (Socratic Neuron)

Innovation backpropagation leaming algorithm is given by [4,5,6]: (1) W(t) =W(t-1) +( l-A)cub(t)X(t) +( l-l/t)kA W(t-1)

Fig. 1 . 2 The non-hear function of the conventional neuron and its derivative

Since &(y)=tanh(y), the derivative fcN(y) is given by, TcN(y) =d[tanh(y)]/dy = 1-y:. The sigmoid non-linear function tanh(y) and its derivative d[tanh(y)]/dy are shown in Figure 1.2. The innovation, e(t) is given by the relationship; c(t) =d(t)-tanh[wT(t-l)x(t)]. The innovation at the output of the ada l i e o r the back propagated innovation, 6(t) can be obtained as; 6(t)= fCN(y)c(t). The innovation surface, E[?(t)] is usually used in deriving leaming algorithms; E[.] denotes the expectation operator.

The terms, c(t), 6(t), and E[eZ(t)l are incorrectly referred to in the literature as the error, the backpropagated error, and the

error surface respectively; the correct terms are the innovation, backpropagated innovation, and the innovation surface.

e(t) =d(t)-tanh[wT(t)x(t)]. The vector w(t) is used in the calculation of error, and it is not available at time t until the weights at time t-1 are updated. Further the error does not have any information, and therefore, there is no reason to use it in updating the weights.

However, w(t-1) is used in the calculation of the innova- tion, and hence, the innovation E(t) is available at time t to update the weight vector from the past value w(t-I) to the present value w(t). The error can be calculated only after the innovation is used to update the weight vector from w(t-1) to w(t). Therefore, in practice, the weight update o r leaming is actually done by using the innovation backpropagation.

The error term e(t) can be written as,

3.2 CONVENTIONAL (or SOCRATIC) NEURAL NETS: The conventional neural nets are formed by using layers

of conventional neurons, and intemnnectingthe neurons of adjacent layers.

';sLstema&cs: A complex system that worh is invariably found to have evolvedfrom a simple system that works. "

3.3 WHAT IS WRONG WITH CONVENTIONAL NEURONS? Conventional neurons make use of the: - dichotomies or - polarities (+/-, all/nothing) for leaming. (Does the biological neuron function on all or nothing paradigm? We doubt it. Biological being are too smart to take an all or nothing

0 ROCK-LOGIC or Hard-Edged Logic is used. In other words, conventional neurons are based on Socratic

0 Hence, it is a Socratic system. Socrafc system leads to: - very very slow leaming - paralysis.

The HARD-EDGED dichotomy uushes things one way or the other with its rigid inflexibility when they might better be some- where in between. Fast l aming requires intellectual freedom. Conventional neurons and networks are denied that very essential intellectual freedom, which is an indisuensable factor for fast leaming. Think for a while. why University students and teachers alike are given intellectual freedom. Therefore, our fmt task should be to set the neurons free; to give neurons intellectual freedom.

Question: How can we make neural nets fast learners? Answer: We know the use of HARD-LOGIC o r Socratic thinking leads to: - slow leaming - a tendency to paralyse. We also know that the conventional neurons are autocratically denied intellectual freedom by imposing dichotomies

P a r a d i g m . )

thinking.

Therefore, to make neural nets paralysis free, efficient, and fast leaminp, we should avoid the HARD-LOGIC or Socratic thinkinrr,

Question: If we should avoid Socratic thinking, what alternative system of thinking should we go for?

Answer lies in Parallel thinking. What is parallel thinking?

3.4 PARALLEL TIiINKlNG ( d e b n o Thinking): " ... Parallel thinking does not need the harsh HARD-EDGED judgcmenb that are required by the 'truth merchant' in the Socratic system.

The trudfalse (alUnone. +I-) dichotomy is softened by 'possibility' overlap and 'fuzzy edges'.

Alternative views can lie along side each other - in parallcl."

- Edward deBono (Parallel Thinking) -

Question (revisited): How can we make neural nets fast learners? Answer: Smart Neural Nets

Set the neurons free! Give them intellectual freedom! 0 Use parallel thinking in place of Socratic thinking.

Use SOFT-LOGIC in place of HARD-LOGIC Use a non-saturatinp. non-hear function in ulace of the saturating

non-linear function.

"Systemntics: Every new idea - in Science, Politics. Art or whatever - evokes three stages of reaction. l l ~ e y may be summed up by the three phases: (a) 'It is impossible - don 'I waste my time. ' (b) 'It is possible - bur it is not worth doing. * (c) 'I said it was a good idea all along. ' @

4. SMART NEURONS and INTELLIGENT NEURONS "Systemntics: A complex neural system lhar does not learn satisfactorily cannot be patched up to make i~ learns elficientb. You have to start over, beginning with an elficient neuron. "

4.1 SMART NEURONS: We introduce what we call 'the smart neurons' by using

a bounded and differentiable, but non-saturating non-linear function. Configuration of the smart neuron is given in Fig. 2.1.

4 Y&)

x(t) w(t-1) S(t) -69- e(t) -63- 0.5d(t) [d(t) = f 11

fSN(Y)=Y/(l +Y') innovation backpropagation Fig. 2.1: Smart Neuron

-1.0 1 Fig. 2.2 Non-linear function of the smart neuron and

its derivative

The output of the smart neuron, yo@) is given by, y,,(t)=y(t)/(l +y*(t)), where y(t)=wT(t-l)x(t). The derivative of the non-linear function, T,,(y) KS given by, TsdY)=d[fs~(Y)]/dY =(1-Y2)/(1 +Y')'. The non-linear function of the smart neuron and i b derivative are shown in Figure 2.2.

Note that we have softened the edges of the non-linear function of the Socratic neuron to obtain the smart neurons. In other words, we have used SOFT-LOGIC in place of HARD-LOGIC. It is noteworthy that leaming does not require monotonous non- linearity. When it comes to leaming, the smart neuron shown in Figure 2.1 provides well thought out decision and hence the name smart neuron.

The non-hear function of the smart neuron and its derivative are shown in Figure 2.2. The term 'smart' is used to reflect its fast learning capabilities. If a student l a m s fast, we call the student 'smart'. The same analogy is used to name the smart neuron.

Let us consider the characteristics of the smart neuron for leaming. The characteristic of the neuron is mainly determined by the nature of the non-linear function used as the driving function. The non-linear function, y/(l +f) that is used in the smart neuron has the following characteristics. i. It has only one positive peak for positive values of y, and only

one trough for the negative values of y. ii. The non-linear function fSN(y) is bounded so that,

-0.5 5fm(y) 50.5. iii. For finite y, the gradient of the non-linear function becomes zero

only at two distinct points. iv. The gradient of the non-linear function takes both positive

values as well as negative values. (This is quite a contrast to the conventional neurons with sigmoid non-hear functions where the gradient takes only the positive values.)

v. When the gradient is zero, the values of the non-linear function is i0.5.

Since the maximum and minimum values of the non-linear function are fO.5, the desired signal to the neuron should be f0.5. If the desired output signal, d(t) is binary and takes the values f. 1, then, 0.5d(t) should be chosen as the desired signal to the smart neuron (Fig. 2.1).

Let us consiaer how the smart neuron lea,ms by the learning algorithm given in eqn. (1). L e t ' s consider the situation where we have a large adaline output y, i.e. y > 1. In that case, f,,(y) will be small and hence, the innovation e@), e(t)=0.5d(t)-fs,(y), will be large. And also the gradient, f,,(y) at that point will be negative. Therefore, the coefficient of the second term of the leaming eqn. (I) , i.e. a(ld)e(t)f,,(y) will be negative and large, and in effect, that decreases the value of the weight vector causing the output y to roll back to a smaller value. When the output of the adaline, y is smaller (0 < y < l ) , the opposite takes place; i.e. the coefficient of the second term of the learning eqn. (1) is positive and, in effect, increases the weight vector leading to higher value of y. This process continues until the peak or the trough of the nonlinear function, i0.5 is reached. When the desired value 0.5 (or -0.5 for y<O) is reached, the gradient f,,(y) of the non-linear function, as well as the innovation e@) become zero causing the learning to terminate. At this point, the neuron has learned all the information that could be gained from the input and therefore, the innovation will be zero or negligible:.

In the case of smart neuron, the gradient of the non-linear function is zero (for finite y) only at the desired signal & O S . This is one of the unique feature of the smart neuron. This itself

contributes greatly to the paralysis free, increased speed of learning of the smart neuron.

Q: What do we have now? - We have smart neumns that are paralysis free efficient learners. Q: How can we use them?

i. We can use the smart neurons alone to construct a fast learning smart neural nets [2].

ii. We can employ smart neurons as the hidden layers and the conventional neurons as the output layer and construct a fast learning smart neural nets [3,7l.

However, we are going to do something different here. We also introduce another neuron called the 'intelligent neuron'.

4.2 INTELLIGENT NEURONS:

"what is intelligence: In our view, intelligence is the abifiry to be aware and to ut ike the present and the part innovations non-emotional& for fiture decisions. "

The intelligent neumn is formed by introducing M%rder memory into the smart neuron. To form the intelligent neuron, we take the smart neuron, and then add M feed back terms that are taken out of a tap delay line fed by a nonemotional non-hear function of the innovation. The nonemotional non-hear function fFB(e) is chosen to be, fFB(e)=e/(l +e2). The nonemotional non-linear function can also be written as, fFB(E)={ € 9 l e 1 4 1

1/E. [ E l 81. The function f,(e) prevent the feed back loop responding to the innovations directly o r emotionally. The nature of the non-linear function, f,(e) is such, it responds directly to smaller innovations while responding reciprocally to the larger innovations. It is suspicious about the larger innovations. It does not allow larger innovations to spoil what the system has learned so far. In other words, the fFB(e) actS as a non-emotional non-hear function. The configuration of the order intelligent neuron is given in Fig. 3.1.

M 2 1

f,,(q)=q/(l +q2), fFB(6)=6(t)/(l +e2(t)), M 1 1 , Z'=unit delay

Fig. 3.1: Intelaent Neurons

Oncc thc neuron has learned, e(t)=O and hence, the learned intelligcnt neuron is the same as the smart neuron since we have chosen our nonemotional non-linear function as, fFB(e)=e/(1+e2). In other words, once the neuron has leamed, we can discard the feed back branch from the neuron without altering the information the neural net has received. The intelligent neurons are used only in the output layer. The graphical representation of the nonemotional non-linear function, fFB(e) is the same as the non- linear function, fsN of the smart neuron given in Fig. 2.2. The input vector to the intelligent neuron, p(t) is given by, p(t) = [p,(t),p,(tL. .,pL-,(t), 1 ,e(t-l)/( 1 +2(t-l)),t(t-2)/(1+eZ(t-2)),

p(t)=rP,(t),PZ(t), ... 1 PL.,(t),l,O,O, ... m. ... , e(t-M)/(l +eZ(t-M))lT.

Once the neuron has learned, p(t) is given by,

Yijutemuniies: A large system, produced by expanding the dimensions of a smaller system. does not behave like the smaller system. -

5. SMART NEURAL NETS We construct what we call 'the smart neural nets' by using

the smart neurons and the intelligent neurons. The smart neurons are used to form the hidden layer (or layers) of the net. The output layer of the smart neural net is formed using the intelligent neurons alone. The configuration of a fully connected feed forward smart neural net is shown in Fig. 4.1.

+l

Hidden layer of smart neurons Output layer of intelligent neurons d, f 1, i= 1,2. x(t) =input to the hidden layer, p,(t) r input to the i' neuron of the output layer. Fig. 4.1: Smart neural net (before and during the learning)

For the network shown in Fig. 4.1, the same input vector is applied to each smart neuron in the hidden layer. The input vector x(t) is given by, x(t)=[xl(t),&(t),x,(t),lIr. However, input to the output layer intelligent neurons will not be the same during the learning. The input vector to the neuron of the output layer, p,(t) is given by, p,(Q = Ip,(t),pz(t),p,(t), 1 ,€,(t-1)~(1+ e:(t-l)),€i(t-2)41 +e:(t-2)) I

_ _ _ , e,(t-M)/(l+~?(t-M))]~, i=1.2. Once the smart neural net has leamed, e,(t)=O, Vi, and

hence, the feed back loops of the intelligent neurons of the output layer cease to exist. In a learned smart neural net, the output layer of intelligent neurons become equivalent to a layer of smart neurons. Therefore, the leamed smart neural net will be a neural net of smart neurons alone. Once it's leamed. the smart net given in Fig. 4.1 becomes equivalent to the net of smart neurons given in Fig. 4.2.

Hidden Layer of Output layer of intelligent neurons becomes a smart neurons

x(t) -input to the hidden layer, p(t) =input to the output layer P(t) =IPl(t)'PZ(t).P,(t), 11 Fig. 4.2 Smart neural net after the learning

layer of smart neurons after learning

We have considered a two-layer neural net that consists of a hidden layer of smart neurons and an output layer of intelligent neurons. If the net has more than two layers, then, all the hidden layers are formed using the smart neurons alone. The intelligent neurons are used only in the output layer. Innovation Backpropagation The innovation vector of the output layer, t(t) is given by, ei(t)=0.54(t)-y,(t), i=1,2, ..., MO, M,=no. of neurons in the output layer. The backpropagated innovation vector, S(t) of the output layer can be derived as, Si(t)=[(l+')/(l +q,*)']e,(t), i=1,2, ..., MO, Si(t) is the i"' element of the vector &(t). Let M, be the no. of neurons in the hidden layer. Then, innovation, C ( t ) of the hidden layer is given by,

e,@')(t)= Cw,(t-l)b,(t), i= 1,2 ,..., M,.

where wq denotes the im weight of the J* neuron in the output layer. e,(")@) is the i"' element of P)(t). Then, the backpropgation innovation 6*)(t) is given by, SF)(t) = [( l-r:)/( 1 +rf)*]e?)(t), i= 1,2,. . . $4,. 6,")(t) is the i"' ekment of the vector 6@)(t). Now, we have the innovations at the adalines of the hidden layer and the adalines of the output layer. We update the weight vectors using the modified momentum-LMS learning given in [3,4,5]. Weight Updates:

MO

j = l

ou tpu t layer;

pit) =[P1(t),PZ(t),. . . ,Psn(t). 1 ,d t - l )W +€iV-l)).€i(t-2)/(1 +e:(t-2)), Wi(t) =W,(t- 1) (l-h)Q&,(t)p,(t) + (1 - 1h)h AWi(t-l),

... ,ei(t-M)/(l +e:(t-M))lT, i=1,2 ,..., MO. Awi(t-1)=wi(t-l)-wi(t-2), i= 1,2,.. .,MO. Hidden layer;

where, x(t) = [x,(t).Xi(t).. . . . x ~ . ~ . 1IT, Vi(t)=Vi(t-l) +( l-h)QSi@')(t)x(t) +( 1-l/t)h AVi(t-1),

AV,(t-1)=V,(t-l)-Vi(t-2), i=1,2 ,..., M,.

6. APPLICATION T O EXCLUSIVEOR GATE (whipping boy) The truth table of two-input ex-OR gate is given by,

-1 -1 1 1 -1 1 -1 1 1 input )

I 1 1 1 1 +bias (input data set) -1 1 1 -1 + output (desired output) training set

To implement the exclusive-OR gate, we use a neural net that has two-neuron hidden layer and a single neuron output layer We construct both the smart neural net and the conventional neural

n d as givcn in Figures 5.1 and 5.2 respectively. V,@- I )

c(t) k- d(t) + ' 1 (bias input) v z ( t - l ) - / innovation backpropagation Learning curve: E[e*(t)] vs. t, where E denotes expectation operator. Fig. 5.1: Conventional or Socratic neural net fur =-OR gate

v,(t-l), i=1,2 denote the hidden layer weight vectors, w(t-1). denotes the output layer neuron weight vector Leaming curve: E[(2c(t)}7 vs. t. Fie. 5.2: Smart neural net for exclusive-OR gate

In order to compare the learning curve of the smart neural net with the leaming curve of the conventional neural net. we have to bring the innovations to the same scale. Therefore, we use the E[(2e(t)}'] as the learning curve of the smart neural net. PARAMETERS FOR LEARNING (i) Conventional Neural Net: This typically chooses a smaller Q value a d smaller random initial weights. Therefcre, we choose the initial weight vtxtors so that, IIw(0)Il = O S and lv,ll = O S , i=1,2. Parameters Q and X are chosento be, u=0.02 (typical for conventionalneuralnets), h=0.5. (i) Smart Neural Net: Weight vectors are initialized using random va1ue:r so that; initial weight vectors of the hidden layer neurons, 11 vi(0)II = l . i=1,2, initial weight vector of the output laycr neuron; 11 w(0) 11 = O S . Parameters Q and X are chosen to be; Q = 1. (Unlike the conven- tional neural nets, the smart neural net has the ability to handle higher learning rates. Also note that a( 1-h) < 1 even when a= 1; this prevents the exaggeration of the information or learning by gossip), X = O . 5 . Number of nonemotional non-linear feed back terms. M=15. See [3] for the choice of X. 7. PERFORMANCE ANALYSIS

To analyze the performance, the neural nets described above for two-input exclusive-OR gate are used. We repeat the experiment for 100 sets of independent initial weight values. The average of the square innovation, i.e. E[Z(t)] for lthe conventional neural net, and E[{2e(t)}7 for the smart neural net, is used to obtain the leaming curves. The learning curves for the conventional neural net and the smart neural net are shown in Fig. 6.1 for regular innovation backpropagation leaming algorithm. Fig. 6.2 illustrates the first 200 points of the leaming curves given in Fig. 6.1.

> m

time (sample number) Fg. 6.1: Learning curves of conventional and smart n e toward

theregular innovation backpropagation learning (M= 15)

smart net E[&(t)I2 0.0

time (sample number) Fi. 6.2: Fit 200 points Fgure 6.1

The conventional neural net and the smart neural net have responded completely differently to the regular backpropagation laming algorithm. The smart neural net was able to leam almost absolutely within approximately 8 iterations o r 2 training data set repetitions. Our training set contains 4 data vectors since we have used two-input exclusiveOR gate.

The conventional neural net did not learn completely even after 1500 updates o r 375 training set repetitions. In fact, the conventional neural net did not show any sign of l aming absolutely even after fairly large number of updates. However, after 375 training set repetitions, the conventional net leams enough to represent the exclusiveAR gate.

It is also clear from the results that the conventional neural net exhibits a lot of variability in the learning curve. In other words, the conventional neural net did not behave well during the training period. In fact, the conventional neural net displayed a very rowdy behaviour during the learning. However, the smart neural net has produced a fairly smooth and extremely fast decaying learning curve showing its well behaved and fast l aming capabilities. Unlike the conventional neural nets. due to the very nature of the nonlinear function used in the smart neurons, the smart neural nets have the inherent ability to l a m without ever becoming paralyse. Paralysis free leaming is one of the unique properties of the smart nets. Question: Why are the smart neural nets preferable to conventional or Socratic neurons? Answer: i. They are extremely fast leamers. ii. They are well behaved during the laming. iii. They are completely paralysis free. iv. Initialization has no effect on their ability to leam. v. No computational complexity.

vi. Existing conventional or Socratic neural nets can bc easily converted to smart neural nets simply by; (a) replacing the nonlinear function of the hidden layer

(or layers) by y/(l +y’) or SOFT-LOGIC, (b) applying W order memory driven by ~/(1+2) into the

output layer neurons. Larger the M faster the laming. 0 Once the net has leamed, the feed back loop of the output layer ceases to exist, and hence we can simply discard the feed back loops and put the net into practice. Any accelerated backpropagation leaming algorithm that is available for conventional neural nets can also be used with the smart neural nets for improved performance.

‘Systemantics: Career in research is simply a myth ” 8. WHAT HAVE WE DONE SO FAR?

We have introduced the concept of designing neural nets so that they leam fast in a well behaved manner by the regular innovation backpropagation learning algorithm. Wc have shown that it is not necessary to use the sigmoid non-linear function driven conventional o r Socratic neurons to build feed forward neural nets. In fact, the fast learning neural nets can be built by avoiding the use of the conventional neuruns that are. driven by HARD-LOGIC such as sigmoid o r hard limiter non-linearities.

We have introduced the smart neurons as well as the intelligent neurons based on SOFl-LOGIC. They seem to have underpinning of ’parallel thinking’ o r ‘deBono thinking’. It is shown that the fast leaming feed forward neural nets can be built by using the smart neurons as the hidden layer (or layers), and the intelligent neurons as the output layer. Once it has leamed, the smart neural net becomes a net of smart neurons alone. Simulation results show that the smart neural nets not only leam extremely fast by the regular backpropagation leaming algorithm, but also behave well during the laming. The smart neural net is ‘a new kid (smart)’ on the block.

RefertXlCB [l] deBono Edward, (1994);ParaUeI ?hinking’ [2] Dahanayake B. W. and A. R. M. Upton,(l994),’Smarr neural

nets for fmt learning’, IEEE Symposium on Emerging Technologies & Factory Automation, Tokyo, Japan.

131 Dahanayake B. W. and A. R. M. Upton,(1994),’A novel approach for f a t learning: smart neural nets’, IEEE World Congress on Computational Intelligence, IC”, Orlando, USA.

[4] Widrow B. and M. E. Hoff, Jr., (1960),’Aduptive switching circuits’, IRE westemelectricconvention records, 104, Aug. 23.

[SI Rumelhart D. E., G. E. Hinton, and R. J. Wiuiams, (1986). ‘Learning by error backpropagation’, in parallel distributed processing, vol. 1, ch. 8, D. E. Rumelhart and I. L. McClelland, eds., Cambridge, MA: MIT press.

161 Dahanayake B. W. and A. R. M. Upton,(1993),’exponenfiolty weighted lemt accumulated innovation squares learning’, ISCA Intemational Conference on Computer Applications in Industry and Engineering, Honolulu, Hawaii, USA.

(7J DahanayakeB. W. and A. R. M. Upton,(l994).’Learning wirh discipline: smart neural nets’, IEEE Workshop on Nonlinear Signal & Image Processing, Hakidiki, Greece.

[SI Gall John, (1977),’Systemantics’. [9] BLoch Arthur, (1994).’Murphy’s Law: all the reasons why

everything goes wrong’.

“Researcher ’s Dilemma: - no maner how much you do. you will never do enough; - what you don ’t do is always more important than what you do. ”