AI - Competitive Learning NN

8/6/2019 AI - Competitive Learning NN

http://slidepdf.com/reader/full/ai-competitive-learning-nn 1/10

8

Competitive Learning NeuralNetworks



http://rajakishor.co.cc

Consider pattern recognition tasks that the following network can perform.

The network consists of an input layer of linear units. The output of each of these

units is given to all the units in the second layer (output layer) with adaptive (adjustable)

feedforward weights. The output functions of the units in the second layer are either linear

or nonlinear depending on the task for which the network is to be designed. The output of

each unit in the second layer is fed back to itself in a self-excitatory manner and to the

other units in the layer in an excitatory or inhibitory manner depending on the task.

Generally, the weights on the connections in the feedback layer are nonadaptive or

fixed. Such a combination of both feedforward and feedback connection layers results insome kind of competition among the activations of the units in the output layer. Hence such

networks are called competitive learning neural networks.

Different choices of the output functions and interconnections in the feedback layer

of the network can be used to perform different pattern recognition tasks. For example, if

the output functions are linear, and the feedback connections are made in an on-centre off-surround fashion, the network performs the task of storing an input pattern temporarily. In

an on-centre off-surround connection there is an excitatory connection to the same unit and inhibitory connections to the other units in the layer.

On the other hand, if the output functions of the units in the feedback layer are made

nonlinear, with fixed weight on-centre off-surround connections, the network can be usedfor pattern clustering.

The objective in pattern clustering is to group the given input patterns in an

unsupervised manner, and the group for a pattern is indicated by the output unit that has anonzero output at equilibrium. The network is called a pattern clustering network, and the

feedback layer is called a competitive layer. The unit that gives the nonzero output at equilibrium is said to be the winner. Learning in a pattern clustering network involves

http://rajakishor.co.cc/






adjustment of weights in the feedforward path so as to orient the weights (leading to the

winning unit) towards the input pattern.

If the output functions of the units in the feedback layer are nonlinear and the units

are connected in such a way that connections to the neighbouring units are all made

excitatory and to the farther units inhibitory, the network then can perform the task of

feature mapping. The resulting network is called a self-organization network . In the self-organization, at equilibrium the output signals from the nearby units in the feedback layer

indicate the proximity of the corresponding input patterns in the feature space. A self-

organization network can be used to obtain mapping of features in the input patterns onto

a one-dimensional or a two-dimensional feature space.

Pattern Storage

(STM)

Architecture

Learning

Recall

Limitation

To overcome

Two layers (input and competitive), linear processing

unit s

No learning in FF stage, fixed weight s in FB layer

Not relevant

STM, no application, theoretical interest

Nonlinear output f unction in FB stage, learning in FF

stage

Pattern Clustering

(Grouping)

Architecture

Learning

Recall

Limitation

To overcome

Two layers (input and competitive), nonlinear

processing units in the competitive layer

Only in FF stage, Competitive learning

Direct in FF stage, activation dynamics until stablestate is reached in FB layer

Fixed (rigid) grouping of patterns

Train neighbourhood units in competition layer

Feature Map Architecture

Learning

Recall

Limitation

To overcome

Self -organization network, two layers, nonlinear

processing units, excitatory neighbourhood units

Weights leading to the neighbourhood units in thecompetitive layer

Apply input, det ermine winner

Only visual features, not quantit ative

More complex architecture







Analysis of P attern C lustering N etworks

A competitive learning network with nonlinear output functions for units in the

feedback layer can produce at equilibrium larger activation on a single unit and smallactivations on other units. This behaviour leads to a winner-take all situation, where, when

the input pattern is removed, only one unit in the feedback layer will have nonzero

activation. That unit may be designat ed as the winner for the input pattern.

If the feedforward weights are suitably adjusted, each of the units in the feedback

layer can be made to win for a group of similar input patterns. The corresponding learning

is called competitive learning.

The units in the feedback layer have nonlinear f(x) = xn , n > 1 output functions.

Other nonlinear output functions such as hard-limiting threshold function or semilinear

sigmoid function can also be used. These units are connected among themselves with fixed

weights in an on-centre off -surround manner. Such networks are called competitivelearning networks. Since they are used for clustering or grouping of input patterns, they can

also be called pattern clustering networks.

In the pattern clustering task, the pattern classes are formed on unlabelled input

data, and hence the corresponding learning is unsupervised. In the competitive learning the

weights in the feedforward path are adjusted only after the winner unit in the feedback

layer is identified for a given input pattern.

There are three different methods of implementing the competitive learning as

illustrated below.







In the figure, it is assumed that the input is a binary {0, 1} vector. The activaition of

the ith unit in the feedback layer for an input vector x = (x1, x2, …, xM)T is given by

1

M

i ij j

j

y w x

where wij is the (i, j)th element of the weight matrix W, connecting the j th input to the ith

unit. let i = k be the unit in the feedback layer such that

max( )k i

i y y

then

,T T

k iw x w x for all i

Assume that the weight vectors to all the units are normalized, i.e., || wi || = 1, for all

i. Geometrically, the above result means that the input vector x is closest to the weight

vector wk among all wi. That is

|| x – wk || ≤ || x - wi ||, for all i.

Start wit h some initial random values for the weight s. The input vectors are applied

one by one in a random order. For each input the winner unit in t he f eedback layer is

ident ified, and the weights leading to the unit are adjusted in such a way that the weight

vector wk moves towards the input vector x by a small amount, determined by a learning

rate parameter .

A straight forward implementat ion of the weight adjustment is to make

wkj = (xj - wkj),

so that

wk (m+1) = wk (m) + wk (m)

= wk (m) + (x – wk (m))

This looks more like Hebb's learning with a decay t erm if the output of the winning

unit is assumed to be 1. It work s best when the input vectors are not normalized. This is

called the standard compet itive learning.







If the input vectors are not normalized, then they are normalized in the weight

adjustment formula as follows:

j

kj kji

i

x

w w x

only for those j for which x j = 1. This can be called minimal learning.

In the case of binary input vectors, for the winner unit, only the weights which have

nonzero input would be adjusted. That is

, 1

0,

j

kj kj j

ii

j

xw w for x

x

for x o

In the minimal learning there is no automatic normalization of weights after each

adjustment.

That is,1

1.

M

kj

j

w

In order to overcome this problem, Malsburg suggested the following learning law,in which all the weights leading to the winner unit will be adjusted:

, 1 j

kj kj j

i

i

xw w for all x

x

In this law if a unit wins the competition, then each of its input lines gives up some

portion of its weights, and that weight is distributed equally among the active connections

for which Xj = 1.

The unit i with an initial weight vector wi far from any input vector, may never win

the competition. Since a unit will never learn unless it wins the competition, anothermethod called leaky learning law is proposed. In this case, the weights leading to the units

which do not win also are adjusted for each update as follows:









ijw , for all j,

j

w ij

m

m

xw

x

if i wins the competition, i.e., i = k.

, for all j, j

l ij

m

m

xw

x

if i loses the competition, i.e., i ≠ k. where w and l are the learning rate parameters for the winning and losing units,

respectively (w > l ). In this case the weights of the losing units are also slightly moved for

each presentation of an input.

Basic competitive learning and its variations are used for adapt ive vector

quantization, in which t he input vectors are grouped based on the Euclidean distance

between vect ors.

Associative M emory

Pattern storage is an obvious pattern recognition task that one would like to realize

using an artificial neural network. This is a memory function, where the network isexpected to store the pattern inf ormation (not data) for later recall. The pat terns to be

st ored may be of spatial type or spatio-temporal (pattern sequence) type.

Typically, an artificial neural network behaves like an associative memory, in which

a pattern is associated with another pattern, or with itself. This is in contrast with the

random access memory which maps an address to a data. An artificial neural network can

also function as a content addressable memory where data is mapped onto an address.The pattern information is stored in the weight matrix of a feedback neural network .

The stable states of the network represent the stored patterns, which can be recalled byproviding an external stimulus in the form of partial input.

If the weight matrix stores the given patterns, then the network becomes an

autoassociative memory . If the weight matrix stores the association between a pair of

patterns, the network becomes a bidirectional associative memory . This is called







heteroassociation between the two patterns. If the weight matrix stores multiple

associations among several (> 2) patterns, then the network becomes a multidirectional

associative memory . If the weights store the associations between adjacent pairs of patterns

in a sequence of patterns, then the network is called a temporal associative memory .

Some desirable characteristics of associative memories are:

1. The network should have a large capacity, i.e. , ability to store a large number of

patterns or pattern associations.

2. The network should be fault tolerant in the sense that damage to a few units or

connections should not affect the performance in recall significantly.

3. The network should be able to recall the stored pattern or the desired associated

pattern even if the input pattern is distorted or noisy.

4. The net work performance as an associative memory should degrade only gracefullydue to damage to some units or connections, or due to noise or distortion in the

input .

5. The network should be flexible to accommodate new patterns or associations

(within the limits of its capacity) and to be able to eliminate unnecessary patterns or

associations.

Bidirectional Associative Memory (BAM)

The objective is to store a set of patt ern pairs in such a way that any stored pattern

pair can be recalled by giving either of the patterns as input . The network is a two-layer

heteroassociative neural network as shown below.

The network encodes binary or bipolar pattern pairs (al , bl ) using the Hebbian

learning. It can learn on-line and it operates in discrete time steps.







The BAM weight matrix from the first layer to the second layer is given by

1

LT

l l

l

W a b

where al Є {-1, +1}M and bl Є {-1, +1}N for bipolar patterns, and L is the number of training

patterns.

For binary patterns pl Є {0, 1}M and ql Є {0, 1}N, the bipolar values ali = 2pli - 1 and

bli = 2qli – 1 corresponding to the binary elements p li and qli, respectively, are used in the

computation of the weight matrix.

The weight matrix from the second layer to the first layer is given by

1

LT T

l l

l

W b a

The activation equations for the bipolar case are as follows:

j

j

j

1, if y 0

( 1) ( ), if y 0

1, if y 0

j jb m b m

where

1

( ) M

j ji i

i

y w a m

and

i

i

i

1, if x 0

( 1) ( ), if x 0

1, if x 0

i ia m a m

where

1

( ) N

i ij j

j

x w b m







In the above equations a(m) = [a1(m), a2(m), …, aM(m)]T is the output of the first

layer at the mth iteration, and b(m) = [b1(m), b2(m), …, bN(m)]T is the output of the secondlayer at the mth iteration.

The updates in the BAM are synchronous in the sense that the units in each layer are

updated simultaneously.

The BAM is limited t o binary or bipolar valued pattern pairs. The upper limit on the

number (L) of pattern pairs that can be stored is min(M, N).

The performance of BAM depends on the nature of the pattern pairs and their

number. As the number of pattern pairs increases, the probability of error in recall will also

increase.




AI - Competitive Learning NN

Documents

Transcript of AI - Competitive Learning NN