Download - Hopfield Network– Recurrent NetorksLaboratório de Automação e Robótica - A. Bauchspiess– Soft Computing - Neural Networks and Fuzzy Logic 84 E.P.n yn (k) w2n w1n P.E.n Hopfield

HopfieldNetwork– Recurrent Netorks

E.P.1

E.P.2y2

(k)

y1

(k)

w12

wn2

w21

wn1 P.E.1

P.E.2

Auto-Associative Memory:Given an initialn bit patternreturns the closest stored (associated) pattern.No P.E. self-feedback!

( ))()1(

1

)()(:

kk

n

i

kiij

kj ywsDynamics =

+=∑

Laboratório de Automação e Robótica - A. Bauchspiess– Soft Computing - Neural Networks and Fuzzy Logic 84

E.P.n yn(k)

w2n

w1n

P.E.n

Hopfield Network withn Processing Elements

( ))()1( kj

kj sfy =+

xy =)0(

[ ])()( ki

k y=y

=<>

=

jj

jj

jj

j

Lsifvalueprevioushold

Lsif

Lsif

sf

functionactivationBinary

,

0

1

)(

:

Network Initialization:

Output Vector:

HopfieldNetwork...

E.P.1

E.P.2y2

(k)

y1

(k)

w12

wn2

w21

wn1 P.E.1

P.E.2

- Fast training and fast data recovery- IIR system with no input (only I.C.)- Guaranteed stability- Good for VLSI implementation

-Operating Forms (firing order)

• Asynchronous• Synchronous


E.P.n yn(k)

w2n

w1n

P.E.n

Hopfield Network withn Processing Elements

• Synchronous• Sequential

Possible Hopfield Network states (8) with 3 Processing Elements(Illustration of a typical recovery state evolution. From I.C. to E.V.)

111

100

010

001000

Initial Condition

End Value (stable)

HopfieldNetwork...

Learning:The patterns to be stored in the associative memory are chosen a priori.

m distinct patterns. Each of the form:

[ ] ),0(.1or0with,21 usuallyLaaaaA pi

pn

ppp === K


∑=

−−=m

p

pj

piij aaw

1

)12)(12(

)12( −piaObs: converts 0/1 to –1/+1

pj

pi aa =wij is incremented by 1 if otherwise it is decremented

Procedure is repeated for eachi,j for every Ap.

Learning is analogous to reinforcement learning

HopfieldNetwork - Example

Symbol Training Vector

L A1 = [1 0 0 1 0 0 1 1 1]

T A2 = [1 1 1 0 1 0 0 1 0]

a1 a2 a3 1 1 1 1 1

a4 a5 a6 1 1 1 1 1

a7 a8 a9 1 1 1 1 1

−−−−−−−−

313131101

111311110

Patterns to be stored as 3x3 matrices:


+ A3 = [0 1 0 1 1 1 0 1 0]

−−−−−−−−−

−−−−−−−−−−−−−−

−−−−−−−−−−−−−

=

013131131

101111111

310131131

111011113

313101131

111110311

111113011

313131101

W∑=

−−=m

p

pj

piij aaw

1

)12)(12( Weigth Matrix

HopfieldNetwork - Example

== ]110001101[)0(yx

New pattern presentedto the trained network:

1 11

1 1

Fired P.E. P.E. Sum P.E. Output New output vector

1 2 1 1 0 1 1 0 0 0 1 1

2 -3 0 1 0 1 1 0 0 0 1 1

3 -4 0 1 0 0 1 0 0 0 1 1

4 1 1 1 0 0 1 0 0 0 1 1

5 -4 0 1 0 0 1 0 0 0 1 1

6 -4 0 1 0 0 1 0 0 0 1 1

7 4 1 1 0 0 1 0 0 1 1 1

Sequential operation of the network:


−−−−−−−−−

−−−−−−−−−−−−−−

−−−−−−−−−−−−−

−−−−

=

==

013131131

101111111

310131131

111011113

313101131

111110311

111113011

313131101

111311110

]110001101[

W

yx 7 4 1 1 0 0 1 0 0 1 1 1

8 0 1 1 0 0 1 0 0 1 1 1

9 4 1 1 0 0 1 0 0 1 1 1

1 2 1 1 0 0 1 0 0 1 1 1

2 -8 0 1 0 0 1 0 0 1 1 1

=<>

=

=−

0,

00

01

)(

:0,

j

j

j

j

j

sifvalueprevioushold

sif

sif

sf

LfunctionactivationBinaryRemember

Convergence to “L” Pattern

HopfieldNetwork – javademosDemonstrations available in the www, e.g.:


HopfieldN. – final considerations

Stability proof – Cohen and Grossberg, 1983.W symmetric with zero diagonal “Energy funcition” always decreases.

E - Energia da rede

Padrão espúrioValor Inicial

E – “Energy” of the network

SpuriousPatternInitial State

∑ ∑∑ −−=i j

jjj

jiij LyyywE2

1


Padrões armazenados

Padrão espúrioValor Inicial

Padrão recuperado

Estados

SpuriousPatternInitial State

Stored Patterns

States

Recovered Pattern

Typical Energy and Patterns illustration for Hopfield Newtworks

Hopfield Network Limitations:• Not necessarily the closest pattern isreturned.• Differences between patterns. Not all patterns

have equal emphasis (size of attraction basins).• Spurious patterns, i.e., patterns evoked

that are not part of the stored set.• Maximum number of stored patterns is limited.

networkbits,patterns,log/5,0 nmnnm≤

Radial Basis Functions

a

- Moody & Darken, 1989,...- Function Approximators- Inspiration: sensoricc overlapped reception fields in the cortex- Localized activity of the processing elements

2i

iix

i ea σµ−

−

=

bwpi

iiea −−=

Gaussian(Average, Variance)

ml: w – weigthb – bias


1wp− 2wp− nwp−

p

-3 -2 -1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4Weighted Sum of Radial Basis Transfer Functions

Input pO

utpu

t a

Radial Basis Functions...


-3 -2 -1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4Weighted Sum of Radial Basis Transfer Functions

Input p

Out

put

a

MatLab Implementation[net,tr] = newrb(P,T,GOAL,SPREAD,MN)

P - RxQ matrix, Q input vectors (“pattern”),T - SxQ matrix, Q objective vectors (“target”),GOAL - desired mean square error, default = 0.0,SPREAD - radial basis function spread, default = 1.0,MN - Maximum number of neurons, default is Q.

-3 -2 -1 0 1 2 30

0.2

0.4

0.6

0.8

1

X: -0.83Y: 0.5021

w-p

a

)*),(( bpwdistradbasa =


spread → to low- Goodfitting (at the training points)!- Bad interpolation!

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Learning: add neuronsincrementally- Where? To obtain the largest quadratic errror reductions at each step 0min 2 =→ ∑e


spread → to high- Goodfitting at low frequencies- Goodinterpolation in some ranges!

-20 0 20 40 60 80 100 120-0.4

-0.3

-0.2

-20 0 20 40 60 80 100 120-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6


spread → OK- Goodfitting!- Goodinterpolation!

-Badextrapolation0

0.1

0.2

0.3

0.4

0.5

0.6

Tar

get

26 Amostras da Função

Função

Aprox. RBF - 20 neurônios

minmax1 xxSPREADxx ii −<<−+

Heuristics:

26 function samplesFunctionAprox. with 20 RBF neuros


-Badextrapolation(is a very difficult task)

-20 0 20 40 60 80 100 120-0.4

-0.3

-0.2

-0.1

Pattern

Conclusions

- Faster training faster, but uses more neurons than MLP.- Incremental Training, new points can be learned without losing prior knowledge.-You can use a priori knowledge to locate neurons (which is not possible in a MLP).

- Fixed spread – Incremental traning→ suboptimal solution!!

ComparisonRBF x MLP

0.1

0.2

0.3

0.4

0.5

0.6MLP comparison - MSE over ti=0:0.1:100,yi=exp(-.03*ti).*sin(0.003*ti.*ti))

samples (21)function MSE

5 MLPs 0.0672

10 MLPs 0.0268

15 MLPs 0.024420 MLPs 0.0416

0.1

0.2

0.3

0.4

0.5

0.6RBF comparison - MSE over ti=0:0.1:100,yi=exp(-.03*ti).*sin(0.003*ti.*ti))

samples (21)function MSE

5 RBFs 0.0602

10 RBFs 0.0325

15 RBFs 0.020820 RBFs 0.00901


RBF – more neurons better fitting → best solution newrbe (exact fitting!)MLP – too much neurons → worse fitting (bad interpolation)

0 20 40 60 80 100-0.4

-0.3

-0.2

-0.1

0

0.1

0 20 40 60 80 100-0.4

-0.3

-0.2

-0.1

0

0.1

UnsupervisedLearning

Competitive Layer

Find vector codes that describe the data distribution

Used in Data CompressionNo known desired Code Vectors


Example:Code symbols thatwill be transmitted over a communication channel.

For the comprehension thevariability of the signal is considered as “noise”, and so, discarded.

They should reflect the Inner statistical distributionof the process

CompetitiveLayer

0.4

0.6

0.8

1


-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

o code vectors

+ training vectors

x test vectors

Borders

0.2

0.4

0.6

0.8

1


-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

o code vectors+ training vectorsx test vectors

(Voronoi Diagram)

Ex. ClassificationBorderso code vectors+ training vectorsx test vectors

0.4

0.6

0.8

1

0.4

0.6

0.8

1


-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.2

0.4

0.6

0.8

1

BiasBias adjustment to to help “weak” neurons Bias = 0

(only Euclidean distance)

0.2

0.4

0.6

0.8

1

Laboratório de Automação e Robótica - A. Bauchspiess– Soft Computing - Neural Networks and Fuzzy Logic

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

100

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

LearningVector QuantizationNo known desired Code Vectors

Data points belong to Classes Code Vectors should reflect theInner statistical distribution of the process

Input Vectors

Class AClass B


LVQ1, LVQ2.1, LVQ3, OLVQ

“Enhanced Algorithms”

-Dead neurons-Neighborhood definition

Input Vectors

Current input

Code vectors

Class(X)=Class(Wc)

Class C

Self OrganizingMaps

Triangular distribution

Kohonen, 1982 – Unsupervised learning

One active layer with neighborhood constrains

Code Vectors should reflect theInner statistical distribution of the process


Triangular distribution

Weird unsuccessful training

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-0.6 -0.4 -0.2 0 0.2 0.4 0.6

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

W(i,1)

W(i,

2)

Weight Vectors

SOMMap

ANN General Characteristics

� Positive� Learning

� Parallelism

� Distributed knowledge

� Fault Tolerant

� AssociativeMemory

� Negative� Knowledge acquisition only by

learning(“E.g., Wich topology is best suit?)

� Introspection is not possible(“What is the contribution of this neuron?)

� Thelogical inferenceis hardto

Laboratório de Automação e Robótica - A. Bauchspiess– Soft Computing - Neural Networks and Fuzzy Logic

� AssociativeMemory

� Robust against Noise

� No exhaustive modelling

� Thelogical inferenceis hardto obtain(“Why this output for this situation?”)

� Learning is slow

� Very sensitive to initial conditions

103

To obtain successful ANNa good process knowledge isrecommended in order to designexperiments that produce useful data sets!

There is no free lunch!