1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya...

64
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath Pupacdi SCCS451 Artificial Intelligence Week 12

Transcript of 1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya...

1

Chapter 6: Artificial Neural NetworksPart 2 of 3 (Sections 6.4 – 6.6)

Asst. Prof. Dr. Sukanya PongsuparbDr. Srisupa Palakvangsa Na AyudhyaDr. Benjarath Pupacdi

SCCS451 Artificial IntelligenceWeek 12

2

Agenda

Multi-layer Neural NetworkHopfield Network

3

Multilayer Neural NetworksA multilayer perceptron is a feedforward neural network with ≥ 1 hidden layers.

Threshold

Inputs

x1

x2

Output

Y

HardLimiter

w2

w1

LinearCombiner

Inputlayer

Firsthiddenlayer

Secondhiddenlayer

Outputlayer

O u

t p

u t

S

i g n

a l

s

I n

p u

t S

i g

n a

l s

Single-layer VS Multi-layer Neural Networks

4

Roles of Layers

Input Layer Accepts input signals from outside worldDistributes the signals to neurons in hidden layerUsually does not do any computation

Output Layer (computational neurons)Accepts output signals from the previous hidden layerOutputs to the worldKnows the desired outputs

Hidden Layer (computational neurons)Determines its own desired outputs

5

Hidden (Middle) Layers

Neurons in hidden layers unobservable through input and output of the networks.Desired output unknown (hidden) from the outside and determined by the layer itself1 hidden layer for continuous functions2 hidden layers for discontinuous functionsPractical applications mostly use 3 layersMore layers are possible but each additional layer exponentially increases computing load

6

How do multilayer neural networks learn?

More than a hundred different learning algorithms are available for multilayer ANNsThe most popular method is back-propagation.

7

Back-propagation AlgorithmIn a back-propagation neural network, the learning algorithm has 2 phases.1. Forward propagation of inputs2. Backward propagation of errors

The algorithm loops over the 2 phases until the errors obtained are lower than a certain threshold.Learning is done in a similar manner as in a perceptron

A set of training inputs is presented to the network.The network computes the outputs.The weights are adjusted to reduce errors.

The activation function used is a sigmoid function.

X

sigmoid

eY

1

1

8

Common Activation FunctionsS t e p f u n c t io n S ig n f u n c t io n

+ 1

-1

0

+ 1

-1

0X

Y

X

Y

+ 1

-1

0 X

Y

S ig m o id f u n c t io n

+ 1

-1

0 X

Y

L in e a r f u n c t io n

0 if ,0

0 if ,1

X

XY step

0 if ,1

0 if ,1

X

XY sign

Xsigmoid

eY

1

1XY linear

Hard limit functions often used for decision-making neurons for classification and pattern recognition

Popular in back-propagation networks

Often used for linear approximation

Output is a real number in the [0, 1] range.

9

3-layer Back-propagation Neural Network

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

10

How a neuron determines its outputVery similar to the perceptron

X

sigmoid

eY

1

1

n

iiiwxX

1

1. Compute the net weighted input

2. Pass the result to the activation function

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

j

2

5

1

8

0.10.2

0.50.3

X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2 = 3.9Y = 1 / (1 + e-3.9) = 0.98

Let θ = 0.2

0.98

0.98

0.98

0.98

0.98

0.98

Input Signals

11

How the errors propagate backwardThe errors are computes in a similar manner to the errors in the perceptron.Error = The output we want – The output we get

)()()( , pypype kkdk Error at an output neuron k at iteration p

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

k

2

5

1

8

0.10.2

0.50.3

ek(p) = 1 – 0.98 = 0.02

Suppose the expected output is 1.

0.98

Iteration p

Error Signals

12

Back-Propagation Training AlgorithmStep 1: Initialization

Randomly define weights and threshold θ such that the numbers are within a small range

where Fi is the total number of inputs of neuron i. The weight initialization is done on a neuron-by-neuron basis.

ii FF

4.2 ,

4.2

0 5 10 15 20 25 30 350

1

2

Random weight range [2.4, 0]

Fi

2.4/

Fi

13

Back-Propagation Training AlgorithmStep 2: Activation

Propagate the input signals forward from the input layer to the output layer.

]1,0[,1

1

YY

sigmoid

X

sigmoid

e

n

iiiwxX

1

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

j

2

5

1

8

0.10.2

0.50.3

X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2 = 3.9Y = 1 / (1 + e-3.9) = 0.98

Let θ = 0.20.98

0.98

0.98

0.98

0.98

Input Signals

14

Back-Propagation Training AlgorithmStep 3: Weight Training

There are 2 types of weight training1. For the output layer neurons2. For the hidden layer neurons

*** It is important to understand that first the input signals propagate forward, and then the errors propagate backward to help train the weights. ***

In each iteration (p + 1), the weights are updated based on the weights from the previous iteration p.The signals keep flowing forward and backward until the errors are below some preset threshold value.

15

3.1 Weight Training (Output layer neurons)

These formulas are used to perform weight corrections.

)()()1( ,,, pwpwpw kjkjkj

)()()(, ppypw kjkj

)()(1)()( pepypyp kkkk

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

k

1

2

m

w1,kw2,k

wm,k

yk(p)

Iteration p

jwj,k

ek(p) = yd,k(p) - yk(p)

δ = error gradient

16

)()()1( ,,, pwpwpw kjkjkj

)()()(, ppypw kjkj

)()(1)()( pepypyp kkkk

We want to compute this We know this

predefined We know how to compute this

We know how to compute these

k

1

2

m

w1,kw2,k

wm,k

yk(p)

Iteration p

jwj,k

ek(p) = yd,k(p) - yk(p)

We do the above for each of the weights of the outer layer neurons.

17

3.2 Weight Training (Hidden layer neurons)

These formulas are used to perform weight corrections.

)()()1( ,,, pwpwpw jijiji

)()()(, ppxpw jiji

l

kjkkjjj pwppypyp

1

)()()(1)()(

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

j

1

2

n

w1,jw2,j

wn,j

Iteration p

iwi,j

1

2

l

k

18

l

kjkkjjj pwppypyp

1

)()()(1)()(

We want to compute this We know this

predefined input

We do the above for each of the weights of the hidden layer neurons.

)()()1( ,,, pwpwpw jijiji

)()()(, ppxpw jiji

Propagates from the outer layer

We know this

j

1

2

n

w1,jw2,j

wn,j

Iteration p

iwi,j

1

2

l

k

We know how to compute these

19

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

P = 1

20

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

Weights trainedWeights trainedP = 1

21

After the weights are trained in p = 1, we go back to Step 2 (Activation) and compute the outputs for the new weights.

If the errors obtained via the use of the updated weights are still above the error threshold, we start weight training for p = 2.

Otherwise, we stop.

22

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

P = 2

23

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

Weights trainedWeights trainedP = 2

24

Example: 3-layer ANN for XORx2 or input2

x1 or input1

(1, 1)

(1, 0)

(0, 1)

(0, 0)

XOR is not a linearly separable function.

A single-layer ANN or the perceptron cannot deal with problems that are not linearly separable. We cope with these problem using multi-layer neural networks.

25

Example: 3-layer ANN for XOR

y5 5

x1 3 1

x2

Input layer

Output layer

Hidden layer

4 2

3 w13

w24

w23

w14

w35

w45

4

5

1

1

1

(Non-computing)

n

iiiwxX

1

Let α = 0.1

26

Example: 3-layer ANN for XOR

26

Training set: x1 = x2 = 1 and yd,5 = 0

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

Calculate y3 = sigmoid(0.5+0.4-0.8) = 0.5250

Calculate y4 = sigmoid(0.9+1.0+0.1) = 0.8808

0.5250

0.8808

-0.63

0.9689

y5 = sigmoid(-0.63+0.9689-0.3) = 0.5097

0.5097

e = 0 – 0.5097 = – 0.5097

Let α = 0.1

27

Example: 3-layer ANN for XOR (2)

27

Back-propagation of error (p = 1, output layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ = y5x (1-y5) x e = 0.5097 x (1-0.5097)

x (-0.5097) = -0.1274

Δwj,k (p) = α x yj(p) x δk(p)

Δw3,5 (1) = 0.1 x 0.5250 x (-0.1274) = -0.0067Let α = 0.1

wj,k (p+1) = wj,k (p) + Δwj,k (p)

w3,5 (2) = -1.2 – 0.0067 = -1.2067

-1.2067

28

Example: 3-layer ANN for XOR (3)

28

Back-propagation of error (p = 1, output layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

w4,5 (2) = 1.1 – 0.0112 = 1.0888

-1.2067

Δwj,k (p) = α x yj(p) x δk(p)

Δw4,5 (1) = 0.1 x 0.8808 x (-0.1274) = -0.0112

wj,k (p+1) = wj,k (p) + Δwj,k (p)

1.0888

29

Example: 3-layer ANN for XOR (4)

29

Back-propagation of error (p = 1, output layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

θ5 (2) = 0.3 + 0.0127= 0.3127

-1.2067

Δθk (p) = α x y(p) x δk(p)

Δθ5 (1) = 0.1 x -1 x (-0.1274) = 0.0127

θ5 (p+1) = θ5 (p) + Δ θ5 (p)

1.0888

0.3127

30

Example: 3-layer ANN for XOR (5)

30

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

Δwi,j (p) = α x xi(p) x δj(p)

Δw1,3 (1) = 0.1 x 1 x 0.0381 = 0.00381

wi,j (p+1) = wi,j (p) + Δwi,j (p)

w1,3 (2) = 0.5 + 0.00381 = 0.5038

δj(p) = yi(p) x (1-yi (p)) x ∑ [αk(p) wj,k (p)], all k’s

δ3(p) = 0.525 x (1- 0.525) x (-0.1274 x -1.2) = 0.0381

0.5038

31

Example: 3-layer ANN for XOR (6)

31

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

Δwi,j (p) = α x xi(p) x δj(p)

Δw1,4 (1) = 0.1 x 1 x -0.0147 = -0.0015

wi,j (p+1) = wi,j (p) + Δwi,j (p)

w1,4 (2) = 0.9 -0.0015 = 0.8985

δj(p) = yi(p) x (1-yi (p)) x ∑ [αk(p) wj,k (p)], all k’s

δ4(p) = 0.8808 x (1- 0.8808) x (-0.1274 x 1.1) = -0.0147

0.5038

0.8985

32

Example: 3-layer ANN for XOR (7)

32

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

Δwi,j (p) = α x xi(p) x δj(p)

Δw2,3 (1) = 0.1 x 1 x 0.0381 = 0.0038

wi,j (p+1) = wi,j (p) + Δwi,j (p)

w2,3 (2) = 0.4 + 0.0038 = 0.4038

δ3(p) = 0.0381δ4(p) = -0.0147

0.5038

0.4038 0.8985

33

Example: 3-layer ANN for XOR (8)

33

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

Δwi,j (p) = α x xi(p) x δj(p)

Δw2,4 (1) = 0.1 x 1 x -0.0147 = -0.0015

wi,j (p+1) = wi,j (p) + Δwi,j (p)

w2,4 (2) = 1 – 0.0015 = 0.9985

δ3(p) = 0.0381δ4(p) = -0.0147

0.5038

0.4038 0.8985

0.9985

34

Example: 3-layer ANN for XOR (9)

34

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

δ3(p) = 0.0381δ4(p) = -0.0147

0.5038

0.4038 0.8985

0.9985

θ3 (2) = 0.8 - 0.0038 = 0.7962

Δθk (p) = α x y(p) x δk(p)

Δθ3 (1) = 0.1 x -1 x 0.0381 = -0.0038

θ3 (p+1) = θ3 (p) + Δ θ3 (p)

0.7962

35

Example: 3-layer ANN for XOR (10)

35

Back-propagation of error (p = 1, input layer)

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.51

1

0

1

1

0.5

0.9

0.4

1.0

0.5250

0.8808

-0.63

0.9689y5 = 0.5097

e = – 0.5097

δ =-0.1274

Let α = 0.1

-1.2067

1.0888

0.3127

δ3(p) = 0.0381δ4(p) = -0.0147

0.5038

0.4038 0.8985

0.9985

θ4 (2) = -0.1 + 0.0015 = -0.0985

Δθk (p) = α x y(p) x δk(p)

Δθ4 (1) = 0.1 x -1 x (-0.0147) = 0.0015

θ4 (p+1) = θ4 (p) + Δ θ4 (p)

0.7962

-0.0985

36

Example: 3-layer ANN for XOR (9)

36

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.5

α = 0.1

Now the 1st iteration (p = 1) is finished. Weight training process is repeated until the sum of squared errors is less than 0.001 (threshold).

-1.2067

1.0888

0.3127

0.5038

0.4038 0.8985

0.9985

0.7962

-0.0985

37

Learning Curve for XOR

0 50 100 150 200

101

Epoch

Su

m-S

qu

are

d E

rro

r

Sum-Squared Network Error for 224 Epochs

100

10-1

10-2

10-3

10-4

The curve shows ANN learning speed.

224 epochs or 896 iterations were required.

38

Final Results

38

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

-0.1

-1.2

1.0

1.1

0.8

0.30.4

0.9

0.5 -10.4

9.8

4.6

4.7

4.8 6.4

6.4

7.3

2.8

Training again with different initial values may result differently. It works so long as the sum of squared errors is below the preset error threshold.

39

Final Results

Inputs

x1 x2

1010

1100

011

Desiredoutput

yd

0

0.0155

Actualoutput

y5Y

Error

e

Sum ofsquarederrors

e 0.9849 0.9849 0.0175

0.0155 0.0151 0.0151 0.0175

0.0010

Different result possible for different initial.But the result always satisfies the criterion.

40

McCulloch-Pitts Model: XOR Op.

y55

x1 31

x2 42

+1.0

1

1

1+1.0

+1.0

+1.0

+1.5

+1.0

+0.5

+0.5 2.0

Activation function: sign function

41

Decision Boundary

(a) Decision boundary constructed by hidden neuron 3;(b) Decision boundary constructed by hidden neuron 4; (c) Decision boundaries constructed by the complete three-layer network

x1

x2

1

(a)

1

x2

1

1

(b)

00

x1 + x2 – 1.5 = 0 x1 + x2 – 0.5 = 0

x1 x1

x2

1

1

(c)

0

42

Problems of Back-Propagation

Not similar to the process of a biological neuronHeavy computing load

43

Accelerated Learning in Multi-layer NN (1)

Represent sigmoid function by hyperbolic tangent:

where a and b are constants.Suitable values: a = 1.716 and b = 0.667

ae

aY

bXhtan

1

2

44

Accelerated Learning in Multi-layer NN (2)

Include a momentum term in the delta rule

where is a positive number (0 1) called the

momentum constant. Typically, the momentum constant is set to 0.95.

This equation is called the generalized delta rule.

)()()1()( ppypwpw kjjkjk

4545

Learning with Momentum

0 20 40 60 80 100 120 10-4

10-2

100

102

Epoch

Su

m-S

qu

are

d E

rro

r Training for 126 Epochs

0 100 140 -1

-0.5

0

0.5

1

1.5

Epoch

Lea

rnin

g R

ate

10-3

101

10-1

20 40 60 80 120

Reduced from 224 to 126 epochs

46

Accelerated Learning in Multi-layer NN (3)

Adaptive learning rate: Ideasmall smooth learning curvelarge fast learning, possibly instable

Heuristic rule:increase learning rate when the change of the sum of squared errors has the same algebraic sign for several consequent epochs.decrease learning rate when the sign alternates for several consequent epochs

4747

Effect of Adaptive Learning Rate

0 10 20 30 40 50 60 70 80 90 100Epoch

Training for 103 Epochs

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

Epoch

Lea

rnin

g R

ate

10-4

10-2

100

102S

um

-Sq

ua

red

Err

or

10-3

101

10-1

4848

Momentum + Adaptive Learning Rate

0 10 20 30 40 50 60 70 80Epoch

Training for 85 Epochs

0 10 20 30 40 50 60 70 80 900

0.5

1

2.5

Epoch

Lea

rnin

g R

ate

10-4

10-2

100

102S

um

-Sq

ua

red

Err

or

10-3

101

10-1

1.5

2

49

The Hopfield Network

Neural networks were designed on an analogy with the brain, which has associative memory.We can recognize a familiar face in an unfamiliar environment. Our brain can recognize certain patterns even though some information about the patterns differ from what we have remembered.Multilayer ANNs are not intrinsically intelligent.Recurrent Neural Networks (RNNs) are used to emulate human’s associative memory.Hopfield network is a RNN.

50

The Hopfield Network: Goal

To recognize a pattern even if some parts are not the same as what it was trained to remember.The Hopfield network is a single-layer network.It is recurrent. The network outputs are calculated and then fed back to adjust the inputs. The process continues until the outputs become constant.Let’s see how it works.

51

Single-layer n-neuron Hopfield Network

xi

x1

x2

xn

yi

y1

y2

yn

1

2

i

nI n p

u t

S i

g n

a l s

O u

t p

u t

S i

g n

a l s

52

Activation Function

If the neuron’s weighted input is greater than zero, the output is +1.If the neuron’s weighted input is less than zero, the output is -1.If the neuron’s weighted input is zero, the output remains in its previous state.

XY

X

X

Ysign

if,

if,1

0if,1

53

Hopfield Network Current State

The current state of the network is determined by the current outputs, i.e. the state vector.

xi

x1

x2

xn

yi

y1

y2

yn

1

2

i

n

ny

y

y

2

1

Y

54

What can it recognize?

n = the number of inputs = nEach input can be +1 or -1There are 2n possible sets of input/output, i.e. patterns.M = total number of patterns that the network was trained with, i.e. the total number of patterns that we want the network to be able to recognize

55

Example: n = 3, 23 = 8 possible states

y1

y2

y3

(1, 1,1)( 1, 1,1)

( 1, 1, 1) (1, 1, 1)

(1, 1,1)( 1,1,1)

(1, 1, 1)( 1,1, 1)

0

56

Weights Weights between neurons are usually represented in matrix form

For example, let’s train the 3D network to recognize the following 2 patterns (M = 2, n = 3)

Once the weights are calculated, they remained fixed.

IYYW MM

m

Tmm

1

1

1

1

1Yúúú

û

ù

êêê

ë

é

---

=

1

1

1

2Y

57

Weights (2)

M = 2

Thus we can determine the weight matrix as follows

1

1

1

1Yúúú

û

ù

êêê

ë

é

---

=

1

1

1

2Y 1111 =TY 1112 ---=TY

IYYW MM

m

Tmm

1

úúú

û

ù

êêê

ë

é=

100

010

001

I

úúú

û

ù

êêê

ë

é----

úúú

û

ù

êêê

ë

é

---

+úúú

û

ù

êêê

ë

é=

100

010

001

2111

1

1

1

111

1

1

1W

úúú

û

ù

êêê

ë

é=

022

202

220

58

How is the Hopfield network tested?Given an input vector X, we calculate the output in a similar manner that we have seen before.

Ym = sign(W Xm – θ ), m = 1, 2, …, M

1

1

1

0

0

0

1

1

1

022

202

220

1 signY

1

1

1

0

0

0

1

1

1

022

202

220

2 signY

Θ is the threshold matrixIn this case all thresholds are set to zero.

59

Stable States

As we see, Y1 = X1 and Y2 = X2. Thus both states are said to be stable (also called fundamental states).

1

1

1

0

0

0

1

1

1

022

202

220

1 signY

1

1

1

0

0

0

1

1

1

022

202

220

2 signY

60

Unstable StatesWith 3 neurons in the network, there are 8 possible states. The remaining 6 states are unstable.

Possible State Iteration

Inputs Outputs

Fundamental Memoryx1 x2 x3 y1 y2 y3

1 1 1 0 1 1 1 1 1 1 1 1 1

-1 1 1 0 -1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 -1 1 0 1 -1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 -1 0 1 1 -1 1 1 1

1 1 1 1 1 1 1 1 1 1

-1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 1 0 -1 -1 1 -1 -1 -1 -1 -1 -1

1 -1 -1 -1 -1 -1 -1

-1 1 -1 0 -1 1 -1 -1 -1 -1 -1 -1 -1

1 -1 -1 -1 -1 -1 -1

1 -1 -1 0 -1 1 -1 -1 -1 -1 -1 -1 -1

1 -1 -1 -1 -1 -1 -1

61

Error Correction Network

Each of the unstable states represents a single error, compared to the fundamental memory.The Hopfield network can act as an error correction network.

62

The Hopfield Network

The Hopfield network can store a set of fundamental memories.The Hopfield network can recall those fundamental memories when presented with inputs that maybe exactly those memories or slightly different.However, it may not always recall correctly.Let’s see an example.

63

Ex: When Hopfield Network cannot recall

X1 = (+1, +1, +1, +1, +1)X2 = (+1, -1, +1, -1, +1)X3 = (-1, +1, -1, +1, -1)

Let the probe vector beX = (+1, +1, -1, +1, +1)It is very similar to X1, but the network recalls it as X3.This is a problem with the Hopfield Network

64

Storage capacity of the Hopfield Network

Storage capacity is the largest number of fundamental memories that can be stored and retrieved correctly.The maximum number of fundamental memories Mmax that can be stored in the n-neuron recurrent network is limited by

nMmax 0.15