Chapter 7 Introduction to Back Propagation Neural Networks BPNN KH Wong Neural Networks NN ver. 4h1.

Chapter 7Introduction to Back Propagation

Neural Networks BPNNKH Wong

Neural Networks NN ver. 4h 1

Introduction

• Very Popular• A high performance Classifier (multi-class)• Successful in handwritten optical character

OCR recognition, speech recognition, image noise removal etc.

• Easy to implementation– Slow in learning– Fast in classification


http://www.ninds.nih.gov/disorders/brain_basics/ninds_neuron.htmhttp://yann.lecun.com/exdb/mnist/

Overview

• Back Propagation Neural Networks (BPNN)– Part 1: Feed forward processing (classification or

Recognition)– Part 2: Feed backward processing (Training the

network), also include forward processing• Appendix:• A MATLAB example is explained• %source :

http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-pattern-recognition-tutorial


Theory of Back Propagation Neural Net (BPNN)

• Use many samples to train the weights (W) & Biases (b), so it can be used to classify an unknown input into different classes

• Will explain– How to use it after training: forward pass

(classify / or called recognize input )– How to train it: how to train the weights and

biases (using forward and backward passes)


Motivation

• Biological findings inspire the development of Neural Net– Input weights Logic function output

• Biological relation– Input– Dendrites – Output


X=inputs

W=weights

Neuron(Logic function)

Output

Optical character recognitionOCR example

• Training: Train the system first by presenting a lot of samples to the network

• Recognition: When an image is input to the system, it will tell what character it is


Neural Net Output3=‘1’, other outputs=‘0’

Neural Net

Training up the network:weights (W) and bias (b)

Part 1 (classification in action /or called Recognition process)

Forward pass of Back Propagation Neural Net (BPNN)

Assume weights (W) and bias (b) are found by training already (to be discussed in part2)


Recognition: assume weight (W) bias (b) are found earlier

• Neural Networks NN ver. 4h 8

OutputOutput0=0Output1=0Output2=0Output3=1

:Output1=0

Each pixel is X(u,v)

Neurons in BPNN

• In side each neuron :

Neural Networks NN ver. 4h9

1X l 2X l 3X l

2W l 3W lNlW

NlX

Kk

k

lll kxk

ll

u

llllllllll

Kk

k

llllll

e

fy

thereforee

uf

f

buxy

kxkuf

1b)()(

1

1

1)u(

,1

1)(

i.e. function, (sigmod) logistica is Typically

b,W,u,X,Y that such

,b)()( with)u(Y

Output neuronsInputs

lkxkxkx

b

lkωkωkω

llll

l

llll

layer for ]),...3(),2(),1([ inputs ofset a X

neuron onefor one

,layer ]for ),...3(),2(),1([ weightsofset a W

Multi-layer structure of a BP neural network

•


Input layer

,W,u,X,Y neuron eachfor that such

biases ofset b weights,ofset W inputs, ofset Xoutputs,Yllllllll uxy

llayer

:hidden

1layer

:hidden

l

lX 1X l

()

b

W

has neuron Each

neurons multiple

haslayer A

f

biases

weightsl

l

layer

Output

Neurons in the Multi-layer structure• In between any neighboring 2

layers, a set of neurons can be found


Kk

k

llllll kxkuufy1

b)()( with)(

neuron eachfor W ,WX,

layer at inputs

weightsx

lxl

l

1layer at input 1 lx l

)1(lx

1 ll xy

)1(l)2(l)2(lx

lu luf

Each Neuron

)(Kl)(Kx l

BPNN Forward pass• Forward pass is to find the output when an input is given. For

example:• Assume we have used N=60,000 images to train a network to

recognize c=10 numerals.• When an unknown image is given to the input, the output

neuron corresponds to the correct answer will give the highest output level.


10 output neurons for 0,1,2,..,9

Inputimage

000100

Architecture (exercise: write formulas for A1(i=4) and A2(k=3)How many inputs, hidden neurons, outputs, weights in each layer?

Neural Networks NN ver. 4h •

Input:P=9x1Indexed by j

A1: Hidden layer1 =5 neurons, indexed by iWl=1=9x5bl=1=5x1

l=1(j=1,i=1)

l=1(j=2,i=1)

l=1(j=9,i=1)

P(j=1)

P(j=2)

P(j=3)

::

P(j=9)

)1(ib...)1,2()1,1(112

11

1

1

1A

PijPij ll

e

A1(i=1)P(j=1)

P(j=2)

P(j=9)

Neuron i=1Bias=b1(i=1)

2(i=1,k=1)

2(i=2,k=1)

2(i=5,k=1)

))1(b...)1()1,2()1()1,1((222

21

2

1

1A

kkAkikAki ll

e

A2(k=2)

A1

A2

A5

Neuron k=1Bias=b2(k=1)

l=1(j=1,i=1)

l=1(j=2,i=1)

l=1(j=9,i=5)

l=1(j=3,i=4)

A1(i=5)

A1(i=1)

A2:layer2, 3 Output neuronsindexed by kWl=2=5x3bl=2=3x1

l=2(i=5,k=3)

l=2(i=1,k=1)

l=2(i=2,k=2)

l=2(i=2,k=1)A1(i=2)

13

Layer l=1 Layer l=2S2 generated

S1 generated

Answer (exercise: write values for A1(i=4) and A2(k=3)

• P=[ 0.7656 0.7344 0.9609 0.9961 0.9141 0.9063 0.0977 0.0938 0.0859]

• Wl=1=[ 0.2112 0.1540 -0.0687 -0.0289 0.0720 -0.1666 0.2938 -0.0169 -0.1127]

• bl=1= 0.1441• %Find A1(i=4)• A1_i_is_4=1/(1+exp[-(l=1*P+bl=1))]• =0.49• How many inputs, hidden neurons, outputs, weights and biases in

each layer?• Answer: Inputs=9, hidden neurons=5, outputs=3, weights in hidden

layer (layer1) =9x5, neurons in output layer (layer2)= 5x3, 5 biases in hidden layer (layer1), 3 biases in output layer (layer2)


)4(ib...)4,2()4,1(112

11

1

1

1)4(A

PijPij ll

ei

Numerical Example : Architecture of the example


Input Layer9x1 pixels

output Layer 3x1

neuron) eachfor bias (1 1x neurons 5b

neuron eachfor inputs 9x neurons 5W

layer

hidden

l

l

x

lx

neuron eachfor (),b,W fbiasesweights ll •

Part 2: feed backward processing (Training the network)

Backward pass of Back Propagation Neural Net (BPNN) (Training)

Ref:http://en.wikipedia.org/wiki/Backpropagation


Feed backward stage•


1ll

1lxlx

Part1:FeedForward (studied before)

Part2: Feedbackward

llayer

)(1 bxfx l

We will explain why and prove equations in the following slides

For training we need to find , why?

E

The criteria to train a network • Based on the overall error function, there are ‘N’ samples and

‘c’ classes to be learned•


network forward feed theofouput at the

sample training theof classoutput The

(teacher) sample training theof class truegiven The

;2

yt2

1

2

1

:sample training theofError

2

1error Overall

2

2

2

2

1

2

1 1

thnk

thnk

nnc

k

nk

nk

n

th

N

n

c

k

nk

nk

N

ny

nt

norms

ytE

n

ytE

Theory

•


19

Neuron j

(1)--------- rule chainby , EE

,E

find want toWe

)(y

, definitionBy

isOutput j, neurona For

output actualy,or teachertarget

outputat error squared overall2

1

1j

1

2

ij

j

j

j

jij

ij

j

nk

kkjkj

j

nk

kkjkj

j

w

u

u

y

yw

sow

bwxfuf

bwxu

y

t

ytE

wkj

K=1,2,..,nn inputs to neuron jOutput of neuron j is yj

Learning by gradient decent

•


b

Ebb

w

Ew

Ewwww

Tw

Ew

E

www

w

oldnew

argument, same For the

need why wesThat'

slide),next in explained be willmethoddecent gradient (The

0.1)factor learning ( ve smalla useslowly it do o

decent)gradient by (learning make

cycle learningeverfy for decrease want to weIf

using

,calculated is new a (epoch), cycle learning each In

oldoldnew

oldnew

We need to find , why?

•


EE

EEEE

EEwE

EEEE

()(

E

)(E

EE

EE

EEE

oldnew

oldoldnew

oldnew

oldnew

oldnewoldnew

decrease will- set :Conclusion

ve always is since ),()(

)(-)()(

becomes *) into **put

rate learning set the to termve smalla is is where

(**)- set we

*----- )()(

, Here

..)()(

definitionby seriesTaylor

Using Taylor series http://www.fepress.org/files/math_primer_fe_taylor.pdfhttp://en.wikipedia.org/wiki/Taylor's_theorem

E

Theory

•


term3 term2,, term1

rule) chain(by , EE

(1) from so E, affects how see want toWe

through neuron toconnected is input An

ij

j

j

j

jij

ij

ij

w

u

u

y

yw

w

wji

xi

yj

wijneuron j

ui

j

nk

kkjkj

j

nk

kkjkj

bwxfuf

bwxu

1j

1

)(y

Case 1: if neuronj is at the output layer

•


ysensitivit

ufufty

termtermu

yf

y

xufuftyw

* term * termtermw

tyy

yt

yterm

bxw

bxw

w

uterm

ufufufu

uf

u

yterm

, term , termterm

w

u

u

y

yw

j

jjjjj

j

j

jj

ijjjjij

ij

jjj

jj

j

jiij

jiij

ij

j

jjjj

j

j

j

ij

j

j

j

jij

)2())(1)((

2*1)(E

:note

)(1)(E

321E

since

outputat measured,21E

:1

constant since, :3

appendix See,)(1)()(')(

:2

321

, EE

2

xiyj

wij

uj

Outputyj

TrueTarget Class=tj

j

Neuron j as an output neuron

We want to see ijwE

Case2 : if neuronj is at the hidden layerOutput yj affects all neurons connected to it in next layer

• Neural Networks NN ver. 4h 24

ijj

Ll

ljll

ij

Ll

ljll

ij

Ll

ljll

j

ljjlj

l

ll

l

ll

Ll

l j

l

lj

ij

j

j

j

jij

xufufww

termterm

termtermwtermtermtermw

wy

term

uywy

u

u

y

yu

y

u

uyterm

w

u

u

y

yw

)(1)(E

hence

slide previous thein that as in same are 3,2

32321E

E:1

layeroutput in all affects because ,:term1B

eq.(2)) (see EE

:term1A

term1Bterm1A,

EE:1

term3 term2,, term1

, EE

1

1

1

1

neuron j

2lu 2ly

Llu Lly

1lu 1ly

2, ljw

program

in

W2

,

Lljw

jy

lby indexed

neurons

output

1l

2l

3l

ixju

For this hidden neuron j, this df1 in the program

Input xi to the hidden neuron j, P(:,) in program

program in W1ijw

1, ljw

After all are found

•


w

Eww

w

Ew

www

E

w

oldnew

oldnew

methoddecent graident

theusing minimized is so

all update tostep thisuse can We

ijw

E

Training• How to train the neurons: how to train the weights (W) and biases

(b) (use forward, backward passes)• Initialize W and b randomly• Iter=1: all_epochs (or break when E is very small)

– For all training samples {• Forward pass (same as the recognition process in part1) for

each output neuron:– Use training samples: Xclass_t : feed forward to find y.

– E=error_function(y-t)• Backward pass:

– From the output layer find and b to reduce Error E. Find all s(of the output)

– Calculate and b of all hidden layers,}


Summary

• Learn what is Back Propagation Neural Networks (BPNN)

• Learn the forward pass• Learn the backward pass and the training of

the BPNN network


References• Wiki

– http://en.wikipedia.org/wiki/Backpropagation– http://en.wikipedia.org/wiki/

Convolutional_neural_network

• Matlab programs– Neural Network for pattern recognition- Tutorial

http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-pattern-recognition-tutorial

– CNN Matlab example http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox


Appendices


Appendix 1:Sigmod function f(u) and its derivative f’(u)

•


)(1)()()(

)(1)()1(

11

)1(

1

)1(

)1()1(

)1(

1

)1()1(

1

)1(

1

)1(

1)(

rule) chain using(,)1(

)1(1

1)(

)(

)()(

1set simplicityfor ,1

1

1

1)(

'

22'

'

'

ufufufdu

udf

ufufee

e

eee

ee

e

e

ee

ee

uf

du

ed

ede

d

du

udfuf

Hence

ufdu

udfee

uf

uu

u

uuu

uu

u

u

uu

uu

u

u

u

uu

http://link.springer.com/chapter/10.1007%2F3-540-59497-3_175#page-1

http://mathworld.wolfram.com/SigmoidFunction.html

•


nnLlLl

nnn

l

nnn

nnn

l

nnnn

nl

n

n

nnnn

llll

tyf

L l

ivuftyb

E

b

ui

b

uufyt

b

uftyt

b

E

b

ytyt

b

Ei)(ii) & (ii

t

ufy

iiiuftytE

n-th

iiub

u

ub

ib

ubxu

u'

layeroutput at the

)(

,1),(in since

,)(

, From

(teacher)or target truththe

,outputcurrent theis )( Becuase

)()(2

1

2

1

sample theSince

)(ysensitivit theEEE

hence

),(1 so, since

'

'

22

1

Alternative

Derivation (for the output layer , in each neuron)

1ll

Output(last layer)t=target (teacher)y=outputFeeding back to the previous layer

derivation

•


eq(ii)δbb

Ebb

xE

viv(iv eq

T

viE

E

xivxE

ufty(iv)xuftyE

bxufty

ufyt

yyt

E

bwxuytE

lb

loldl

n

blold

lnew

lll

l

ll

l

ll

nnlnnl

lnn

lnn

l

nnn

l

nnn

see , argument, same For the

),,.use hence slide),next see method,decent gradient theis (This

factor) learning ( ve smalla useslowly it do o

)( make

cycle learningeverfy for decrease want to weIf

-(v)------- ,calculated is new a phase, learning eachFor

) weight and input each(for )(

)(' in since,)('

)(')(

and,2

1 , (iii) from Also

1oldoldoldnew

oldnew

2

BNPP example in matlab

Based on Neural Network for pattern recognition- Tutorial

http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-

pattern-recognition-tutorial


Example: a simple BPNN

• Number of classes (no. of output neurons)=3• Input 9 pixels: each input is a 3x3 image• Training samples =3 for each class• Number of hidden layers =1• Number of neurons in the hidden layer =5


Display of testing patterns

•


Architecture

Neural Networks NN ver. 4h •

Input:P=9x1Indexed by j

A1: Hidden layer1 =5 neurons, indexed by iWl=1=9x5bl=1=5x1

l=1(j=1,i=1)

l=1(j=2,i=1)

l=1(j=9,i=1)

P(j=1)

P(j=2)

P(j=3)

::

P(j=9)

)1(ib...)1,2()1,1(112

11

1

1

1A

PijPij ll

e

A1(i=1)P(j=1)

P(j=2)

P(j=9)

Neuron i=1Bias=b1(i=1)

2(i=1,k=1)

2(i=2,k=1)

2(i=5,k=1)

))1(b...)1()1,2()1()1,1((222

21

2

1

1A

kkAkikAki ll

e

A2(k=2)

A1

A2

A5

Neuron k=1Bias=b2(k=1)

l=1(j=1,i=1)

l=1(j=2,i=1)

l=1(j=9,i=5)

l=1(j=3,i=4)

A1(i=5)

A1(i=1)

A2:layer2, 3 Output neuronsindexed by kWl=2=5x3bl=2=3x1

l=2(i=5,k=3)

l=2(i=1,k=1)

l=2(i=2,k=2)

l=2(i=2,k=1)A1(i=2)

36

Layer l=1 Layer l=2S2 generated

S1 generated

• %source : http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-pattern-recognition-tutorial• clear memory %commented added by kh wong• clear all• clc• nump=3; % number of classes• n=3; % number of images per class• % training images reshaped into columns in P • % image size (3x3) reshaped to (1x9)• • % training images • P=[196 35 234 232 59 244 243 57 226; ...• 188 15 236 244 44 228 251 48 230; ... % class 1• 246 48 222 225 40 226 208 35 234; ...• • 255 223 224 255 0 255 249 255 235; ...• 234 255 205 251 0 251 238 253 240; ... % class 2• 232 255 231 247 38 246 190 236 250; ...• • 25 53 224 255 15 25 249 55 235; ...• 24 25 205 251 10 25 238 53 240; ... % class 3• 22 35 231 247 38 24 190 36 250]';• • % testing images • N=[208 16 235 255 44 229 236 34 247; ...• 245 21 213 254 55 252 215 51 249; ... % class 1• 248 22 225 252 30 240 242 27 244; ...• • 255 241 208 255 28 255 194 234 188; ...• 237 243 237 237 19 251 227 225 237; ... % class 2• 224 251 215 245 31 222 233 255 254; ...• • 25 21 208 255 28 25 194 34 188; ...• 27 23 237 237 19 21 227 25 237; ... % class 3• 24 49 215 245 31 22 233 55 254]';• • % Normalization• P=P/256;• N=N/256;•


• % display the training images • figure(1),• for i=1:n*nump• im=reshape(P(:,i), [3 3]);• %remove theline below to reflect the truth data input• % im=imresize(im,20); % resize the image to make it clear• subplot(nump,n,i),imshow(im);…• title(strcat('Train image/Class #', int2str(ceil(i/n))))• end• % display the testing images • figure,• for i=1:n*nump• im=reshape(N(:,i), [3 3]);• % remove theline below to reflect the truth data input• % im=imresize(im,20); % resize the image to make it clear • subplot(nump,n,i),imshow(im);title(strcat('test image #', int2str(i)))• end•


• • • % targets• T=[ 1 1 1 0 0 0 0 0 0• 0 0 0 1 1 1 0 0 0• 0 0 0 0 0 0 1 1 1 ];• • S1=5; % numbe of hidden layers• S2=3; % number of output layers (= number of classes)• • [R,Q]=size(P); • epochs = 10000; % number of iterations• goal_err = 10e-5; % goal error• a=0.3; % define the range of random variables• b=-0.3;• W1=a + (b-a) *rand(S1,R); % Weights between Input and Hidden Neurons• W2=a + (b-a) *rand(S2,S1); % Weights between Hidden and Output Neurons• b1=a + (b-a) *rand(S1,1); % Weights between Input and Hidden Neurons• b2=a + (b-a) *rand(S2,1); % Weights between Hidden and Output Neurons• n1=W1*P;• A1=logsig(n1); %feedforward the first time• n2=W2*A1;• A2=logsig(n2);%feedforward the first time• e=A2-T; %actually e=T-A2 in main loop• error =0.5* mean(mean(e.*e)); % better say e=T-A2 , but no harm to error here• nntwarn off


• for itr =1:epochs• if error <= goal_err • break• else• for i=1:Q %i is index to a column in P(9x9), for each column of P

( P:,i)• % is a training sample image, 9 training samples, 3 for each class• %A1=5x9, A1 =outputs of hidden layer and input to output layer• % A2=3x9, A2=Outputs of output layer• %T=true class, each column in T is for 1 training sample • % hidden_layer =1, output_layer =2, • df1=dlogsig(n1,A1(:,i)); %df1 is 5x1 for 5 neurons in hidden layer• df2=dlogsig(n2,A2(:,i)); %df2 is 3x1 for output neurons• % s2 is sigma2=sensitvity2 from the output layer , equation(2) • s2 = -1*diag(df2) * e(:,i); %e=T-A2; df2=f’=f(1-f) of layer2


• %s1=5x1• s1 = diag(df1)* W2'* s2; % eq(3),feedback, from s2 to S1• %dW= -n*s2*df(u)*x in ppt, =0.1, S2 is found, x is A1• • %W2 is 3x5 , each output neuron receives, update W2• % 5 inputs from 5 hidden neurons in the hidden layer• %sigma2=s2 = -1*diag(df2) * e(:,i); %e=T-A2; df2=f’=f(1-f) of layer2• %delta_W2 = -learning_rate*sigma2*input_to_output_layer • %delta_W2 = -0.1*sigma2*A1• W2 = W2-0.1*s2*A1(:,i)'; %learning rate=0.1, equ(2) output case• %3x5 =3x5- (3x1*1x5), • %A1=5 hidden neuron outputs (5 hidden neurons)• %A1(:,i)’=1x5=outputs of hidden layer, • • b2 = b2-0.1*s2; %threshold • % 3x1=3x1- 3x1• %P1(:,i)=1x9 =input t o hidden,• % s1=5x1 because each hidden note has 1 sensitivity (sigma)• W1 = W1-0.1*s1*P(:,i)';% update W1 in layer 1, see equ(3) hidden case• %5x9 = 5x9-(5x1* 1x9), since P is 9x9 and for an i, P(:,i)' =1x9


• b1 = b1-0.1*s1;%threshold • %5x1=5x1-5x1• • A1(:,i)=logsig(W1*P(:,i)+b1);%forward• %5x1 = 5x1• A2(:,i)=logsig(W2*A1(:,i)+b2);%forward• %3x1=3x1• end• e = T - A2; % for this e, put -ve sign for finding s2• error =0.5*mean(mean(e.*e));• disp(sprintf('Iteration :%5d mse :%12.6f

%',itr,error));• mse(itr)=error;• end• end• Neural Networks NN ver. 4h 42

• threshold=0.9; % threshold of the system (higher threshold = more accuracy)• • % training images result• • %TrnOutput=real(A2)• TrnOutput=real(A2>threshold) • • % applying test images to NN , TESTING BEGINS HERE• n1=W1*N;• A1=logsig(n1);• n2=W2*A1;• A2test=logsig(n2);• • % testing images result• • %TstOutput=real(A2test)• TstOutput=real(A2test>threshold)• • • % recognition rate• wrong=size(find(TstOutput-T),1);• recognition_rate=100*(size(N,2)-wrong)/size(N,2)• % end of code


Chapter 7 Introduction to Back Propagation Neural Networks BPNN KH Wong Neural Networks NN ver. 4h1.

Documents

Transcript of Chapter 7 Introduction to Back Propagation Neural Networks BPNN KH Wong Neural Networks NN ver. 4h1.