Introduction to Neural Networks: Structure and …qjz/ANN_Course/ANN_Structure.pdfQ.J. Zhang,...

Q.J. Zhang, Carleton University

Introduction to

Neural Networks:

Structure and Training

Professor Q.J. Zhang

Department of Electronics

Carleton University, Ottawa, Canada

www.doe.carleton.ca/~qjz, [email protected]


A Quick Illustration Example:

Neural Network Model for Delay

Estimation in a High-Speed

Interconnect Network


High-Speed VLSI


Driver 1

Driver 2

Receiver 1

Driver 3

Receiver 2

Receiver 3

Receiver 4


Circuit Representation of the


C1 R1 R2 C2

C4 R4

C3 R3

L1 L2 L3

L4

1 2

3

4

Source

Vp, Tr

Rs


Massive Analysis of Signal Delay


• A PCB contains large number of interconnect networks,

each with different interconnect lengths, terminations, and

topology, leading to need of massive analysis of

interconnect networks

• During PCB design/optimization, the interconnect

networks need to be adjusted in terms of interconnect

lengths, receiver-pin load characteristics, etc, leading to

need of repetitive analysis of interconnect networks

• This necessitates fast and accurate interconnect network

models and neural network model is a good candidate

Need for a Neural Network Model


Neural Network Model for Delay Analysis

L1 L2 L3 L4 R1 R2 R3 R4 C1 C2 C3 C4 Rs Vp Tr e1 e2 e3

…...

1 2 3 4


Z1 Z2 Z3 Z4

y1 y2

x1 x2 x3

W ’jk

Wki

Outputs

yj =S W ’jkZk k

3 Layer MLP: Feedforward Computation

Inputs

Zk = tanh(S Wki xi )

Hidden Neuron Values

i


Neural Net Training

Training Data

by simulation/

measurement

d = d(x)

Neural

Network

y = y(x)

Objective:

to adjust W,V such that

minimize (y - d)2

W,V x

S


Simulation Time for 20,000

Interconnect Configurations

Method CPU

Circuit Simulator (NILT) 34.43 hours

AWE 9.56 hours

Neural Network Approach 6.67 minutes


• Neural networks have the ability to model multi-dimensional nonlinear relationships

• Neural models are simple and the model computation is fast

• Neural networks can learn and generalize from available data thus making model development possible even when component formulae are unavailable

• Neural network approach is generic, i.e., the same modeling technique can be re-used for passive/active devices/circuits

• It is easier to update neural models whenever device or component technology changes

Important Features of Neural Networks


Inspiration

“Stop”

“Start”

“Help”

Mary Lisa John


A Biological Neuron

Figure from Reference [L.H. Tsoukalas and R.E. Uhrig, Fuzzy and Neural Approaches in Engineering, Wiley, 1997.]


Neural Network Structures


• A neural network contains

• neurons (processing elements)

• connections (links between neurons)

• A neural network structure defines

• how information is processed inside a neuron

• how the neurons are connected

• Examples of neural network structures

• multi-layer perceptrons (MLP)

• radial basis function (RBF) networks

• wavelet networks

• recurrent neural networks

• knowledge based neural networks

• MLP is the basic and most frequently used structure

Neural Network Structures


MLP Structure

(Output) Layer L

(Hidden) Layer L-1

1 2 NL

NL-1 3 2 1

. . . .

. . . . . . .

. . . .

(Hidden) Layer 2

(Input) Layer 1

3 2 1 N2

N1 3 1 2

x1 x2 x3

. . . .

. . . .

xn

y1 y2 ym


Information Processing In a Neuron

1

2

lz

l

iz

S

(.)

….

l

iw 0

1

0

lz

l

iw 1l

iw 2

l

i

l

iN lw

1

1

1

lz

1

1

l

N lz


• Input layer neurons simply relay the external inputs to the neural network

• Hidden layer neurons have smooth switch-type activation functions

• Output layer neurons can have simple linear activation functions

Neuron Activation Functions


0

0.5

1

1.5

-25 -20 -15 -10 -5 0 5 10 15 20 25

-2

-1

0

1

2

-10 -8 -6 -4 -2 0 2 4 6 8 10

-2

-1

0

1

2

-10 -8 -6 -4 -2 0 2 4 6 8 10

Sigmoid

Arctangent

Hyperbolic

tangent

Forms of Activation Functions: z = ()


Forms of Activation Functions: z = ()

Sigmoid function:

Arctangent function:

Hyperbolic Tangent function:

ez

1

1)(

)arctan(2

)(

z

ee

eez )(


Multilayer Perceptrons (MLP):

Structure:

(Output) layer L

(Hidden) layer l

(Hidden) layer 2

(Input) layer 1

1

3 2 1 N2

2 NL

NL-1 3 2 1

N1 3 1 2

x1 x2 x3

. . . .

. . . . . . .

. . . .

. . . .

. . . .

xn


where: L = Total No. of the layers

Nl = No. of neurons in the layer #l

l = 1, 2, 3, …, L

= Link (weight) between the neuron #i in the layer

#l and the neuron #j in the layer #(l-1)

NN inputs = (where n = N1)

NN outputs = (where m = NL)

Let the neuron output value be represented by z,

= Output of neuron #i in the layer #l

Each neuron will have an activation function:

lijw

nxxx ,,, 21

myyy ,,, 21

liz

)(z


Neural Network Feedforward:

Problem statement:

Given: , get from NN.

Solution: feed to layer #1, and feed outputs of layer #l-1 to l.

For

For :

and solution is

nx

x

x

x2

1

my

y

y

y2

1

1,,,2,1,:1 Nnnixzl il

i

Ll ,,3,2

1

0

1lN

j

li

li

lj

lij

li )(z,zw

LLii Nmmizy ,,,2,1,


Question: How can NN represent an arbitrary

nonlinear input-output relationship?

Summary in plain words:

Given enough hidden neurons, a 3-layer-perceptron can

approximate an arbitrary continuous multidimensional function

to any required accuracy.


Theorem (Cybenko, 1989):

Let be any continuous sigmoid function, the finite sums of

the form:

are dense in C(In). In other words, given any and ,

there is a sum, , of the above form, for which

for all

where:

In -- n-dimensional unit cube

C(In): Space of continuous functions on In .

e.g., Original problem is , where ,

the form of is a 3-layer-perceptron NN.

)(xfy )( nICf

)( xf

(.)

2

1 0

23N

j

n

ii

)(ji

)(kjkk )xw(w)x(fy mk ,,2,1

)I(Cf n 0

)( xf

)x(f)x(f nIx

n1,0

nixspacex i ,,2,1,1,0:


Illustration of the Effect of Neural Network Weights

0

0.2

0.4

0.6

0.8

1

1.2

-6 -4 -2 0 2 4 6

Suppose this neuron is zil. Weights wij

l (or wkil+1)

affect the figure horizontally (or vertically).

Values of the w’s are not unique, e.g., by changing

signs of the 3 values in the example above.

Standard signmoid function

1/(1 + e –x )

Hidden neuron

1/(1 + e –(0.5x – 2.5) ) 2.0

0.5

bias: -2.5

0

0.5

1

1.5

2

2.5

-10 -5 0 5 10 15 20



- Example with 1 input

bias: -20 bias: -2.5

bias: 1

bias:

-26

0.8

-4.5

2.0

0.5 2.0

1.0

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

-20 -10 0 10 20 30 40 50



- Example with 2 inputs

bias:

-20

bias:

-0.25

bias: 1

bias:

-16

0.7

-1.5 2.3

1.0

1.0

-1.0

z1

0.0 1.0 1.0 1.0

0.0

1

bias:

0

x1 x2

y

z2 z3 z4

z1 as a function of x1 and x2 :

z1 = 1/(1+exp(- x1))

x1

x2

z1

x1 x2

z1


x2

bias:

0

bias:

0

bias:

0

4.0

4.0

0.1

z1

0.1 0.1 4.0 4.0

0.1

bias:

0

x1

y

z2 z3 z4

x1 x2 z1 z2 z3 z4

-1 -1 -1 -1 -1 0

+1 -1 +1 -1 0 0

-1 +1 -1 +1 0 0

+1 +1 +1 +1 +1 0



Assuming arctan activation function


x2

bias:

0

bias:

0 bias:

0

4

-4

4

z1

4 -4 4 -4

-4

bias:

0

x1

y

z2 z3 z4

x1 x2 z1 z2 z3 z4

-1 -1 -1 +1 0 0

+1 -1 0 0 +1 -1

-1 +1 0 0 -1 +1

+1 +1 +1 -1 0 0





x2

bias:

-4

bias:

10

bias:

-12

0.1

4

4

z1

0.1 4 4

0.1 0.1

bias:

-4

x1

y

z2 z3 z4

Illustration of the Effect of Neural Network Bias Parameters



x1 x2 z1 z2 z3 z4

-1 -1 -1 +1 -1 -1

-1 +1 -1 +1 -1 -1

+1 -1 -1 +1 -1 -1

+1 +1 -1 +1 -1 +1

40 40 +1 +1 +1 +1

-70 -70 -1 -1 -1 -1


x2

Effect of Neural Network Weights, and Inputs on

Neural Network Outputs

As the values of the neural

network inputs x change,

different neurons will respond

differently, resulting in y being a

“rich” function of x. In this way,

y becomes a nonlinear a function

of x.

When the connection weights and

bias change, y will become a

different function of x.

y = f(x, w)

bias:

-20

bias:

-0.25

bias: 1

bias:

-16

0.7

-1.5 2.3

1.0

1.0

-1.0

z1

0.0 1.0 1.0 1.0

0.0

1

bias:

0

x1 x2

y

z2 z3 z4


Question: How many neurons are needed?

Essence: The degree of non-linearity in original problem.

Highly nonlinear problems need more neurons, more smooth

problems need fewer neurons.

Too many neurons – may lead to over learning

Too few neurons – will not represent problem well enough

Solution:

schemes adaptive

r trial/erro

experience


Question: How many layers are needed?

3 or more layers are necessary and sufficient for arbitrary

nonlinear approximation

3 or 4 layers are used more frequently

more layers allow more effective representation of

hierarchical information in the original problem


Radial Basis Function Network (RBF)

Structure:

Z1 Z2 ZN

x1 xn

y1 ym

wij

ij, cij


0

0.5

1

1.5

-10 -6 -2 2 6 10

()

a = 0.4

a = 0.9

RBF Function (multi-quadratic)

01

22

,

)c()(


RBF Function (Gaussian)

))/(exp()(2

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-4 -3 -2 -1 0 1 2 3 4

=1

=0.2

()


RBF Feedforward:

where

is the center of radial basis function, is the width factor.

So a set of parameters (i = 1, 2, …, N, j = 1, …, n) represent

centers of the RBF. Parameters , (i = 1, 2, …, N, j = 1, …, n)

represent “standard deviation” of the RBF.

Universal Approximation Theorem (RBF) –

(Krzyzak, Linder & Lugosi, 1996): An RBF network exists such that it will approximate an

arbitrary continuous to any accuracy required.

mk ,,2,1

i

ie)(zi

0210

,N,,,i 2

1

)cx

(n

j ij

ijj

i

ijc ij

)(xfy

N

iikik zwy

0

ijijc


Wavelet Neural Network Structure:

For hidden neuron :

where tj = [ tj1 , tj2 , . . . , tjn]T is the translation vector, is the dilation

factor, is a wavelet function:

Z1 Z2 ZN

x1 xn

y1 ym

a1 a2 aN tij

wij

jz )()(j

jjj

a

txz

ja

)(

22

2

)()()(

j

enjjj

n

i j

jii

j

jj )

a

tx(

a

tx

1

2 where


Wavelet Function

-1

-0.5

0

0.5

1

-20 -10 0 10 20

a1 = 5

a1 = 1

x

z


Wavelet Transform: R – space of a real variable

-- n-dimensional space, i.e. space of vectors of n real

variables

A function f : is radial, if a function g : ,

exists such that ,

If is radial, its Fourier Transform is also radial.

Let , is a wavelet function, if

Wavelet transform of is:

Inverse transform is:

)(xf

dxa

txaxftaw

n

Rn )()(),( 2

dtdaa

txatawa

Cxf

n

R

nn )(),(

1)( 2

0

)1(

nR

RRn RR

nRx )()( xgxf

)(ˆ w

)()(ˆ ww )(x

dC

n

0

2)(

)2(

)(x


The neural network accepts the input information sent to input

neurons, and proceeds to produce the response at the output

neurons. There is no feedback from neurons at layer l back to

neurons at layer k, k l.

Examples of feedforward neural networks:

Multilayer Perceptrons (MLP)

Radial Basis Function Networks (RBF)

Wavelet Networks

Feedforward Neural Networks


Recurrent Neural Network (RNN):

Discrete Time Domain

….

y(t)

x(t)

y(t-2) y(t-3) y(t-) x(t-) x(t-2)

Feedforwad neural

network

The neural network output is a function of its present input, and a history of its

input and output.


Dynamic Neural Networks (DNN)

(continuous time domain)

….

y(t)

x(t)

y’’(t) y’’’(t) y’(t) x’(t) x’’(t)

Feedforwad neural

network

The neural network directly represents the dynamic input-output relationship of

the problem. The input-output signals and their time derivatives are related

through a feedforward network.


Self-Organizing Maps (SOM), (Kohonen, 1984)

Clustering Problem:

Given training data xk, k = 1, 2, …, P,

Find cluster centers ci, i = 1, 2, …, N

Basic Clustering Algorithm: For each cluster i, (i=1,2, .., N), initialize an index set Ri={empty}, and set

the center ci to an initial guess.

For xk, k = 1, 2, …, P, find the cluster that is closest to xk,

i.e. find ci , such that ,

Let Ri= Ri {k}. Then the center is adjusted:

jijxcxc kjki ,,

iRk

k

i

i xRsize

c)(

1

and the process continues until ci does not move any more.


Self Organizing Maps (SOM)

SOM is a one or two dimensional array of neurons where the

neighboring neurons in the map correspond to the neighboring

cluster centers in the input data space.

Principle of Topographic Map Information: (Kohonen, 1990)

The spatial location of an output neuron in the topographic map

corresponds to a particular domain or features of the input data

Two-dimensional array of

neurons

input


Training of SOM:

For each training sample xk, k = 1, 2, …, P, find the nearest

cluster cij , such that

Then update and neighboring centers:

where ,

Nc – size of neighborhood

-- a positive value decaying as training proceeds

jqipqpxcxc kpqkij ,,,,

ijc

))(( pqkpqpq cxtcc

Ncip Ncjq

)(t


Specific MLP for

group 1

Specific MLP for

group 2

Specific MLP for

group 3

Specific MLP for

group 4

y

SOM

Control

x y

x

Control

General MLP

Q

x y

x y

x y

c

Filter Clustering Example (Burrascano et al)


Other Advanced Structures:

Knowledge Based Neural Networks embedding application

specific knowledge into networks

Empirical formula

Equivalent Circuit

Pure

Neural Networks +

Introduction to Neural Networks: Structure and …qjz/ANN_Course/ANN_Structure.pdfQ.J. Zhang,...

Documents

Transcript of Introduction to Neural Networks: Structure and …qjz/ANN_Course/ANN_Structure.pdfQ.J. Zhang,...