2101INT – Principles of Intelligent Systems Lecture 10.
-
Upload
kevin-willis -
Category
Documents
-
view
214 -
download
0
Transcript of 2101INT – Principles of Intelligent Systems Lecture 10.
2101INT – Principles of Intelligent Systems
Lecture 10
Neural Networks
Since we are interested in creating artificial intelligence in systems, it is reasonable that we would attempt to mimic the human brain
The concept of an artificial neuron has been around since at least 1943
This type of field is usually described as artificial neural networks (ANNs), connectionism, parallel distributed processing or neural computation
This field is also of interested to cognitive psychologists who seek to better understand the human brain
Neurons
A neuron is a cell in the brain that collects, processes and disseminates electric signals
On their own, neurons are not particularly complex Much of the brain’s information-processing capacity is
thought to stem from the number of and inter-relationships between the neurons.
As such is an emergent property of the neurons, since each of its own does not have the power of the whole
The human brain contains about 1010 neurons, each on average connected to about 10,000 others
Neurons
Signals in a brain are noisy “spike trains” of electrical energy
Neurons
Neurons
The axon endings almost touch the dendrites or cell body of the next neuron – termed a synapse
Electrical signals are transferred with the aid of neurotransmitters – chemicals which are released from one neuron and which bind to another
Signal transmission depends on:– quantity of neurotransmitter– number and arrangement of receptors– neurotransmitter re-absorption– etc.
Carbon vs Silicon
Elements: 1014 synapses vs 108 transistors Size: 10-6m vs 10-6m Energy: 30W vs 30W (CPU) Speed: 100Hz vs 109Hz Architecture: Parallel/Distributed vs Serial/Centralised Fault Tolerant: Yes vs a Little Learns: Yes vs Maybe Intelligent: Usually vs Not Yet
Mathematical Model of a Neuron
McCulloch and Pitts (1943)
737
Mathematical Model of a Neuron
ANNs are composed of many units
A link from unit j to unit i propagates the activation of unit j (aj) to unit i
Each link also has weight Wj,i which determines the strength and sign of the link
Each unit computes the weighted sum of its inputs, ini:
And then applies an activation function to derive the output ai:
A bias weight is also present
n
jjiji aWin
0,
n
jjijii aWginga
0,)(
Activation Functions
The activation function should create a “high” output (say 1) when the correct inputs are given and a “low” output (say 0) otherwise
The function also needs to be non-linear (not of the form y = mx + c), otherwise the network as a whole would be a simple linear function, which isn’t particularly powerful
Two choices are the threshold (unit step) function and the sigmoid function = 1/(1+e-x)
738
Neurons as Logic Gates
A single neuron can implement the three most basic Boolean logic functions (and also NAND and NOR)
A single unit cannot represent XOR
738
Usefulness of ANNs
So neurons and ANNs can be designed to exhibit particular behaviours
More often, they are used to learn to recognise/classify particular patterns.
Rosenblatt’s work (1958) explicitly considered this problem when a teacher is providing advice to the ANN
From this we derive the term supervised classification or supervised learning
It was he who introduced perceptrons: ANNs that change their link weights when they make incorrect decisions/classifications
Networks of Neurons
Two main categories: feed-forward networks and recurrent networks
Feed-forward networks are acyclic: all links feed forward in the network. A feed forward network is simply a function of its current input. It has no internal state.
Recurrent networks are cyclic: links can feed back into themselves. Thus, the activation levels of the network form a dynamic system, and can exhibit either stable, oscillatory or even chaotic behaviour.
A recurrent network’s response will depend on its initial state, which depends on prior inputs
738
A Single Layer Perceptron
A network with all inputs connected directly to the outputs is called a single-layer neural network or a perceptron network
740
What can single layer networks do?
Already seen that they can implement simple Boolean logic functions
Can also represent other more complex functions, like the majority function: returns T iff more than half of its inputs are T
Decision tree representation would require O(2n) nodes
So why can’t we represent XOR?
Linear Seperability
The output of a threshold perceptron can be described as:
or as vectors:
W•x > 0
This equation defines a hyperplane in the input space
740
00
n
jjj xW
Linear Seperability cont.
Functions that can be divided by such a hyperplane are termed linear seperable
In general, threshold perceptrons can represent only linearly seperable functions
There are times when such networks are sufficiently appropriate however
741
How does the brain learn?
Brains learn by altering the strength of connections between neurons
They learn online, often without the benefit of explicit training examples
Hebb’s Postulate:"When an axon of cell A... excites[s] cell B and repeatedly or
persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells so that A's efficiency as one of the cells firing B is increased."
Learning
What is learning? Rosenblatt (1958) provided a learning scheme with the
property that: “if the patterns of the training set can be seperated by some choice of weights and threshold, then the scheme will eventually yield a satisfactory setting of the weights”
1. Pick a “representative set of patterns” – training set
2. Expose network to this set to adjust synaptic weights using a learning rule – such as minimise the error
741
Learning
Minimise the error, in this case sum of square error SSE:
Use gradient descent, to minimise the error wrt each particular link weight. Use chain rule:
741
y
u
u
x
y
x
22 )(2
1
2
1xWhyErrE
Learning cont.
Using the chain rule:
First term reduces to:
Second term:
741
22 )(2
1
2
1xWhyErrE
jj W
Err
Err
E
W
E
Err
Err
Err
Err
E
221
n
jjj
jj
xWygWW
Err
0
Learning cont.
Combining these:
Definitions of terms– g’() is the derivative of the activation function– ini is the weighted sum of inputs
741
jj
jjj
xingErrW
E
xingErrW
Err
Err
E
W
E
)('
)('
Updating weights
Having calculated the impact of each weight on the overall error, can now adjust each Wj accordingly:
Note that the minus has been dropped from the previous equation: +ve error requires increased output
is called the learning rate The network is shown each training example, and the
weights are updated Exposure to a complete set of training examples is
termed an epoch The process is repeated until convergence occurs
742
jjj xingErrWW )('
Learning Examples
743
Learning Examples
748
Graph shows total error on a training set of 100 examples
Multilayer Feed Forward Networks
Will now consider networks where the inputs do not connect directly to the outputs
Introduce some intervening units between input and output which are termed hidden units
Why?
744
Effect of Hidden Units
Call these networks multilayer perceptron networks Hidden layers remove the restriction to linearly
seperable functions Using a sigmoid activation function, two hidden units
can classify a ridge, 4 a bump, >4 more bumps, etc
744
Weight learning in MLPs
Similar procedure as for a single layer, except that the error must be propagated back through the hidden layers
Gives rise to back propagation learning The equations for back propagation learning are
derived on pages 745-747 of the text
745pp
Performance comparison MLP vs Single
Learning in ANNs
Readings for Week 10
1. Russell and Norvig, Chapter 1
2. http://www.willamette.edu/~gorr/classes/cs449/brain.html
25pp