Sydow oTols of AImsyd/wyknai/perceptron3.pdf · Neuron, Perceptron , How Perceptron Learns. oTols...
Transcript of Sydow oTols of AImsyd/wyknai/perceptron3.pdf · Neuron, Perceptron , How Perceptron Learns. oTols...
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Tools of AI1. Evaluation and Over�tting
2. Introduction to Neural Networks: Perceptron
Marcin Sydow
12.03.09
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Topics covered by this lecture:
(Repetition: Decision Table, Supervised Learning)
Evaluation, Over�tting
Neuron, Perceptron, How Perceptron Learns
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Repetion:Decision Table for a Mysterious Outdoor Game
outlook temperature humidity windy PLAY?
sunny hot high false no
sunny hot high true no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
overcast cool normal true yes
sunny mild high false no
sunny cool normal false yes
rainy mild normal false yes
sunny mild normal true yes
overcast mild high true yes
overcast hot normal false yes
rainy mild high true no
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Decision Table:Cases and Attributes
Knowledge can be built on previously observed cases
Each case is described by some attributes of speci�ed type
(nominal or numeric)
For a given case, each of its attributes has some value (usually)
Decision Table:
cases = rows
attributes = columns
(a basic concept in data mining)
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Machine Learning
Task: to learn the relationships between the values of attributes
This knowledge is to be discovered automatically by machine
There are two main paradigms:
1 Supervised Learning
2 Unsupervised Learning
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Supervised Learning
1 given a new case (row): its attribute values are known
except the decision attribute
2 Task: �predict� the correct value of the decision attribute
3 Learn it on an available training set, for which all
attribute values are known, including the decision attribute
It is called:
classi�cation, when the decision attribute is nominal
regression, when the decision attribute is numeric
Some practical problems: e.g. training set can contain:
some noisy, erroneous or missing attribute values
inconsistent rows (di�erent decision for the same attribute values)
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Outdoor Game - a new case
outlook temperature humidity windy PLAY?
sunny hot high false no
sunny hot high true no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
overcast cool normal true yes
sunny mild high false no
sunny cool normal false yes
rainy mild normal false yes
sunny mild normal true yes
overcast mild high true yes
overcast hot normal false yes
rainy mild high true no
overcast cool high true ???
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
General Scheme of Supervised Learning
1 Data acquisition
2 Data cleaning and pre-processing
3 Division into training set and evaluation set
4 Learning on the traing set and evaluating on the evaluation
set (iterative)
5 Using the system
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Evaluation and the Problem of Over�tting
After each phase of learning the system has to be evaluated
How to measure how much the system has learnt?
We have the training set. It could be possible to count the
fraction of training cases on which the system gave the
incorrect answer (training error rate).
However, the training set is �nite - the system can often learn it
exactly. Learning strictly the training data is known as
over�tting problem.
This is not the goal, since over�tted classi�er is unable of
generalising the knowledge to unkonwn cases. (this is similar to
a student learning the mathematical knowledge by heart, on
examples, without understanding the general rules).
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Overcoming the Problem of Over�tting
If there is enough training data available, it is possible to keep
some part of it as an evaluation set.
Important: No training is done on the evaluation set - it is
used exclusively for evaluating: �how well the system can
�generalise� gained knowledge (i.e. on unseen cases)
In practice, training data is too expensive or too scarce to use
part of it as a �xed evaluation set.
In such cases, other techniques are applied:
cross-validation (most popular)
leave-one-out
bootstrap
(The techniques will be discussed on other lecture)
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Neural Networks:
Introduction
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Neuron
Human neural system has been a natural source of inspiration
for arti�cial intelligence researchers. Hence the interest in
a neuron � the fundamental unit of the neural system.
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Behaviour of a Neuron
Transmit the neural signal from the �inputs� (dendrites:
the endings of a neural cell contacting with other cells) to
the �output� (neurite, which is often a very long ending,
transmitting the signal to further neurons)
Non-linear signal processing: the output state is not a
simple sum of input signals
Dynamically modify the connections with other neurons via
synapses (connecting elements), which makes it possible to
strengthen or weaken the signal received from other
neurons according to the current task
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Perceptron: an Arti�cial Neuron
Perceptron is a simple mathematical model of a neuron.
Historically, the goal for the work on neural networks was to
gain the ability of generalisation (approximation) and learning,
speci�c for the human brain (according to one of the de�nitions
of arti�cial intelligence).
Currently, arti�cial neural networks focus on less �ambitiuous�,
but more realistic tasks.
A single perceptron can serve as a classi�er or regressor
Perceptron is a building block in more complex arti�cial neural
network structures that can solve practical problems:
supervised or unsupervised learning
controlling complex mechanical devices (e.g. robotics)
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Perceptron - a simple model of natural neuron
A perceptron consists of:
n inputs x1, . . . , xn corresponding to dendrites
n weights w1, . . . ,wn, corresponding to synapses
Each weight wi is attached to the i − th input xi
a threshold Θ
one output y
(All the variables are real numbers)
The output y is computed as follows:
y =
{1 if
∑n
i=1 wi · xi = W TX ≥ Θ (perceptron �activated�)
0 else (not �activated�)
W ,X ∈ Rn denote the vector of weights and inputs, respectively
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
The perceptron is activated (y=1) only when the dot product
W TX (sometimes called as �net�) reaches the speci�ed
threshold Θ
We call the perceptron discrete if y ∈ {0, 1} (or {−1, 1})continuous if y ∈ [0, 1] (or [−1, 1])
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Perceptron: Geometrical Interpretation
The computation of the output of the perceptron has simple
geometrical interpretation.
Consider the n-dimensional input space (each point here is a
potential input vector X ∈ Rn).
The weight vector W ∈ Rn is the normal vector of the
decision hyperplane.
The perceptron is activated (outputs 1) only if the input vector
X is on the same side (of the decision hyperplane) as the
weight vector W .
Moreover, the maximum net value (W TX ) is achieved for X
being close to W , is 0 if they are orthogonal and minimum
(negative) if they are opposite.
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Geometrical Interpretation of Perceptron, cont.
The required perceptron's behavior can be obtained by
adjusting appropriate weights and threshold.
The weight vector W determines the �direction� of the decision
hyperplane. The threshold Θ determines how much decision
hyperplane is moved from the origin (0 point)
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Example: Perceptrons can simulate logical circuits
A single perceptron with appropriately set weights and threshold
can easily simulate basic logical gates:
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Limitations of single perceptron
A single perceptron can �distinguish� (by the value of its
output) only the sets of inputs which are linearly separable in
the input space (i.e. there exists a n-1-dimensional hyperplane
separating the positive and negative cases)
One of the simplest examples of linearly non-separable sets is
logical function XOR (excluding alternative).
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Limitations of a single perceptron
One can see on the pictures below that functions AND and OR
correspond to the linearly separable sets, so we can model each
of them using a single perceptron (as shown in the previous
section), while XOR cannot be modeled by any single
perceptron.
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Network of Perceptrons
The output of one perceptron can be connected to input of
other perceptron(s) (as in neural system). This makes it
possible to extend the computational possibilities of a single
perceptron.
For example, XOR can be simulated by joining 2 perceptrons
and appropriately setting their weights and thresholds.
Remark: by �perceptron� or �multilayer perceptron� one can
also mean a network of connected entities (neurons).
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Example: Iris classi�cation
Single perceptron can distinguish Iris-setosa from 2 other
sub-species
However, it cannot exactly recognise any of the other 2
sub-species
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Perceptron: Overcoming the limitations
Discovery of the above limitations (1969) blocked further
development of neural networks for years.
An obvious method to cope with this problem is to join
perceptrons together (e.g. two perceptrons are enough to
model XOR) to form arti�cial neural networks.
However, it is mathematically far more di�cult to adjust more
complex networks to our needs than in case of a single
perceptron.
Fortunately, the development of e�cient techniques of learning
the perceptron networks (80s of XX century), i.e. automatic
tuning of their weights on the basis of positive and negative
examples, caused the �renaissance� of arti�cial neural networks.
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Single Perceptron as a Classi�er
A single perceptron can be used as a tool in supervised machine
learning, as a binary classi�er (output equal to 0 or 1)
To achieve this, we present a perceptron with a training set of
pairs:
input vector
correct answer (0 or 1)
The perceptron can �learn� the correct answers by appropriately
setting it's weight vector.
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Learning Perceptron
We apply the �training examples� one by one. If the current
output is the same as the desired, we pass to the next example.
If it is incorrect we apply the following �perceptron learning
rule� to the vector of its weights:
W ′ = W + (d − y)αX
d - desired (correct) output
y - actual output
0 < α < 1 - a parameter, tuned experimentally
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Interpretation of Perceptron Learning Rule
To �force� the perceptron to give the desired ouputs, its weight
vector should be maximally �close� to the positive (y=1) cases.
Hence the formula:
W ′ = W + (d − y)αX
move W towards �positive� X if it outputs 0 instead to 1
(�too weak activation�)
move W away from �negative� X (if it outputs 1 instead of
0) (�too strong activation�)
Usually, the whole training set should be passed several times to
obtain the desired weights of perceptron.
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
The Role of Threshold
If the activation threshold Θ is set to 0, the perceptron can
�distinguish� only the classes which are separable by a decision
hyperplane containing the origin of the input space (0 vector)
To �move� the decision hyperplane away from the origin, the
threshold has to be set to some non-zero value.
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Incorporating Threshold into Learning
W TX ≥ Θ
W TX −Θ ≥ 0
Hence, X can be extended by appending −1 (fake n+1-th
input) and W can be extended by appending Θ. Denote by W ′
and X ′ the extended (n+1)-th dimensional vectors. Now we
have:
W ′TX ′ ≥ 0 - the same form as �without� the threshold
Now, the learning rule can be applied to the extended vectors
W ′ and X ′
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Questions/Problems:
the problem of over�tting in supervised learning
training set and evaluation set
desired properties of arti�cial neuron
mathematical formulation of perceptron
how perceptron computes it's output
geometric interpretation of perceptron's computation
mathematical limitations of perceptron
learning rule for perceptron
geometric interpretation of perceptron's learning rule
Tools of AI
MarcinSydow
Repetition
DecisionTable
Learning
EvaluationandOver�tting
NeuralNetworks
Perceptron
Limitations
Learning
Summary
Thank you for attention