Classification - Artificial Neural Network (ANN)
Transcript of Classification - Artificial Neural Network (ANN)
Perceptron Model Multilayer ANN Discussions
ClassificationArtificial Neural Network (ANN)
Huiping Cao
Huiping Cao, Slide 1/21
Perceptron Model Multilayer ANN Discussions
Outline
1 Perceptron Model
2 Multilayer ANN
3 Discussions
Huiping Cao, Slide 2/21
Perceptron Model Multilayer ANN Discussions
Motivation
Inspired by biological neural systems
We are interested in replicating the biological functionThe first step is to replicate the biological structure
Mo#va#on
From the bird to planeNeurons: nerve cellsAxons: strands of fibers, for linking neuronsDendrites: extensions from the cell body of the neuron,connects neurons and axonsSynapse: the contact point between a dendrite and an axon
Huiping Cao, Slide 3/21
Perceptron Model Multilayer ANN Discussions
Perceptron Model – Example
x1 x2 x3 y
1 0 0 -1
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 -1
0 1 0 -1
0 1 1 1
0 0 0 -1
sum
x1
x2
x3
Inputnodes
Outputnodes
y
t=0.40.3
0.3
0.3
Huiping Cao, Slide 4/21
Perceptron Model Multilayer ANN Discussions
Perceptron Model – Concepts
A perceptron is a single processing unit of a neural net.
Nodes in a neural network architecture are commonly knownas neurons or units.
Input nodes: represent the input attributesOutput node: represent the model output
Weighted links: emulate the strength of synaptic connectionbetween neurons.
y =
{1, if 0.3x1 + 0.3x2 + 0.3x3 − 0.4 > 0−1, if 0.3x1 + 0.3x2 + 0.3x3 − 0.4 < 0
Activation functions
y = sign[wdxd + wd−1xd−1 + · · ·+ w1x1 + w0x0] = sign[w · x]
where w0 = −t and x0 = 1.
Huiping Cao, Slide 5/21
Perceptron Model Multilayer ANN Discussions
Perceptron Model – General Formalization
x1---------w1-------\
\
...wj... > in ----> g ---> output
/
xn---------wn----- -/
A perceptron takes n inputs x1 to xn. Each input xj has anassociated weight wj .
The output of the perceptron is produced by a two-stage process.
The first stage computes a quantity which is the weighted sumof the inputs in =
∑nj=1 wjxj .
The second stage applies an activation function g to in. Forperceptrons, the activation function used is the thresholdactivation function
y = g(in) =
{1 if in > threshold σ−1 otherwise
Huiping Cao, Slide 6/21
Perceptron Model Multilayer ANN Discussions
Perceptron Model – Activation Function
The threshold σ is a parameter of the activation function.
y = g(in) =
{1 if in > threshold σ−1 otherwise
An alternative way of encoding a threshold activation functionis to create a special input x0.
This input will always be fixed at -1 for every instance.The weight w0 associated with this input can then be used inspace of the threshold σ (i.e., w0 = σ).
Thus,
in =n∑
j=0
wjxj
where x0 = −1 and w0 = σ
y = g(in) =
{1 if in > 0−1 otherwise
Huiping Cao, Slide 7/21
Perceptron Model Multilayer ANN Discussions
Activation functions
Sigmoid function sigmoid(s) = θ(s) = es
1+es . Its output isbetween 0 and 1. Its shape looks like a flattened out ‘s’. It is
also called a logistic function.-6 -4 -2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
Sigmoid Function(s)
Hyperbolic tangent function: tanh(s) = es−e−s
es+e−s
Huiping Cao, Slide 8/21
Perceptron Model Multilayer ANN Discussions
Learning Perceptron Model
Perceptron Learning Algorithm (PLA)
Training a perceptron model: adapting the weights of thelinks until they fit the input-output relationships of theunderlying data.
Input: D = {(xi , yi )|i = 1, 2, · · · ,N} be the set of trainingexamples
Initialize the weight vector with random values w(0).
repeat (iteration no is k)for each training point (xi , yi ) ∈ D do
Compute the predicted output y(k)i
for each weight wj
Update w(k+1)j = w
(k)j + λ(yi − y
(k)i )xij
until stopping condition is met
Huiping Cao, Slide 9/21
Perceptron Model Multilayer ANN Discussions
Perceptron Learning Algorithm (PLA)
Weight update formula: new weight is a combination of the
old weight w(k)j and a term proportional to the prediction
error (yi − y(k)i ).
w(k+1)j = w
(k)j + λ(yi − y
(k)i )xij
λ: learning rate, ∈ [0, 1]; It can be fixed or adaptive.
The model is linear in its parameters w and attributes x.
Huiping Cao, Slide 10/21
Perceptron Model Multilayer ANN Discussions
Perceptron Model – Linearly Separable
A perceptron is a classifier, which takes n continuous inputs,and returns a Boolean classification (+1 or -1).
All the points s.t. in(x) = 0 defines a hyperplane in the spaceof inputs.
On one side of this hyperplane, all points are classified as +1.
On the other side, they are classified as -1.
Decision boundary: A surface that divides the input spaceinto regions of different classes.
Perceptrons always have a linear decision boundary.
Linearly separable: a classification function with a lineardecision boundary.
Huiping Cao, Slide 11/21
Perceptron Model Multilayer ANN Discussions
Perceptron examples – Linearly Separable
Perceptrons can represent many (but not ALL) Booleanfunctions on two inputs.
A perceptron can represent the AND function by setting theweights to be +1 and the threefold to 1.5.
Huiping Cao, Slide 12/21
Perceptron Model Multilayer ANN Discussions
Perceptron examples – Non-linearly Separable
XOR function on two inputs?
Intuitively: we cannot find a line that can separate the fourpoints with the points on one side having the same classlabels.
Formally:
Suppose that we can represent XOR using weightsw = (w0,w1,w2).
(1) (x1 = −1) ∧ (x2 = −1), output is -1: −w0 − w1 − w2 ≤ 0
(2) (x1 = −1) ∧ (x2 = 1), output is 1: −w0 − w1 + w2 > 0
So, w2 is positive.
(3) (x1 = 1) ∧ (x2 = 1), output is -1: −w0 + w1 + w2 ≤ 0
(4) (x1 = 1) ∧ (x2 = −1), output is 1: −w0 + w1 − w2 > 0
So, w2 is negative.
Contradiction!
Huiping Cao, Slide 13/21
Perceptron Model Multilayer ANN Discussions
Perceptron Properties
If the problem is linearly separable, PLA is guaranteed toconverge to an optimal solution.
If the problem is not linearly separable, the algorithm failsto converge.
Huiping Cao, Slide 14/21
Perceptron Model Multilayer ANN Discussions
The neural network
x1
x2
x3
Inputlayer
Outputlayer
y
x4
x5
Hiddenlayer
Hidden layer, hidden nodes
Feed-forward NN: the nodes in one layer are connected onlyto the nodes in the next layer.
Recurrent NN: the links may connect nodes within the samelayer or nodes from one layer to the previous layers.
Activation functions: sign, linear, sigmoid (logistic),hyperbolic tangent
Huiping Cao, Slide 15/21
Perceptron Model Multilayer ANN Discussions
Learning the ANN model
Target function: the goal of the ANN learning algorithm isto determine a set of weights w that minimize the total sumof squared errors.
E (w) =1
2
N∑i=1
(yi − yi )2
When y is a linear function of its parameters, we can replacey = w · x into the above equation, then the error functionbecomes quadratic in its parameters.
Huiping Cao, Slide 16/21
Perceptron Model Multilayer ANN Discussions
Gradient descent
In most cases, the output of an ANN is a nonlinear functionof its parameters because of the choice of its activationfunctions (e.g., sigmoid or tanh function)
It is no longer straightforward to derive a solution for w thatis guaranteed to be globally optimal.
Greedy algorithm (e.g., based on gradient descentmethods) are typically developed.
Update formula
wj = wj − λ∂E (w)
∂wj
Huiping Cao, Slide 17/21
Perceptron Model Multilayer ANN Discussions
Gradient descent
For hidden nodes, the computation is not trivial because it is difficult to assess
their error term ∂E(w)∂wj
without knowing what their output values should be.
Back-propagation technique: two phases in each iteration
forward phase: at the ith iteration, w(i−1) are used to calculate theoutput value of each neuron in the network.
Outputs of neurons at level k are computed before computing the outputs
at level k + 1. I.e., k → k + 1
backward phase: update of w(i) is applied in the reverse direction.
Weights at level k + 1 are updated before the weights at level k are
updated. I.e., k + 1→ k
Huiping Cao, Slide 18/21
Perceptron Model Multilayer ANN Discussions
Neural network R examples
> install.packages("neuralnet")
> require(neuralnet)
>XOR <- c(0,1,1,0)
>xor.data <- data.frame(expand.grid(c(0,1), c(0,1)), XOR)
> xor.data
Var1 Var2 XOR
1 0 0 0
2 1 0 1
3 0 1 1
4 1 1 0
>net.xor <- neuralnet(XOR~Var1+Var2, xor.data, hidden=2, rep=5)
>print(net.xor)
>plot(net.xor, rep="best")
Usage and more examples seehttps://cran.r-project.org/web/packages/neuralnet/neuralnet.pdf
Function expand.grid {base}: Create a data frame from all combinations of thesupplied vectors or factors.
Huiping Cao, Slide 19/21
Perceptron Model Multilayer ANN Discussions
Design issues in ANN learning
Assign an input node to each numerical or binary inputvariable.
Output nodes: for two-class problem, one output node;k-class problem, k output nodes.
Target function representation factors: (1) weights of thelinks, (2) the number of hidden nodes and hidden layers, (3)biases in the nodes, and (4) type of activation function
Finding the right topology is not easy.
The initial weights and biases can come from randomassignments.
Fix the missing values first.
Huiping Cao, Slide 20/21
Perceptron Model Multilayer ANN Discussions
Characteristics of ANN
Universal approximators: they can be used to approximateany target functions.
Can handle redundant features.
Sensitive to the presence of noise.
The gradient descent method often converges to localminimum. One way: add a momentum term to the weightupdate formula.
Training is time consuming.
Testing is rapidly.
Huiping Cao, Slide 21/21