Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The...
-
Upload
nickolas-lawson -
Category
Documents
-
view
223 -
download
1
Transcript of Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The...
![Page 1: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/1.jpg)
Chapter 4
Artificial Neural Networks
![Page 2: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/2.jpg)
Questions:
• What is ANNs?
• How to learn an ANN? (algorithm)
• The presentational power of ANNs(advantage and disadvantage)
![Page 3: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/3.jpg)
What is ANNs ------Background
Consider humans
• Neuron switching time 0.001 second
• Number of neurons 1010
• Connections per neuron 104~5
• Scene recognition time 0.1 second
much parallel computation
• Property of neuron: thresholded unit
One motivation for ANN systems is to capture this kind of highly parallel computation based on distributed reprensetation
![Page 4: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/4.jpg)
• classfication
• Voice recognition
• others
What is ANNs? -----Problems related to ANNs
![Page 5: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/5.jpg)
Another example:
Properties of artificial neural nets (ANNs)
• Many neuron like threshold switching units
• Many weighted interconnections among units
• Highly parallel distributed process
• Emphasis on tuning weights automatically
![Page 6: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/6.jpg)
4.1 Perceptrons
> 00 1 2 211 ...( , ..., )1 1 otherwise
n nif x x xo x xn
![Page 7: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/7.jpg)
To simplify notation, set x0 =1
0( ) sgn( )nio x x xi i
�������������������������� ��
0 1( , , ..., )n
0 1( , , ..., )
nx x x x
Learning a perceptron involves choosing values for the weight . Therefore, the space H of candidate hypotheses considered in perceptron learning is the set of all possible real-valued weight vectors.
0,..., n
1{ | }nH
R}
![Page 8: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/8.jpg)
We can view the perceptron as representing a hyperplane decision surface in the n-dimensional space of instances.
Two way to train perceptron:
Perceptron Training Rule and Gradient Descent
0( )
sgn( )
nio x xi i
x
����������������������������
![Page 9: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/9.jpg)
(1). Perceptron Training Rule
i i i ( )i it o x
• is target value
• o is perceptron output
• is small constant called learning rate
( )t x
•Initialize the ωi with random value in the given interval
•Update the value of ωi according to the training example
![Page 10: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/10.jpg)
• A single perceptron can be used to represent many boolean functions, such as AND, OR, NAND, NOR, but fail to represent XOR.
• Eg: g(x1, x2) = AND(x1 ,x2)
o(x1, x2) = sgn(- 0.8 + 0.5x1 + 0.5x2 )
Representation Power of Perceptrons
x1 x1 - 0.8 + 0.5x1 + 0.5x2 O
-1 -1 -1.8 -1
-1 1 -0.8 -1
1 -1 -0.8 -1
1 1 0.2 1
![Page 11: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/11.jpg)
Representation Power of Perceptrons
(a) Can prove it will converge• If training data is linearly separable• and sufficiently small(b)But some functions not representable ,eg: not linearly separa
ble(c) Every boolean function can be represented by some network
of perceptrons only two levels deep
![Page 12: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/12.jpg)
(2). Gradient Descent
Key idea: searching the hypothesis space to find the weights that best fit the training examples.
Best fit: minimize the squared error
Where D is set of training examples
21( ) ( )
2 d dd D
E t o
![Page 13: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/13.jpg)
0 1
( ) ( , , , )n
E E EE
Gradient:
Training rule:
( )E
i i i
ii
E
or
Gradient Descent
![Page 14: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/14.jpg)
21( )
2
( ) ( )
( )( )
d dd Di i
d d d dd D i
d d idd D
Et o
t o t x
t o x
( )i d d idd D
t o x
i i i
ii
E
Gradient Descent
![Page 15: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/15.jpg)
Gradient Descent Algorithm
Initialize each ωi to some small random value
• Until the termination condition is met , Do
– Initialize each Δωi to zero.
– For each <x, t> in training examples Do
• Input the instance x to the unit and compute the output o
• For each linear unit weight ωi Do
– For each linear unit weight ωi ,Do
i i ( ) it o x
i i i
![Page 16: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/16.jpg)
When to use gradient descent
• Continuously parameterized hypothesis
• The error can be differentiable
![Page 17: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/17.jpg)
Advantage vs Disadvantage
Advantage
• Guaranteed to converge to hypothesis with
local minimum error , Given sufficiently small learning rate η;
• Even when training data contains noise;
• Training data not linear separable ;
• Converge to the single global minimum.
Disadvantage
• Converging sometimes can be very slow;
• No guarantee Converging to global minimum in cases where there are multiple local minima
![Page 18: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/18.jpg)
Incremental (Stochastic) Gradient Descent
standard Gradient Descent
Do until satisfied
• Compute the gradient
•
Stochastic Gradient Descent
For each training example d in D
• Compute the gradient
•
( )DE
( )DE
( )dE
( )dE
21( ) ( )
2d d dE t o 21
( ) ( )2D d d
d D
E t o
Vs.
![Page 19: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/19.jpg)
Standard Gradient Descent vs. Stochastic Gradient Descent
• Stochastic Gradient Descent can approximate Standard Gradient Descent arbitrarily closely if η made small enough;
• Stochastic mode can converge faster;
• Stochastic Gradient descent can sometimes avoid falling into local minima.
![Page 20: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/20.jpg)
(3).Perceptron training rule Vs. gradient descent
Perceptron training rule• Thresholded perceptron output: • Provided examples are linearly separable• Converge to a hypothesis that perfectly classfies the trainin
g data
gradient descent• Unthresholded linear output:• Regardless of whether the training data are linearly separa
ble • Converge asymptotically toward the minimum error hypot
hesis
( ) sgn( )o x x�������������������������� ��
( )o x x�������������������������� ��
![Page 21: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/21.jpg)
4.2 Multilayer Networks
Perceptron: Network:
Perceptrons can only express liner decision,we need to express a rich variety of nonlinear decision
![Page 22: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/22.jpg)
Sigmoid unit – a differentiable threshold unit
1( ) ( 1)
1 kxx here k
e
( )( )(1 ( ))
d xx x
dx
1( ) ( ( ) )
1 neto net net x
e
Sigmoid function:
Property:
Output:
Why do we use sigmoid instead of linear and
sgn(x)?
![Page 23: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/23.jpg)
• computing the input and output of each unit foreword;
• modifying the weights of units pairs backward with respect to errors
The main idea of backpropagation algorithm
The Backpropagation Algorithm
![Page 24: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/24.jpg)
21( ) ( )
2D kd kdd D k outputs
E t o
21( ) ( )
2d kd kdk outputs
E t o
Error definition :
Batch mode:
Individual mode:
![Page 25: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/25.jpg)
ji
ji
j
j
j
x =the ith input to unit j
= the weight associated with the ith input to unit j
net (the weighted sum of inputs for unit j)
o = the output computed by unit j
t = the target output
ji jiix
for unit j
outputs =the set of units in the final layer
Ds(j) = the set ot units whose immediate
inputs include the output of j
oj
ω ij
oi = xji
… …
j net ji jiix
![Page 26: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/26.jpg)
Training rule for Output Unit weights
jd d
j j j
oE E
net o net
21( ) ( )
2d
j j j jj j
Et o t o
o o
( )(1 )j j
j jj j
o neto o
net net
( ) (1 )dj j j j
j
Et o o o
net
( ) (1 )dji j j j j
ji
Et o o o
![Page 27: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/27.jpg)
Training Rule for Hidden Unit Weights
( )
( )
( )
(
(
)
)
(1 )
d d k
k Ds jj k j
kk
k Ds j j
jk
k kj
kk Ds j j
j jk Ds j
j
jk kj
k Ds j j
E E net
net net net
net
net
o
onet
o net
o
ne
o
t
j k( )
(1 )
we have
j j kjk Ds j
and
o o
j d
j
Edenote
net
j jiji x
Error term
ok
![Page 28: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/28.jpg)
Backpropagation Algorithm
• Initialize all weights to small random numbers
• Until termination condition is met Do
For each training example Do
//Propagate the input forward
1. Input the training example to the network and compute the network outputs
//Propagate the errors backward
2. For each output unit k
3. For each hidden unit h
4.Update each network weight
where
( ) (1 )k k k k kt o o o
h k(1 )h h khk outputs
o o
ji ji ji
ji j jix
![Page 29: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/29.jpg)
Hidden layer Representations
![Page 30: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/30.jpg)
Hidden layer Representations
![Page 31: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/31.jpg)
Hidden layer Representations
![Page 32: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/32.jpg)
Hidden layer Representations
![Page 33: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/33.jpg)
Hidden layer Representations
![Page 34: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/34.jpg)
Convergence and local minima
• Converge to some local minimum and not necessarily to the global minimum error
• Use stochastic gradient descent rather than the standard gradient descent
• Initialization will influence the convergence. Training multiple networks network with different initializing random weights,over the same data, then select the best one
• Training can take thousands of iterations -->slow
• Initialize weights near zero, Therefore initial networks near linear. Increasingly nonlinear functions is possible as training progresses
• Add a momentum term to speed convergence
j ji( ) ( 1) (0 1)ji jin x n
![Page 35: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/35.jpg)
Expressive Capabilities of ANNs
• Every boolean function can be represented by network with single hidden layer
• Every bounded continuous function can be approximated with arbitrarily small error by network with one hidden layer
• Any function can be approximated to arbitrary accuracy by a network with two hidden layers
• The network with more hidden layers possibly results in the rise of precision , the possibility of converging to a local minima ,however, will increase as well.
![Page 36: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/36.jpg)
When to Consider Neural Networks
• Input is high dimensional discrete or real valued
• Output is discrete or real valued
• Output is a vector of values
• Possibly noisy data
• Form of target function is unknown
• Human readability of result is unimportant
![Page 37: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/37.jpg)
Overfitting in ANNs
![Page 38: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/38.jpg)
Strategy applied to avoid overfitting
• Poor strategy: continue training until the error falls below some threshold
• A good indicator : the number of iterations that produces the lowest error over the validation set
• Once the trained weights reach a significantly higher error over the validation set than the stored weights, terminate!
![Page 39: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/39.jpg)
Alternative Error Functions
![Page 40: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/40.jpg)
Recurrent Networks
![Page 41: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/41.jpg)
Recurrent Networks
![Page 42: Chapter 4 Artificial Neural Networks. Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649e495503460f94b3d68a/html5/thumbnails/42.jpg)
Thank you !