CS536: Machine Learning Artificial Neural Networks Neural Networks
Neural Networks
-
Upload
autumn-hammond -
Category
Documents
-
view
25 -
download
0
description
Transcript of Neural Networks
Neural Networks
10701/15781 Recitation
February 12, 2008
Parts of the slides are from previous years’ 10701 recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.
Recall Linear Regression
Prediction of continuous variables
Learn the mapping f: X Y
Model is linear in the parameters w (+ some noise) Assume Gaussian noise Learn MLE w =
))(or()( i
iii
ii xwxwxf
)()( 1 YXXX TT
Neural Network
Neural nets are also models with w parameters in them. They are now called weights.
As before, we want to compute the weights to minimize sum-of-squared residuals Which turns out, under “Gaussian i.i.d noise” assumption
to be max. likelihood.
Instead of explicitly solving for max. likelihood weights, we use Gradient Descent
Input x=(x1,…, xn) and target value t:
or
Given training data {(x(l),t(l))}, find w which minimizes
Perceptrons
)()(Output1
0
n
iii xwwfo x
)(:1
1)(,where
)())(exp(1
1)(
10
10
sigmoide
netxwwnet
netxww
o
net
n
iii
n
iii
x
L
l
ll xotE1
2)()( ))((2
1
otherwise
0if
1
1)()(e.g.
netnetsigno x
Gradient descent
General framework for finding a minimum of a continuous (differentiable) function f(w) Start with some initial value w(1) and compute the
gradient vector The next value w(2)
is obtained by moving some distance from w(1) in the direction of steepest descent, i.e., along the negative of the gradient
)( )1(wf
)( )()()()1( kkkk f www
Gradient Descent on a Perceptron
The sigmoid perceptron update rule
llll
j
n
jjl
ljl
L
llljj
txw
xww
),(where
)1(
)(
0
)(
1
Boolean Functions
e.g using step activation function with threshold 0, can we learn the function X1 AND X 2 ?
X 1 OR X 2 ?
X 1 AND NOT X 2 ?
X 1 XOR X 2 ?
Multilayer Networks
The class of functions representable by perceptron is limited
Think of nonlinear functions:
))(()( i
ijij
j xwfWhxo
A 1-Hidden layer Net
Ninput=2, Nhidden=3, Noutput=1
Backpropagation
HW2 – Problem 2 Output in k-th output unit from input x
With bias: add a constant term for every non-input unit
Learn w to minimize
))(()( i
ijij
kjk xwfWfo x
K
kkk otE
1
2))((2
1x
Backpropagation
Initialize all weights Do until convergence
1. Input a training example to the network and compute the output ok
2. Update each hidden-to-output weight wkj by
3. Update each input-to-hidden weight wji by
jy
netfotyww
j
kkkkjkkjkj
unithiddenfromoutput:
,)(')(where
)(')(where jk
kkjjijjiji netfwyww