Neural Networks

Neural Networks

10701/15781 Recitation

February 12, 2008

Parts of the slides are from previous years’ 10701 recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

Recall Linear Regression

Prediction of continuous variables

Learn the mapping f: X Y

Model is linear in the parameters w (+ some noise) Assume Gaussian noise Learn MLE w =

))(or()( i

iii

ii xwxwxf

)()( 1 YXXX TT

Neural Network

Neural nets are also models with w parameters in them. They are now called weights.

As before, we want to compute the weights to minimize sum-of-squared residuals Which turns out, under “Gaussian i.i.d noise” assumption

to be max. likelihood.

Instead of explicitly solving for max. likelihood weights, we use Gradient Descent

Input x=(x1,…, xn) and target value t:

or

Given training data {(x(l),t(l))}, find w which minimizes

Perceptrons

)()(Output1

0

n

iii xwwfo x

)(:1

1)(,where

)())(exp(1

1)(

10

10

sigmoide

netxwwnet

netxww

o

net

n

iii

n

iii

x

L

l

ll xotE1

2)()( ))((2

1

otherwise

0if

1

1)()(e.g.

netnetsigno x

Gradient descent

General framework for finding a minimum of a continuous (differentiable) function f(w) Start with some initial value w(1) and compute the

gradient vector The next value w(2)

is obtained by moving some distance from w(1) in the direction of steepest descent, i.e., along the negative of the gradient

)( )1(wf

)( )()()()1( kkkk f www

Gradient Descent on a Perceptron

The sigmoid perceptron update rule

llll

j

n

jjl

ljl

L

llljj

txw

xww

),(where

)1(

)(

0

)(

1

Boolean Functions

e.g using step activation function with threshold 0, can we learn the function X1 AND X ２ ?

X １ OR X ２ ?

X １ AND NOT X ２ ?

X １ XOR X ２ ?

Multilayer Networks

The class of functions representable by perceptron is limited

Think of nonlinear functions:

))(()( i

ijij

j xwfWhxo

A 1-Hidden layer Net

Ninput=2, Nhidden=3, Noutput=1

Backpropagation

HW2 – Problem 2 Output in k-th output unit from input x

With bias: add a constant term for every non-input unit

Learn w to minimize

))(()( i

ijij

kjk xwfWfo x

K

kkk otE

1

2))((2

1x

Backpropagation

Initialize all weights Do until convergence

1. Input a training example to the network and compute the output ok

2. Update each hidden-to-output weight wkj by

3. Update each input-to-hidden weight wji by

jy

netfotyww

j

kkkkjkkjkj

unithiddenfromoutput:

,)(')(where

)(')(where jk

kkjjijjiji netfwyww

Neural Networks

Documents

Transcript of Neural Networks