Neural Networks

11
Neural Networks 10701/15781 Recitation February 12, 2008 Parts of the slides are from previous years’ 10701 recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

description

Neural Networks. 10701 /15781 Recitation February 12, 2008. Parts of the slides are from previous years’ 10701 recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials. Recall Linear Regression. Prediction of continuous variables Learn the mapping f: X  Y - PowerPoint PPT Presentation

Transcript of Neural Networks

Page 1: Neural Networks

Neural Networks

10701/15781 Recitation

February 12, 2008

Parts of the slides are from previous years’ 10701 recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

Page 2: Neural Networks

Recall Linear Regression

Prediction of continuous variables

Learn the mapping f: X Y

Model is linear in the parameters w (+ some noise) Assume Gaussian noise Learn MLE w =

))(or()( i

iii

ii xwxwxf

)()( 1 YXXX TT

Page 3: Neural Networks

Neural Network

Neural nets are also models with w parameters in them. They are now called weights.

As before, we want to compute the weights to minimize sum-of-squared residuals Which turns out, under “Gaussian i.i.d noise” assumption

to be max. likelihood.

Instead of explicitly solving for max. likelihood weights, we use Gradient Descent

Page 4: Neural Networks

Input x=(x1,…, xn) and target value t:

or

Given training data {(x(l),t(l))}, find w which minimizes

Perceptrons

)()(Output1

0

n

iii xwwfo x

)(:1

1)(,where

)())(exp(1

1)(

10

10

sigmoide

netxwwnet

netxww

o

net

n

iii

n

iii

x

L

l

ll xotE1

2)()( ))((2

1

otherwise

0if

1

1)()(e.g.

netnetsigno x

Page 5: Neural Networks

Gradient descent

General framework for finding a minimum of a continuous (differentiable) function f(w) Start with some initial value w(1) and compute the

gradient vector The next value w(2)

is obtained by moving some distance from w(1) in the direction of steepest descent, i.e., along the negative of the gradient

)( )1(wf

)( )()()()1( kkkk f www

Page 6: Neural Networks

Gradient Descent on a Perceptron

The sigmoid perceptron update rule

llll

j

n

jjl

ljl

L

llljj

txw

xww

),(where

)1(

)(

0

)(

1

Page 7: Neural Networks

Boolean Functions

e.g using step activation function with threshold 0, can we learn the function X1 AND X 2 ?

X 1 OR X 2 ?

X 1 AND NOT X 2 ?

X 1 XOR X 2 ?

Page 8: Neural Networks

Multilayer Networks

The class of functions representable by perceptron is limited

Think of nonlinear functions:

))(()( i

ijij

j xwfWhxo

Page 9: Neural Networks

A 1-Hidden layer Net

Ninput=2, Nhidden=3, Noutput=1

Page 10: Neural Networks

Backpropagation

HW2 – Problem 2 Output in k-th output unit from input x

With bias: add a constant term for every non-input unit

Learn w to minimize

))(()( i

ijij

kjk xwfWfo x

K

kkk otE

1

2))((2

1x

Page 11: Neural Networks

Backpropagation

Initialize all weights Do until convergence

1. Input a training example to the network and compute the output ok

2. Update each hidden-to-output weight wkj by

3. Update each input-to-hidden weight wji by

jy

netfotyww

j

kkkkjkkjkj

unithiddenfromoutput:

,)(')(where

)(')(where jk

kkjjijjiji netfwyww