Lecture 3 Perceptrons

46
7/21/2019 Lecture 3 Perceptrons http://slidepdf.com/reader/full/lecture-3-perceptrons 1/46 The Linear Classifier, also known as the “Perceptron” COMP24111 lecture 3

Transcript of Lecture 3 Perceptrons

Page 1: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 1/46

The Linear Classifier,

also known as the “Perceptron”

COMP24111 lecture 3

Page 2: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 2/46

LAST WEEK: our first“machine learning” algorithm

Testing point x

For each training datapoint x’

 measure distance(x,x’)

End 

Sort distances

Select K nearest

 Assign most common class 

The K-Nearest Neighbour Classifier

Make your own notes on itsadvantages / disadvantages.

I’ll ask for volunteers nexttime we meet…..

Page 3: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 3/46

Model(memorize the

training data)

Testing Data(no labels)

Training data

Predicted Labels

Learning algorithm

(do nothing)

Supervised Learning Pipeline for Nearest Neighbour

Page 4: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 4/46

Themost important concept in Machine Learning

Page 5: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 5/46

Looks good so far…

Themost important concept in Machine Learning

Page 6: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 6/46

Looks good so far…

Oh no! Mistakes! 

What happened?

Themost important concept in Machine Learning

Page 7: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 7/46

Looks good so far…

Oh no! Mistakes! 

What happened?

We didn’t have all the data.

We can never assume that we do.

This is called “OVER-FITTIN”to the small dataset.

Themost important concept in Machine Learning

Page 8: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 8/46

The Linear Classifier

COMP24111 lecture 3

Page 9: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 9/46

Model(equation)

Testing Data(no labels)

Training data

Predicted Labels

Learning algorithm

(search for good

parameters)

Supervised Learning Pipeline for Linear Classifiers

Page 10: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 10/46

A more simple,compact model?

height

weightt 

dancer""else player""then)(if    t weight  >

Page 11: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 11/46

What’s an algorithm to find a good threshold?

height

weight

dancer""else player""then)(if    t weight  >

t=40

while ( numMistakes != 0 ){

t = t + 1

numMistakes = testRule(t)

}

Page 12: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 12/46

We have our second Machine Learning procedure.

dancer""else player""then)(if    t weight  >

t=40while ( numMistakes != 0 )

{

t = t + 1

numMistakes = testRule(t)

}

The threshold classifier (also known as a“Decision Stump”)

Page 13: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 13/46

Three“ingredients” of a Machine Learning procedure

“Model”The final product, the thing you have to package upand send to a customer. A piece of code with some parameters that need to be set.

“Error function”The performance criterion: the function you use to judge how well the parameters of the model are set.

“Learningalgorithm”

The algorithm that optimises the model parameters,using the error function to judge how well it is doing.

Page 14: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 14/46

Three“ingredients” of a Threshold Classifier

Model

Learningalgorithm

t=40

while ( numMistakes != 0 )

{ t = t + 1

numMistakes = testRule(t)

}

Error function

dancer""else player""then)(if    t  x >

General case – we’ re not just talking about the weight

of a rugby player – a threshold can be put on any feature‘  x’ .

Page 15: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 15/46

Page 16: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 16/46

Model

(memorize the

training data)

Testing Data(no labels)

Training data

Predicted Labels

Learning algorithm

(do nothing)

Supervised Learning Pipeline for Nearest Neighbour

Page 17: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 17/46

What’s the“model” for the Nearest Neighbour classifier?

height

weight

For the k-nn, the model is thetraining data itself ! - very good accuracy - very computationally intensive!

Testing point x

For each training datapoint x’

 measure distance(x,x’)

End 

Sort distances

Select K nearest

 Assign most common class 

Page 18: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 18/46

height

weight

New data: what’s an algorithm to find a good threshold?

Our model does notmatch the problem!

1 mistake…

dancer""else player""then)(if    t weight  >

Page 19: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 19/46

height

weight

New data: what’s an algorithm to find a good threshold?

But our current modelcannot represent this…

 

dancer""else player""then)(if    t weight  >

Page 20: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 20/46

We need a more sophisticated model…

dancer""else player""then)(if    t  x >

Page 21: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 21/46

Page 22: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 22/46

nput signals sent

from other neurons

f enough sufficient

signals accumulate! theneuron fires a signal"

#onnection strengths determine

ho$ the signals are accumulated

Page 23: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 23/46

1 x

2 x

3 x

add 

)( t aif    >1=output  output 

 signal 

• input signals!x’ and coefficients!w’ are multiplied

• weights correspond to connection strengths

• signals are added up – if they are enough, FIRE!

else0=output 

1w

2w

3w

i

 M 

i

iw xa ∑=

=1

incoming

signal

connection

strengthacti%ation

le%el

output

signal

Page 24: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 24/46

Sum notation

(just like a loop from 1 to

M)

double[] x =

double[] =

Multiple corresponding

elements and add them up

a

if (actiation ! threshold) "#$% &

(actiation)

i

 M 

i

iw xa ∑=

=1

Calculation…

Page 25: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 25/46

t >if  0else '1then ==   output output    

  ∑

=

i

 M 

i

iw x1

The Perceptron Decision Rule

Page 26: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 26/46

out"ut # $out"ut # %

t >if  0else '1then ==   output output   

  

 ∑=

i

 M 

i

iw x1

Ru&'( "la(er # $)allet dancer # %

Page 27: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 27/46

s this a good decision boundar&'

t >if  0else '1then ==   output output  

 

 

 

 ∑=

i

 M 

i

iw x1

Page 28: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 28/46

w$ # $.%

 

w* # %.*

t # %.%+

t >if  0else '1then ==   output output  

 

 

 

 ∑=

i

 M 

i

iw x1

Page 29: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 29/46

w$ # *.$

 

w* # %.*

t # %.%+

t >if  0else '1then ==   output output  

 

 

 

 ∑=

i

 M 

i

iw x1

Page 30: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 30/46

w$ # $.,

 

w* # %.%*

t # %.%+

t >if  0else '1then ==   output output  

 

 

 

 ∑=

i

 M 

i

iw x1

Page 31: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 31/46

han&in& the wei&htsthreshold ma/es the decision 'oundar( move.

0ointless im"ossi'le to do it '( hand 1 onl( o/ 2or sim"le *-3 case.

We need an al&orithm4.

w$ # -%.5

 

w* # %.%6

t # %.%+

Page 32: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 32/46

 *0 '*0 '2*0 +=w

 0*2 '*0 '0*1 += x

,1* -hat is the actiation' a' of the neuron.

,2* /oes the neuron fire.

,3* -hat if e set threshold at 0* and eight 3 to ero.

 

0*1=t 

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

Take a 20 minute break

and think about this.

Page 33: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 33/46

20 minute break

Page 34: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 34/46

 *0 '*0 '2*0 +=w

 0*2 '*0 '0*1 += x

0*1=t 

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

3*1)*00*2()*0*0()2*00*1(1

=×+×+×==∑=

 M 

i

iiw xa

" *hat is the acti%ation! a! of the neuron'

+" Does the neuron fire'

if (activation > threshold) output else output" 

  …# $o %es& it fires# 

Page 35: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 35/46

 *0 '*0 '2*0 +=w

 0*2 '*0 '0*1 += x

0*1=t 

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

w$

w*

w6

7$

7*

76

∑=

= M 

i

iiw xa1

*0)0*00*2()*0*0()2*00*1(1

=×+×+×==∑=

 M 

i

iiw xa

," *hat if $e set threshold at -". and $eight /, to zero'

if (activation > threshold) output else output" 

  …# $o no& it does not fire## 

Page 36: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 36/46

We need a more sophisticated model…

height

weight

dancer""else player""then)(if    t weight  >

dancer""else player""then))((if    t  x f     > )(

)(

2

1

kg weight  x

cmheight  x

==

i

i

i xw∑=

=1

)4()4()( 2211   xw xw x f     +=

The Perceptron

Page 37: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 37/46

The Perceptron

i

i

i xw∑=

=1

)4()4()( 2211   xw xw x f     +=

height

weight

height

weight

dancer""else player""then)(if    t  x f     >

, and change the position of the DECISION BOUNDARYt 1w 2w

Decisionboundary

Page 38: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 38/46

The Perceptron

error)tionclassifica(a*k*a* mistakesof  5um6erError function

height

weight

Model

Learning algo.  alues***andtheoptimisetoneed****... t w

0

=

=

"dancer" 

"player" 0y7else1y7thenif 

1

==>∑=

t  xw i

i

i

Page 39: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 39/46

Perceptron Learning Rule

ne eight 8 old eight 9 0*1 ( true:a6el ; output ) input×

i24 8 target = !9 output = ! : 4. then u"date # ;

i24 8 target = !9 output = " : 4. then u"date # ;

i24 8 target = "9 output = ! : 4. then u"date # ;

i24 8 target = "9 output = " : 4. then u"date # ;

×

What 'eight updates do these cases produce?

update

Page 40: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 40/46

initialise weights to random numbers in range -1 to +1

or n = 1 to "M#$%&R'%$

or ea*h training eam,le (.)

*al*ulate a*ti/ation

or ea*h weight

u,date weight b. learning rule

end

end

end

Perceptron convergence theorem:

 If the data is linearly separable, then application of the

 Perceptron learning rule will find a separating decision boundary,

within a finite number of iterations

Learning algorithm for the Perceptron

Page 41: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 41/46

Model(if… then…)

Testing Data(no labels)

Training data

Predicted Labels

Learning algorithm(search for good

parameters)

Supervised Learning Pipeline for Perceptron

N d l l bl

Page 42: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 42/46

New data…. non-linearly separable 

height

weight

Our model does notmatch the problem!

(AGAIN!)

Many mistakes!

dancer""else player""thenif 1

t  xw i

i

i   >∑=

M lil P

Page 43: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 43/46

Multilayer Perceptron

x1

x2

x3

x4

x5

Page 44: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 44/46

MLPd ii b d li bl ld!

Page 45: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 45/46

MLP decision boundary – nonlinear problems, solved!

height

weight

N lNt k

Page 46: Lecture 3 Perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 46/46

Neural Networks - summary

Perceptrons are a (simple) emulation of a neuron.

Layering perceptrons gives you… a multilayer perceptron.An MLP is one type of neural network – there are others.

An MLP with sigmoid activation functions can solve highly nonlinear problems.

Downside – we cannot use the simple perceptron learning algorithm.

Instead we have the backpropagation  algorithm.

This is outside the scope of this introductory course.