net09-23

7/29/2019 net09-23

1/12

September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

1

Refresher: Perceptron Training Algorithm

Algorithm Perceptron;

Start with a randomly chosen weight vectorw0;

Let k = 1;

while there exist input vectors that are

misclassified by wk-1, doLet ij be a misclassified input vector;

Let xk = class(ij)ij, implying that wk-1xk < 0;

Update the weight vector to wk = wk-1 + xk;Increment k;

end-while;

7/29/2019 net09-23

2/12


2

Another Refresher: Linear Algebra

How can we visualize a straight line defined by an

equation such as w0 + w1i1 + w2i2 = 0?One possibility is to determine the points where theline crosses the coordinate axes:

i1 = 0 w0 + w2i2 = 0 w2i2 = -w0 i2 = -w0/w2

i2 = 0 w0 + w1i1 = 0 w1i1 = -w0 i1 = -w0/w1

Thus, the line crosses at (0, -w0/w2)T and (-w0/w1, 0)

T.

If w1

or w2

is 0, it just means that the line is horizontalor vertical, respectively.

If w0 is 0, the line hits the origin, and its slope i2/ii is:w1i1 + w2i2 = 0 w2i2 = -w1i1 i2/i1 = -w1/w2

7/29/2019 net09-23

3/12


3

Perceptron Learning Example

i1

1 2 3-3 -2 -1

i2

1

23

-3-2

-1

-1

1

We would like our perceptron to correctly classify the

five 2-dimensional data points below.Let the random initial weight vectorw0 = (2, 1, -2)

T.

Then the dividing line crosses at(0, 1)T and (-2, 0)T.

class -1

class 1

Let us pick the misclassifiedpoint (-2, -1)T for learning:

i = (1, -2, -1)T (include offset 1)

x1 = (-1)(1, -2, -1)T (i is in class -1)

x1 = (-1, 2, 1)T

7/29/2019 net09-23

4/12


4


i1

1 2 3-3 -2 -1

i2

1

23

-3-2

-1

-1

1

w1 = w0 + x1 (let us set = 1 for simplicity)

w1 = (2, 1, -2)T + (-1, 2, 1)T = (1, 3, -1)T

The new dividing line crosses at (0, 1)T and (-1/3, 0)T.

Let us pick the next misclassifiedpoint (0, 2)T for learning:

i = (1, 0, 2)T (include offset 1)

x2 = (1, 0, 2)T

(i is in class 1)

class -1

class 1

7/29/2019 net09-23

5/12


5


i1

1 2 3-3 -2 -1

i2

1

23

-3-2

-1

-1

1

w2 = w1 + x2

w2 = (1, 3, -1)T + (1, 0, 2)T = (2, 3, 1)T

Now the line crosses at (0, -2)T and (-2/3, 0)T.

With this weight vector, theperceptron achieves perfectclassification!

The learning process terminates.

In most cases, many moreiterations are necessary than inthis example.class -1

class 1

7/29/2019 net09-23

6/12


6

Perceptron Learning Results

We proved that the perceptron learning algorithm isguaranteed to find a solution to a classification problem

if it is linearly separable.

But are those solutions optimal?

One of the reasons why we are interested in neural

networks is that they are able to generalize, i.e., giveplausible output for new (untrained) inputs.

How well does a perceptron deal with new inputs?

7/29/2019 net09-23

7/12


7

Perceptron Learning ResultsPerfectclassification of

training samples,but may notgeneralize well tonew (untrained)

samples.

7/29/2019 net09-23

8/12


8

Perceptron Learning ResultsThis functionis likely to

performbetterclassificationon new

samples.

7/29/2019 net09-23

9/12


9

AdalinesIdea behind adaptive linear elements (Adalines):

Compute a continuous, differentiable error functionbetween net input and desired output (before applyingthreshold function).

For example, compute the mean squared error (MSE)

between every training vector and its class (1 or -1).Then find those weights for which the error is minimal.

With a differential error function, we can use the

gradient descent technique to find this absoluteminimum in the error function.

7/29/2019 net09-23

10/12


10

Gradient Descent

Gradient descent is a very common technique to find

the absolute minimum of a function.It is especially useful for high-dimensional functions.

We will use it to iteratively minimizes the networks (orneurons) errorby finding the gradient of the error

surface in weight-space and adjusting the weightsin the opposite direction.

7/29/2019 net09-23

11/12


11

Gradient Descent

Gradient-descent example: Finding the absolute

minimum of a one-dimensional error function f(x):

f(x)

xx0

slope: f(x0)

x1 = x0 - f(x0)

Repeat this iteratively until for some xi, f(xi) is

sufficiently close to 0.

7/29/2019 net09-23

12/12


12

Gradient DescentGradients of two-dimensional functions:

The two-dimensional function in the left diagram is represented by contourlines in the right diagram, where arrows indicate the gradient of the function

at different locations. Obviously, the gradient is always pointing in the

direction of the steepest increase of the function. In order to find the

functions minimum, we should always move against the gradient.

net09-23

Documents

Transcript of net09-23