net09-23

download net09-23

of 12

Transcript of net09-23

  • 7/29/2019 net09-23

    1/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    1

    Refresher: Perceptron Training Algorithm

    Algorithm Perceptron;

    Start with a randomly chosen weight vectorw0;

    Let k = 1;

    while there exist input vectors that are

    misclassified by wk-1, doLet ij be a misclassified input vector;

    Let xk = class(ij)ij, implying that wk-1xk < 0;

    Update the weight vector to wk = wk-1 + xk;Increment k;

    end-while;

  • 7/29/2019 net09-23

    2/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    2

    Another Refresher: Linear Algebra

    How can we visualize a straight line defined by an

    equation such as w0 + w1i1 + w2i2 = 0?One possibility is to determine the points where theline crosses the coordinate axes:

    i1 = 0 w0 + w2i2 = 0 w2i2 = -w0 i2 = -w0/w2

    i2 = 0 w0 + w1i1 = 0 w1i1 = -w0 i1 = -w0/w1

    Thus, the line crosses at (0, -w0/w2)T and (-w0/w1, 0)

    T.

    If w1

    or w2

    is 0, it just means that the line is horizontalor vertical, respectively.

    If w0 is 0, the line hits the origin, and its slope i2/ii is:w1i1 + w2i2 = 0 w2i2 = -w1i1 i2/i1 = -w1/w2

  • 7/29/2019 net09-23

    3/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    3

    Perceptron Learning Example

    i1

    1 2 3-3 -2 -1

    i2

    1

    23

    -3-2

    -1

    -1

    1

    We would like our perceptron to correctly classify the

    five 2-dimensional data points below.Let the random initial weight vectorw0 = (2, 1, -2)

    T.

    Then the dividing line crosses at(0, 1)T and (-2, 0)T.

    class -1

    class 1

    Let us pick the misclassifiedpoint (-2, -1)T for learning:

    i = (1, -2, -1)T (include offset 1)

    x1 = (-1)(1, -2, -1)T (i is in class -1)

    x1 = (-1, 2, 1)T

  • 7/29/2019 net09-23

    4/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    4

    Perceptron Learning Example

    i1

    1 2 3-3 -2 -1

    i2

    1

    23

    -3-2

    -1

    -1

    1

    w1 = w0 + x1 (let us set = 1 for simplicity)

    w1 = (2, 1, -2)T + (-1, 2, 1)T = (1, 3, -1)T

    The new dividing line crosses at (0, 1)T and (-1/3, 0)T.

    Let us pick the next misclassifiedpoint (0, 2)T for learning:

    i = (1, 0, 2)T (include offset 1)

    x2 = (1, 0, 2)T

    (i is in class 1)

    class -1

    class 1

  • 7/29/2019 net09-23

    5/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    5

    Perceptron Learning Example

    i1

    1 2 3-3 -2 -1

    i2

    1

    23

    -3-2

    -1

    -1

    1

    w2 = w1 + x2

    w2 = (1, 3, -1)T + (1, 0, 2)T = (2, 3, 1)T

    Now the line crosses at (0, -2)T and (-2/3, 0)T.

    With this weight vector, theperceptron achieves perfectclassification!

    The learning process terminates.

    In most cases, many moreiterations are necessary than inthis example.class -1

    class 1

  • 7/29/2019 net09-23

    6/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    6

    Perceptron Learning Results

    We proved that the perceptron learning algorithm isguaranteed to find a solution to a classification problem

    if it is linearly separable.

    But are those solutions optimal?

    One of the reasons why we are interested in neural

    networks is that they are able to generalize, i.e., giveplausible output for new (untrained) inputs.

    How well does a perceptron deal with new inputs?

  • 7/29/2019 net09-23

    7/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    7

    Perceptron Learning ResultsPerfectclassification of

    training samples,but may notgeneralize well tonew (untrained)

    samples.

  • 7/29/2019 net09-23

    8/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    8

    Perceptron Learning ResultsThis functionis likely to

    performbetterclassificationon new

    samples.

  • 7/29/2019 net09-23

    9/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    9

    AdalinesIdea behind adaptive linear elements (Adalines):

    Compute a continuous, differentiable error functionbetween net input and desired output (before applyingthreshold function).

    For example, compute the mean squared error (MSE)

    between every training vector and its class (1 or -1).Then find those weights for which the error is minimal.

    With a differential error function, we can use the

    gradient descent technique to find this absoluteminimum in the error function.

  • 7/29/2019 net09-23

    10/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    10

    Gradient Descent

    Gradient descent is a very common technique to find

    the absolute minimum of a function.It is especially useful for high-dimensional functions.

    We will use it to iteratively minimizes the networks (orneurons) errorby finding the gradient of the error

    surface in weight-space and adjusting the weightsin the opposite direction.

  • 7/29/2019 net09-23

    11/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    11

    Gradient Descent

    Gradient-descent example: Finding the absolute

    minimum of a one-dimensional error function f(x):

    f(x)

    xx0

    slope: f(x0)

    x1 = x0 - f(x0)

    Repeat this iteratively until for some xi, f(xi) is

    sufficiently close to 0.

  • 7/29/2019 net09-23

    12/12

    September 23, 2010 Neural NetworksLecture 6: Perceptron Learning

    12

    Gradient DescentGradients of two-dimensional functions:

    The two-dimensional function in the left diagram is represented by contourlines in the right diagram, where arrows indicate the gradient of the function

    at different locations. Obviously, the gradient is always pointing in the

    direction of the steepest increase of the function. In order to find the

    functions minimum, we should always move against the gradient.