WK1 - Introduction

49
Contents Optimisati on Perceptron Convergenc e Conclusion s CS 476: Networks of Neural Computation, CSD, UOC, 2009 WK1 - Introduction CS 476: Networks of Neural Computation WK2 – Perceptron Dr. Stathis Kasderidis Dept. of Computer Science University of Crete Spring Semester, 2009

description

WK1 - Introduction. CS 476: Networks of Neural Computation WK2 – Perceptron Dr. Stathis Kasderidis Dept. of Computer Science University of Crete Spring Semester, 2009. Contents. Elements of Optimisation Theory Definitions Properties of Quadratic Functions - PowerPoint PPT Presentation

Transcript of WK1 - Introduction

Page 1: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

WK1 - Introduction

CS 476: Networks of Neural Computation

WK2 – Perceptron

Dr. Stathis Kasderidis

Dept. of Computer Science

University of Crete

Spring Semester, 2009

Page 2: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ContentsContents

Contents

•Elements of Optimisation Theory

•Definitions

•Properties of Quadratic Functions

•Model Algorithm for Smooth Functions

•Classical Optimisation Methods

•1st Derivative Methods

•2nd Derivative Methods

•Methods for Quadratic functions

•Other Methods

Page 3: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ContentsContents

Contents II

•Perceptron Model

•Convergence Theorem of Perceptron

•Conclusions

Page 4: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Definitions

•Types of Optimisation Problems:

•Unconstrained

•Linear Constraints

•Non-linear Constraints

•General nonlinear constrained optimisation problem definition:

•NCP: minimise F(x) x m

subject to ci(x)=0 i=1..m’

ci(x)0 i=m’..m

Page 5: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Definitions II

•Strong local minimum: A point x* is a SLM of NCP if there exists >0 such that:

•A1: F(x) is defined in N(x*, ); and

•A2: F(x*)<F(y) for all y N(x*, ), yx*

•Weak local minimum: A point x* is a WLM of NCP if there exists >0 such that:

•B1: F(x) is defined in N(x*, );

•B2: F(x*) F(y) for all y N(x*, ); and

•B3: x* is not a strong local minimum

Page 6: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Definitions III

•UCP: minimise F(x) x m

•Necessary conditions for a minimum of UCP:

•C1: ||g(x*)|| =0, i.e. x* is a stationary point;

•C2: G(x*) is positive semi-definite.

•Sufficient conditions for a minimum of UCP:

•D1: ||g(x*)|| =0;

•D2: G(x*) is positive definite.

Page 7: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Definitions IV

•Assume that we expand F in its Taylor series:

Where x,p m, 0 1, is a scalar and positive

•Any vector p which satisfies:

is called descent direction at x*

ppxGpxgpxFpxF TT )*(2

1*)(*)()*( 2

0*)( xgpT

Page 8: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Properties of Quadratic Functions

•Assume that a quadratic function is given by:

For some constant vector c and a constant symmetric matrix G (the Hessian matrix of ).

•The definition of implies the following relation between (x+p) and (x) for any vector x and scalar :

Gxxxcx TT

2

1)(

GppacGxpxpx TT 2

2

1)()()(

Page 9: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Properties of Quadratic Functions I

•The function has a stationary point when:

(x*) = Gx*+c = 0

•Consequently a stationary point must satisfy the system of the linear equations:

Gx* = -c

•The system might have:

•No solutions (c is not a linear combination of columns of G)

•Many solutions (if G is singular)

•A unique solution (if G is non-singular)

Page 10: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Properties of Quadratic Functions II

•If x* is a stationary point it follows that:

•Hence the behaviour of in a neighbourhood of x* is determiend by matrix G. Let j and uj denote the j-th eigenvalue and eigenvector of G. By definition:

G uj = j uj

•The symmetry of G implies that the set of {uj }, j=1..m are orthonormal. So, when p is equal to uj :

Gppaxapx T2

2

1*)()*(

jj axaux 22

1*)()*(

Page 11: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Properties of Quadratic Functions III

•Thus the change in when moving away from x* along the direction of uj depends on the sign of j :

•If j >0 strictly increases as || increases

•If j <0 is monotonically decreasing as || increases

•If j =0 the value of remain constant when moving along any direction parallel to uj

•If G is positive definite, x* is the global minimum of

Page 12: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Properties of Quadratic Functions IV

•If G is positive definite, x* is the global minimum of

•If G is positive semi-definite a stationary point (if exists) is a weak local minimum.

•If G is indefinite and non-singular, x* is a saddle point and is unbounded above and below

Page 13: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Model Algorithm for Smooth Functions

•Algorithm U(Model algorithm for m-dimensional unconstrained minimisation)

•Let xk be the current estimate of x*

•U1: [Test for convergence] If the conditions are satisfied, the algorithm terminates with xk

as the solution;

•U2: [Compute a search direction] Compute a non-zero m-vector pk the direction of search

Page 14: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Model Algorithm for Smooth Functions I

•U3: [Compute a step length] Compute a positive scalar k, the step length, for which it holds that:

F(xk+kpk) < F(xk)

•U4: [Update the estimate of the minimum] Set:

Xk+1 xk + kpk ,

k k+1

and go back to step U1.

•To satisfy the descent condition (Fk+1<Fk) pk

should be a descent direction:

gkTpk < 0

Page 15: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods: 1st derivative

•1st Derivative Methods:

•A linear approximation to F about xk is:

F(xk+p) = F(xk) + gkTp

•Steepest Descent Method: Select pk as:

p = - gk

•It can be shown that:

*)()(1

1

*)()(*)()(

2

2

2minmax

2minmax

1

xFxFk

k

xFxFxFxF

k

kk

Page 16: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods I: 1st derivative

is the spectral condition number of G

•The steepest descent method can be very slow if is large!

•Other 1st derivative methods include:

•Discrete Newton method: Approximates the Hessian, G, with finite differences of the gradient g

•Quasi-Newton methods: Approximate the curvature:

skTGsk (g(xk+sk) – g(xk))T sk

Page 17: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods II: 2nd derivative

•2nd Derivative Methods:

•A quadratic approximation to F about xk is:

F(xk+p) = F(xk) + gkTp+(1/2) pTGkp

•The function is minimised my finding the minimum of the quadratic function :

•This has a solution that satisfies the linear system:

Gkpk = -gk

pGppgp kTT

k 2

1)(

Page 18: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods III: 2nd derivative

•The vector pk in the previous equation is called the Newton direction and the method is called the Newton method

•Conjugate Gradient Descent: Another 2nd derivative method

Page 19: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods IV: Methods for Quadratics

•In many problems the function F(x) is a sum of squares:

•The i-component of the m-vector is the function fi(x), and ||f(x)|| is called the residual at x.

•Problems of this type appear in nonlinear parameter estimation. Assume that x n is a parameter vector and ti is the set of independent variables. Then the least-squares problem is:

minimise , x n

2

1

2 ||)(||2

1)(

2

1)( xfxfxF

m

ii

2

1

)(),(

m

iii tFtx

Page 20: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods V: Methods for Quadratics

•Where:

fi(x)=(x,ti)-yi and

Y=F(t) is the “true” function

• yi are the desired responses

•In the Least Squares Problem the gradient, g, and the Hessian, G, have a special structure.

•Assume that the Jacobian matrix of f(x) is denoted by J(x) (a mxn matrix) and let the matrix Gi(x) denote the Hessian of fi(x). Then:

Page 21: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods VI: Methods for Quadratics

g(x)=J(x)Tf(x) and

G(x)=J(x)TJ(x)+Q(x)

• where Q is:

•We observe that the Hessian is a special combination of first and second order information.

•Least-square methods are based on the premise that eventually the first order term J(x)TJ(x) dominates the second order one, Q(x)

)()()(1

xGxfxQ i

m

ii

Page 22: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods VII: Methods for Quadratics

•The Gauss-Newton Method:

•Let xk denote the current estimate of the solution; a quantity subscripted by k will denote that quantity evaluated at xk. From the Newton direction we get:

(JkTJk+Q)pk = -JkTfk

•Let the vector pN denote the Newton direction. If ||fk|| 0 as xk x* the matrix Qk 0. Thus the Newton direction can be approximated by the solution of the equations:

JkTJkpk = -JkTfk

Page 23: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods VIII: Methods for Quadratics

•The solution of the above problem is given by the solution to the linear least square problem:

minimise , p n

•The solution is unique if Jk has full rank. The vector pGN which solves the linear problem is called the Gauss-Newton direction. This vector approximates the Netwon direction pN as ||Qk|| 0

2||||2

1kk fpJ

Page 24: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods VIII: Methods for Quadratics

•The Levenberg-Marquardt Method:

•In this method the search direction is defined as the solution to the equations:

(JkTJk+kI)pk = -JkTfk

•Where k is a non-negative integer. A unit step is taken along pk, i.e.

xk+1 xk + pk

•It can be shown that for some scalar , related to k, the vector pk is the solution to the constrained subproblem:

Page 25: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Classical Optimisation Methods VIIII: Methods for Quadratics

minimise , p n

subject to ||p||2

•If k =0 pk is the Gauss Newton direction;

•If k , || pk|| 0 and pk becomes parallel to the steepest descent direction

2||||2

1kk fpJ

Page 26: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Other Methods

•Other methods are based on function values only. This category includes methods such as:

•Genetic Algorithms

•Simulated Annealing

•Tabu Search

•Guided Local Search

•etc

Page 27: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

OptimisationOptimisation

Additional References

•Practical Optimisation, P. Gill, W. Murray, M. Wright, Academic Press, 1981.

•Numerical Recipes in C/C++, Press et al, Cambridege University Press, 1988

Page 28: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Perceptron

•The model was created by Rosenblatt•It uses the nonlinear neuron of McCulloch-Pitts (which uses the Sgn as transfer function)

Page 29: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Perceptron Output

•The output y is calculated by:

•Where sgn() is defined as:

•The perceptron classifies an input vector x m to one of two classes C1 or C2

m

iii bxwSgnbxwSgny

1

)()(

01

01)(

xif

xifxSgn

Page 30: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Decision Boundary

•The case present above is called linearly separable classes

Page 31: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Learning Rule

•Assume that vectors are drawn from two classes C1 and C2,i.e.

•x1(1), x1(2), x1(3),…. belong to C1; and

•x2(1), x2(2), x2(3),…. belong to C2

•Assume also that we redefine vectors x(n) and w(n) such as to include the bias, i.e.

•x(n)=[+1,x1(n),…,xm(n)]T; and

•w(n)=[b(n),w1(n),…,wm(n)]T x, w m+1

Page 32: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Learning Rule II

•Then there should exist a weight vector w such that:

•wTx > 0, when x belongs to class C1; and

•wTx 0, when x belongs to class C2;

•We have selected arbitrary to include the case of wTx=0 to class C2

•The algorithm for adapting the weights of the perceptron may be formulated as follows:

Page 33: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Learning Rule III

1. If the nth member of the training set, x(n) is correctly classified by the current weight set, w(n) no correction is needed, i.e.

• w(n+1)=w(n) , if wTx(n) > 0 and x(n) belongs to class C1;

• w(n+1)=w(n) , if wTx(n) 0 and x(n) belongs to class C2;

Page 34: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Learning Rule IV

2. Otherwise, the weight vector is updated according to the rule:

• w(n+1)=w(n) - (n)x(n),

if wTx(n) > 0 and x(n) belongs to class C2;

• w(n+1)=w(n)+ (n)x(n),

if wTx(n) 0 and x(n) belongs to class C1;

• The parameter (n) is called learning rate and controls the adjustment to the weight vector at iteration n.

Page 35: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Learning Rule V

•If we assume that the desired response is given by:

•Then we can re-write the adaptation rule in the form of an error-correction learning rule:

w(n+1)=w(n)+ [d(n)-y(n)]x(n)

•Where the e(n)= [d(n)-y(n)] is the error signal

•The learning rate is a positive constant in the range 0< 1.

2

1

)(1

)(1)(

Cclasstobelongsnxif

Cclasstobelongsnxifnd

Page 36: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Learning Rule VI

•When we assign a value to in the range (0,1] we must keep in mind two conflicting requirements:

•Averaging of past inputs to provide stable weights estimates, which requires a small

•Fast adaptation with respect to real changes in the underlying distributions of the process responsible for the generation of the input vector x, which requires a large

Page 37: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Summary of the Perceptron Algorithm

•Variables and Parameters:

•x(n)=(m+1)-by-1 input vector

= [+1, x1(n),…, xm(n)]T

•w(n)=(m+1)-by-1 weight vector

= [b(n), w1(n),…, wm(n)]T

•b(n)=bias

•y(n)=actual response

•d(n)=desired response

=learning rate in (0,1]

Page 38: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Summary of the Perceptron Algorithm I

1. Initialisation: Set w(0)=0. Then perform the following computations for time steps n=1,2,….

2. Activation: At time step n, activate the perceptron by applying the input vector x(n) and desired response d(n).

3. Computation of Actual Response: Compute the actual response of the perceptron by using:

y(n)=Sgn(wT(n)x(n) )

where Sgn(•) is the signum function.

Page 39: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

PerceptronPerceptron

Summary of the Perceptron Algorithm II

4. Adaptation of Weight Vector: Update the weight vector of the perceptron by using:

w(n+1)=w(n)+ [d(n)-y(n)]x(n)

where:

5. Continuation: Increment the time step n by one and go back to step 2.

2

1

)(1

)(1)(

Cclasstobelongsnxif

Cclasstobelongsnxifnd

Page 40: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConvergenceConvergence

Perceptron Convergence Theorem

•We present a proof of the fact that the perceptron needs only a finite number of steps in order to converge (i.e. to find the correct weight vector if this exists)•We assume that w(0)=0. If this is not the case the proof still stands but the number of the number of steps that are needed for convergence is increased or decreased

•Assume that vectors are which are drawn from two classes C1 and C2,form two subsets, i.e.

•H1={x1(1), x1(2), x1(3),….} belong to C1; and

•H2={ x2(1), x2(2), x2(3),….} belong to C2

Page 41: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConvergenceConvergence

Perceptron Convergence Theorem I

•Suppose that w(n)Tx(n) < 0 for n=1,2,… and the input vector belongs to H1

•Thus for (n)=1 we can write the (actually incorrect) weight update equation as:

w(n+1)=w(n)+ x(n),

for x(n) belonging to class C1;

•Given the initial condition w(0)=0, we can solve iteratively the above equation and obtain the result:

w(n+1)=x(1)+x(2)+…+x(n) (E.1)

Page 42: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConvergenceConvergence

Perceptron Convergence Theorem II

•Since classes C1 and C2 are linearly separable there exists w0 such that wTx(n) > 0 for all vectors x(1), x(2), …, x(n) belonging to H1. For a fixed solution w0 we can define a positive number as: )(min 0

)( 1

nxwa T

Hnx

•Multiplying both of E.1 with w0T we get:

w0T w(n+1)= w0

T x(1)+ w0T x(2)+…+ w0

T x(n)

•So we have finally:w0

T w(n+1) n (E.2)

Page 43: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConvergenceConvergence

Perceptron Convergence Theorem III

•We use the Cauchy-Schwarz inequality for two vectors which states that:

||w0||2|| w(n+1)||2 [w0T w(n+1)]2

•Where ||•|| denotes the Euclidean norm of the vector and the inner product w0

T w(n+1) is a scalar. Then from E.2 we get:

||w0||2|| w(n+1)||2 [w0T w(n+1)]2 n2 2

or alternatively: )3.(||||

||)1(||2

0

222 E

w

nnw

Page 44: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConvergenceConvergence

Perceptron Convergence Theorem IV

•Now using:

w(k+1)=w(k)+ x(k),

for x(k) belonging to class C1; k=1,..,n

•And taking the Euclidean norm we get:

||w(k+1)||2=||w(k)||2+ ||x(k)||2+2 w(k)Tx(k),

•Which under the assumption of wrong classification (i.e. wTx(n) < 0 ) leads to:

||w(k+1)||2 ||w(k)||2+ ||x(k)||2

Page 45: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConvergenceConvergence

Perceptron Convergence Theorem V

•Or finally to:||w(k+1)||2 - ||w(k)||2 ||x(k)||2

•Adding all these inequalities for k=1,…,n and using the initial condition w(0)=0 we get:

•Where is a positive number defined as:

)4.(||)(||||)1(||1

2 Enkxnwn

k

)5.(||)(||max 2

)( 1

EkxHkx

Page 46: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConvergenceConvergence

Perceptron Convergence Theorem VI

•E.4 states that the Euclidean norm of vector w(n+1) grows at most linearly with the number of iterations n.•But this result is in conflict with E.3 for large enough n. •Thus we can state that n cannot be larger than some value nmax for which both E.3 and E.5 are simultaneously satisfied with the equality sign. That is nmax is the solution of the equation:

max2

0

22max

||||n

w

n

Page 47: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConvergenceConvergence

Perceptron Convergence Theorem VII

•Solving for nmax we get:

•This proves the perceptron algorithm will terminate after a finite number of steps. However, observe that there exists no unique solution for nmax due to non-uniqueness of w0

2

20

max

||||

w

n

Page 48: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConclusionsConclusions

Conclusions

•There are many optimistion methods. With decreasing power we present the methods in the following list:

•2nd derivative methods: e.g. Netwon

•1st derivative methods: e.g. Quasi-Newton

•Function value based methods: e.g. Genetic algorithms

•The perceptron is a model which classifies an input vector to one of two exclusive classes C1 and C2

•The perceptron uses an error-correction style rule for weight update

Page 49: WK1 - Introduction

Contents

Optimisation

Perceptron

Convergence

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

ConclusionsConclusions

Conclusions I

•The perceptron learning rule converges in finite number of steps