1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau...

75
1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau ([email protected]) AgroParisTech based on slides by Antoine Cornuejols

Transcript of 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau...

Page 1: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

1L. OrseauNeural Networks

Neural Networks

EFREI 2010

Laurent Orseau([email protected])

AgroParisTech

based on slides by Antoine Cornuejols

Page 2: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

2L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 3: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

3L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 4: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

4L. OrseauNeural Networks

Introduction: Why neural networks?

• Biological inspiration

Natural brain: a very seductive model

– Robust and fault tolerant

– Flexible. Easily adaptable

– Can work with incomplete, uncertain, noisy data ...

– Massively parallel

– Can learn

Neurons

– ≈ 1011 neurons in the human brain

– ≈ 104 connections (synapses + axons) / neuron

– Action potential / refractory period / neurotransmitters

– Excitatory / inhibitory signals

Page 5: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

5L. OrseauNeural Networks

Introduction: Why neural networks?

• Some propertiesproperties

Parallel computation

Directly implementable on dedicated circuits

Robust and fault tolerant (distributed representation)

Simple algorithms

Very general

• Some defectsdefects

Opacity of acquired knowledge

Page 6: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

6L. OrseauNeural Networks

Historical notes (quickly)

Premises

– Mc Culloch & Pitts (1943): 1st formal neuron model.

neuron and logical calculus: base of artificial intelligence.

– Hebb rule (1949): learning by reinforcing synaptic coupling

First realizations

– ADALINE (Widrow-Hoff, 1960)

– PERCEPTRON (Rosenblatt, 1958-1962)

– Analysis of Minsky & Papert (1969)

New models

– Kohonen (competitive learning), ...

– Hopfield (1982) (recurrent net)

– multi-layer perceptron(1985)

Analysis and developments

– Control theory, generalization (Vapnik), ...

Page 7: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

7L. OrseauNeural Networks

The perceptron

Rosenblatt (1958-1962)

Page 8: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

8L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 9: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

9L. OrseauNeural NetworksLinear discrimination: the perceptron

[Rosenblatt, 1957,1962]

Decision function:

Bias nodeOutput node

Input nodes

Page 10: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

11L. OrseauNeural NetworksLinear discrimination: the perceptron

• Geometry - 2 classes

Page 11: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

12L. OrseauNeural NetworksLinear discrimination: the perceptronDiscrimination contre tous les autresDiscrimination against all others

• Geometry - multiclass

Ambiguous region

Page 12: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

13L. OrseauNeural NetworksLinear discrimination: the perceptronDiscrimination entre deux classesDiscrimination against all others

• Geometry – multiclass

•N(N-1)/2 discriminant functions

Page 13: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

14L. OrseauNeural NetworksThe perceptron: Performance criterion

• Optimization criterion (error function): Total # classification errors: NO

Perceptron criterion:

For all forms of learning, we want:

Proportional to the distance to the decision surface (for all wrongly classified

examples)

Piecewise linear and continuous function

wT x 0

< 0

x 1

2

Page 14: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

15L. OrseauNeural Networks

Direct learning: pseudo-inverse method

• Direct solution (pseudo-inverse method) requires:

Knowledge of all pairs (xi,yi)

Matrix inversion (often ill defined)

(only for linear network and quadratic error function)

• Requires an iterative method without matrix inversion

Gradient descent

Page 15: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

16L. OrseauNeural NetworksThe perceptron: algorithm

• Exploration method of H Gradient search

– Minimization of error function

– Principle: in the spirit of the Hebb rule:

modify connection proportionally to input and output

– Learn only if classification error

Algorithm:

if example is correctly classified: do nothing

otherwise:

Loop over all training examples until a stopping criterion

Convergence?

w(t 1) w(t) xi ui

Page 16: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

17L. OrseauNeural NetworksThe perceptron: convergence memory capacity

• Questions:

What can be learned?

– Result from [Minsky & Papert,68]: linear separators

Convergence guaranties?

– Perceptron convergence theorem [Rosenblatt,62]

Reliability of learning and number of examples

– How many examples do we need to have some guaranty about what should

be learned?

Page 17: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

18L. OrseauNeural Networks

Expressive power: Linear separations

Page 18: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

19L. OrseauNeural Networks

Expressive power: Linear separations

Page 19: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

20L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 20: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

22L. OrseauNeural Networks

The multi-layer perceptron

• Usual topology

Signal flow

Input : xk

Input layer Output layerHidden layer

Output: yk

Desired output: uk

Page 21: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

23L. OrseauNeural Networks

The multi-layer perceptron: propagation

• For each neuron:

wjk : weightweight of the connection from node j to node k

ak : activationactivation of node k

g : activationactivation functionfunction

g(a) 1

1 e a

yl g w jk jj 0, d

g(ak )

g’(a) = g(a)(1-g(a))

Radial Basis Function

sigmoïdal function

Threshold function

rail function

Activation ai

Sortie zi

+1

+1

+1

Page 22: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

24L. OrseauNeural Networks

The multi-layer perceptron: the XOR example

A

B

C

x1

x2

y

Bias

Weight

Weigth

-0.5

1-1.5

1

11

1

-0.5

-1

A B C

Page 23: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

25L. OrseauNeural Networks

Example of network (JavaNNS)

Page 24: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

26L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 25: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

27L. OrseauNeural Networks

The MLP: learning

• Find weights such that the network makes a input-output mapping consistent with the given examples

(same old generalization problem)

• Learning:

Minimize loss function E(w,{xl,ul}) in function of w

Use a gradient descent method

(gradient back-propagation algorithm )

Inductive principle: We suppose that what works on training examples (empirical risk minimization) should work on test (unseen) examples (real risk minimization)

wij E wij

Page 26: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

28L. OrseauNeural Networks

Learning: gradient descent

• learning = search in the multidimensional parameter space (synaptic weights) to minimize loss function

• Almost all learning rules

= gradient descent method

Optimal solution w* so that

wij(1) wij

( ) E

wij w( )

E(1) E( ) w E

E(w* ) 0

=

w1

,

w2

, ...,

w N

T

Page 27: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

29L. OrseauNeural Networks

The multi-layer perceptron: learning

Goal:

Algorithm (gradient back-propagation): gradient descent

Iterative algorithm:

Off-line case (total gradient):

where:

On-line case: (stochastic gradient):

w( t ) w( t 1) Ew(t )

wij (t) wij (t 1) (t)1

m

RE (xk ,w)

wijk1

m

wij (t) wij (t 1) (t)RE(xk,w)

wij

RE(xk ,w) [tk f (xk ,w)]2

w * argminw

1

my(xl ; w) u(xl ) 2

l1

m

Page 28: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

30L. OrseauNeural Networks

The multi-layer perceptron: learning

1. Take one example from training set

2. Compute output state of network

3. Compute error = fct(output – desired output) (e.g. = (yl - ul)2)

4. Compute gradients

With gradient back-propagation algorithm

5. Modify synaptic weights

6. Stopping criterion

Based on global error, number of examples, etc.

7. Go back to 1

Page 29: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

31L. OrseauNeural Networks

MLP: gradient back-propagation

• The problem: Determine responsibilities (“credit assignment problem”)What connection is responsible, and of how much, on error E ?

• Principle: Compute error on a connection in function of the error on the next layer

• Two steps:

1. Evaluation of error derived relative to weights

2. Use of these derivates to compute the modification on each weight

Page 30: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

32L. OrseauNeural Networks

1. Evaluation of error Ej (or E) due to each connection:

Idea: compute error on connection wji in function of error after node j

For nodes in the output layer:

For nodes in the hidden layer:

MLP: gradient back-propagation

E l

wij

k E l

ak

g' (ak ) E l

yk

g' (ak ) uk(xl) yk

j E l

aj

E l

ak

ak

ajk k

ak

zj

zj

a jk g' (a j ) w jk k

k

E l

wij

E l

a j

a j

wij

j zi

Page 31: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

33L. OrseauNeural Networks

MLP: gradient back-propagation

ai : activation ofnode i

zi : sortie of node i

i : error attached to node i

wijji k

yk

Output nodeHidden node

k

akaj

j

wjkzjzi

Page 32: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

34L. OrseauNeural Networks

MLP: gradient back-propagation

• 2. Modification of weights

We suppose a step gradient (constant or not): (t)

If stochastic learning (after presentation of each example)

If batch learning (after presentation of the whole set of examples)

wji (t) j ai

wji (t) jn ai

n

n

Page 33: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

35L. OrseauNeural Networks

MLP: forward and backward passes (resume)

x

ai(x) w jxj

j 1

d

w0

yi(x) g(ai(x))

ys (x) w js y jj1

k

ys(x)

wis

k neurons on thehidden layer

. . .x1 x2 x3 xd

w1 w2 w3wd

yi(x)

x0

w0Bia s

. . .y (x)1

Page 34: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

36L. OrseauNeural Networks

MLP: forward and backward passes (resume)

x

ys(x)

wis

. . .x1 x2 x3 xd

w1 w2 w3wd

yi(x)

x0

w0Biais

. . .y (x)1

s g' (as ) (us ys )

wis (t 1) wis (t) ( t) sai

wei (t 1) wei(t) (t ) iae

j

g' ( aj) w

js

snodes

of next layer

Page 35: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

37L. OrseauNeural Networks

MLP: gradient back-propagation

• Learning efficiency

O(|w|) for each learning pass, |w| = # weights

Usually several hundreds of passes (see below)

And learning must typically be done several dozens of times with different

initial random weights

• Recognition efficiency

Possibility of real time

Page 36: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

40L. OrseauNeural Networks

Applications: multi-objective optimization

• cf [Tom Mitchell]

Predict both class and color

Instead of class only

Page 37: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

41L. OrseauNeural Networks

Role of the hidden layer

Page 38: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

42L. OrseauNeural Networks

Role of the hidden layer

Page 39: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

43L. OrseauNeural Networks

Role of the hidden layer

Page 40: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

44L. OrseauNeural Networks

MLP: Applications• Control: identification and control of processes

(e.g. Robot control)

• Signal Processing (filtering, data compression, speech processing (recognition, prediction, production),…)

• Pattern recognition, image processing (hand-writing recognition, automated postal code recognition (Zip codes, USA), face recognition...)

• Prediction (water, electricity consumption, meteorology, stock market, ...)

• Diagnostic (industry, medical, science, ...)

Page 41: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

45L. OrseauNeural Networks

Application to postal Zip codes

• [Le Cun et al., 1989, ...] (ATT Bell Labs: very smart team)

• ≈ 10000 examples of handwritten numbers

• Segment et rescales on a 16 x 16 matrix

• Weigh sharing

• Optimal brain damage

• 99% correct recognition (on training set)

• 9% reject (delegated to human recognition)

Page 42: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

46L. OrseauNeural Networks

The database

Page 43: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

47L. OrseauNeural Networks

Application to postal Zip codes

1

2

3

4

5

6

7

8

9

0

16 x 16 Matrix 12 segment detectors (8x8)

12 segment detectors (4x4)

30 nodes

10 output nodes

Page 44: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

48L. OrseauNeural Networks

Some mistakes made by the network

Page 45: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

49L. OrseauNeural Networks

Regression

Page 46: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

50L. OrseauNeural Networks

A failure: QSAR

• Quantitative Structure Activity Relations

Predire certaines proprietes de molecules (par example activite biologique) à partir de descriptions:- chimiques- geometriques- electriques

Page 47: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

51L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 48: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

52L. OrseauNeural Networks

MLP: Practical view (1)• Technical problems:

how to improve the algorithm performance?

MLP as an optimization method: variants

• Momentum

• Second order methods

• Hessian

• Conjugate gradient

Heuristics

• Sequential learning vs batch learning

• Choice of activation function

• Normalization of inputs

• Weights initializations

• Learning gains

Page 49: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

53L. OrseauNeural Networks

MLP: gradient back-propagation (variants)

• Momentum

wji (t 1) E

w ji

w ji(t)

Page 50: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

54L. OrseauNeural Networks

Convergence

• Learning step tweaking:

Page 51: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

55L. OrseauNeural Networks

MLP: Convergence problems

• Local minimums

Add momentum (inertia)

Conditioning of parameters

Noising learning data

Online algorithm (stochastic vs. total)

Variable gradient step (in time and for each node)

Use of second derivate (Hessien). Conjugate gradient

Page 52: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

56L. OrseauNeural Networks

MLP: Convergence problems (variables gradient)

• Adaptive gain

if gradient does not change sign, otherwise

Much lower gain for stochastic than for total gradient

Specific gain for each layer (e.g. 1 / (# input node)1/2 )

• More complex algorithms

Conjugate gradients

– Idea: Try to minimize independently on each dimension, using a momentum of

search

Second order methods (Hessian)

– Faster convergence but slower computations

Page 53: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

57L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 54: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

58L. OrseauNeural Networks

Overfitting

Real Risk

Emprirical Risk

Overfitting

Data quantity

Page 55: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

59L. OrseauNeural Networks

Prenventing overfitting: regularisation

• Principle: limit expressiveness of H

• New empirical risk:

• Some useful regularizers:

– Control of NN architecture

– Parameter control

• Soft-weight sharing

• Weight decay

• Convolution network

– Noisy examples

Remp () 1

mL(h (xl , ),u l

l 1

m

) [h(. , )]Penalization term

Page 56: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

60L. OrseauNeural Networks

Control by limiting the exploration of H

• Early stopping

• Weight decay

Page 57: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

61L. OrseauNeural Networks

Generalization: optimize the network structure

• Progressive growth

Cascade correlation [Fahlman,1990]

• Pruning

Optimal brain damage [Le Cun,1990]

Optimal brain surgeon [Hassibi,1993]

Page 58: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

62L. OrseauNeural Networks

Introduction of prior knowledge

Invariances

• Symmetries in the example space

Translation / rotation / dilatation

• Cost function having derivates

Page 59: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

63L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 60: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

64L. OrseauNeural Networks

ANN Application Areas

• Classification

• Clustering

• Associative memory

• Control

• Function approximation

Page 61: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

65L. OrseauNeural Networks

Applications for ANN Classifiers

• Pattern recognition

Industrial inspection

Fault diagnosis

Image recognition

Target recognition

Speech recognition

Natural language processing

• Character recognition

Handwriting recognition

Automatic text-to-speech conversion

Page 62: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

66L. OrseauNeural Networks

Presented by Martin Ho, Eddy Li, Eric Wong and Kitty Wong - Copyright© 2000

Neural Network ApproachesALVINN - Autonomous Land Vehicle In a Neural Network

ALVINN

Page 63: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

67L. OrseauNeural Networks

Presented by Martin Ho, Eddy Li, Eric Wong and Kitty Wong - Copyright© 2000

- Developed in 1993.

- Performs driving with Neural Networks.

- An intelligent VLSI image sensor for road following.

- Learns to filter out image details not relevant to driving.

Hidden layer

Output units

Input units

ALVINN

Page 64: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

68L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 65: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

69L. OrseauNeural Networks

MLP with Radial Basis Functions (RBF)

• Definition

Hidden layer uses radial basis activation function (e.g. Gaussian)

– Idea: “pave” the input space with “receptive fields”

Output layer: linear combination upon the hidden layer

• Properties

Still universal approximator ([Hartman et al.,90], ...)

But not parsimonious (combinatorial explosion of input dimension)

Only for small input dimension problems

Strong links with fuzzy inference systems and neuro-fuzzy systems

Page 66: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

70L. OrseauNeural Networks

• Parameters to tune:

# hidden nodes

Initial positions if receptive fields

Diameter of receptive fields

Output weights

• Methodes

Adaptation of back-propagation

Determination of each type of parameters with a specific method (usually more effective)

– Centers determined by “clustering” methofs (k-means, ...)

– Diameters determined by covering rate optimization (PPV, ...)

– Output weights by linear optimization (calcul de pseudo-inverse, ...)

MLP with Radial Basis Functions (RBF)

Page 67: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

71L. OrseauNeural Networks

Neural Networks for sequence processing

• Tasks : Take the Time dimension into account

Sequence recognition

E.g. recognize a word corresponding to a vocal signal

Reproduction of sequence

E.g. predict next values of the sequence (ex: electricity consumption prediction)

Temporal association

Production of another in response to the recognition of another sequence

Time Delay Neural Networks (TDNNs)

Duplicate inputs for several past time steps

Recurrent Neural Networks

Page 68: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

72L. OrseauNeural Networks

Recurrent ANN Architectures

• Feedback connections

• Dynamic memory: y(t+1)=f(x(τ),y(τ),s(τ)) τ(t,t-1,...)

• Models: Jordan/Elman ANNs

Hopfield

Adaptive Resonance Theory (ART)

Page 69: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

73L. OrseauNeural Networks

Recurrent Neural Networks

• Can learn regular grammars

Finite State Machines

Back Propagation Through Time

• Can even model full computers with 11 neurons (!)

Very special use of RNNs…

Uses the property that a weight can be any number,i.e. it is an unlimited memory

+ Chaotic dynamics

No learning algorithm for this

Page 70: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

75L. OrseauNeural Networks

Recurrent Neural Networks

• Problems

Complex trajectories

– Chaotic dynamics

Limited memory of past

Learning is very difficult!

– Exponential decay of error signal in time

Page 71: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

76L. OrseauNeural Networks

Long Short Term Memory (Hochreiter 1997)

• Idea:

Only some nodes are recurrent

Only self-recurrence

Linear activation function

– Error decays linearly, not exponentially

• Can learn

Regular languages (FSM)

Some Context-free (stack machine) and Context-sensitive grammars

– anbn, anbncn

Page 72: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

77L. OrseauNeural Networks

Reservoir computing

• Idea:

Random recurrent neural network,

Learn only output layer weights

• Many internal dynamics

• Output layer selects interesting ones

• And combinations thereofOutputInput

Page 73: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

79L. OrseauNeural Networks

Plan

1. Introduction

2. The perceptron

3. The multi-layer perceptron (MLP)

4. Learning in MLP

5. Computational aspects

6. Methodological aspects of learning

7. Applications

8. Developments and perspectives

9. Conclusions

Page 74: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

80L. OrseauNeural Networks

Conclusions

• Limits

Learning is slow and difficult

Result is opaque

– Difficult to extract knowledge

– Difficult to use prior knowledge (but KBANN)

Incremental learning of new concepts is difficult: catastrophic forgetting

• Avantages

Can learn a wide variety of problems

Page 75: 1 L. Orseau Neural Networks Neural Networks EFREI 2010 Laurent Orseau (laurent.orseau@agroparistech.fr) AgroParisTech based on slides by Antoine Cornuejols.

81L. OrseauNeural Networks

Bibliography• Ouvrages / articles

Bishop C. (06): Neural networks for pattern recognition. Clarendon Press - Oxford, 1995.

Haykin (98): Neural Networks. Prentice Hall, 1998.

Hertz, Krogh & Palmer (91): Introduction to the theory of neural computation. Addison Wesley, 1991.

Thiria, Gascuel, Lechevallier & Canu (97): Statistiques et methodes neuronales. Dunod, 1997.

Vapnik (95): The nature of statistical learning. Springer Verlag, 1995.

• Sites web

http://www.lps.ens.fr/~nadal/