Introduction to Multilayer Perceptrons - LIA Neurali Teoria e... · Introduction to Multilayer...

Marco Gori - IEEE Expert Now Course

An Introduction to Multilayered Neural Networks

Introduction toMultilayer Perceptrons

Marco GoriUniversity of Siena

An introduction to Multilayered Neural Networks

Outline of the course

• Motivations and biological inspiration• Multilayer perceptrons: architectural

issues• Learning as function optimization• Backpropagation• The applicative perspective

A difficult problem for knowledge-based solutions

An “A” perceived bya webcam

How can I provide a satisfatorystatement to associate the picturewith an “A”?

Emulation of the brain?

... Or simply inspiration?

I just want to point out that the componentry used in The memory may be entirely different from the one thatUnderlines the basic active organs.John von Neumann, 1958

... Inspiration at the level of neurons

... Hard to go beyond!

Artificial neurons

Sigmoidal units

Radial basis units

Supervised and reinforcement learning

Reinforcement info: reward/punishmentSupervised info: specific target

Directed Acyclic Graph Architecture

Feedforward architecture Multilayer architecture

Partial ordering on the nodes

Forward PropagationLet be any topological sorting of the nodes and let Be the parents of node

Boolean Functions

Boolean Functions by MLP

• Every Boolean function can be expressed in the firstcanonical form

• Every minterm is a linearly-separable function (oneon a hypercube’s vertex)

• OR is linearly-separable• Similar conclusions using the second canonical form.

Every Boolean function can be represented by an MLP with two layers: minterms at the first layer, OR at the second one

Membership functions

A set function is defined by

for all

Convex set by MLP

Membership for comlex domainsnon-connected domains

non-convex domains

Membership functions (Lippman ASSP’87)

• Every hidden unit is associated with a hyperplane• Every convex set is associated with units in the first

hidden layer• Every non-connected or non-convex set can be

represented by a proper combination (at the secondhidden layer) of units representing convex sets in thefirst hidden layer

Basic statement: Two hidden layer to approximate anyset function

Universal ApproximationGiven

One hidden layer (with “enough” hidden units)!

Supervised LearningConsider the triple

Error due to the mismatch between

Gradient Descent

• The optimization may involve a huge number of paramters –even one million (Bourlard 1997)

• The gradient heuristics is the only one which is meaningful insuch huge spaces

• The trajectory ends up in local minima of the error function.• How is the gradient calculated? (very important issue!)

BackpropagationBryson & Ho (1969), Werbos (1974), le Cun (1995), Rumerlhart-Hinton-Williams (1986)

Error accumulation:

DAG hypothesis:

Backpropagation (con’t)

if thenelse

Backpropagation (con’t)any topologic sorting induced bytopologic sorting induced by the inverse

Batch-learningWe calculate the “true-gradient”

The weight updating takes place according within The classic framework of numerical analysis

On-line learning

Weight updating after presenting each example …

momentum term to filter out abrupt changes …

Backprop Heuristics

• Weight initialization– Avoid small weights (no Backprop of the delta errors– Avoid large weights (neuron saturation)

• Learning rate– The learning rate can change during the learning (higher

when the gradient is small)– There is no “magic solution”!

Backprop Heuristics (con’t)

• Activation function: symmetric vsasymmetric

• Target values to avoid saturation• Input normalization• The input variables (coordinates) should be

incorrelated• Normalization w.r.t. to the fan-in of the first

hidden layer

simple example ofsub-optimal learning

local minima as symmetricconfigurations ...

The problem of local minima

Basic results

• No local minima in the case of linearly separablepatterns (Gori & Tesi, IEEE-PAMI92)

• No local m inima if the number of hidden units isequal to the number of examples (Yu et al, IEEE-TNN95)

• A general comment:– The theoretical investigations on this problem have not

relevant results for the design of the networks.

Complexity issues: Backprop is optimal

Classic numerical algorithms requiresfor the computation of a single partial derivative and

for the whole gradient computation

Backprop requires for the whole gradient computation

e.g. in the case of 10,000 parameters, we need 10,000 FPO versus 100 millions FPO!!! This is one of the main reasons of the success of Backpropagation!

penalties to limit large weights

Courtesy MathWorks

crossvalidation

An Introduction to Multilayered Neural Networks

The applicative perspective:pattern recognition

MLP as classifiers:a simple pre-processing

input Preprocessedinput

output

The cascade

Autoassociator-based classifiersMotivation: better behavior for pattern verificationGori & Scarselli, IEEE-PAMI-98

Some successful applications

• Airline market assistance, BehavHeuristics Inc• Automated Real Estate Appraisal Systems - HNCSoftware• OCR - Caere, Audre Recognition System• Path planning, NeuroRoute - Protel• Electronic nose - AromaScan Inc• Quality control - Anheuser-Busch, Dunlop• Banknote acceptor BANK, DF Elettronica Florence

Introduction to Multilayer Perceptrons - LIA Neurali Teoria e... · Introduction to Multilayer...

Documents

Transcript of Introduction to Multilayer Perceptrons - LIA Neurali Teoria e... · Introduction to Multilayer...

Exercise I: Perceptrons and Multi-layer perceptrons for classification ...

Data Mining with MLPs - Universidade do Minho · Data Mining with Multilayer Perceptrons: Intensive Care Medicine and Internet Traﬃc Forecasting ... Feedforward neural network where

multilayer perceptrons pdf - uni-osnabrueck.desbitzer/... · Introduction to Multilayer Perceptrons • simple perceptron – local vs. distributed – linearly separable • hidden

MULTILAYER PERCEPTRONS - York University Multilayer... · Now each layer of our multi-layer perceptron is a logistic regressor. ! Recall that optimizing the weights in logistic regression

Using Multilayer Perceptrons as means to predict …904219/FULLTEXT01.pdfDEGREE PROJECT IN MATERIALS DESIGN AND ENGINEERING 300 CREDITS, SECOND CYCLE STOCKHOLM, SWEDEN 2015 Using Multilayer

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

COMS 4995 Lecture 2: Multilayer Perceptrons & …

Links Between Markov Models and Multilayer Perceptrons

Multilayer Perceptrons applied to Automatic Microfossil ...wseas.us/e-library/conferences/2006madrid/papers/512-277.pdfvolume of analyzed terrain, and, consequently, improving the

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

Kalman Filter Based Algorithms for Fast Training of Multilayer Perceptrons: Implementation and Applications ECE 539 Project Dan Li Spring, 2000.

Multilayer Perceptrons with Radial Basis Functions as ... · Multilayer Perceptrons with Radial Basis Functions as Value Functions in Reinforcement Learning Victor Uc Cetina Humboldt

Statistical Machine Learning Notes - WordPress.com 04, 2018 · Multilayer perceptron Multilayer perceptrons have one or more hidden layers in between the input and output layers.

Artificial Intelligence Techniques Multilayer Perceptrons.

P.J. Gawthrop and D. Sbarbaro- Stochastic Approximation and Multilayer Perceptrons: The Gain Backpropagation Algorithm

Jasmin Steinwender and Sebastian Bitzer- Multilayer Perceptrons

an efficient implementation of lattice-ladder multilayer perceptrons in ...

Multilayer Perceptrons Lecture 11labs.seas.wustl.edu/bme/...MultiLayerPerceptrons.pdf · 4! Multilayer Perceptron! g Graph of a multilayer perceptron with two hidden layers" Input

MultiLayer Perceptrons Backpropagation Part 3/3people.sabanciuniv.edu/berrin/cs512/lectures/7-nn3-backprop.ppt.pdf · hidden layer) and sigmoidal units. All other functions can be

Multilayer Neural Networks (sometimes called “Multilayer Perceptrons” or MLPs)