Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity...

34
Artificial Intelligence Krzysztof Ślot, 2008 Artificial Intelligence Krzysztof Ślot Institute of Applied Computer Science, Technical University of Lodz, Poland Introduction to computing with neural networks feedforward nets

Transcript of Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity...

Page 1: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Artificial Intelligence

Krzysztof Ślot

Institute of Applied Computer Science,

Technical University of Lodz, Poland

Introduction to computing

with neural networks –

feedforward nets

Page 2: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Introduction

To solve “hard” problems – computationally intense, w/unclear rules etc.

To mimic “higher” brain functions - recognition, classification etc.

Sample

recognition

task

Motivation

Page 3: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Neurons are living cells that consume energy to operate

Page 4: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

• Another illusion - same mechanism

– reveals three-channel visual input

Researching neural networks

Page 5: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Synapse

Synapse

Dendrites Cell

Axon

Ranvier gap

Shield

Nucleus

Biological reference

Architecture of a neuron

1 ms 0 V

-70 mV

Activity Refraction

t

Neuron’s operation

Firing frequency is

proportional to total

excitation

Simple multi-input,

single-output unit

Neural Networks: background

Page 6: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Modeling a neuron

• Physical modeling

– Pulse propagation phenomenon

Hodgkin-Huxley model (Nobel prize)

• Functional modeling

– Of interest to AI: provides means for simulating/emulating neural nets

Page 7: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

x1

x2

xn

1

Activation function

f s

y = f (wTx`)

Mc Culloch-Pitts model

S

w2

wn

w1

w0= T

f Linear

Activation functions

f O

1 f(x)

x

Non-linear,

differentiable s

s

e

esthsf

1

1)()(

Hyperbolic

tangent

)1...1()( sf

sesf

1

1)(

Sigmoid

)1...0()( sf

f Non-linear, non-

differentiable

Sign

)1...1()( sf

Step

f(s) = 1(s) f(s) = sgn(s)

)1...0()( sf

)()(01

n

i

ii

n

i

ii xwfxwfy

Perceptron

Page 8: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Size

Distance

– Assume only two inputs - activation: linear equation 2211 xwxws

5.2,1,1 21 ww

05.2),( 2121 xxxxf

Sample parameters:

A line

0)(,3.0

2.0

sfBx y =0 B

A

0)(,5.1

3

sfAx y =1

Neuron’s function

• How to interpret neuron’s outcome?

– Outcome: assessment of activation level (dot product)

s

ssfy

1

0)(

• Interpretation

– Outcome: a decision (e.g. hunt or not)

– Data classification

Frog’s world

Page 9: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Supervised learning algorithms

Training neural networks

Determine weights which provide desired

network operation

Objective

:

x0 w0

xi f

y i = 1

xn

wn

i

i

d i = 0

Training vector set : { x i}

Desired network responses: { d i }

Actual neuron outputs: { y i }

Given

...

di Expected output for

“i”-th training vector Actual output

...

yi

ei = c ( d i - y i ) 2 Error k=0

y = f ( xk wk ) n

ei = f(w)

Page 10: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

ei = c ( d i - y i ) 2 y = f(s) = f (wTx)

k

ik

w

ew

Gradient-descent methods

(differentiable error function)

Supervised learning

Basic idea – adjust weights to minimize an error

k

i

k

ik

w

s

s

y

y

e

w

ew

i

i

i

i

i

k

iii

k xsfydw )(')(

Page 11: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Non-linear activation functions

sesf

β21

1)(

Sigmoid

i

k

ii

k xydsfw ))(('

y = S wlxl

Delta - rule

Linear activation function

k

ii

k

ik

w

yd

w

ew

2)(

i

k

ii

k xydw )(k

n

l

i

llii

k

i

w

xw

ydηw

s

s

y

y

i

i

i

i

11)('

)1(β2)(' ffsf

i

k

ii

k xffydw )1()(βη2

No differentiation

required !

i

k

ii

k xydw )(Step and sign functions

Page 12: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

x0

x1

1

iii yd xw )(

0

xi

wT(i-1)

Direction of change: vector xi

w

wT(i)

Delta - rule learning - geometrical interpretation

Note: di - y i is negative

Page 13: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Data classification

Application domain Linearly-separable tasks

Minsky, Pappert (1963): recess in ANN research

Neuron function

?

Page 14: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

x0

x1

y0

y1

y

Multi-layer ANN

+ =

x0 x1

y0 0 1 1 y1

0 0 1

Page 15: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Neuron function - linearly separable

function - logical OR or AND Binary input

y0 yn

Y

yi

x0 x1

x0

x1

Decision

region

1

0

1 0

0

1 1

0

Multi-layer ANN

Decision regions for 2D NN are convex

Page 16: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

ML ANN decision regions in classification tasks

Source: Lippman “Introduction to neural computeing”

Multi-layer ANN

Page 17: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Data processing in multi-layer ANNs

X

Input layer

neurons 1,,1

1

111 , Ni

M

k

kkiii xwfsfy

Input vector

N NN M

j

M

k

jk

N

ji

M

j

N

j

N

ji

N

i

N

i fwfwfywfsfy0 00

11

)(Output layer

neurons

:

Y

Page 18: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

x k x

1 x

O ....

f f f f

-1

f/l f/l Output

layer

Y 1 Y M .......................

-1 y 1

Input

layer

......

yj

yN

wjk

xk

Wij

Yi

Learning in multi-layer ANNs - notation

Page 19: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

i

j jY

1Y

mYDx

1x

ny

1y

kx

MLP Training

• Supervised setup

• Criterion: mean-squared error (MSE)

ik

ikw

Ew

,

,

jiW ,ikw ,

2)( jj

j Yt ij

j

ji ysYW )(',

Weight update for output neurons: delta rule

For hidden units the error cannot be directly estimated

Solution: basic calculus ?i

2

11

)(

YtEmm

2

1

)(

YtEn

)( ,

1

yWfYn

)( ,

1

xwfyn

)( ,ikwgE

ki ,

21

Page 20: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

MLP training

Derivative for compound function: chain rule

ki

i

ik xsfw )(',

ik

i

iik w

y

y

Y

Y

E

w

E

,,

)(', SfW

y

Yi

i

)('

,

ik

ik

i sfxw

y

ik

ikw

Ew

,

,

E )( ,

1

yWfYn

)( ,

1

xwfyn

2

1

)(

YtEn

)(21

YtY

E n

kii

m

ik xsfSfWYtw )(')(')( ,

1

,

Weight update interpretation

Analogous to delta rule

Error: back-projected from the upper layer

)(')( ,

1

SfWYt i

mi

i

f’(.) jE

1E

mEmiW ,

f’(.)

f’(.)

1,iW

Error Back-Propagation algorithm (BP) Krzysztof Ślot: Głębokie sieci

neuronowe

22

Page 21: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

)()( px frfy

Radial functions – distance-dependent output

Typical example: Gaussian

)()( 1mxmx S

t

eCy

Net’s architecture

w1 wN Ouput unit (linear) i

ii yWY

RBF units

x Input vector

Networks with radial-basis units

Page 22: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Regression (function approximation)

x

y

:

x y

1

2

3

Linear

neuron

Feed-forward ANN applications

RBF

+ +

+

x

y Neuron 1

Neuron 2

Neuron 3 :

+

+

+

x

y Neuron 1

Neuron 2

Neuron 3 :

Sigmoid

Page 23: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Network training

• Parameters to be determined

– Number of hidden neurons – number of approximating functions

– RBF function parameters (means, covariance matrices) if RBF neurons are

used in the hidden layer

– Sigmoid parameters if sigmoid units are used

– Weights of the output neuron

• Learning strategies

– Supervised

– Mixed: unsupervised learning of hidden-layer units’ parameters, supervised

learning of output weights

Page 24: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

N

k

i

k

i

ki YdE1

2)(

M

j

jjkk wY

1

)( μx

N – number of output units

M – number of hidden units

i – training sample index

M

j

μx

jkkj

j

ewY

1

)(

2

2

RBF supervised training

• Training criterion

– To minimize approximation error

• Sample network

– 1D RBF (e.g. Gaussian)

– One output unit

2

2)(

2)( s

si μx

s

s

i

s

ii

s

i

i

ii

s eμx

wYdY

Y

Ec

is

RBF parameters

iii

x

ii

s

i

i

ii

s YYdeYdw

Y

Y

Ecw s

s

)()(2

2)(

Output weights

Gradient descent approach

Page 25: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Feedforward NNs: problems

• Overfitting

– For overly complex net and insufficient amount of data, a model learns

training samples, not a rule. A model should generalize well

20 training samples

5 hidden RBF units 50 hidden RBF units

Overfitting

Page 26: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Limitations of ML-FF ANNs

• Local minima

– Only if error function is convex one can expect correct training outcome

(gradient descent gets us to the minimum). Unfortunately, error functions for

multiple-layer fedforward ANNs are rarely convex …

– Possible ways to alleviate the problem:

• Boltzman machines

• Multiple initial points

• Regularization

• Overfitting

– If learning set is not significantly larger than parameter set, network learns

examples not the rule (there are many well-fitting units)

• Capabilities of multiple-layer networks trained using BP algorithm

and its descendendants for solving real-life problems are limited

… Face … Face

Page 27: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Summary of multilayer feed-forward ANNs

• Drawbacks

– Learning is a challenge: local minima of error function result in non-optimal

solutions as gradient-descent methods cannot find global minima of non-

monotonous functions: possible solution – stochastic methods (simulated

annealing – global minima search)

– Convergence speed of BP algorithm: possible solution: consider second

order derivatives in error approximation (Levenbergh-Marquardt)

– Fundamental difficulties with VLSI implementations of nets

– ANNs are hard to analyze (feedback nets)

• Advantages

– Theoretically, capable of solving hard problems

– Extremely fast execution (if implemented in hardware, but also, if simulated)

– Can constantly learn and improve, even after deployment

• Practical applications

– Rare …

– Until recently …

Page 28: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Deep Neural Networks and Deep Learning

• Deep neural networks: breakthrough in performance of intelligent data

processing

– Recognition of contents of Rn data: images (object recognition, scene

analysis, image classification)

– Recognition of contents of Rn data sequences: video (action recognition),

speech (recognition, trascription, translation), NLP (document classification,

analysis)

– Generation of Rn data: image objects, textures

– Generation of Rn data sequences: control, description, speech

Page 29: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Recognition

Humans CNN

Accuracy 96% 99.6 %

Humans CNN

Accuracy 82% 86.1 %

Categories: 40, examples: 30 000 Categories: 100, examples: 400 000

39

• Classification of image objects

– DNN perform better than humans

Page 30: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Recognition and generation

Application: autonomous vehicles

Scene understanding, vehicle control

Nvidia https://www.youtube.com/watch?v=qhUvQiKec2U 40

Page 31: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Generation

Robot motion control

Boston Dynamics: https://www.youtube.com/watch?v=-e9QzIkP5qI 41

Page 32: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

DCGAN creations Style: Van Gogh

Painting style

Learning abstract concepts Input image

http://www.boredpanda.com/computer-

deep-learning-algorithm-painting-masters/

Style: Munch

Page 33: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Convolutional Neural Networks

• Automated image annotation

Page 34: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

Artificial Intelligence Krzysztof Ślot, 2008

Deep Learning and Convolutional Neural Networks

• Deep

– Multiple layers (dozens, hundreds, thousands)

– Huge amounts of parameters

– Appropriate measures for training

Conv 1

ReLU

Filter Si Filter Sj

Pooling

1 - MAX

Conv 2

ReLu

Pooling

2 - MAX

Conv n

ReLU Data

O

u

t

p

u

t

Fully-connected ANN

• Convolutional neural networks