The Cognitive Inversion Lecture 13 - UTK · The Cognitive Inversion ... 1

Artificial Neural Networks 11/2/06 17:29

1

11/2/06 1

Lecture 13Lecture 13

Artificial Neural NetworksArtificial Neural Networks

11/2/06 2

The Cognitive Inversion• Computers can do some things very well that are difficult

for people– e.g., arithmetic calculations– playing chess & other board games– doing proofs in formal logic & mathematics– handling large amounts of data precisely

• But computers are very bad at some things that are easy forpeople (and even some animals)– e.g., face recognition & general object recognition– autonomous locomotion– sensory-motor coordination

• Conclusion: brains work very differently from VonNeumann computers

11/2/06 3

The 100-Step Rule

• Typical recognitiontasks take less thanone second

• Neurons take severalmilliseconds to fire

• Therefore then can beat most about 100sequential processingsteps

11/2/06 4

Two Different Approaches toComputing

Von Neumann: Narrow but Deep

…

Neural Computation: Shallow but Wide

11/2/06 5

How Wide?

Retina:100 million receptors

Optic nerve: one million nerve fibers

11/2/06 6

Neurons are Not Logic Gates• Speed

– electronic logic gates are very fast (nanoseconds)– neurons are comparatively slow (milliseconds)

• Precision– logic gates are highly reliable digital devices– neurons are imprecise analog devices

• Connections– logic gates have few inputs (usually 1 to 3)– many neurons have >100 000 inputs


2

11/2/06 7

Artificial Neural Networks

11/2/06 8

Typical Artificial Neuron

inputs

connectionweights

threshold

output

11/2/06 9

Typical Artificial Neuron

summation

net input

activationfunction

11/2/06 10

Operation of Artificial Neuron

Σ

2

2

–1

4

0

1

1

0 × 2 = 01 × 1 = –11 × 4 = 4

3 3 > 21

11/2/06 11

Operation of Artificial Neuron

Σ

2

2

–1

4

1

1

0

1 × 2 = 21 × 1 = –10 × 4 = 0

1 1 < 20

11/2/06 12

Feedforward Network

. . .

. . . . . . . . .

. . .

. . .

inputlayer

outputlayer

hiddenlayers


3

11/2/06 13

Feedforward Network InOperation (1)

. . .

. . . . . . . . .

. . .

. . .input

11/2/06 14


. . .

. . . . . . . . .

. . .

. . .

11/2/06 15


. . .

. . . . . . . . .

. . .

. . .

11/2/06 16

Feedforward Network InOperation (4). .

.

. . . . . . . . .

. . .

. . .

11/2/06 17


. . .

. . . . . . . . .

. . .

. . .

11/2/06 18


. . .

. . . . . . . . .

. . .

. . .


4

11/2/06 19


. . .

. . . . . . . . .

. . .

. . .

11/2/06 20


. . .

. . . . . . . . .

. . .

. . . output

11/2/06 21

Comparison with Non-NeuralNet Approaches

• Non-NN approaches typically decide outputfrom a small number of dominant factors

• NNs typically look at a large number offactors, each of which weakly influencesoutput

• NNs permit:– subtle discriminations– holistic judgments– context sensitivity

11/2/06 22

Connectionist Architectures• The knowledge is implicit in the connection

weights between the neurons• Items of knowledge are not stored in dedicated

memory locations, as in a Von Neumanncomputer

• “Holographic” knowledge representation:– each knowledge item is distributed over many

connections– each connection encodes many knowledge items

• Memory & processing is robust in face of damage,errors, inaccuracy, noise, …

11/2/06 23

Differences fromDigital Calculation

• Information represented in continuousimages (rather than language-likestructures)

• Information processing by continuousimage processing (rather than explicit rulesapplied in individual steps)

• Indefiniteness is inevitable (rather thandefiniteness assumed)

11/2/06 24

Supervised Learning• Produce desired outputs for training inputs• Generalize reasonably & appropriately to

other inputs• Good example: pattern recognition• Neural nets are trained rather than

programmed– another difference from Von Neumann

computation


5

11/2/06 25

Learning for Output Neuron (1)

Σ

2

2

–1

4

0

1

1

3 3 > 21

Desiredoutput:

1

No change!

11/2/06 26


Σ

2

2

–1

4

0

1

1

3 3 > 21

Desiredoutput:

0

Output istoo large

X 3.25

–1.5 X X1.75

XXX1.75< 2

X 0

11/2/06 27


Σ

2

2

–1

4

1

1

0

1 1 < 20

Desiredoutput:

0

No change!

11/2/06 28


Σ

2

2

–1

4

1

1

0

1 1 < 20

Desiredoutput:

1

Output istoo small

–0.5 X

X 2.75

X2.25

XXX2.25> 2

X 1

11/2/06 29

Credit Assignment Problem

. . .

. . . . . . . . .

. . .

. . .

inputlayer

outputlayer

hiddenlayers

How do we adjust the weights of the hidden layers?

. . .

Desiredoutput

11/2/06 30

Back-Propagation:Forward Pass

. . .

. . . . . . . . .

. . .

. . . . . .

DesiredoutputInput


6

11/2/06 31

Back-Propagation:Actual vs. Desired Output

. . .

. . . . . . . . .

. . .

. . . . . .

DesiredoutputInput

11/2/06 32

Back-Propagation:Correct Output Neuron Weights

. . .

. . . . . . . . .

. . .

. . . . . .

DesiredoutputInput

11/2/06 33

Back-Propagation:Correct Last Hidden Layer

. . .

. . . . . . . . .

. . .

. . . . . .

DesiredoutputInput

11/2/06 34

Back-Propagation:Correct All Hidden Layers

. . .

. . . . . . . . .

. . .

. . . . . .

DesiredoutputInput

11/2/06 35


. . .

. . . . . . . . .

. . .

. . . . . .

DesiredoutputInput

11/2/06 36


. . .

. . . . . . . . .

. . .

. . . . . .

DesiredoutputInput


7

11/2/06 37

Back-Propagation:Correct First Hidden Layer

. . .

. . . . . . . . .

. . .

. . . . . .

DesiredoutputInput

11/2/06 38

Use of Back-Propagation (BP)• Typically the weights are changed slowly• Typically net will not give correct outputs

for all training inputs after one adjustment• Each input/output pair is used repeatedly for

training• BP may be slow• But there are many better ANN learning

algorithms

11/2/06 39

ANN Training Procedures• Supervised training: we show the net the

output it should produce for each traininginput (e.g., BP)

• Reinforcement training: we tell the net if itsoutput is right or wrong, but not what thecorrect output is

• Unsupervised training: the net attempts tofind patterns in its environment withoutexternal guidance

11/2/06 40

Applications of ANNs

• “Neural nets are the second-best way ofdoing everything”

• If you really understand a problem, you candesign a special purpose algorithm for it,which will beat a NN

• However, if you don’t understand yourproblem very well, you can generally train aNN to do it well enough

11/2/06 41

The Hopfield Network

(and constraint satisfaction)

11/2/06 42

Hopfield Network• Symmetric weights: wij = wji

• No self-action: wii = 0• Zero threshold: θ = 0• Bipolar states: si ∈ {–1, +1}• Discontinuous bipolar activation function:

!

" h( ) = sgn h( ) =#1, h < 0

+1, h > 0

$ % &


8

11/2/06 43

Positive Coupling

• Positive sense (sign)• Large strength

11/2/06 44

Negative Coupling

• Negative sense (sign)• Large strength

11/2/06 45

Weak Coupling• Either sense (sign)• Little strength

11/2/06 46

State = –1 & Local Field < 0

h < 0

11/2/06 47

State = –1 & Local Field > 0

h > 0

11/2/06 48

State Reverses

h > 0


9

11/2/06 49

State = +1 & Local Field > 0

h > 0

11/2/06 50

State = +1 & Local Field < 0

h < 0

11/2/06 51

State Reverses

h < 0

11/2/06 52

Hopfield Net as Soft ConstraintSatisfaction System

• States of neurons as yes/no decisions• Weights represent soft constraints between

decisions– hard constraints must be respected– soft constraints have degrees of importance

• Decisions change to better respectconstraints

• Is there an optimal set of decisions that bestrespects all constraints?

11/2/06 53

Demonstration of Hopfield Net

Run Hopfield Demo

11/2/06 54

Convergence

• Does such a system converge to a stablestate?

• Under what conditions does it converge?• There is a sense in which each step relaxes

the “tension” in the system (or increases its“harmony”)

• But could a relaxation of one neuron lead togreater tension in other places?


10

11/2/06 55

Quantifying Harmony–10

9

2

–1

–1

2

9

–10

H = 10

H = 9

H = 2

H = 1

H = –1

H = –2

H = –9

H = –10

incr

easi

ng h

arm

ony decreasing harm

ony

11/2/06 56

Energy

• “Energy” (or “tension”) is the opposite ofharmony

• E = –H

Harmony Energy

11/2/06 57

Energy Does Not Increase

• In each step in which a neuron is consideredfor update:E{s(t + 1)} – E{s(t)} ≤ 0

• Energy cannot increase• Energy decreases if any neuron changes• Must it stop? (Yes)

11/2/06 58

Conclusion

• If we do asynchronous updating, theHopfield net must reach a stable, minimumenergy state in a finite number of updates

• This does not imply that it is a globalminimum

11/2/06 59

ConceptualPicture of

Descent onEnergySurface

(fig. from Solé & Goodwin) 11/2/06 60

EnergySurface

(fig. from Haykin Neur. Netw.)


11

11/2/06 61

EnergySurface

+FlowLines

(fig. from Haykin Neur. Netw.) 11/2/06 62

FlowLines

(fig. from Haykin Neur. Netw.)

Basins ofAttraction

11/2/06 63

StoringMemories as

Attractors

(fig. from Solé & Goodwin) 11/2/06 64

Demonstration of Hopfield Net

Run Hopfield Demo

11/2/06 65

Example ofPattern

Restoration

(fig. from Arbib 1995) 11/2/06 66

Example ofPattern

Restoration

(fig. from Arbib 1995)


12

11/2/06 67

Example ofPattern

Restoration

(fig. from Arbib 1995) 11/2/06 68

Example ofPattern

Restoration


11/2/06 69

Example ofPattern

Restoration

(fig. from Arbib 1995) 11/2/06 70

Example ofPattern

Completion


11/2/06 71

Example ofPattern

Completion

(fig. from Arbib 1995) 11/2/06 72

Example ofPattern

Completion



13

11/2/06 73

Example ofPattern

Completion

(fig. from Arbib 1995) 11/2/06 74

Example ofPattern

Completion


11/2/06 75

Example ofAssociation

(fig. from Arbib 1995) 11/2/06 76



11/2/06 77


(fig. from Arbib 1995) 11/2/06 78




14

11/2/06 79


(fig. from Arbib 1995) 11/2/06 80

Applications ofHopfield Memory

• Pattern restoration• Pattern completion• Pattern generalization• Pattern association

11/2/06 81

Hopfield Net for Optimizationand for Associative Memory

• For optimization:– we know the weights (couplings)– we want to know the minima (solutions)

• For associative memory:– we know the minima (retrieval states)– we want to know the weights

11/2/06 82

Hebb’s Rule

“When an axon of cell A is near enough toexcite a cell B and repeatedly or persistentlytakes part in firing it, some growth ormetabolic change takes place in one or bothcells such that A’s efficiency, as one of thecells firing B, is increased.”

—Donald Hebb (The Organization of Behavior, 1949, p. 62)

11/2/06 83

Example of Hebbian Learning:Pattern Imprinted

11/2/06 84

Example of Hebbian Learning:Partial Pattern Reconstruction


15

11/2/06 85

Stochastic Neural Networks

(in particular, the stochastic Hopfield network)

11/2/06 86

Trapping in Local Minimum

11/2/06 87

Escape from Local Minimum

11/2/06 88

Escape from Local Minimum

11/2/06 89

Motivation

• Idea: with low probability, go against the localfield– move up the energy surface– make the “wrong” microdecision

• Potential value for optimization: escape from localoptima

• Potential value for associative memory: escapefrom spurious states– because they have higher energy than imprinted states

11/2/06 90

The Stochastic Neuron

!

Deterministic neuron : " s i= sgn h

i( )Pr " s

i= +1{ } =# h

i( )Pr " s

i= $1{ } =1$# h

i( )

!

Stochastic neuron :

Pr " s i= +1{ } =# h

i( )Pr " s

i= $1{ } =1$# h

i( )

!

Logistic sigmoid : " h( ) =1

1+ exp #2h T( )

h

σ(h)


16

11/2/06 91

Properties of Logistic Sigmoid

• As h → +∞, σ(h) → 1• As h → –∞, σ(h) → 0• σ(0) = 1/2

!

" h( ) =1

1+ e#2h T

11/2/06 92

Logistic SigmoidWith Varying T

11/2/06 93

Logistic SigmoidT = 0.5

Slope at origin = 1 / 2T

11/2/06 94


11/2/06 95


11/2/06 96

Logistic SigmoidT = 1


17

11/2/06 97


11/2/06 98


11/2/06 99

Pseudo-Temperature

• Temperature = measure of thermal energy (heat)• Thermal energy = vibrational energy of molecules• A source of random motion• Pseudo-temperature = a measure of nondirected

(random) change• Logistic sigmoid gives same equilibrium

probabilities as Boltzmann-Gibbs distribution

11/2/06 100

Simulated Annealing

(Kirkpatrick, Gelatt & Vecchi, 1983)

11/2/06 101

Dilemma• In the early stages of search, we want a high

temperature, so that we will explore thespace and find the basins of the globalminimum

• In the later stages we want a lowtemperature, so that we will relax into theglobal minimum and not wander away fromit

• Solution: decrease the temperaturegradually during search

11/2/06 102

Quenching vs. Annealing• Quenching:

– rapid cooling of a hot material– may result in defects & brittleness– local order but global disorder– locally low-energy, globally frustrated

• Annealing:– slow cooling (or alternate heating & cooling)– reaches equilibrium at each temperature– allows global order to emerge– achieves global low-energy state


18

11/2/06 103

Multiple Domains

localcoherence

globalincoherence

11/2/06 104

Moving Domain Boundaries

11/2/06 105

Effect of Moderate Temperature

(fig. from Anderson Intr. Neur. Comp.) 11/2/06 106

Effect of High Temperature

(fig. from Anderson Intr. Neur. Comp.)

11/2/06 107

Effect of Low Temperature

(fig. from Anderson Intr. Neur. Comp.) 11/2/06 108

Annealing Schedule

• Controlled decrease of temperature• Should be sufficiently slow to allow

equilibrium to be reached at eachtemperature

• With sufficiently slow annealing, the globalminimum will be found with probability 1

• Design of schedules is a topic of research


19

11/2/06 109

Demonstration of BoltzmannMachine

& Necker Cube Example

Run ~mclennan/pub/cube/cubedemo

11/2/06 110

Necker Cube

11/2/06 111

Biased Necker Cube

11/2/06 112

Summary

• Non-directed change (random motion)permits escape from local optima andspurious states

• Pseudo-temperature can be controlled toadjust relative degree of exploration andexploitation

The Cognitive Inversion Lecture 13 - UTK · The Cognitive Inversion ... 1

Documents

Transcript of The Cognitive Inversion Lecture 13 - UTK · The Cognitive Inversion ... 1