Feedback Networks and Hopfield Networks - KTH
Transcript of Feedback Networks and Hopfield Networks - KTH
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Feedback Networks and Hopfield Networks
Erik Fransén, Daniel Gillblad
CB, KTH
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Outline
1 Reducing the Boltzmann machine
2 Hopfield Networks
3 Hebbian Learning
4 Beyond the Boltzmann machine
Some examples and images from MacKay (2003): Information Theory,Inference, and Learning Algorithms, a good introduction to machine learningand information theory.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Reducing the Boltzmann machine
Problem of the Boltzmann machine: Number of computationsgrows exponentially with problem size.
Reducing the topology: The restricted Boltzmann machine(no connections between hidden units)Replacing the stochastic variables with their mean: TheHopfield network
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Replacing stochastic variables
The activation (input to a neuron) from other neurons isstochastic. Replace ai = ∑j wijxj with < ai >The state of a neuron is stochastic. Replace xj with < xj >To do this, mean field theory is used.Simply stated: the average of the function value isapproximated by the function value of the average.+ much faster- only first order (and with some tricks also second order)correlations can be handled
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Hopfield Networks
A fully connected feedback network.The weights are constrained to be symmetric.Can be used
As nonlinear associative memories or content-addressablememories.To solve optimization problems.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Discrete Hopfield Networks, Notation
We will denote weights from neuron i to neuron j as wij .The network consists of I fully connected neurons troughsymmetric connections, i.e. wij = wji .There are no self connections, thus wii = 0.Biases wi0 may be included.The activity of a neuron is written as xi .
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Discrete Hopfield Networks, Activities
A Hopfield network’s activity rule is for each neuron toupdate its state in accordance with a thresholdingactivation function,
x(a) = Θ(a)≡{
1−1
a≥ 0a < 0
As there is feedback, we need to define an order for theupdates:
Synchronous updates. All neurons compute theiractivationsai = ∑j wijxjthen update their states simultaneously usingxi = Θ(ai )Asyncronous updates. One neuron at a time updates itsactivation and state. The sequence can be fixed or random.
The properties of the network may be sensitive to updatestrategy.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Discrete Hopfield Networks, Convergence
Let us update one neuron at a time according to theupdate rule.The network state will converge within a finite number ofsteps if
wij = wji wii = 0
Define an energy measure,
E =−12 ∑
i ,jwijxixj
The change in energy for each update is
∆Exk→x∗k =−(∑i
wkixix∗k −∑i
wkixixk ) =−(x∗k −xk )∑i
wkixi ≤ 0
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Hebbian Learning
Hebb’s postulate of learning is the oldest and most famousof all learning rules (Hebb, 1949):When an axon of cell A is near enough to excite a cell B and repeatedlyor persistently takes part in firing it, some growth process or metabolicchanges take place in one or both cells such that A’s efficiency as oneof the cells firing B, is increased.
We can reformulate this into1 If two neurons on either side of a connection are activated
simultaneously, the strength of the connection is increased.2 If two neurons on either side of a connection are activated
asynchronously, the connection is weakened or eliminated.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Hebbian Connections
Four key properties:1 Time-dependant mechanism. Modification depends on the
time of occurrence.2 Local mechanism. Only uses locally available information.3 Interactive mechanism. A change depends on activity
levels of both sides of the connection.4 Correlational mechanism. Connection change depends on
correlation between activities.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Hebbian Learning and Correlation
Hebbian learning can be described in terms of correlation.Positively correlated activites are increased
dwij
dt∼ Correlation(xi ,xj)
For example,
dwij
dt= ηcov(xi ,xj) = ηE [(xi − x̄i)(xj − x̄j)]
If two stimuli co-occur, the Hebbian learning rule willincrease the weights.Unsupervised learning.Can be used to provide pattern completion.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Associative Networks
Heteroassociation: Mapping from one pattern to another,e.g.y = sign(Wx) (Thresholding)y = Wx (Linear mapping)Autoassociation: Mapping to the same pattern, e.g.x = sign(Wx)x = WxCan be performed with a recurrent network.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Discrete Hopfield Networks, Learning
The learning rule is intended to make a set of desiredmemories {x(n)} be stable states of the activity rule.Each memory is a binary pattern, xi ∈ {−1,1}.The weights are set using the sum of outer products(Hebbian learning),
wij = η ∑n
x (n)i x (n)
j
where η is an unimportant constant.To prevent the weights from growing with the number ofpatterns, η is often set to 1
N .
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Continuous Hopfield Networks
Using the same architecture and learning rule as in thediscrete Hopfield network, we can define a continuousHopfield network.Activities are real numbers between -1 and 1.Update activities as single neurons with sigmoid activationfunctions:
Synchronous or asynchronous updates. Activations areagain calculated asai = ∑j wijxjbut neurons use the activation functionxi = tanh(ai )
Although the learning rule is the same as before, the valueof η now becomes important.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Attractors
Use local minima to store patterns.The input of the network is the initial state.The output of the network is the closest local minima.There is an area of attraction around each local minima.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Attractors, Example 1
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Attractors, Example 2
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Attractor Types
“Normal” attractors,
Point attractors.Limit cycles.
Strange attractors (the system itself is chaotic),
Sensitive dependence on initial conditions.Deterministic, but can exhibit a behaviour so complicatedthat it looks random.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Associative Memories
For simple Hopfield networks, it often takes just oneiteration to converge from a randomly perturbed storedpattern.The network often has more stable states in adition to thedesired memories:
The inverse of a stable state.Mixtures of the memories.
Introducing “Brain-damage” by setting a subset of thelearned weights to zero often allows the network to stillcomplete the patterns.Patterns can usually be added up to a certain point, atwhich the network fails catastrophically.Network properties are not robust when changing fromasynchronous to synchronous updates.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Examples, auto associative memory
�����
������������� ������������������������������������ ������������� �� �� �� �� ������������������������ ��������������������� �� �� �� �� ������������������������ ��������������������� �� �� �� �� ������������������������ ��������������������� �� ���� ��������������������������������������� �� �� ����������������������������������������� ������� �� �� �� ��������������������������������������� �������� �� �� �� �������������������������������������� ������� �� �� ���������������������������������������� �� ����� �� �� �� ��������������������������������������� �������� �� �� ����������������������������������������� ���������������������� �������������������������������������������������������������������������������������������������� ������ �� ������������������������������� ��� �� �� �� ��������������������������������������� ����������������� �� ���� ����������������������������������������������������� ������������������������������������ �� �� �� �������������������������������������� ����������������� ������ �� ������������������������������� ��� �� �� �� ��������������������������������������� ������� �� �� ���������������������������������������� �� ��������������� �� �� �� �� ������������������������ ��������������������� �� �� �� �� ������������������������ ������������������ ������������������ ����������������������������� �� �� ����������������������������������������� ������
����� �
�! "� �
�!#$� � �
��%&� �
�(')� �
��*+� �
��,�� �
�(-.� � �0/1� � �(23� � �
�(4.� � � �(56� � �
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Examples, hetero associative memory
moscow------russia
lima----------peru
london-----england
tokyo--------japan
edinburgh-scotland
ottawa------canada
oslo--------norway
stockholm---sweden
paris-------france
moscow---????????? ⇒ moscow------russia
?????????---canada ⇒ ottawa------canada
otowaa------canada ⇒ ottawa------canada
egindurrh-scotland ⇒ edinburgh-scotland
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Hopfield Networks and Optimization
Since a Hopfield network minimizes an energy function, wecan map some optimization problems onto Hopfieldnetworks.Travelling Salesman:
B
C
D
2 3 41
A
������������ ������
� � ��
D
C
A
B
� �����
B
C
D
2 3 41
A
������������� ���� �
� ! ��
D
C
A
B
� �#"#�
�%$ �
C
D
A
B
2 3 41
� �&�
C
D
A
B
2 3 41
')(+*-,
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Capacity of Hopfield Networks
The standard learning method gives us local minima in thecorrect places.However, retreiving patterns can fail in numerous ways:
1 Individual bits in some memories might be corrupted. Astable state of the network is displaced a little from thedesired memory.
2 Entire memories might be absent from the set of attractors.3 Spurious additional memories unrelated to desired
memories might be present.4 Spurious additional memories derived from the desired
memories through operations such as mixing and inversionmay be present.
These are in general all undesirable, although failure type4 may be regarded as generalization.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Examples on capacity
moscow------russia
lima----------peru
london-----england
tokyo--------japan
edinburgh-scotland
ottawa------canada
oslo--------norway
stockholm---sweden
paris-------france
→W→
moscow------russia
lima----------peru
londog-----englard
tonco--------japan
edinburgh-scotland
lostoslo--------norway
stockholm---sweden
paris-------france
wrpkmh---xqpqwqxpq
paris-------sweden
ecnarf-------sirap
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
More Examples on capacity, 1
�����
������������ � ������� ��������� ��������� ���������� �������������������������������������� ����������������������������������� �!�����"������� ��������������� ������������������������ �#�$������ ��!���������������������������"���$��������������������������� $����������������%��������� � ������������������ ���������������������� �������������&��������������������������������� ��� ��������� � �������������������������������������������������� ��������������������� ��������$������������"������� �������'�$������ ���������������������� �����"�������������(���������������'�)������� �� ������������ *� �������������������� ����� ����������������������� ���������������$�"����"�������������$�����������������������������������"����"����������������+���,������������' ���� �����$������������ � ��-�#��� ��� �������������� �����������������������������"����"���������' (�"��������"����. �������������������������� ����������������� �������������' /�� $�������� �������������������"�����������������"������������� 0������������������������������������ ������������� ����������� �����1�2�����������������������������������$������������"��� *�'����������3����������������� ����������� ���� �+����������'����������������������������������������������$�� *����� ������������ �����������,�� ��������������������������' *�"�� ����������(������������� 0����������-����� ��������������' �������������"���$������������������������,�4�� ����� �#�� *�"���������������������������������������������������� �,�3���� ���� �+���$������ *�"����+��������������������$����� ������������������������$�"����"�������������$�������������������������������
��54� 6 6
��7�� 6 6
��8�� 6
��9�� 6
�:;� 6 6
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
More examples on capacity, 2
������������� ��� �����������
� � � � �
� � � � �
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Consequences of Capacity Calculations, 1
If we try to store N ' 0.18I then about 1% of the bits will beunstable after the first iteration, starting from a stablepattern.When N/I is large, unstable bits may cause an avalancheeffect of bits becoming unstable during iteration.There is a sharp discontinuity at
Ncritical = 0.138I
When N/I exceeds 0.138, the system only has spuriousstates.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Consequences of Capacity Calculations, 2
0
0.2
0.4
0.6
0.8
1
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160.95
0.96
0.97
0.98
0.99
1
0.09 0.1 0.11 0.12 0.13 0.14 0.15
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Transition Properties
For all N/I stable states uncorrelated with the desiredmemories exist.
For N/I ∈ (0,0.138) there are stable states close tothe desired memories.
For N/I ∈ (0,0.05) the desired memories have lowerenergy than the uncorrelated states.
For N/I ∈ (0.05,0.138) the uncorrelated statesdominate.
For N/I ∈ (0,0.03) there are additional mixture states.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Capacity of Random Patterns, 1
Assume that the patterns we want to store are randombinary patterns.Let us study the stability of a single bit, assuming that thestate of the network is set to the desired pattern x(n).The activation of a particular neuron is
ai = ∑j
wijx(n)j
and the weights are (for i 6= j)
wij = x (n)i x (n)
j + ∑m 6=n
x (m)i x (m)
j
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Capacity of Random Patterns, 2
We split W into two terms, one representing “signal”reinforcing the desired memory, and the second “noise”.The activation becomes
ai = ∑j 6=i
x (n)i x (n)
j x (n)j + ∑
j 6=i∑
m 6=nx (n)
i x (n)j x (n)
j
= (I−1)x (n)i + ∑
j 6=i∑
m 6=nx (m)
i x (m)j x (m)
j
The first term is (I−1) times the desired state x (n)i .
The second term is a sum of (I−1)(N−1) randomquantities x (m)
i x (m)j x (m)
j . These are independent randomvariables with mean 0 and variance 1.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Capacity of Random Patterns, 3
We can conclude that ai has mean (I−1)x (n)i and variance
(I−1)(N−1).Assume that I and N are large enough so that we canassume that the distinction to (I−1) and (N−1) arenegligable.This means that ai is approximately Gaussian distributedwith mean Ix (n)
i and variance IN.The probability that bit i will flip is
P(i flip) = Φ
(− I√
IN
)= Φ
(− 1√
N/I
)where
Φ(z) =∫ z
−∞
dz1√2π
e−z2/2
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Increasing Capacity in Hopfield Networks
We can increase the capacity of the network if we abandonthe Hebbian learning rule, and instead use an objectivefunction that measures how well the patterns are storedand minimizing it.For all patterns x(n), if all other neurons are set correctly,the activation of neuron i should be such that xi = x (n)
i ,
Cost(W) = ∑i ∑n tni ln(y (n)
i ) + (1− t(n)n ln(1−y (n)
i ))
t(n)i =
{1 x (n)
i = 10 x (n)
i =−1
y (n)i = 1
1+e−a(n)i
Parameters can be found using gradient descent.
Erik Fransén, Daniel Gillblad Feedback networks
Reducing the Boltzmann machineHopfield NetworksHebbian Learning
Beyond the Boltzmann machine
Beyond the Boltzmann machine
The Boltzmann machine is a special case of an undirectedgraphical model, it belongs to a class called Markov RandomFields.More about (statistical) belief networks in a later lecture.
Erik Fransén, Daniel Gillblad Feedback networks