Jeon Jun Shik Samsung Electronics Mobile neXt Generation Video Codec.
The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim...
-
Upload
patrick-potter -
Category
Documents
-
view
216 -
download
2
Transcript of The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim...
The free-energy principle: a rough guide to the brain?
K Friston
Summarized by Joon Shik Kim12.05.03.(Thu)
Computational Models of Intelligence
Sufficient Statistics
• Quantities which are sufficient to pa-rameterise a probability density (e.g., mean and covariance of a Gaussian density).
2
2
( )
21( )
2
x
p x e
Surprise
• or self-information is the negative log-probability of an outcome. An improbable outcome is there-fore surprising.
ln ( | )p y y
: sensory input
: action
Kullback-Leibler Divergence
• Information divergence, information gain, cross or relative entropy is a non-commutative measure of the dif-ference between two probability dis-tributions.
( )( || ) ( ) log
( )KL
p xD P Q p x dx
q x
Conditional Density
• or posterior density is the probability distribution of causes or model pa-rameters, given some data; i.e., a probabilistic mapping from observed data to causes.
( | ) ( )( | )
( )
P E H P HP H E
P E
( ) ( | ) ( )i ii
P E p E H p H
E
H
( )p H
( | )p H E
: evi-dence: hypothe-sis: prior
: posterior
Generative Model
• or forward model is a probabilistic mapping from causes to observed consequences (data). It is usually specified in terms of the likelihood of getting some data given their causes (parameters of a model) and priors on the parameters
( | ) ( | ) ( )p w D p D w p w
Prior
• The probability distribution or density on the causes of data that encode beliefs about those causes prior to observing the data.
( )p H or ( )p w
Bayesian Surprise
• A measure of salience based on the divergence between the recognition and prior densities. It measures the information in the data that can be recognised.
Entropy
• The average surprise of outcomes sampled from a probability distribu-tion or density. A density with low en-tropy means, on average, the out-come is relatively predictable.
( ) ln ( )S p x p x dx
Ergodic
• A process is ergodic if its long term time-average converges to its en-semble average. Ergodic processes that evolve for a long time forget their initial states.
1
0
1 1lim ( )
( )
nk
nk
f T x fdn x
Free-energy
• An information theory measure that bounds (is greater than) the surprise on sampling some data, given a gen-erative model.
( , | )F y E TS
ln ( , | ) ln ( , )q qp y q
( ( ; ) || ( | )) ln ( | )F D q p y p y m
Generalised Coordinates
• of motion cover the value of a vari-able, in its motion, acceleration, jerk and higher orders of motion. A point in generalised coordinates corre-sponds to a path or trajectory over time.
, ', '',...u u u u
Gradient Descent
• An optimization scheme that finds a minimum of a function by changing its arguments in proportion to the negative of the gradient of the func-tion at the current value.
( 1) ( )E
w t w tw
Helmholtz Machine
• Device or scheme that uses a gener-ative model to furnish a recognition density. They learn hidden structure in data by optimising the parameters of generative models.
Stochastic
• The successive states of stochastic processes that are governed by ran-dom effects.
Free Energy
Dynamic Model of World and Recognition
Neuronal Architecture
What is the computational role of neuromodulation?
• Previous treatments suggest that modulatory neurotransmitter have distinct roles; for example, ‘dopamine signals the error in reward prediction, serotonin controls the time scale of reward pre-diction, noradrenalin controls the randomness in action selection, and acetylcholine controls the speed of memory update. This contrasts with a single role in encoding precision above. Can the apparently diverse functions of these neuro-transmitters be understood in terms of one role (encoding precision) in different parts of the brain?
Can we entertain ambiguous per-cepts?
• Although not an integral part of the free-en-ergy principle, we claim the brain uses uni-modal recognition densities to represent one thing at a time. Although, there is compelling evidence for bimodal ‘priors’ in sensorimotor learning, people usually assume the ‘recogni-tion’ density collapses to a single percept, when sensory information becomes available. The implicit challenge here is to find any electrophysiological or psychological evi-dence for multimodal recognition densities.
Does avoiding surprise suppress salient information?
• No; a careful analysis of visual search and atten-tion suggests that: ‘only data observations which substantially affect the observer’s beliefs yield (Bayesian) surprise, irrespectively of how rare or informative in Shannon’s sense these observa-tions are.’ This is consistent with active sampling of things we recognize (to reduce free-energy). However, it remains an interesting challenge to formally relate Bayesian surprise to the free-en-ergy bound on (Shannon) surprise. A key issue here is whether saliency can be shown to depend on top-down perceptual expectations.
Which optimisation schemes does the brain use?
• We have assumed that the brain uses a de-terministic gradient descent on free-energy to optimise action and perception. However, it might also use stochastic searches; sampling the sensorium randomly for a percept with low free-energy. Indeed, there is compelling evidence that our eye movements implement an optimal stochastic strategy. This raises in-teresting questions about the role of stochas-tic searches; from visual search to foraging, in both perception and action.