The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim...

The free-energy principle: a rough guide to the brain?

K Friston

Summarized by Joon Shik Kim12.05.03.(Thu)

Computational Models of Intelligence

Sufficient Statistics

• Quantities which are sufficient to pa-rameterise a probability density (e.g., mean and covariance of a Gaussian density).

2

2

( )

21( )

2

x

p x e

Surprise

• or self-information is the negative log-probability of an outcome. An improbable outcome is there-fore surprising.

ln ( | )p y y

: sensory input

: action

Kullback-Leibler Divergence

• Information divergence, information gain, cross or relative entropy is a non-commutative measure of the dif-ference between two probability dis-tributions.

( )( || ) ( ) log

( )KL

p xD P Q p x dx

q x

Conditional Density

• or posterior density is the probability distribution of causes or model pa-rameters, given some data; i.e., a probabilistic mapping from observed data to causes.

( | ) ( )( | )

( )

P E H P HP H E

P E

( ) ( | ) ( )i ii

P E p E H p H

E

H

( )p H

( | )p H E

: evi-dence: hypothe-sis: prior

: posterior

Generative Model

• or forward model is a probabilistic mapping from causes to observed consequences (data). It is usually specified in terms of the likelihood of getting some data given their causes (parameters of a model) and priors on the parameters

( | ) ( | ) ( )p w D p D w p w

Prior

• The probability distribution or density on the causes of data that encode beliefs about those causes prior to observing the data.

( )p H or ( )p w

Bayesian Surprise

• A measure of salience based on the divergence between the recognition and prior densities. It measures the information in the data that can be recognised.

Entropy

• The average surprise of outcomes sampled from a probability distribu-tion or density. A density with low en-tropy means, on average, the out-come is relatively predictable.

( ) ln ( )S p x p x dx

Ergodic

• A process is ergodic if its long term time-average converges to its en-semble average. Ergodic processes that evolve for a long time forget their initial states.

1

0

1 1lim ( )

( )

nk

nk

f T x fdn x

Free-energy

• An information theory measure that bounds (is greater than) the surprise on sampling some data, given a gen-erative model.

( , | )F y E TS

ln ( , | ) ln ( , )q qp y q

( ( ; ) || ( | )) ln ( | )F D q p y p y m

Generalised Coordinates

• of motion cover the value of a vari-able, in its motion, acceleration, jerk and higher orders of motion. A point in generalised coordinates corre-sponds to a path or trajectory over time.

, ', '',...u u u u

Gradient Descent

• An optimization scheme that finds a minimum of a function by changing its arguments in proportion to the negative of the gradient of the func-tion at the current value.

( 1) ( )E

w t w tw

Helmholtz Machine

• Device or scheme that uses a gener-ative model to furnish a recognition density. They learn hidden structure in data by optimising the parameters of generative models.

Stochastic

• The successive states of stochastic processes that are governed by ran-dom effects.

Free Energy

Dynamic Model of World and Recognition

Neuronal Architecture

What is the computational role of neuromodulation?

• Previous treatments suggest that modulatory neurotransmitter have distinct roles; for example, ‘dopamine signals the error in reward prediction, serotonin controls the time scale of reward pre-diction, noradrenalin controls the randomness in action selection, and acetylcholine controls the speed of memory update. This contrasts with a single role in encoding precision above. Can the apparently diverse functions of these neuro-transmitters be understood in terms of one role (encoding precision) in different parts of the brain?

Can we entertain ambiguous per-cepts?

• Although not an integral part of the free-en-ergy principle, we claim the brain uses uni-modal recognition densities to represent one thing at a time. Although, there is compelling evidence for bimodal ‘priors’ in sensorimotor learning, people usually assume the ‘recogni-tion’ density collapses to a single percept, when sensory information becomes available. The implicit challenge here is to find any electrophysiological or psychological evi-dence for multimodal recognition densities.

Does avoiding surprise suppress salient information?

• No; a careful analysis of visual search and atten-tion suggests that: ‘only data observations which substantially affect the observer’s beliefs yield (Bayesian) surprise, irrespectively of how rare or informative in Shannon’s sense these observa-tions are.’ This is consistent with active sampling of things we recognize (to reduce free-energy). However, it remains an interesting challenge to formally relate Bayesian surprise to the free-en-ergy bound on (Shannon) surprise. A key issue here is whether saliency can be shown to depend on top-down perceptual expectations.

Which optimisation schemes does the brain use?

• We have assumed that the brain uses a de-terministic gradient descent on free-energy to optimise action and perception. However, it might also use stochastic searches; sampling the sensorium randomly for a percept with low free-energy. Indeed, there is compelling evidence that our eye movements implement an optimal stochastic strategy. This raises in-teresting questions about the role of stochas-tic searches; from visual search to foraging, in both perception and action.

The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim...

Documents

Transcript of The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim...