2012 Course: The Statistician Brain: The Bayesian ...

31
2012 Course: The Statistician Brain: The Bayesian Revolution in Cognitive Sciences Stanislas Dehaene Chair of Experimental Cognitive Psychology Lecture n°6 The neuronal implementation of Bayesian mechanisms Lecture material translated from the French version by CG Traduction & Interprétation

Transcript of 2012 Course: The Statistician Brain: The Bayesian ...

2012 Course:

The Statistician Brain: The Bayesian Revolution in Cognitive Sciences

Stanislas Dehaene

Chair of Experimental Cognitive Psychology

Lecture n°6

The neuronal implementation of Bayesian mechanisms

Lecture material translated from the French version by CG Traduction & Interprétation

What are the characteristics of a « Bayesian » decision?

Bayesian Inference: • All sensory cues provide

probabilistic evidence which constrains the interpretation of the physical world.

• These fragments of evidence combine with each other according to Bayes rule

• … and with the priors we may have on the state of the world

Gain Function:

• Motor tasks apply a gain or loss function

• We try to choose the action which maximizes expected gain.

Do our actions reflect the Bayes-optimal integration?

Körding & Wolpert (2004): Subjects attempt to reach a target while their finger is moved over an unknown distance Insertion of two sources of movement uncertainty: - The starting point is randomly moved (mean = 1 cm, standard deviation = 0.5 cm) - Midway through the experiment, subjects are given information on the position of their finger which can be:

- precise - uncertain - absent Kording, K. P., & Wolpert, D. M. (2004). Bayesian integration

in sensorimotor learning. Nature, 427(6971), 244-247.

Do our actions reflect the Bayes-optimal integration? What does the Bayesian model predict? -The a priori distribution of shifts is a Gaussian curve centered on +1 cm.

- Example of feedback received in the case of an initial shift of 2 cm. Bayes’ rule predicts a combination of two distributions according to the product rule. According to the level of uncertainty of the central point, subjects should compensate only in part for the perceived position.

Kording, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning. Nature, 427(6971), 244-247.

a priori

feedback

a posteriori

Do our actions reflect the Bayes-optimal integration?

Predictions of the Bayesian model: - when perception is precise, subjects compensate adequately for the initial side movement, and the final deviation from the target is minimal (purple curve) - as sensory uncertainty grows, subjects take less and less into account the information on the central point, and return to the prior (1 cm of deviation on average)

Final target deviation

Two other possible models - Full trade-off: subjects use all the available information on the central point. The average is accurate but mean error increases. - Learning on the basis of the final position: since it is given only in the “precise” condition, subjects should behave in the same way whatever the level of uncertainty.

Full trade-off model Learning model

Do our actions reflect the Bayes-optimal integration?

Predictions of the Bayesian model: - when perception is precise, subjects compensate adequately for the initial side movement, and the final deviation from the target is minimal (purple curve) - as sensory uncertainty grows, subjects take less and less into account the information on the central point, and return to the prior (1 cm of deviation on average)

Final target deviation

The results of the experiment comply with the Bayesian model.

Do our actions reflect the Bayes-optimal integration?

Experiment 2: non-Gaussian a priori distribution The optimal response is more difficult to apprehend. It is non-linearly dependent on the perceived position. This is precisely what we observed! The brain may be able to carry out Bayesian computations via average distributions (?)

un sujet le groupe

q(n)

w(n) . . . . . . .

Criteria small large

All decisions imply Bayesian inference: Evidence accumulated during a single trial

Central Integration (C) Motor stage (M)

Perceptual stage (P)

RT Distribution

Stimulus Response P

C

M

• Many Bayesian problems can therefore be reduced to an internal random walk: - Each sample positively or negatively modifies the accumulated evidence in favor of the different response options

- Optimal decisions can be made when the threshold of total accumulated evidence is reached.

• If there are several successive and independent samples, X1, X2, X3… then Bayes’ theorem is an easy way to accumulate this data:

• P(R| X1, X2, X3 …) is proportional to product P(R) P(X1 | R) P(X2| R) P(X3| R) … • With a logarithm, the evidence provided by each sample is summed up.

• Even the simplest decision (such as, are dots moving to the left or to the right?) may require the combination of several samples according to a Bayesian rule.

• In fact, the occurrence of fluctuations (in the stimulus or in the nervous system) leads to ambiguous decisions, and requires computing the most plausible response

The monkey must decide in which direction the dots move. Motion coherence modulate the firing rate of neural responses during the decision-making process.

Fixed Threshold Decision

Variable speed Accumulation

Prefrontal and parietal neurons compile statistics relevant to decision-making

Kim & Shadlen, Nature Neuroscience 1999

Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080.

Do neurons in the LIP area of the cortex compute a genuine Bayesian inference? Can they combine several sources of probabilistic information ?

Task inspired by the human task of “weather reporting”. In each round, the monkey is shown 4 shapes sequentially,

selected among 10 other possible shapes. Then the monkey makes a saccadic eye movement towards the

red or green dot. Reinforcement is not certain, but provided via a probabilistic rule

which depends on the sum of the cues provided by each shape.

The evidence (weight of evidence [WOE] or log posterior odds) is equal to the sum of the weights:

The number of shape combinations (104

permutations, 715 combinations) encourages the inference of a general rule on behavior.

Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080.

Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080.

After several thousand training exercises, monkeys respond in a more probabilistic and regular manner.

Their response rate is a sigmoidal function of the objective evidence [WOE].

It can be modelled through logistic regression.

Subjective weights assigned to each shape are closely correlated with the objective weights.

When neurons implement probabilistic reasoning

Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080.

Recording of neurons in the LIP area, where the receptor field includes one of the targets of the saccade.

Assumption: the rate of neural firing reflects the evidence accumulated at all times:

The quantity is approximately equal to the sum of

n cues received (exactly for n = 4). Indeed: 1. Neural firing reflects whether the selected

target is within the visual field or not.

When neurons implement probabilistic reasoning

Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080.

2. Neural firing tracks the value of the evidence at all times (divided in quintiles).

When neurons implement probabilistic reasoning

3. The firing rate of this neuron varies linearly as a function of the evidence.

Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080.

When neurons implement probabilistic reasoning

Movies of neural activity during several trials

Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080.

When neurons implement probabilistic reasoning

Movies of neural activity during several trials

Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080.

When neurons implement probabilistic reasoning

Movies of neural activity during several trials

Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075-1080.

When neurons implement probabilistic reasoning

More in-depth analyses demonstrate that -Each animal has learnt the subjective weight of each shape, deviating slightly from reality. - the identification of these subjective weights facilitates predictions on the final choice - and the predictions are further improved by fluctuations in the neural firing rate: “A variation of 1 spike per second from a single neuron was equivalent to 0.1 ban of evidence” Conclusions: -Simple decisions are indeed made on the basis of accumulated evidence -The firing rate of LIP neurons approximates the “random walk” process postulated in many decision-making models (See Ratcliff) - the neural firing rates seem to be proportional to the logarithm of the log likelihood ratio. - Results do not exclude simpler or complementary interpretations

- coding of expected reinforcement - “naïve” addition of the probability that a given shape leads to reinforcement.

How do neurons code and handle probability distributions?

In primates, neuronal circuits (certainly those of other species) must enable: 1. The representation of several probability

distributions 2. The calculation, on the basis of these

distributions, of: - the product of two distributions p(H|t,v) α p(H|t)P(H|v) - Or equivalent, the addition of their

logarithm, - Between sensory modalities or over

time 3. The incorporation of a prior

p(H|D) α p(D|H) p(H) 4.The identification of the maximum a

posteriori (MAP) of the distribution

Probability coding by a population of neurons

Beck, J. M., Ma, W. J., Kiani, R., Hanks, T., Churchland, A. K., Roitman, J., et al. (2008). Probabilistic population codes for Bayesian decision making. Neuron, 60(6), 1142-1152.

An external stimulus (such as the speed of a group of dots) is represented by a firing rate vector:

r={r1,r2,…,rn}

Average Activity Activity during a given

trial

-45 0 45 0

20

40

60

80

100

Stimulus -45 0 45

0

20

40

60

80

100

Preferred stimulus

Poisson’s law applied to the cortex

Trial 1

Trial 2

Trial 3

Trial 4

For an isolated neuron, the fluctuations in neural firing rate follow a “Poisson-like” law (belonging to the “exponential family” of distributions) p(r|s) (where r is the number of action potentials observed over a given time interval) follows an asymmetrical bell-shaped curve where variance is proportional to the mean. The proportionality factor, known as Fano’s factor, varies between 0.3 et 1.8.s

Courtesy of Alex Pouget

Probabilistic neuronal population codes

-45 0 45 0

20

40

60

80

100

Stimulus

Average activity

-45 0 45 0

20

40

60

80

100

Preferred stimulus

Activity during a given trial

( )|p s ∝r ( )|p sr-45 0 45 0

20

40

60

80

100

r

-45 0 45 0

0.02

0.04

Inferred stimulus

probability

Bayesian Decoder

p(s|r)

Preferred stimulus

Assumption: via Bayes’ rule, the firing rates of a neuronal population constitute a representation of probability distribution over the space of stimuli?

Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nat Neurosci, 9(11), 1432-1438.

An external stimulus (such as the speed of a group of dots) is represented by a firing rate vector:

r={r1,r2,…,rn}

Probabilistic neuronal population codes

-45 0 45 0

20

40

60

80

100

Preferred stimulus

Activ

ity

-45 0 45 0

0.02

0.04

Inferred stimulus

Bayesian Decoder

Reduced gain: Low certainty

P(s|

r)

-45 0 45 0

20

40

60

80

100

Preferred stimulus

Activ

ity

-45 0 45 0

0.02

0.04

Inferred stimulus

P(s|

r)

Bayesian Decoder

High gain: High certainty

g

s

The gain (or intensity) of neural firing rates automatically encodes the width of the Gaussian curve that plots the plausibility of stimuli: g is proportional to 1/σ²

Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nat Neurosci, 9(11), 1432-1438.

Automatic combination of two cues

-45 0 45 0 20 40 60 80

100

Preferred stimulus

Activ

ity

g=g1+g2

+ -45 0 45 0

20 40 60 80

100

Preferred stimulus

Activ

ity g1 Vision

-45 0 45 0 20 40 60 80

100

Activ

ity

g2 Touch

Preferred stimulus 2

1 gσ

∝ 1 2g g= +2 21 2

1 1σ σ

∝ +

If neural firing rate variability is part of the exponential family, then the arithmetic sum of the firing rates of two neuronal populations can automatically compute the product of the distributions they represent.

Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nat Neurosci, 9(11), 1432-1438.

Evidence Accumulation and Selection of the Optimal Response Beck, J. M., Ma, W. J., Kiani, R., Hanks, T., Churchland, A. K., Roitman, J., et al. (2008). Probabilistic population codes for Bayesian decision making. Neuron, 60(6), 1142-1152.

MT provides an instantaneous estimation of motion, with a firing rate that is contrast-dependent.

We would like the LIP area to accumulate evidence and code the probability distribution of each motion.

We would like the superior colliculus to only represent the maximum of this distribution.

An Additional Difficulty: Marginalization

Neural firing of MT neurons depends on the direction of motion (s) as well as on other « nuisance » parameters such as contrast (c). We would like to compute the a posteriori probability of speed, in view of the contrast and firing rates observed. Or better yet, the distribution of s where the nuisance variable c has been marginalized This would be ideal, if other areas of the brain are to use the signals of MT as a speed indicator, without having to factor in the level of contrast c Beck et al. observe that all such issues are solved when function p(r) belongs to the exponential family Poisson’s law, characteristic of the neural discharges experimentally observed, belongs to this family.

Beck, J. M., Ma, W. J., Kiani, R., Hanks, T., Churchland, A. K., Roitman, J., et al. (2008). Probabilistic population codes for Bayesian decision making. Neuron, 60(6), 1142-1152.

An Additional Difficulty: Marginalization

In this case, dependency on contrast disappears and the ensuing equation is: Here, the constant of proportionality depends on MT activity, not on contrast c. For the LIP area to represent the inferred distribution of speed, in view of the

samplings provided in succession by MT, it suffices that To prevent the saturation implied by this equation, it is possible to use global

recurrent inhibition which constantly subtracts the total activity of the LIP area (without loss of information).

And to select the global maximum, it is sufficient that the superior colliculus (1) does not receive any information before the threshold is crossed (2) is initialized with the LIP firing rates at that moment (3) relaxes via lateral connections, carefully chosen to implement a winner-take-all.

Beck, J. M., Ma, W. J., Kiani, R., Hanks, T., Churchland, A. K., Roitman, J., et al. (2008). Probabilistic population codes for Bayesian decision making. Neuron, 60(6), 1142-1152.

Simulation of the Proposed Network Beck, J. M., Ma, W. J., Kiani, R., Hanks, T., Churchland, A. K., Roitman, J., et al. (2008). Probabilistic population codes for Bayesian decision making. Neuron, 60(6), 1142-1152.

Simulated firing rates Actual recordings

Conclusion: Points to Remember Beck, J. M., Ma, W. J., Kiani, R., Hanks, T., Churchland, A. K., Roitman, J., et al. (2008). Probabilistic population codes for Bayesian decision making. Neuron, 60(6), 1142-1152.

The type of neural coding and network proposed by Alex Pouget et al. calculates exactly and at all times, the a posteriori probability distribution of the speed of the stimulus: -Even if the contrast varies constantly - without needing to know or estimate this contrast - without knowing how much time has elapsed since the start of the trial

Moreno-Bote et al (PNAS, 2011) show that similar principles apply to sampling when perception is multi-stable.

Xiao-Jing Wang et al. propose more realistic networks on the neurobiological level, which approximate a similar function.

This is only true if the variability of neural firing rates r in a neuron population follows a stimulus-specific distribution belonging to the “exponential family”.

Appendix: How to Model Bayesian sampling? Moreno-Bote, R., Knill, D. C., & Pouget, A. (2011). Bayesian sampling in visual perception. Proc Natl Acad Sci U S A, 108(30), 12491-12496.

For the model to sample the a posteriori distribution according to Bayes’ rule, it is necessary and sufficient that input variability is associated to stimulus s according to Bayes’ law:

or h(s) is a function of stimulus s which is only dependent on the tuning curves of neurons and on the covariance matrix of their firing rates. This family of distributions is an excellent approximation of actual neural firing (See Ma et al., Nat Neurosci 2006).

Appendix: How to Model Bayesian sampling? Moreno-Bote, R., Knill, D. C., & Pouget, A. (2011). Bayesian sampling in visual perception. Proc Natl Acad Sci

U S A, 108(30), 12491-12496.

The exponential family of distributions has two advantages: - it enables Bayesian sampling - as well as the Bayes-optimal combination of two cues

Alex Pouget’s assumption (Ma, Beck, Latham et Pouget, Nature Neuroscience 2006): The variability of neural firing rates automatically codes for probability distributions Furthermore, this variability is in the exact format required to simplify calculations - The addition of two neuron populations corresponds to the calculation of the

product of their probability distributions - The competition between the populations (winner take all) helps to choose,

approximately, the maximum probability stimulus (Denève et al., Nature Neuroscience, 1999).