Rouault sfn2014

INTEGRATION OF VALUES AND INFORMATION

IN DECISION-MAKING

1

Marion ROUAULT, Jan DRUGOWITSCH and Étienne KOECHLIN

Laboratoire de Neurosciences Cognitives INSERM U960,

Ecole Normale Supérieure, Paris

Neural bases of action outcomes evaluation

2

Fronto-striatal loops

• Executive control of behavior relies on evaluation of action outcomes to adjust subsequent action

Atlas Yelnik and Bardinet

Striatum

Dopaminergic system: reward processing

Ventromedial prefrontal cortex

Action outcomes may convey two types of value signals:

- “Rewarding” value : valorisation for the action outcome over an axis of subjective preferences - “Informational” value : information transmitted by the action outcome about choice reliability (probability that, in the current situation, the chosen action was the most appropriate)

Working hypothesis

3

Reinforcement learningSimple, rapid, phylogenetically old

Bayesian inferenceSophisticated, rapidly saturated

How are processed rewarding and informational aspects of action outcomes? What are their neural and functional interaction?

Probabilistic reversal learning task

Correct state is rewarded 80 % of the time+ reversals

• States :

4

• Values : 2, 4, 6, 8, 10 € before decision, range 1:11 € after decision• Minimal instructions

3 conditionsManipulate separately values and information

Values provide no information about the most frequently rewarded state

Higher values are correlated with the most frequently rewarded state

Higher values are correlated with the less frequently rewarded state

5

Reward

80% 20%

20% 20%80%80%

Pro

ba

bili

ty

RewardReward

CONDITION CORRELATE

D

CONDITION RANDOM

CONDITION ANTI-CORRELATED

Behavior

6Subjects favor accuracy, “being correct”, over simply maximizing reward

Ch

oic

e %

of t

arg

et

with

b

est

exp

ec

ted

va

lue

Ch

oic

e %

of m

ost

fre

que

ntly

re

wa

rde

d t

arg

et

Trial number after contingency reversal


CONDITION CORRELATE

D

CONDITION RANDOM

22 SUBJECTS

Variables contributive to choice?Logistic regressions

Co

ntr

ibu

tion

to

ch

oic

e

(be

ta w

eig

ht)

p r1 r2 EV1 EV2 xt-1 p r1 r2 EV1 EV2 xt-1 p r1 r2 EV1 EV2 xt-1

CONDITION CORRELATE

D


CONDITION RANDOM

• Differential processing of rewards given the experimental condition: informational value• No computation of expected value

Choice models

• Optimal choice would be rational combination of probabilities and rewards:

Probability x Reward• However people’s behavior is usually suboptimal• To explain this sub-optimality, it is assumed that subjects

have distortions in their probabilities and rewards representations

8

Khaneman and Tversky 1979 Zhang and Maloney 2012

1000 simulations

SUBJECTSDISTORTIONS MODEL

Distortions modelCONDITION

ANTI-CORRELATEDCONDITION CORRELATE

D

CONDITION RANDOM

Ch

oic

e %

of t

arg

et

with

b

est

exp

ec

ted

va

lue

Ch

oic

e %

of m

ost

fre

que

ntly

re

wa

rde

d t

arg

et


Mixed model: integration of 2 concurrent systems for decision-making

Particularity of the protocol: possible rewards to gain are presented before choice:

RLBayesian inference

Combination of beliefs and reinforcement:Choice over: 0.75 BeliefBay + 0.25 QRL

Qt+1 = Qt + α (Rt – Qt)

(1 – w)Qt + wRt

with w biasing current expected returns

Revision of Qs before choice:Revision of beliefs before choice given reward distributions:

1000 simulations

SUBJECTSMIXED MODEL

Mixed modelCONDITION


D

CONDITION RANDOM

Ch

oic

e %

of t

arg

et

with

b

est

exp

ec

ted

va

lue

Ch

oic

e %

of m

ost

fre

que

ntly

re

wa

rde

d t

arg

et


Relative gain to a Bayesian model solely monitoring beliefs

LLH BIC AIC

DISTORTIONS MIXED

p = .057

Models comparison

p < .005

p < .05

Distortions might be better explained by a mixed model integrating two systems for decision-making

Mixed model without informational valueC

ho

ice

% o

f ta

rge

t w

ith

be

st e

xpe

cte

d v

alu

eC

ho

ice

% o

f m

ost

fre

que

ntly

re

wa

rde

d t

arg

et


CONDITION CORRELATE

D

CONDITION RANDOM


SUBJECTSMIXED MODEL WITHOUT INFORMATIONAL VALUE

14

Informational value processing

p < .005 unc., c > 10 voxels, z = 40.

Small but significant positive correlation with informational value within dlPFC regions

Refaire extraction de betas dans GLM36

p < 0.005 unc. c > 10.

linear

quadratic

Neuro Imaging resultsBelief system RL system

Neuro Imaging results

16

Belief system: RL system:

Neural activations are coherent with a mixed model involving two systems for decision-making

Summary• The product of the distortions is actually explained

by an integration of two systems for decision-making

• Rewarding value: network involving

• Informational value: network involving dlPFC,

17

Reinforcement learningSimple, rapid, phylogenetically old

Bayesian inferenceSophisticated, rapidly saturated

Acknowledgments

18

Frontal lobe functions team

CONDITION CORRELATE

DCONDITION RANDOM


Ch

oic

e r

eu

ros

wh

en

pre

sen

ted

Choice given reward presented

19

4 € 10 €

How much do you choose 10 euros, independently of your belief about the current state?

2 4 6 8 10

Remaining effect of rewarding value visible in condition random

Choices rather related to states

20

Reinforcement learning model

Computations associated with RL model:


21

Action selection:

Generative model of the taskz STATE OF THE WORLD (NOT OBSERVED)

Variables contributive to choice?Logistic regressions

p r1 r2 xt-1 p r1 r2 xt-1 p r1 r2 xt-1

Co

ntr

ibu

tion

to

ch

oic

e

(be

ta w

eig

ht)

Co

ntr

ibu

tion

to

ch

oic

e

(be

ta w

eig

ht)

DISTORTIONS MIXED

22 subjects

19 subjects

p r1 r2 xt-1 p r1 r2 xt-1 p r1 r2 xt-1

1000 simulations

SUBJECTSDISTORTIONS MODEL

Distortions modelCONDITION


D

CONDITION RANDOM

Ch

oic

e %

of t

arg

et

with

b

est

exp

ec

ted

va

lue

Ch

oic

e %

of m

ost

fre

que

ntly

re

wa

rde

d t

arg

et


Mixed model: integration of 2 concurrent systems for decision-making

Particularity of the protocol: possible rewards to gain are presented before choice:

RLBayesian inference

Linear combination of beliefs and reinforcement:Choice over: 0.75 BeliefBay + 0.25 QRL

Revision of beliefs before choice given reward distributions:

Revision of Qs before choice:


Rouault sfn2014

Documents

Transcript of Rouault sfn2014