Rouault sfn2014
-
Upload
clemente-thorez -
Category
Documents
-
view
32 -
download
0
Transcript of Rouault sfn2014
INTEGRATION OF VALUES AND INFORMATION
IN DECISION-MAKING
1
Marion ROUAULT, Jan DRUGOWITSCH and Étienne KOECHLIN
Laboratoire de Neurosciences Cognitives INSERM U960,
Ecole Normale Supérieure, Paris
Neural bases of action outcomes evaluation
2
Fronto-striatal loops
• Executive control of behavior relies on evaluation of action outcomes to adjust subsequent action
Atlas Yelnik and Bardinet
Striatum
Dopaminergic system: reward processing
Ventromedial prefrontal cortex
Action outcomes may convey two types of value signals:
- “Rewarding” value : valorisation for the action outcome over an axis of subjective preferences - “Informational” value : information transmitted by the action outcome about choice reliability (probability that, in the current situation, the chosen action was the most appropriate)
Working hypothesis
3
Reinforcement learningSimple, rapid, phylogenetically old
Bayesian inferenceSophisticated, rapidly saturated
How are processed rewarding and informational aspects of action outcomes? What are their neural and functional interaction?
Probabilistic reversal learning task
Correct state is rewarded 80 % of the time+ reversals
• States :
4
• Values : 2, 4, 6, 8, 10 € before decision, range 1:11 € after decision• Minimal instructions
3 conditionsManipulate separately values and information
Values provide no information about the most frequently rewarded state
Higher values are correlated with the most frequently rewarded state
Higher values are correlated with the less frequently rewarded state
5
Reward
80% 20%
20% 20%80%80%
Pro
ba
bili
ty
RewardReward
CONDITION CORRELATE
D
CONDITION RANDOM
CONDITION ANTI-CORRELATED
Behavior
6Subjects favor accuracy, “being correct”, over simply maximizing reward
Ch
oic
e %
of t
arg
et
with
b
est
exp
ec
ted
va
lue
Ch
oic
e %
of m
ost
fre
que
ntly
re
wa
rde
d t
arg
et
Trial number after contingency reversal
CONDITION ANTI-CORRELATED
CONDITION CORRELATE
D
CONDITION RANDOM
22 SUBJECTS
Variables contributive to choice?Logistic regressions
Co
ntr
ibu
tion
to
ch
oic
e
(be
ta w
eig
ht)
p r1 r2 EV1 EV2 xt-1 p r1 r2 EV1 EV2 xt-1 p r1 r2 EV1 EV2 xt-1
CONDITION CORRELATE
D
CONDITION ANTI-CORRELATED
CONDITION RANDOM
• Differential processing of rewards given the experimental condition: informational value• No computation of expected value
Choice models
• Optimal choice would be rational combination of probabilities and rewards:
Probability x Reward• However people’s behavior is usually suboptimal• To explain this sub-optimality, it is assumed that subjects
have distortions in their probabilities and rewards representations
8
Khaneman and Tversky 1979 Zhang and Maloney 2012
1000 simulations
SUBJECTSDISTORTIONS MODEL
Distortions modelCONDITION
ANTI-CORRELATEDCONDITION CORRELATE
D
CONDITION RANDOM
Ch
oic
e %
of t
arg
et
with
b
est
exp
ec
ted
va
lue
Ch
oic
e %
of m
ost
fre
que
ntly
re
wa
rde
d t
arg
et
Trial number after contingency reversal
Mixed model: integration of 2 concurrent systems for decision-making
Particularity of the protocol: possible rewards to gain are presented before choice:
RLBayesian inference
Combination of beliefs and reinforcement:Choice over: 0.75 BeliefBay + 0.25 QRL
Qt+1 = Qt + α (Rt – Qt)
(1 – w)Qt + wRt
with w biasing current expected returns
Revision of Qs before choice:Revision of beliefs before choice given reward distributions:
1000 simulations
SUBJECTSMIXED MODEL
Mixed modelCONDITION
ANTI-CORRELATEDCONDITION CORRELATE
D
CONDITION RANDOM
Ch
oic
e %
of t
arg
et
with
b
est
exp
ec
ted
va
lue
Ch
oic
e %
of m
ost
fre
que
ntly
re
wa
rde
d t
arg
et
Trial number after contingency reversal
Relative gain to a Bayesian model solely monitoring beliefs
LLH BIC AIC
DISTORTIONS MIXED
p = .057
Models comparison
p < .005
p < .05
Distortions might be better explained by a mixed model integrating two systems for decision-making
Mixed model without informational valueC
ho
ice
% o
f ta
rge
t w
ith
be
st e
xpe
cte
d v
alu
eC
ho
ice
% o
f m
ost
fre
que
ntly
re
wa
rde
d t
arg
et
Trial number after contingency reversal
CONDITION CORRELATE
D
CONDITION RANDOM
CONDITION ANTI-CORRELATED
SUBJECTSMIXED MODEL WITHOUT INFORMATIONAL VALUE
14
Informational value processing
p < .005 unc., c > 10 voxels, z = 40.
Small but significant positive correlation with informational value within dlPFC regions
Refaire extraction de betas dans GLM36
p < 0.005 unc. c > 10.
linear
quadratic
Neuro Imaging resultsBelief system RL system
Neuro Imaging results
16
Belief system: RL system:
Neural activations are coherent with a mixed model involving two systems for decision-making
Summary• The product of the distortions is actually explained
by an integration of two systems for decision-making
• Rewarding value: network involving
• Informational value: network involving dlPFC,
17
Reinforcement learningSimple, rapid, phylogenetically old
Bayesian inferenceSophisticated, rapidly saturated
Acknowledgments
18
Frontal lobe functions team
CONDITION CORRELATE
DCONDITION RANDOM
CONDITION ANTI-CORRELATED
Ch
oic
e r
eu
ros
wh
en
pre
sen
ted
Choice given reward presented
19
4 € 10 €
How much do you choose 10 euros, independently of your belief about the current state?
2 4 6 8 10
Remaining effect of rewarding value visible in condition random
Choices rather related to states
20
Reinforcement learning model
Computations associated with RL model:
with w biasing current expected returns
21
Action selection:
Generative model of the taskz STATE OF THE WORLD (NOT OBSERVED)
Variables contributive to choice?Logistic regressions
p r1 r2 xt-1 p r1 r2 xt-1 p r1 r2 xt-1
Co
ntr
ibu
tion
to
ch
oic
e
(be
ta w
eig
ht)
Co
ntr
ibu
tion
to
ch
oic
e
(be
ta w
eig
ht)
DISTORTIONS MIXED
22 subjects
19 subjects
p r1 r2 xt-1 p r1 r2 xt-1 p r1 r2 xt-1
1000 simulations
SUBJECTSDISTORTIONS MODEL
Distortions modelCONDITION
ANTI-CORRELATEDCONDITION CORRELATE
D
CONDITION RANDOM
Ch
oic
e %
of t
arg
et
with
b
est
exp
ec
ted
va
lue
Ch
oic
e %
of m
ost
fre
que
ntly
re
wa
rde
d t
arg
et
Trial number after contingency reversal
Mixed model: integration of 2 concurrent systems for decision-making
Particularity of the protocol: possible rewards to gain are presented before choice:
RLBayesian inference
Linear combination of beliefs and reinforcement:Choice over: 0.75 BeliefBay + 0.25 QRL
Revision of beliefs before choice given reward distributions:
Revision of Qs before choice:
with w biasing current expected returns