Prediction, Control and Decisions Kenji Doya [email protected]
description
Transcript of Prediction, Control and Decisions Kenji Doya [email protected]
![Page 1: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/1.jpg)
Prediction, Control and DecisionsKenji Doya
Initial Research Project, OISTATR Computational Neuroscience LaboratoriesCREST, Japan Science and Technology Agency
Nara Institute of Science and Technology
![Page 2: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/2.jpg)
Outline
Introduction
Cerebellum, basal ganglia, and cortex
Meta-learning and neuromodulators
Prediction time scale and serotonin
![Page 3: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/3.jpg)
Learning to Walk (Doya & Nakano, 1985)
Action: cycle of 4 posturesReward: speed sensor output
Multiple solutions: creeping, jumping,…
QuickTime˛ Ç∆H.263 êLí£ÉvÉçÉOÉâÉÄǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
![Page 4: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/4.jpg)
Learning to Stand Up (Morimoto &Doya, 2001)
QuickTime˛ Ç∆ÉVÉlÉpÉbÉN êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇ…ÇÕïKóvÇ≈Ç∑ÅB
QuickTime˛ Ç∆ÉVÉlÉpÉbÉN êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇ…ÇÕïKóvÇ≈Ç∑ÅB
early trials
after learning Reward: height of the headNo desired trajectory
![Page 5: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/5.jpg)
Framework for learning state-action mapping (policy) by exploration and reward feedback
Criticreward prediction
Actoraction selection
Learningexternal reward rinternal reward : difference from prediction
Reinforcement Learning (RL)
environment
reward r
action a
state s
agentcritic
actor
![Page 6: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/6.jpg)
Reinforcement Learning Methods
Model-free MethodsEpisode-based
parameterize policy P(a|s; )Temporal difference
state value function V(s)(state-)action value function Q(s,a)
Model-based methodsDynamic Programming
forward model P(s’|s,a)
![Page 7: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/7.jpg)
Temporal Difference Learning
Predict reward: value functionV(s) = E[ r(t) + r(t+1) + 2r(t+2)…| s(t)=s]Q(s,a) = E[ r(t) + r(t+1) + 2r(t+2)…| s(t)=s, a(t)=a]
Select actiongreedy: a = argmax Q(s,a)Boltzmann: P(a|s) exp[ Q(s,a)]
Update prediction: TD error(t) = r(t) + V(s(t+1)) - V(s(t))V(s(t)) = (t)Q(s(t),a(t)) = (t)
![Page 8: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/8.jpg)
Dynamic Programming and RL
Dynamic Programmingmodel-based, off-line
solve Bellman equationV(s) = maxa s’ [ P(s’|s,a) {r(s,a,s’) + V(s’)}]
Reinforcement Learningmodel-free, on-line
learn by TD error(t) = r(t) + V(s(t+1)) - V(s(t))V(s(t)) = (t)Q(s(t),a(t)) = (t)
![Page 9: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/9.jpg)
Discrete vs. Continuous RL(Doya, 2000)
Discrete time
Continuous time
€
V (x) = E r(t) + γr(t + Δt) + γ 2r(t + 2Δt) + ...[ ]
δ(t) = r(t) + γV (t + Δt) −V (t)
V(x) = es−tτ r(s)ds
t
∞
∫δ(t) =r(t)+ ˙ V (t) −
1τ
V(t)
τ=Δt
1−γ, γ =1−
Δtτ
![Page 10: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/10.jpg)
Questions
Computational QuestionsHow to learn:
direct policy P(a|s)value functions V(s), Q(s,a)forward models P(s’|s,a)
When to use which method?Biological Questions
Where in the brain?How are they represented/updated?How are they selected/coordinated?
![Page 11: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/11.jpg)
Brain HierarchyForebrainCerebral cortex (a)
neocortexpaleocortex: olfactory cortex archicortex: basal forebrain,
hippocampusBasal nuclei (b)
neostriatum: caudate, putamenpaleostriatum: globus pallidusarchistriatum: amygdala
Diencephalonthalamus (c)hypothalamus (d)
Brain stem & CerebellumMidbrain (e)Hindbrain
pons (f)cerebellum (g)
Medulla (h)Spinal cord (i)
![Page 12: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/12.jpg)
Just for Motor Control?(Middleton & Strick 1994)
Basal ganglia (Globus Pallidus)
Prefrontal cortex (area46)
Cerebellum (dentate nucleus)
![Page 13: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/13.jpg)
thalamus
SN
IO
Cortex
BasalGanglia
Cerebellum
target
error+
-
outputinput
Cerebellum: Supervised Learning
reward
outputinput
Basal Ganglia: Reinforcement Learning
Cerebral Cortex : Unsupervised Learning
outputinput
Specialization by Learning Algorithms
(Doya, 1999)
![Page 14: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/14.jpg)
Cerebellum
Purkinje cells~105 parallel fiberssingle climbing fiberlong-term depression
Supervised learningperceptron hypothesisinternal models
![Page 15: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/15.jpg)
early learning after learning
Internal Models in the Cerebellum
(Imamizu et al., 2000)
Learning to use ‘rotated’ mouse
![Page 16: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/16.jpg)
Motor Imagery (Luft et al. 1998)
Finger movement Imagery of movement
![Page 17: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/17.jpg)
Basal Ganglia
Striatumstriosome & matrixdopamine-dependent plasticity
Dopamine neuronsreward-predictive response
TD learning
![Page 18: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/18.jpg)
(a) äwèKëO
(b) äwèKå„
(c) ïÒèVǻǵ
ïÒèV r
ÉhÅ[ÉpÉ~Éìç◊ñE
ïÒèVó\ë™ V
ïÒèV r
ÉhÅ[ÉpÉ~Éìç◊ñE
ïÒèVó\ë™ V
ïÒèV r
ÉhÅ[ÉpÉ~Éìç◊ñE
ïÒèVó\ë™ V
r
V
r
V
r
V
Dopamine Neurons and TD Error
(t) = r(t) + V(s(t+1)) - V(s(t))before learning
after learning
omit reward
(Schultz et al. 1997)
![Page 19: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/19.jpg)
Reward-predicting Activities of Striatal Neurons
Delayed saccade task (Kawagoe et al., 1998)
Not just actions, but resulting rewards
Reward: Right Up Left Down All
Target: Right
Up
Left
Down
![Page 20: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/20.jpg)
Cerebral Cortex
Recurrent connectionsHebbian plasticity
Unsupervised learning, e.g., PCA, ICA
![Page 21: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/21.jpg)
Replicating V1Receptive Fields
(Olshausen & Field, 1996)
Infomax and sparsenessHebbian plasticity and recurrent inhibition
![Page 22: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/22.jpg)
Specialization by Learning?
Cerebellum: Supervised learningerror signal by climbing fibersforward model s’=f(s,a) and policy a=g(s)
Basal ganglia: Reinforcement leaningreward signal by dopamine fibersvalue functions V(s) and Q(s,a)
Cerebral cortex: Unsupervised learningHebbian plasticity and recurrent inhibitionrepresentation of state s and action a
But how are they recruited and combined?
![Page 23: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/23.jpg)
Multiple Action Selection Schemes
Model-freea = argmaxa Q(s,a)
Model-baseda = argmaxa [r+V(f(s,a))]
forward model: f(s,a) Encapsulation
a = g(s)
sa
Qs’a
Vai
f
s
sa
g
![Page 24: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/24.jpg)
Lectures at OCNC 2005
Internal models/CerebellumReza ShadmehrStefan SchaalMitsuo Kawato
Reward/Basal gangliaAndrew G. BartoBernard BalleinePeter DayanJohn O’DohertyMinoru KimuraWolfram Schultz
State coding/CortexNathaniel DawLeo SugrueDaeyeol LeeJun TanjiAnitha PasupathyMasamichi Sakagami
![Page 25: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/25.jpg)
Outline
Introduction
Cerebellum, basal ganglia, and cortex
Meta-learning and neuromodulators
Prediction time scale and serotonin
![Page 26: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/26.jpg)
Framework for learning state-action mapping (policy) by exploration and reward feedback
Criticreward prediction
Actoraction selection
Learningexternal reward rinternal reward : difference from prediction
Reinforcement Learning (RL)
environment
reward r
action a
state s
agentcritic
actor
![Page 27: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/27.jpg)
Reinforcement Learning
Predict reward: value functionV(s) = E[ r(t) + r(t+1) + 2r(t+2)…| s(t)=s]Q(s,a) = E[ r(t) + r(t+1) + 2r(t+2)…| s(t)=s, a(t)=a]
Select actiongreedy: a = argmax Q(s,a)Boltzmann: P(a|s) exp[ Q(s,a)]
Update prediction: TD error(t) = r(t) + V(s(t+1)) - V(s(t))V(s(t)) = (t)Q(s(t),a(t)) = (t)
![Page 28: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/28.jpg)
Cyber Rodent Project
Robots with same constraint as biological agents
What is the origin of rewards?What to be learned, what to be evolved?
Self-preservationcapture batteries
Self-reproductionexchange programs through IR ports
![Page 29: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/29.jpg)
Cyber Rodent: Hardware
camera range sensor proximity sensors gyro
battery latch two wheels
IR port speaker microphones R/G/B LED
![Page 30: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/30.jpg)
Evolving Robot Colony
Survivalcatch battery packs
Reproductioncopy ‘genes’ through IR ports
QuickTime˛ Ç∆YUV420 ÉRÅ[ÉfÉbÉN êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
QuickTime˛ Ç∆YUV420 ÉRÅ[ÉfÉbÉN êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
![Page 31: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/31.jpg)
Discounting Future Reward
large small
QuickTime˛ Ç∆DV/DVCPRO - NTSC êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
QuickTime˛ Ç∆DV/DVCPRO - NTSC êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
![Page 32: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/32.jpg)
Setting of Reward Function
Reward r = rmain + rsupp - rcost
e.g., reward for vision of battery
QuickTime˛ Ç∆DV/DVCPRO - NTSC êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
![Page 33: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/33.jpg)
Reinforcement Learning of Reinforcement Learning (Schweighfer&Doya, 2003)
Fluctuations in the metaparameters correlate with average reward
reward
![Page 34: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/34.jpg)
Battery level
β
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
2
4
6
8
10
12
14
Randomness Control by Battery Level
Greedier action at both extremes
![Page 35: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/35.jpg)
Neuromodulators for Metalearning
(Doya, 2002)
Metaparameter tuning is critical in RLHow does the brain tune them?
Dopamine: TD error Acetylcholine: learning rate Noradrenaline: inv. temp. Serotonin: discount
![Page 36: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/36.jpg)
Learning Rate
V(s(t-1)) = (t)
Q(s(t-1),a(t-1)) = (t)small slow learninglarge unstable learning
Acetylcholine basal forebrainRegulate memory update and retention
(Hasselmo et al.)
LTP in cortex, hippocampustop-down and bottom-up information flow
![Page 37: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/37.jpg)
Inverse Temperature
Greediness in action selection
P(ai|s) exp[ Q(s,ai)]
small exploration
large exploitation
Noradrenaline locus coeruleusCorrelation with performance accuracy
(Aston-Jones et al.)
Modulation of cellular I/O gain(Cohen et al.)
-4 -2 0 2 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Q(s,a1)-Q(s,a
2)
P(a
1)
=0=1=10
![Page 38: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/38.jpg)
Serotonin dorsal rapheLow activity associated with impulsivity
depression, bipolar disordersaggression, eating disorders
Discount Factor
1 2 3 4 5 6 7 8 9 10
-1
-0.5
0
0.5
1
Time
Reward TextEnd
=0.5 =-0.093 V
1 2 3 4 5 6 7 8 9 10
-1
-0.5
0
0.5
1
Time
Reward TextEnd
=0.9 =+0.062 V
V(s(t)) = E[ r(t+1) + r(t+2) + 2r(t+3) + …]Balance between short- and long-term results
![Page 39: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/39.jpg)
TD Error
(t) = r(t) + V(s(t)) - V(s(t-1))
Global learning signal
reward prediction: V(s(t-1)) = (t)
reinforcement: Q(s(t-1),a(t-1)) = (t)
Dopamine substantia nigra, VTARespond to errors in reward predictionReinforcement of actions
addiction
![Page 40: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/40.jpg)
TD Model of Basal Ganglia(Houk et al. 1995, Montague et al. 1996, Schultz et al. 1997,...)
Striosome: state value V(s)Matrix: action value Q(s,a)
evaluation
action selection
state representation
actionoutput
sensoryinput
TD signal
Cerebral cortex
Striatum
Dopamine neurons
reward SNr, GP
Thalamus
s
V(s)
DA neurons: TD error
r
Q(s,a) a
SNr/GPi: action selection: Q(s,a) a
NA?
Ach?
5-HT?
![Page 41: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/41.jpg)
Possible Control of Discount Factor
Modulation of TD error
Selection/weighting of parallel networks
V1 V2 V31 2 3
striatum
Dopamineneurons (t)
V(s(t))
V(s(t+1))
€
(t) = r(t) + γV (s(t +1)) −V (s(t))
![Page 42: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/42.jpg)
Markov Decision Task(Tanaka et al., 2004)
State transition and reward functions
Stimulus and response
![Page 43: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/43.jpg)
Behavior Results
All subjects successfully learned optimal behavior
![Page 44: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/44.jpg)
Block-Design Analysis
SHORT vs. NO (p < 0.001 uncorrected)
LONG vs. SHORT (p < 0.0001 uncorrected)
OFC Insula Striatum Cerebellum
CerebellumStriatum Dorsal rapheDLPFC, VLPFC, IPC, PMd
Different brain areas involved in immediate and future reward prediction
![Page 45: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/45.jpg)
Ventro-Dorsal Difference
Lateral PFC Insula Striatum
![Page 46: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/46.jpg)
Estimate V(t) and (t) from subjects’ performance dataRegression analysis of fMRI data
Model-based Regressor Analysis
fMRI data
Policy
reward r(t)
state s(t)
action a(t)
TD error (t)
Agent
Value functionV(s)
Value functionV(s)
TD error (t)
Environment
20yen
![Page 47: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/47.jpg)
Explanatory Variables (subject NS)
Reward prediction V(t)
= 0
= 0.3
= 0.6
= 0.8
= 0.9
= 0.99
Reward prediction error t
= 0
= 0.3
= 0.6
= 0.8
= 0.9
= 0.99
1 312trial
![Page 48: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/48.jpg)
Regression Analysis
mPFC Insula
x = -2 mm x = -42 mm
Reward prediction
V
Reward prediction error
Striatum
z = 2
![Page 49: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/49.jpg)
Tryptophan Depletion/Loading
Tryptophan: precursor of serotonindepletion/loading affect central serotonin levels(e.g. Bjork et al. 2001, Luciana et al. 2001)
100 g of amino acid drinkexperiments after 6 hours
Day2: Tr0 Day3: Tr+Day1: Tr-
10.3g of tryptophan (Loading)
No tryptophan (Depletion)
2.3g of tryptophan(Control)
![Page 50: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/50.jpg)
Blood Tryptophan LevelsBlood Tryptophan Levels
N.D. (< 3.9 g/ml)
![Page 51: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/51.jpg)
Delayed Reward Choice TaskDelayed Reward Choice Task
![Page 52: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/52.jpg)
Delayed Reward Choice Task
Sessions
Initial black patches
Patches/step
Yellow White Yellow White
1,2,7,8 72 24
18 9 8 2 6 2
3 72 24
18 9 8 2 14 2
4 72 24
18 9 16 2 14 2
5,6 72 24
18 9 16 2 6 2
yellow: large reward with long delaywhite: small reward with short delay
![Page 53: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/53.jpg)
Choice Behaviors
Shift of indifference linenot consistent among 12 subjects
![Page 54: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/54.jpg)
Modulation of Striatal Response
Tr0
0.990.90.80.70.6
Tr- Tr+
![Page 55: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/55.jpg)
Modulation by Tr Levels
QuickTime˛ Ç∆TIFFÅiLZWÅj êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
QuickTime˛ Ç∆TIFFÅiLZWÅj êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
QuickTime˛ Ç∆TIFFÅiLZWÅj êLí£ÉvÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
![Page 56: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/56.jpg)
Changes in Correlation CoefficientChanges in Correlation Coefficient
= 0.6(28, 0, -4)
= 0.99(16, 2, 28)
Tr- < Tr+correlation with V at large in dorsal Putamen
Tr- > Tr+correlation with V at small in ventral PutamenR
egre
ssio
n s
lop
eR
egre
ssio
n s
lop
e
ROI (region of interest) analysis
![Page 57: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/57.jpg)
Summary
Immediate rewardlateral OFC
Future rewardparietal, PMd, DLPFlateranl cerebellumdorsal raphe
Ventro-dorsal gradientinsulastriatum
Serotonergic modulation
![Page 58: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/58.jpg)
Outline
Introduction
Cerebellum, basal ganglia, and cortex
Meta-learning and neuromodulators
Prediction time scale and serotonin
![Page 59: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp](https://reader033.fdocuments.us/reader033/viewer/2022051018/56814af9550346895db80a40/html5/thumbnails/59.jpg)
Collaborators
Kyoto PUMMinoru KimuraYasumasa Ueda
Hiroshima UShigeto YamawakiYasumasa OkamotoGo OkadaKazutaka UedaShuji AsahiKazuhiro Shishida
ATRJun MorimotoKazuyuki Samejima
CRESTNicolas SchweighoferGenci Capi
NAISTSaori Tanaka
OISTEiji UchibeStefan Elfwing