Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.
description
Transcript of Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.
![Page 1: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/1.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 1 / 59
Actor-Critic models: from ventral striatal reward-related activity to robotics
simulations.
Dr. Mehdi Khamassi1,2
1LPPA, UMR CNRS 7152, Collège de France, Paris
2AnimatLab-LIP6 / SIMA-ISIR, Université Pierre et Marie Curie, Paris 6
![Page 2: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/2.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 2 / 59
OBJECTIVE
Help to understand how mammals can adapt their behavior in order to maximize reward obtained from the environment.
Help to understand brain mechanisms underlying these cognitive processes.
IntroIntroIntroIntro
![Page 3: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/3.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 3 / 59
OBJECTIVE
Challenging goal: different levels of decision, different learning
processes, different types of representation
Pluridisciplinary approach
Behavioral Neurophysiology Computational Modelling Autonomous Robotics
IntroIntroIntroIntro
![Page 4: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/4.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 4 / 59
ACTOR-CRITIC MODEL
CRITIC
Learns to
Predict reward
IntroIntroIntroIntro
• Developed in the AI community (RL)
• Explains some reward-seeking behaviors
• Resemblance with some part of the brain
(dopaminergic neurons & striatum)
ACTOR
Learns to
Select actions
![Page 5: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/5.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 5 / 59
Outline
1. Introduction How does an Actor-
Critic model work ?
2. Electrophysiology Reward predictions in
the rat ventral striatum
Intro
3. Computational modelling
An Actor-Critic model in a simulated robot
4. Discussion
IntroIntroIntro
![Page 6: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/6.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 6 / 59
The Actor-Critic model
Learning from reward
1
2
3
4
5Reward
1 2 3 4 5actions:reward
Intro
![Page 7: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/7.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 7 / 59
The Actor-Critic model
• Learning from reward
1
2
3
4
5Reward
1 2 3 4 5actions:
reinforcement
reward
rewardreinforcement
Intro
![Page 8: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/8.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 8 / 59
The Actor-Critic model
• Learning from reward
1
2
3
4
5Reward
1 2 3 4 5actions:
reinforcement
reward
rewardreinforcement
Pt-1reward prediction:
Rescorla and Wagner (1972).
Intro
![Page 9: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/9.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 9 / 59
The Actor-Critic model
• Temporal-Difference (TD) learning
1
2
3
4
5
Pt-1 Pt
Reward
1 2 3 4 5actions:reward
reward predictions:
rewardreinforcement
reinforcement ȓ
Sutton and Barto (1998).
Intro
![Page 10: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/10.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 10 / 59
The Actor-Critic model
• Analogy with dopaminergic neurons
rewardreinforcement
R S
Romo & Schultz (1990).Houk et al. (1995); Schultz et al. (1997).
+1
Intro
![Page 11: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/11.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 11 / 59
The Actor-Critic model
Analogy with dopaminergic neurons
rewardreinforcement
R S
+1
Romo & Schultz (1990).Houk et al. (1995); Schultz et al. (1997).
Intro
![Page 12: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/12.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 12 / 59
The Actor-Critic model
Analogy with dopaminergic neurons
rewardreinforcement
R S
0
Romo & Schultz (1990).Houk et al. (1995); Schultz et al. (1997).
Intro
![Page 13: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/13.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 13 / 59
The Actor-Critic model
Analogy with dopaminergic neurons
rewardreinforcement
R S
-1
Romo & Schultz (1990).Houk et al. (1995); Schultz et al. (1997).
Intro
![Page 14: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/14.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 14 / 59
The Actor-Critic model
Actor-Critic models
Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002).see Joel et al. (2002) for a review.
Dopaminergic neuron
Intro
![Page 15: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/15.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 15 / 59
The Actor-Critic model
Actor-Critic models
Dopaminergic neuron
Intro
P = 0 P = 0
P = 0 P = 0
r = 0
r = 1
L E
![Page 16: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/16.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 16 / 59
The Actor-Critic model
Actor-Critic models
Dopaminergic neuron
Intro
P = 0 P = 0
P = 0 P = 1
r = 0
r = 1
L E
11
![Page 17: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/17.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 17 / 59
The Actor-Critic model
Actor-Critic models
Dopaminergic neuron
Intro
P = 1 P = 0
P = 0 P = 1
r = 0
r = 1
L E
11
11
![Page 18: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/18.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 18 / 59
Adapted from Tierney (2006)
The rat brainIntro
![Page 19: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/19.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 19 / 59
Adapted from Voorn et al. (2004)
The striatumIntro
![Page 20: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/20.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 20 / 59
Ventral Striatum
Dopaminergic neurons (VTA / SNc)
Dorsal Striatum
Actions
ACTORCRITIC
The striatumIntro
(Barto, 1995; Houk et al., 1995; Montague et al., 1996; Schultz et al., 1997; Doya et al., 2002; O’Doherty et
al., 2004)
![Page 21: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/21.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 21 / 59
Learning based on reward prediction in VS...
... on dopamine reinforcements.
... modelled by Temporal Difference (TD)-learning
In the monkey: (Hikosaka et al., 1989; Hollerman et al., 1998; Kawagoe et al., 1998; Hassani et al., 2001; Cromwell and
Schultz, 2003)In the rat: (Carelli et al., 2000; Daw et al., 2002; Setlow et al.,
2003; Nicola et al., 2004; Wilson and Bowman, 2005)
(Barto, 1995; Houk et al., 1995; Schultz et al., 1997; Doya et al., 2002)
(Schultz et al., 1992; Satoh et al., 2003; Nakahara et al., 2004)
The striatumIntro
![Page 22: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/22.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 22 / 59
... using precise timing reward prediction in TD-learning
Adapted from (Suri and Schultz, 2001)
simulation of a TD-learning model
activity recorded from the monkey striatum
(Montague et al., 1996; Suri and Schultz, 2001; Perez-Uribe, 2001; Alexander and Sporns, 2002)
The striatumIntro
![Page 23: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/23.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 23 / 59
ElectrophysiologyMethods
Recording in the rat VS
Simple electrodes
Electrophysiology
![Page 24: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/24.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 24 / 59
ElectrophysiologyBehavioral methods
The plus-maze task
Electrophysiology
![Page 25: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/25.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 25 / 59
ElectrophysiologyBehavioral methods
immobilerunning
Box arrival
Time
Center departure
The plus-maze task
Electrophysiology
![Page 26: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/26.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 26 / 59
ElectrophysiologyResults
170 neurons 91 neurons with behavioral correlates
Departure Center Arrival
5
Time
Electrophysiology
![Page 27: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/27.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 27 / 59
ElectrophysiologyResults: Reward anticipation
Ventral striatal neuron.
Activity anticipating
each reward droplet.
Independent from
locomotor behavior.
Khamassi, Mulder et al. (in revision) J Neurophysiol.
Electrophysiology
![Page 28: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/28.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 28 / 59
ElectrophysiologyResults: Reward anticipation
Ventral striatal neuron.
Activity anticipating
each reward droplet.
Independent from
locomotor behavior.
Khamassi, Mulder et al. (in revision) J Neurophysiol.
Electrophysiology
![Page 29: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/29.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 29 / 59
ElectrophysiologyResults: Reward anticipation
Ventral striatal neuron.
Activity anticipating
each reward droplet.
Independent from
locomotor behavior.
Anticipation of an extra
reward.
Khamassi, Mulder et al. (in revision) J Neurophysiol.
Electrophysiology
![Page 30: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/30.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 30 / 59
Modelling with TD-learningResults
TD-learning
Temporal representation of stimuli (Montague et al., 1996).
Incomplete temporal representation
Ambiguous visual input
No spatial information
7 droplets 5 3 1
TD-learning
TD-learning
TD-learning
Electrophysiology
![Page 31: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/31.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 31 / 59
Modelling with TD-learningResults
TD-learning
Temporal representation of stimuli (Montague et al., 1996).
Incomplete temporal representation
Same context after last drop than during droplets delivery.
No spatial information
7 droplets 5 3 1
TD-learning
TD-learning
TD-learning
Electrophysiology
![Page 32: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/32.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 32 / 59
Modelling with TD-learningResults
TD-learning
Temporal representation of stimuli (Montague et al., 1996).
Incomplete temporal representation
Ambiguous visual input
No spatial information
7 droplets 5 3 1
TD-learning
TD-learning
TD-learning
Electrophysiology
![Page 33: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/33.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 33 / 59
Modelling with TD-learningResults
TD-learning
Temporal representation of stimuli (Montague et al., 1996).
Incomplete temporal representation
Ambiguous visual input
No spatial information
7 droplets 5 3 1
TD-learning
TD-learning
TD-learning
Electrophysiology
![Page 34: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/34.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 34 / 59
TD-learning could reproduce neural anticipatory activity.
Can it reproduce the rat's locomotor behavior in the same task ?
Khamassi, Mulder et al. (in revision) J Neurophysiol.
Electrophysiology
![Page 35: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/35.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 35 / 59
Autonomous roboticsMethods
Virtual plus-maze
Visual perceptions
reward
reward
Actions
Modelling
![Page 36: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/36.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 36 / 59
Autonomous roboticsMethods
Virtual plus-maze
Actions1
2
3
4
1
2
3
4
Visual perceptions
5
5
reward
reward
Modelling
![Page 37: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/37.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 37 / 59
Autonomous roboticsMethods
Results expected
1
2
3
4
5
reward
Modelling
![Page 38: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/38.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 38 / 59
Autonomous roboticsMethods
Actor-Critic models
Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002).see Joel et al. (2002) for a review.
Simplistic Actor. Most often: discrete
environments.
Dopaminergic neuron
Modelling
![Page 39: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/39.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 39 / 59
Autonomous roboticsMethods
Actor-Critic models
Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002).see Joel et al. (2002) for a review.
Simplistic Actor. Most often: discrete
environments.
Continuous environments: coordination of modules.
gating network: Baldassarre (2002); Doya et al. (2002).
hand-tuned (independent from modules' performances): Suri and Schultz (2001).
Dopaminergic neuron
Modelling
![Page 40: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/40.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 40 / 59
Autonomous roboticsMethods
Actor-Critic models
Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002).see Joel et al. (2002) for a review.
Simplistic Actor. Most often: discrete
environments.
Continuous environments: coordination of modules.
gating network: Baldassarre (2002); Doya et al. (2002).
hand-tuned (independent from modules' performances): Suri and Schultz (2001).
Test principles within a common framework
Dopaminergic neuron
Modelling
![Page 41: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/41.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 41 / 59
Autonomous roboticsMethods
Implemented framework
Modelling
![Page 42: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/42.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 42 / 59
Autonomous roboticsMethods
Gurney, Prescott & Redgrave. (2001)Adapted by Girard et al. (2002; 2003).
Modelling
![Page 43: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/43.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 43 / 59
Autonomous roboticsMethods
module coordination
Modelling
![Page 44: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/44.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 44 / 59
Autonomous roboticsMethods
1. gating network(tests modules' capacity for state prediction)
Modelling
![Page 45: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/45.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 45 / 59
Autonomous roboticsMethods
2. hand-tuned(independent from modules' performance)
reward
Categorization
Visual perceptions
Modelling
![Page 46: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/46.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 46 / 59
Autonomous roboticsMethods
3. unsupervised categorization(Self-Oganizing Maps)
Modelling
![Page 47: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/47.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 47 / 59
Autonomous roboticsMethods
4. random robot
Modelling
![Page 48: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/48.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 48 / 59
Autonomous roboticsResults
average
Modelling
![Page 49: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/49.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 49 / 59
Autonomous roboticsResults
Nb of iterations required(Average performance during the second
half of the experiment)
3,50094
40430,000
1. gating network2. hand-tuned3. unsupervised categorization (SOM)4. random robot
Modelling
![Page 50: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/50.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 50 / 59
Autonomous roboticsResults
1. gating network2. hand-tuned3. unsupervised categorization (SOM)4. random robot
Nb of iterations required(Average performance during the second
half of the experiment)
3,50094
40430,000
Modelling
![Page 51: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/51.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 51 / 59
Discussion
Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM
Discussion
![Page 52: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/52.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 52 / 59
Discussion
Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM Prediction: dopamine signal for missing final drop
Discussion
![Page 53: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/53.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 53 / 59
Discussion
Contributions Critic-like reward anticipation in the ventral
striatum Coordinating multiple modules with SOM Prediction: dopamine signal for missing final
drop
Perspectives Vary intervals between droplet rewards Integrate action values (Samejima et al., 2005) Improve the model based on other robotics
multi-modules reinforcement learning methods (Uchibe et al., 2004; Brunskill et al.; 2006)
Discussion
![Page 54: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/54.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 54 / 59
The Actor-Critic model
Actor-Critic models
Dopaminergic neuron
Intro
P = 1 P = 0
P = 0 P = 1
r = 0
r = 1
L E
11
11
![Page 55: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/55.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 55 / 59
Model-based reinforcement learning
Intro
P = 1 P = 0
P = 0 P = 1
r = 0
r = 1
![Page 56: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/56.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 56 / 59
General discussionS
trat
egy
dim
ensi
on
Visual
Place
Cue-guided strategy
Place strategy
Action selection process
flexible, rapidly learned
(cognitive map)
(Action-outcome contingencies)
inflexible, slow to acquire
(Stimulus-Response associations)
Place recognition-triggered responseTrullier et al. (1997)
Cue-guided strategyDickinson and Balleine (1998)
Daw et al. (2005)
Model-free Model-based
Discussion
![Page 57: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/57.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 57 / 59
General discussion
Reinterpret inconsistent behavioral results spatial more rapidly acquired than cue-guided (Packard and
McGaugh, 1996)
cue-guided more rapidly acquired than spatial (Pych et al., 2005).
Evidence for involvement of the prefronto-striatal system in model-based strategies
In mPFC: A-O contingencies (Mulder et al., 2003), spatial goals (Hok et al., 2005)
Lesions of the striatum impair model-based strategies (Kelley et al., 1997; Corbit et al., 2001; Yin et al., 2005)
Discussion
![Page 58: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/58.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 58 / 59
Perspective
EC Project ICEA (Integrating Cognition, Emotion and Autonomy)
Bioinspired interfaces for assessing new hypotheses
DiscussionDiscussion
Neurophysiological experiments, LPPA
Autonomous robotics, LIP6/ISIR
Discussion
Webots software, (c) Wany Robotics
Klusters software(c) L. Hazan in Buzśaki’s lab
![Page 59: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations.](https://reader030.fdocuments.us/reader030/viewer/2022033105/56813f55550346895daa16f9/html5/thumbnails/59.jpg)
IntroElectrophysiology
ModellingDiscussion
slide # 59 / 59
Collaborators
Thesis advisors:Agnès GuillotSidney I. Wiener
LPPA Collège de France:Alain BerthozBenoît GirardAdrien PeyracheKarim Benchenane
IDIAP Research Institute:Ricardo Chavarriaga
ISIR, Université Paris 6:Jean-Arcady MeyerLaurent DolléLouis-Emmanuel MartinetOlivier Sigaud
Universiteit van Amsterdam:Francesco P. BattagliaAntonius B. Mulder
Toyama Faculty of Food nutrition:
Eichi Tabuchi
DiscussionDiscussionDiscussion