3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
PAVLOVIAN CONDITIONING
1
Ivan PavlovNobel Prize 1904
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
PAVLOVIAN CONDITIONING
1
Ivan PavlovNobel Prize 1904
before training training after training
CS →no responseUS →response CS+US CS → response
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
PAVLOVIAN CONDITIONING
1
Ivan PavlovNobel Prize 1904
before training training after training
CS →no responseUS →response CS+US CS → response
CS: bellUS: foodresponse: salivation
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
PAVLOVIAN CONDITIONING
1
Ivan PavlovNobel Prize 1904
before training training after training
CS →no responseUS →response CS+US CS → response
CS: bellUS: foodresponse: salivation
prediction!
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
THE RESCORLA-WAGNER RULE
2
r =!
i
si wi
response CSi ‘weight’i0: absent1: present
degree of association of CSi with USprediction of US
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
THE RESCORLA-WAGNER RULE
2
r =!
i
si wi
response CSi ‘weight’i0: absent1: present
degree of association of CSi with USprediction of US
E = (u! r)2 =
!u!
"
i
si wi
#2
error US0: absent1: present
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
THE RESCORLA-WAGNER RULE
2
r =!
i
si wi
response CSi ‘weight’i0: absent1: present
degree of association of CSi with USprediction of US
E = (u! r)2 =
!u!
"
i
si wi
#2
minimise error wrt. weights‘stochastic gradient descent’
error US0: absent1: present
signed prediction error
! !E
!wi" (u! r)! "# $
"
si
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
THE RESCORLA-WAGNER RULE
2
r =!
i
si wi
response CSi ‘weight’i0: absent1: present
degree of association of CSi with USprediction of US
E = (u! r)2 =
!u!
"
i
si wi
#2
minimise error wrt. weights‘stochastic gradient descent’
error US0: absent1: present
signed prediction error
learning speed ≪ 1
! !E
!wi" (u! r)! "# $
"
si
wi ! wi + ! " sion each trial, update:
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
PAVLOVIAN CONDITIONING REVISITED
3
before training training after training
CS →no responseUS →response CS+US CS → response
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Trial
USCS1
r!w1
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
PAVLOVIAN EXTINCTION
4
before training training after training
phase 1 phase 2
CS →no responseUS →response
CS+US CS CS → no response
0 10 20 30 40 50 60 70 80 90 100!1
!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1
Trial
USCS1
r!w1
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
PARTIAL REINFORCEMENT
5
before training training after training
CS →no responseUS →response CS, CS+US CS → weak response
0 5 10 15 20 25 30 35 40 45 50!1
!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1
Trial
USCS1
r!w1
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
OVERSHADOWING
6
before training training after training
CS1 →no responseCS2 →no responseUS →response
CS1+CS2+US CS1 →weak responseCS2 →weak response
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Trial
USCS1CS2
r!w1w2
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
BLOCKING
7
before training training after training
phase 1 phase 2
CS1 →no responseCS2 →no responseUS →response
CS1+US CS1+CS2+US CS1 →responseCS2 →no response
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Trial
USCS1
r!w1CS2w2
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
SECONDARY CONDITIONING
8
before training training after training
phase 1 phase 2
CS1 →no responseCS2 →no responseUS →response
CS1+US CS1+CS2 CS2 →response
0 10 20 30 40 50 60 70 80 90 100!1
!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1
Trial
USCS1
r!w1CS2w2
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
SECONDARY CONDITIONING
8
before training training after training
phase 1 phase 2
CS1 →no responseCS2 →no responseUS →response
CS1+US CS1+CS2 CS2 →response
0 10 20 30 40 50 60 70 80 90 100!1
!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1
Trial
USCS1
r!w1CS2w2
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
NEURAL SUBSTRATE: DOPAMINE
9
only metabotropic receptors(eg. acting on adenylyl cyclase)
• drugs: cocaine, amphetamine → high dopamine levels
• disorders: schizophrenia, Parkinson’s disease, ADHD
• implicated in self-stimulation, addiction
• modulates LTP
3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching
DOPAMINE = PREDICTION ERROR ?
10
ventral tegmental area
US
US
(no US)
before learning
after learning }ac
tivi
ty o
fdo
pam
iner
gic
cells
NATURE REVIEWS | NEUROSCIENCE VOLUME 1 | DECEMBER 2000 | 2 0 1
R E V I EW S
The phasic activation of dopamine neurons has atime course of tens of milliseconds. It is possible thatthis facet of the physiology of these neurons describesonly one aspect of dopamine function in the brain.Feeding,drinking,punishment, stress and social behav-iour result in a slower modification of the centraldopamine level, which occurs over seconds and min-utes as measured by electrophysiology, in vivo dialysisand voltammetry5,7,37,38. So, the dopamine systemmight act at several different timescales in the brainfrom the fast, restricted signalling of reward and someattention-inducing stimuli to the slower processing ofa range of positive and negative motivational events.The tonic gating of a large variety of motor, cognitiveand motivational processes that are disrupted inParkinson’s disease are also mediated by centraldopamine systems.
Neurons that respond to the delivery of rewards arealso found in brain structures other than the dopaminesystem described above. These include the striatum(caudate nucleus,putamen, ventral striatum includingthe nucleus accumbens)39–44, subthalamic nucleus45,pars reticulata of the substantia nigra46,dorsolateral andorbital prefrontal cortex47–51, anterior cingulate cortex52,amygdala53,54, and lateral hypothalamus55. Some reward-detecting neurons can discriminate between liquid andsolid food rewards (orbitofrontal cortex56),determinethe magnitude of rewards (amygdala57) or distinguishbetween rewards and punishers (orbitofrontal cortex58).Neurons that detect rewards are more common in theventral striatum than in the caudate nucleus and puta-men40. Reward-discriminating neurons in the lateralhypothalamus and the secondary taste area of theorbitofrontal cortex decrease their response to a partic-ular food upon satiation59,60. By contrast,neurons in theprimary taste area of the orbitofrontal cortex continueto respond during satiety and thus seem to encode tasteidentity rather than reward value61.
Most of the reward responses described above occurin well-trained monkeys performing familiar tasks,regardless of the predictive status of the reward. Someneurons in the dorsolateral and orbital prefrontal cortexrespond preferentially to rewards that occur unpre-dictably outside the context of the behaviouraltask50,51,62,63 or during the reversal of reward associationsto visual stimuli58.However,neurons in these structuresdo not project in a widespread, divergent fashion tomultiple postsynaptic targets and thus do not seem tobe able to exert a global reinforcing influence similar tothat described for the dopamine neurons29.
Other neurons in the cortical and subcortical struc-tures mentioned above respond to conditioned reward-predicting visual or auditory stimuli41,51,53,58,64–66, anddiscriminate between reward-predicting and non-reward-predicting stimuli27,51. Neurons within theorbitofrontal cortex discriminate between visual stimulithat predict different liquid or food rewards but showfew relationships to spatial and visual stimulusfeatures56. Neurons in the amygdala differentiatebetween the visual aspects of foods and their responsesdecreasing with selective satiety53.
movement onset
Food box
Resting key
Touch food/wire 200 ms
1 2 s–1–2 0
a
b
c
–1 0
Pictureson
Levertouch
Reward
1 2 3 s
Figure 2 | Primate dopamine neurons respond to rewards and reward-predicting stimuli.
The food is invisible to the monkey but the monkey can touch the food by placing its hand
underneath the protective cover. The perievent time histogram of the neuronal impulses is
shown above the raster display, in which each dot denotes the time of a neuronal impulse in
reference to movement onset (release of resting key). Each horizontal line represents the activity
of the same neuron on successive trials, with the first trials presented at the top and the last
trials at the bottom of the raster display. a | Touching food reward in the absence of stimuli that
predict reward produces a brief increase in firing rate within 0.5 s of movement initiation.
b | Touching a piece of apple (top) enhances the firing rate but touching the bare wire or an
inedible object that the monkey had previously encountered does not. The traces are aligned to
a temporal reference point provided by touching the hidden object (vertical line). (Modified from
REF. 18.) c | Dopamine neurons encode an error in the temporal prediction of reward. The firing
rate is depressed when the reward is delayed beyond the expected time-point (1 s after lever
touch). The firing rate is enhanced at the new time of reward delivery whether it is delayed (1.5 s)
or precocious (0.5 s). The three arrows indicate, from left to right, the time of precocious,
habitual and delayed reward delivery. The original trial sequence is from top to bottom. Data are
from a two-picture discrimination task. (Figure modified with permission from REF. 27 © (1998)
Macmillan Magazines Ltd.).
© 2000 Macmillan Magazines Ltd
Top Related