Download - PAVLOVIAN CONDITIONINGlmate/teaching/3G3_2008/04_classcond_web.pdf · 3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 lmate/teaching PAVLOVIAN CONDITIONING

3G3: Introduction to neuroscience — Classical conditioning, 11 March 2008 http://www.gatsby.ucl.ac.uk/~lmate/teaching

PAVLOVIAN CONDITIONING

1

Ivan PavlovNobel Prize 1904

http://www.gatsby.ucl.ac.uk/~lmate/teaching




1


before training training after training

CS →no responseUS →response CS+US CS → response





1




CS: bellUS: foodresponse: salivation





1




CS: bellUS: foodresponse: salivation

prediction!




THE RESCORLA-WAGNER RULE

2

r =!

i

si wi

response CSi ‘weight’i0: absent1: present

degree of association of CSi with USprediction of US





2

r =!

i

si wi



E = (u! r)2 =

!u!

"

i

si wi

#2

error US0: absent1: present





2

r =!

i

si wi



E = (u! r)2 =

!u!

"

i

si wi

#2

minimise error wrt. weights‘stochastic gradient descent’


signed prediction error

! !E

!wi" (u! r)! "# $

"

si





2

r =!

i

si wi



E = (u! r)2 =

!u!

"

i

si wi

#2

minimise error wrt. weights‘stochastic gradient descent’


signed prediction error

learning speed ≪ 1

! !E

!wi" (u! r)! "# $

"

si

wi ! wi + ! " sion each trial, update:




PAVLOVIAN CONDITIONING REVISITED

3



0 5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Trial

USCS1

r!w1




PAVLOVIAN EXTINCTION

4


phase 1 phase 2

CS →no responseUS →response

CS+US CS CS → no response

0 10 20 30 40 50 60 70 80 90 100!1

!0.8

!0.6

!0.4

!0.2

0

0.2

0.4

0.6

0.8

1

Trial

USCS1

r!w1




PARTIAL REINFORCEMENT

5


CS →no responseUS →response CS, CS+US CS → weak response

0 5 10 15 20 25 30 35 40 45 50!1

!0.8

!0.6

!0.4

!0.2

0

0.2

0.4

0.6

0.8

1

Trial

USCS1

r!w1




OVERSHADOWING

6


CS1 →no responseCS2 →no responseUS →response

CS1+CS2+US CS1 →weak responseCS2 →weak response

0 5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Trial

USCS1CS2

r!w1w2




BLOCKING

7


phase 1 phase 2


CS1+US CS1+CS2+US CS1 →responseCS2 →no response

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Trial

USCS1

r!w1CS2w2




SECONDARY CONDITIONING

8


phase 1 phase 2


CS1+US CS1+CS2 CS2 →response

0 10 20 30 40 50 60 70 80 90 100!1

!0.8

!0.6

!0.4

!0.2

0

0.2

0.4

0.6

0.8

1

Trial

USCS1

r!w1CS2w2




NEURAL SUBSTRATE: DOPAMINE

9

only metabotropic receptors(eg. acting on adenylyl cyclase)

• drugs: cocaine, amphetamine → high dopamine levels

• disorders: schizophrenia, Parkinson’s disease, ADHD

• implicated in self-stimulation, addiction

• modulates LTP




DOPAMINE = PREDICTION ERROR ?

10

ventral tegmental area

US

US

(no US)

before learning

after learning }ac

tivi

ty o

fdo

pam

iner

gic

cells

NATURE REVIEWS | NEUROSCIENCE VOLUME 1 | DECEMBER 2000 | 2 0 1

R E V I EW S

The phasic activation of dopamine neurons has atime course of tens of milliseconds. It is possible thatthis facet of the physiology of these neurons describesonly one aspect of dopamine function in the brain.Feeding,drinking,punishment, stress and social behav-iour result in a slower modification of the centraldopamine level, which occurs over seconds and min-utes as measured by electrophysiology, in vivo dialysisand voltammetry5,7,37,38. So, the dopamine systemmight act at several different timescales in the brainfrom the fast, restricted signalling of reward and someattention-inducing stimuli to the slower processing ofa range of positive and negative motivational events.The tonic gating of a large variety of motor, cognitiveand motivational processes that are disrupted inParkinson’s disease are also mediated by centraldopamine systems.

Neurons that respond to the delivery of rewards arealso found in brain structures other than the dopaminesystem described above. These include the striatum(caudate nucleus,putamen, ventral striatum includingthe nucleus accumbens)39–44, subthalamic nucleus45,pars reticulata of the substantia nigra46,dorsolateral andorbital prefrontal cortex47–51, anterior cingulate cortex52,amygdala53,54, and lateral hypothalamus55. Some reward-detecting neurons can discriminate between liquid andsolid food rewards (orbitofrontal cortex56),determinethe magnitude of rewards (amygdala57) or distinguishbetween rewards and punishers (orbitofrontal cortex58).Neurons that detect rewards are more common in theventral striatum than in the caudate nucleus and puta-men40. Reward-discriminating neurons in the lateralhypothalamus and the secondary taste area of theorbitofrontal cortex decrease their response to a partic-ular food upon satiation59,60. By contrast,neurons in theprimary taste area of the orbitofrontal cortex continueto respond during satiety and thus seem to encode tasteidentity rather than reward value61.

Most of the reward responses described above occurin well-trained monkeys performing familiar tasks,regardless of the predictive status of the reward. Someneurons in the dorsolateral and orbital prefrontal cortexrespond preferentially to rewards that occur unpre-dictably outside the context of the behaviouraltask50,51,62,63 or during the reversal of reward associationsto visual stimuli58.However,neurons in these structuresdo not project in a widespread, divergent fashion tomultiple postsynaptic targets and thus do not seem tobe able to exert a global reinforcing influence similar tothat described for the dopamine neurons29.

Other neurons in the cortical and subcortical struc-tures mentioned above respond to conditioned reward-predicting visual or auditory stimuli41,51,53,58,64–66, anddiscriminate between reward-predicting and non-reward-predicting stimuli27,51. Neurons within theorbitofrontal cortex discriminate between visual stimulithat predict different liquid or food rewards but showfew relationships to spatial and visual stimulusfeatures56. Neurons in the amygdala differentiatebetween the visual aspects of foods and their responsesdecreasing with selective satiety53.

movement onset

Food box

Resting key

Touch food/wire 200 ms

1 2 s–1–2 0

a

b

c

–1 0

Pictureson

Levertouch

Reward

1 2 3 s

Figure 2 | Primate dopamine neurons respond to rewards and reward-predicting stimuli.

The food is invisible to the monkey but the monkey can touch the food by placing its hand

underneath the protective cover. The perievent time histogram of the neuronal impulses is

shown above the raster display, in which each dot denotes the time of a neuronal impulse in

reference to movement onset (release of resting key). Each horizontal line represents the activity

of the same neuron on successive trials, with the first trials presented at the top and the last

trials at the bottom of the raster display. a | Touching food reward in the absence of stimuli that

predict reward produces a brief increase in firing rate within 0.5 s of movement initiation.

b | Touching a piece of apple (top) enhances the firing rate but touching the bare wire or an

inedible object that the monkey had previously encountered does not. The traces are aligned to

a temporal reference point provided by touching the hidden object (vertical line). (Modified from

REF. 18.) c | Dopamine neurons encode an error in the temporal prediction of reward. The firing

rate is depressed when the reward is delayed beyond the expected time-point (1 s after lever

touch). The firing rate is enhanced at the new time of reward delivery whether it is delayed (1.5 s)

or precocious (0.5 s). The three arrows indicate, from left to right, the time of precocious,

habitual and delayed reward delivery. The original trial sequence is from top to bottom. Data are

from a two-picture discrimination task. (Figure modified with permission from REF. 27 © (1998)

Macmillan Magazines Ltd.).

© 2000 Macmillan Magazines Ltd