Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis
description
Transcript of Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis
Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis
Draws on:
Alan and Bill’s experimentUsher & McClelland model and experiments
Patrick Simen’s modelSam and Phil’s analysisJuan’s further analysis
Human experiment examining reward bias effect with responsesignal given at different times after target onset
• Target stimuli are rectangles shifted 1,3, or 5 pixels L or R of fixation
• Reward cue occurs 750 msec before stimulus.
– Small arrow head pointing L or R visible for 250 msec. – Only biased reward conditions (2 vs 1 and 1 vs 2) are used.
• Response signal occurs at different times after target onset:
0 75 150 225 300 450 600 900 1200 2000
- Participant receives reward only if response is correct and occurs within 250 msec of response signal.
- Participants were run for 15-25 sessions to provide stable data.
- Data shown are from later sessions in which effects were all stable.
A participant with very little reward bias
• Top panel shows probability of response giving larger reward as a function of actual response time for combinations of:
Stimulus shift (1 3 5) pixels
Reward-stimulus compatibility
• Lower panel shows data transformed to z scores, and corresponds to the theoretical construct:
mean(x1(t)-x2(t))+bias(t)
sd(x1(t)-x2(t))
where x1 represents the state of the
accumulator associated with greater
reward, x2 the same for lesser reward,
and S is thought to choose larger reward if
x1(t)-x2(t)+bias(t) > 0.
Participants Showing Reward Bias
Analysis Assumptions
• Decision variable x varies as a function of t.• Choice is made at some time t = signal lag + rt.• At the time the choice is made:
– For a single difficulty level, two distributions, with means +, -, and equal sd set to 1. Choose high reward if decision variable x > -Xc
– For three difficulty levels, fixed = 1, means i (i=1,2,3),assume same Xc for all difficulty levels.
– Xc can be regarded as a positive increment to the state of the decision variable;high reward is chosen if x > 0 in this case.
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
- +-xc
C
C
X
X
LHPinvNormZ
HHPinvNormZ
))|((
))|((2
1
2
2
21
21
ZZX
ZZ
c
Only one diff level
iC
iC
Xi
Xi
LHPinvNormZ
HHPinvNormZ
))|((
))|((2
1
3*2
2
21
21
iii
c
iii
ZZX
ZZ
Three diff levels
Subject’s sensitivity, as defined in theory of signal detectability
)(' ii
id When response
signal delay varies)(' tdi
For each subject, fit with function from UM’01
asymi
fiti detd
tt
)1()()0(
Subject Sensitivity
0 0.5 1 1.5 2 2.5-0.5
0
0.5
1
1.5
2
2.5cm
d pr
im
RT+response cue delay
0 0.5 1 1.5 2 2.5-1
0
1
2
3
4ja
d pr
im
RT+response cue delay
0 0.5 1 1.5 2 2.5-0.5
0
0.5
1
1.5
2sl
d pr
im
RT+response cue delay
data, diff=5data, diff=3data, diff=1fit, diff=5fit, diff=3fit, diff=1
1 2 3 4 50.26
0.28
0.3
0.32
0.34
0.36
stimulus (diff) level
RT
0
1 2 3 4 50.2
0.25
0.3
0.35
0.4
0.45
0.5
stimulus (diff) level
0 1 2 3 4 50
1
2
3
4
stimulus (diff) level
das
ym
cm
jasl
cm
jasl
cm
jasl
Optimal “bias” Xc/based on observedsensitivity data.
Observed “bias”, treatedas positive offsetfavoring response associated with highreward.
3*2
21
i
iic
ZZX
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.5
1
1.5
-Xc/
0 0.5 1 1.5 2 2.50
0.5
1
1.5
2cm
RT+response cue delay
norm
aliz
ed t
hres
hold
xc/
real
optimal
0 0.5 1 1.5 2 2.50
0.5
1
1.5
2ja
RT+response cue delay
norm
aliz
ed t
hres
hold
xc/
real
optimal
0 0.5 1 1.5 2 2.5-0.5
0
0.5
1
1.5
2sl
RT+response cue delay
norm
aliz
ed t
hres
hold
xc/
real
optimal
Some possible models
• OU process ( < 0, 0 = 0) following F&H,with reward bias effect implemented as:
1. An alteration in initial condition, subject to decay 2. Optimal time-varying decision boundary outside of the OU
process3. An input ‘current’ starting at presentation of reward signal
1. Noise from reward onset2. Noise from stimulus onset
4. A constant offset or criterion shift unaffected by time
1. Reward as a change in initial condition, subject to decay
Note:1. Effect of the bias
decays away for lambda<0.
2. There is a dip at
3. At t=0, p=1.
aCaCt 0log1
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
Time (s)
P o
f ch
oice
tow
ard
larg
er r
ewar
d
RSC 1, diff 5RSC 0, diff 5RSC 1, diff 3RSC 0, diff 3RSC 1, diff 1RSC 0, diff 1
Feng & Holmes notes
)1()();1(),( 220
2 ttaCt etveetC
2. Time-varying optimal bias (Outside of OU process)
Note:1. Effect of the bias
persists.2. There is a dip at
3. At t=0, p=1.4. The smaller the
stimulus effect, the larger the bias.
5. The harder the stimulus condition, the later the dip.
2log4
2log4122
22
log
Ca
Cat
)1()( 42log taC etb
)1()();1()(),( 22
2 ttaC etvetbtC
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
Time (s)
P o
f ch
oice
tow
ard
larg
er r
ewar
d
RSC 1, diff 5RSC 0, diff 5RSC 1, diff 3RSC 0, diff 3RSC 1, diff 1RSC 0, diff 1
3.1. Reward acts as input “current”, stays on from reward signal to end of trial, noise starts at reward onset
Reward signal comes seconds before stimulus
Note:1. Effect of the
bias persists2. There is no
dip.3. At t=0, p<1.
Feng & Holmes notes
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
Time (s)
P o
f ch
oice
tow
ard
larg
er r
ewar
d
RSC 1, diff 5RSC 0, diff 5RSC 1, diff 3RSC 0, diff 3RSC 1, diff 1RSC 0, diff 1
2
3.2. Same as 3.1 but variability is introduced only at stimulus onset
Note:1. Effect of the bias
persists2. There is dip at
3. At t=0, p=1 since all accumulators have no variance.
baCbeaCt
log1
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
Time (s)
P o
f ch
oice
tow
ard
larg
er r
ewar
d
RSC 1, diff 5RSC 0, diff 5RSC 1, diff 3RSC 0, diff 3RSC 1, diff 1RSC 0, diff 1
2
4. Reward as a constant offset
Note:1. Equivalent to 3.2
for large
2. There is a dip at
3. At t=0, p=1
0log1
aCaCt
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
Time (s)
P o
f ch
oice
tow
ard
larg
er r
ewar
d
RSC 1, diff 5RSC 0, diff 5RSC 1, diff 3RSC 0, diff 3RSC 1, diff 1RSC 0, diff 1
)1()();1(),( 220
2 ttaC etvetC
Some possible models
• OU models ( < 0, 0 = 0) following F&H,with reward bias effect implemented as:
1. An alteration in initial condition, subject to decay 2. Optimal time-varying decision boundary outside of the OU
process3. An input ‘current’ starting at presentation of reward signal
1. Noise from reward onset2. Noise from stimulus onset
4. A constant offset or criterion shift unaffected by time
• While none fit perfectly, starting point variability (0 > 0) would potentially improve 3.2 and 4.
Jay’s favorite mechanistic story(draws from Simen’s model)
• Participant learns to inject waves of activation that prime response accumulators; waves peak just after stimulus onset and have a residual.– Wave is higher for hi rwd response.
• Stimulus activation accumulates as in LCAM. • Response signal initiates added drive to both
accumulators equally.• First accumulator to fixed threshold initiates the
response.