RUNNING HEAD: Reinforcement learning and aversive emotion ...

RUNNING HEAD: Reinforcement learning and aversive emotion

Title: Reinforcement learning in context of aversive emotional psychophysiological stimuli

Author: Isabela Lara Uquillas

Collaboration: Chih-Chung Ting

Supervisor: Jan Engelmann

Ethics approval: Economics & Business Ethics Committee (University of Amsterdam)

EC20170314120328

Sponsor: Center for Research in Experimental Economics and political Decision-making

(CREED) at the University of Amsterdam

Disclosure of interest: The author reports no conflicts of interest

Word count: 5759

Reinforcement learning and anxiety

2

Decision making is a complex process that takes place in every individual as decisions are

ubiquitous and can have significant influences on life. Decision making has been widely

studied due to its relevance in everyday live and prevalence in humans and other organisms.

The importance of decision making and its mechanisms have been characterized in the

literature as early as 1954 (Edwards, 1954), despite the long-standing interest in the topic,

little is known about the effect of incidental emotions in the decision-making process.

Consequentialist economic models make the assumption that incidental emotions do not

influence choices; however, as of late, there has been a growing interest of the effect of

incidental emotions in the decision making process (Loewenstein & Lerner, 2003; Blanchette,

2010).

It’s no secret that decision making is a complex process. One of the most popular models for

it was proposed by Rangel and colleagues and poses decision making is a continuous multi-

step process consisting of actions and their evaluation (2008). Only recently have

experiments in neuroeconomics begun to identify the neural mechanisms underlying

emotional distortions of choice processes (Fehr & Rangel, 2011). Evidence seems to suggest

that multiple areas and pathways are involved (Basten et al., 2010; Wyart et al., 2012;

Rushworth et al., 2012; Phelps et al., 2014) and are further affected by multiple emotional

and cognitive components (Heilman et al., 2010; Starcke & Brand, 2012; Payzan-LeNestour

et al., 2013; Mitchell, 2011) . Furthermore, these areas and processes are also affected by the

individuals pre-existing values as well as their sensitivity to reward and punishment, namely

their capacity to distinguish between these choice domains and learn from them (Hee Kim et

al., 2015).

In order to observe decision making in action, many paradigms have been used, each of

which allows us to observe not just the decision taking place but also the different processes

and components that come into play at different stages of that process (Yu, 2015). In order to

focus on one particular aspect of the choice process, learning about the outcomes of our

decisions to update future expectations, a reinforcement learning task has been selected and

adapted from Palmintieri and colleagues (2006). Specifically, subjects learn to associate

neutral stimuli with a specific (high or low) probability of monetary reward or punishment.

This is important because it has previously been shown that learning through rewards and

punishments can be differently affected by emotional manipulations (Engelmann et al., 2015;


3

Cavanagh et al., 2011); therefore, by using this task we can observe the differences between

decisions made in a loss (punishment) and gain (reward) domain.

Additionally, we aim to observe the effect of incidental emotions on decision making. For

this, researchers have employed emotion induction techniques which have been implemented

in a variety of ways (Lench et al., 2011); however, many of these are only quantifiable

through self-report and can potentially induce a variety of co-existing emotions. Limiting

research to a specific and validated induced emotion can be beneficial to start to unravel the

different variables that come into play in this complex process. In order to do this, incidental

emotions will be defined as incidental anticipatory anxiety evoked by the threat of electrical

stimulation to the forearm through a treat of shock protocol (Schmitz & Grillon, 2012).

Furthermore, to evaluate the effect of this manipulation, skin conductance measurements will

be performed and modeled.

Based on the previous literature, the proposed study aims to decipher to what extent

incidental emotions affect specific decision making processes. To gain a better understanding

of emotion’s distortionary influences on cognitive processes involved in decision making,

this research aims to observe the effect of incidental anxiety on learning, by combining

methods from experimental economics and psychology. This study will focus on learning,

which will be evaluated through an instrumental learning task involving decisions over cues

associated with monetary gains and losses, while being exposed to incidental anxiety

induction as has been previously done (Engelmann et al., 2015). By using this task and

emotion induction paradigm combination, we will be able to measure not just the effect of

incidental anxiety on decision making but also how people are able to evaluate different

options under different conditions and how each outcome affects the subsequent choices by

updating information and thus improve the chances of maximizing gains.

For this purpose, participants will attend a lab sessions after completing a battery of

questionnaires. During the lab session, they will perform multiple tasks during which we will

record reaction times for particular choices, the choice itself as well as skin conductance

responses throughout the whole experiment which will allow us to observe the effect of

induced anxiety and its potential effect on decision making processes. Additionally,

assessments will take place once the task has been completed in order to see if the

participants were able to deduce the outcomes associated with particular options presented to

them throughout the task.


4

Ultimately, it is expected that the participant will deduce the associations between stimuli and

their beneficial or negative outcomes to improve their decisions during the task as well as

report their preference for each option implicitly and explicitly after the task has been

completed. Furthermore, anxiety, operationalized as threat of shock, is expected to interact

with domain in order to influence learning. We do not hypothesize on the directionality of

this interaction as there is evidence that anxiety could enhance performance in an emotion-

congruent manner (Robinson et al., 2013) as well as findings suggesting the attentional bias

associated with anxiety would be detrimental to performance (White et al., 2015; Petzold et

al., 2010). Moreover, emotion congruent learning has been observed in positive affect

manipulations and thus is follows that an effect of a similar magnitude but opposite

directionality could be observed in negative affect conditions such as anxiety (Carpenter et al.,

2013). Behaviorally, it is expected that participants will be able to maximize gains to a better

degree in the control baseline condition as opposed to the induced anxiety condition. We

further expect that skin response modeling will result in validation of our anxiogenic

experimental manipulation, such that participants will have a higher skin conductance

response in the incidental anxiety blocks compared to neutral affect blocks (Bach & Friston,

2013). Finally, we expect participant’s reporting of their implicit and explicit preferences for

particular stimuli to be less accurate in the incidental anxiety condition and more negative

towards negative outcomes such that it correlates to the gains and losses since it should

reflect the predictions used during the task itself.

Method

Participants

42 students from the participant pool of the Center for Research in Experimental Economics

and political Decision-making (CREED) at the University of Amsterdam participated in the

study. The sample consisted right handed participants with no history of psychiatric disorders

nor electronic implants; namely, 20 men (47.6%) and 22 women (52.4%) whose average age

was 24.21 years (SD=3.14) participated in the study in exchange for monetary compensation.

One participant was excluded from all further analysis due to their session being interrupted

by a Windows 10 update reminder.


5

Materials

All tasks performed by the participants were presented on a LED screen (1366x768px) using

Cogent 2000. Participants responded through their keyboard in accordance to each task’s

instructions, namely with the spacebar, enter and arrow keys. In order to administer shocks

for the emotion induction paradigm (i.e. threat of shock) a DS5 - Bipolar constant current

stimulator was used. Shocks are administered with a wrist electrode attached with Velcro and

are triggered by the accompanying MATLAB code. Additionally, a custom-made amplifier

with a pair of sintered Ag/AgCl finger electrodes purchased from the University of

Amsterdam’s Technical Support Social & Behavioural Sciences (TOP) department were used

to record skin conductance. Manipulation, recording and specifications for skin conductance

measurements were applied through an Vsrrp98 V10.0 xml driver which would further

convert the data to MATLAB. Due to time constraints and scope of this report, the results

obtained from skin conductance will not be discussed further in this report. Symbols used for

the learning task were drawn from the Agathodaimon font as was done in the original task

(Palmintieri, 2006). For further information and details regarding all the materials used and

methodologies, refer to Supplementary Materials section.

Measures

Questionnaires were sent to participants before the task for them to complete in their own

time. In the instructions presented to them, they are encouraged to respond to all

questionnaires in a single sitting and without distractions. In compensation for successfully

completing all parts the questionnaire, participants are rewarded a 10 euro endowment for the

behavioral task on the second half of the experiment to be performed in the lab.

Demographics. Participants were asked a few basic questions, include age, gender, study

background, handedness, history of mental illness and presence of any implanted electronic

devices. The later three constituted grounds for exclusion for further participation and if that

was the case, the questionnaire would end subsequently without following through with the

rest of the inventories.

PANAS. Positive and Negative Affect Schedule (PANAS; Watson, Clark, & Tellegen, 1988)

is used to measure both positive and negative affect at a given time scale through rating a

series of adjectives ranging from 1, very slightly or not at all, to 5, extremely, based on the


6

extent to which they have felt it in the given time scale of “At this moment / Last week I

felt…”. Specific items are scored to calculate the positive and negative affect subscales.

BDI. Beck’s Depression Inventory (BDI; Beck, Steer, & Brown, 1996) is used to determine

the existence and severity of depression in an individual through a 21 item inventory. For

each item, participants select a statement with which they identify best. There are 4

statements per item and they range in severity. The higher the score the more severe the

depression.

BAI. Beck’s Anxiety Inventory (BAI; Beck et al., 1988) is used to assess anxiety based on the

symptomatology the participant observes. It consists of 21 common symptoms of anxiety.

The participant indicates the frequency of each symptom during the past month in a 4-point

likert-type scale ranging from Not at all to Severely - it bothered me a lot, which correspond

to scores 0 and 3; the more severe the symptoms reported correspond to higher scores.

ERQ. Emotion Regulation Questionnaire (ERQ; Gross & John, 2003) aims at observing

emotion regulation through 10 statements describing emotional management and to which

participants need to indicate to what degree they agree with them. Participants respond on a

7-point likert-type scale ranging from 1, strongly disagree, to 7, strongly agree. A higher

score corresponds with higher emotion regulation.

Tasks

Multiple tasks were used during the experiment, they are further described below (Figure1

and Figure2).

Calibration Task. In order to implement the threat of shock paradigm, a calibration round was

performed for each participant. Participants were prompted by the script to press a key to

receive a shock and would have to rate said shocks on a scale on screen ranging from 1, not

painful at all, to 10, extremely painful. The intensity of the shocks presented were based on

the participant’s subjective responses and ranged from 2.5mA to 25mA in steps of 2.5mA.

After a participant had rated three consecutive shocks as a 7 or higher the screen would close

and their responses would be recorded. The last shock intensity rated as 7 or higher by the

participant would be used for the subsequent learning task.


7

Practice Task. Participants performed a short practice for the main experimental task which

allowed them to get familiarized with the task setup and to ask any questions they might have

before the main experimental task took place. As the practice task was for the learning task

per se, no shocks were delivered during this task nor were there any monetary gains or losses

accrued. Moreover, the symbols used were unrelated to the main task’s such that there would

not be any carryover effects.

Learning Task. The main experimental task follows a reinforcement learning paradigm in

which two symbols are presented to the left and right sides of a fixation cross. Participants

have 2.5 seconds to choose one of the sides through the arrows on the keyboard. They will

receive feedback corresponding to the symbol they chose, if no symbol is chosen within the

time limit they will receive the detrimental option’s outcome without specifying which

symbol it was associated with. During the task they should be able to learn by accumulating

evidence that some symbols generally represent beneficial responses whereas others

detrimental ones (Figure3). Additionally, trials are divided into gain and loss domain blocks,

in the first the neutral option is negative as it represents no gains whereas in the second the

neutral option is optimal because it does not inflict a monetary loss on the participant. Finally,

certain blocks will have shocks administered throughout whereas others will not include any

shocks; this will be shown to the participant on screen with a distinctive green or blue frame

around the task being performed throughout the whole experiment and in text form before

each block. Each symbol corresponded to either gain or loss and shock or safe; this resulted

in 4 different symbol pairs. Participants were just aware of the shock and safe conditions

before the task started. Each block would consist of 3 trials and there were 8 blocks per

condition; the order of these was pseudorandomized. In total, a full round of the

reinforcement learning task included 96 trials, 24 trials per condition. During this task the

participant’s responses, reaction time and money earned or lost would be recorded.

Preference Task. Participants were shown pairs of the symbols, from which they had to pick

the one they preferred and their responses were recorded. There would not be any shocks,

feedback nor monetary gains or losses from this task.

Valence Rating Task. An individual symbol would be shown and participants rated it from 1,

very negative, to 10, very positive. Each symbol would be shown 4 times in a randomized

order and the participant’s responses to the symbols would be recorded. Similar to the


8

preference task, there would not be any shocks, feedback nor monetary gains or losses from

this task.

Exit Questionnaire

Participants were given a questionnaire regarding their particular strategies for the tasks and

manipulation checks regarding their emotional state during the task. Additionally, it included

a recognition task with symbols that weren’t present in the task itself and which they had to

indicate whether they had seen before or not.

Procedure

All procedures were approved by the Economics & Business Ethics Committee at the

University of Amsterdam prior to recruitment and data collection. Recruitment took place

through the departmental participant pool from Center for Research in Experimental

Economics and political Decision-making (CREED). Participants were asked to fill in a

questionnaire battery prior to the experimental session. The battery consisted of screening

questions, demographic questions, and the questionnaire measures further described

previously (i.e. PANAS, BAI, BDI, ERQ). Additionally, all the symbols to be used in the

main experimental task were shown for at least 60 seconds so they would be familiar with

them. Furthermore, for completion of said questionnaires, participants were awarded 10 euros

which were to be their endowment and starting balance for the learning task that was to take

place during the experimental session in the lab.

Once participants had completed the questionnaire, they attended individually scheduled

sessions. After participants arrived, they were asked to read and give their written consent for

participation in the experiment. Additionally, they were thanked for completing the

questionnaire and once again reminded that the questionnaire completion earned them a

10euro starting balance for the task. Finally, recording and stimulator electrodes were placed

and secured. Skin conductance electrodes were placed in their pinky and ring fingers of their

left-hand and secured with tape; similarly the shock stimulator was secured to their left wrist

by a Velcro band. Additionally, participants were given a pillow on which to lay their arm on

to make it more comfortable. In order to improve signal conductivity and reduce impedance,

all electrodes were placed with conductive gel in alcohol cleaned locations. Participants were

further informed that the shocks would be delivered only to their wrist, that the electrodes on


9

their fingers would not deliver any shocks and that all responses during the tasks were to be

done using their right hand.

First, they underwent the threat of shock calibration as well as a practice round for the

learning task. Once these were completed, participants were reminded that there would be

shocks on the following task, that their final payout would be determined by their

performance and that there was a short self-paced break halfway through all the trials.

Afterwards, skin conductance response recording started and the participants engaged with

the main experimental learning task.

After the task ended there was a second calibration round to account for habituation and

sensitization effects. The intensity reached through this calibration would then be used for the

second iteration of the learning task. This round had the same task as before; however the

symbols were novel to the participant. This was told to the participants in the instructions for

the task and was repeated to them verbally before the task started. Additionally, participants

were reminded that their performance would determine their final payment. Like in the

previous round, there was one self-paced break halfway through the task. After the learning

task was completed, there was a third calibration to note any further habituation or

sensitization in the participants.

Once the last calibration was completed, the shock stimulator would be removed and skin

conductance recording would be stopped. Participants would then be asked to do assessments

regarding the symbols presented on the second session. The first assessment consisted of a

preference task and the second of a valence rating task. After these two tasks were completed,

skin conductance electrodes were removed and the participant was asked to complete an exit

questionnaire including the recognition task on a separate laptop while the researcher

calculated the participant’s payout. Once the participant was done, they were paid and their

questions were answered before they left.

Payment Scheme

Payment was calculated based on the choices the participants made. Each choice outcome

would directly translate to their final payout such that:

Final Payout = 10 euro endowment + gain amount - loss amount + bonus recognition task


10

In which the gain amount is the sum of all the +0.50 euro outcomes the participant achieved,

loss amount is the sum of the -0.50 outcomes. Additionally, each item correctly recognized

during the exit questionnaire resulted in +0.5 euro winnings for a maximum bonus of 2 euros

from 4 items presented.

Results

Results from the main task’s measures were analyzed using Matlab statistics package,

whereas results from questionnaires and post testing tasks was analyzed using IBM SPSS 24.

Calibration Testing

Participants underwent three different rounds of calibration. This analysis includes data from

40/41 participants due to data misplacement. During each of these the final measurement

would be used throughout the experimental task as to evoke anxiety as our anxiety

manipulation. Participants in average rated their last shock as 7.70(SD=1.14), 7.83 (SD=0.93)

and 7.85 (SD=0.83) respectively for each calibration in chronological order (Figure4). After

performing a one way repeated measures ANOVA, it seems there was no significant

differences in the subjective rating participants gave to the shocks used throughout the task;

F(2,78)=0.471, p=0.471, ηp2=0.012. These ratings in each calibration round correspond to an

average intensity of 7.31mA (SD=5.26), 8.31mA (SD=5.29) and 10.06mA (SD=5.70)

respectively for each calibration in chronological order (Figure4). Similarly, another one way

repeated measures ANOVA was used to compare the intensities between the different

calibration rounds. A significant difference was found between the objective measure of the

calibrations as observed by their recorded intensity in mA; F(2,78)=16.91, p<0.001,

ηp2=0.302. These seem to suggest habituation was indeed taking place as the average

intensity increases progressively and chronologically across sessions; however, this was

accounted for in the next calibration as there seem to be no significant differences in the

subjective ratings of the shocks used for the manipulation and therefore the perceived

discomfort remains stable across calibrations and with it so should our manipulation.

Questionnaires

Questionnaire results seem to show that participants had a higher propensity for positive

affect (M=0.33.88, SD=7.59) than negative affect (M=21.71, SD=7.04) through the PANAS.

Additionally, the BDI showed that participants were minimally depressed in average


11

(M=9.20, SD=7.84) and the BAI showed a tendency of minimal state anxiety (M=9.88,

SD=8.66) in average across participants. Finally, the ERQ shows that participants in average

are good emotion regulators (M=42.49, SD=6.84).

Reinforcement Learning task

Prior to analyzing the results of the main experimental task, trials that were too fast (<50ms)

or took too long (>3s) were eliminated. To analyze the results from the main experimental

task, a 2x2 repeated measures ANOVA was used test the effect of both of the experimental

manipulations on the different measured of the reinforcement learning task, namely

performance and reaction time (RT). The independent variables used were the manipulations

defined previously as emotion induced (safe or anxiety) and decision domain (gain or loss) in

which the decision took place. Based on this, there seem to be marginally significant main

effects of decision domain (F(1,40)=3.56, p=0.067) by which participants responded faster in

the gain domain (M=895.05, SD=20.42) than in the loss domain (M=1028.50, SD=25.14). On

the other hand, emotion induction had a highly significant main effect (F(1,40)=72.9,

p<0.001) on reaction time. In this case, participants were faster responding in the anxious

condition (M=950.5, SD=44.68) than in the safe condition (M=973.05, SD=23.22). There

also was a significant main effect of subject (F(40,40)=5.62, p<0.001) which seems to

indicate high inter-subject variability. Furthermore, there was a significant interaction

between emotion induction and subject (F(40,40)=2.47, p=0.003) in terms of reaction time.

All other interactions were not significant (p>0.100). Finally, there were no significant main

effects or interactions in terms of the performance of participants on the reinforcement

learning task (p>0.200).

In order to better understand these relationships, it was decided to further analyze them using

a 2x2x24 ANOVA which factors in emotion induction, domain and trial by trial results. This

analysis approach could take into account the learning taking place on a trial by trial basis

and therefore account for its variance while concurrently being able to measure and contrast

the effects of the experimental manipulations on the task itself at different points in time.

Performance. First off, the participant’s performance was assessed across the emotion

induction manipulation, the domain manipulation as well as on a trial by trial basis

(Figure5A). There seems to be no main effects of either the emotion induction

(F(1,7572)=0.36, p=0.55) nor of domain (F(1,7572)=0.17, p=0.68) in this measure. On the


12

other hand, there is a main effects of trial number (F(40, 7572)=145.11, p<0.001). No

pairwise analyses were performed but visual inspection of the data seems to show an

increasing trend of performance over trials which suggests learning is taking place,

particularly in the anxiety and gain condition (Figure5A). There were multiple significant

interactions. Perhaps most notably was that between both experimental manipulations,

emotion and domain; F(1,7572)=6.62, p=0.01; such that anxiety would improve learning in

the gain domain (M=0.7713, SD= 0.03) however this would not be the case in the loss

domain (M=0.7176, SD=0.02). Additionally, there seems to be an important effect of inter-

individual differences despite its main effect being nonsignificant (F(40, 7572)=0.59,

p=0.9689) since all its interactions with the tested variables are significant. Namely there

seems to be significant interactions between participants and their response to domain (F(40,

7572)=8.46, p<0.001), elicited emotion (F(40,7572)=8.31, p<0.001) and trial number

(F(40,7572)=1.5, p=0.022).

Reaction Times. Reaction Times were also assessed across the same independent variables

described above; namely, the emotion, domain and trial number in which the decisions took

place (Figure5B). In this case, there seems to be no significant main effect of domain

(F(1,7572)=0.92, p=0.343) nor of emotion manipulation (F(1,7572)=0.00, p=0.953. However,

there was a highly significant main effect of trial number (F(1,7572)=99.89, p<0.001). No

pairwise analyses were performed but visual inspection of the data seems to show an

increasing trend of performance over trials which suggests learning is taking place. Like with

performance, pairwise comparisons were not performed for the trial main effect; however,

visual inspection of the data seems to suggest that reaction time decreases over trials in

accordance to learning taking place particularly in the anxiety and gain condition (Figure5B).

Additionally there was a significant interaction between domain and trial number (F(1,

7572)=14.95, p<0.001). Finally, like previously described in performance, it seems there is a

remarkable effect of inter-subject differences affecting this dependent variable. In this case

there is a highly significant main effect of subject on reaction time (F(40,7572)=6.18,

p<0.001) and all interactions with the dependent measures accounted for are highly

significant as well (Domain: F(40,7572)=3.59, p<0.001; Emotion: F(40,7572)=2.15, p<0.001;

Trial: F(40,7572)=3.76, p<0.001).

After the learning task


13

To analyze the effects of the independent variables on the post task and rating task a 2x2x2

repeated measures MANOVA was used. The independent variables used were those which

defined each symbol presented: emotion induced (safe or anxiety), decision domain (gain or

loss), probability of success (75% or 25%). Success was operationalized as either earning 50

cents in the gain domain or not losing 50 cents in the loss domain. This analysis seems to

show that there is a highly significant main effect of domain (F(2,39)=43.64, p<0.001, ηp2=

0.69) and of probability of success (F(2,39)=42.396, p<0.001, ηp2=0.685) across dependent

measures. Emotion induction does not seem to have a significant main effect overall

(F(2,39)=1.25, p=0.30, ηp2=0.06). All other interactions are non-significant (p>0.150). In

order to further analyze these results, they will now be described according to their respective

dependent measures

Preference Task. To analyze the participant’s probability of choosing a specific symbol

during the preference task by comparing them across the different dimensional contexts in

which each symbol was presented, namely emotion induction (safe or anxious), decision

domain (gain or loss) and probability of success (75% or 25%) (Figure6). There seems to be a

significant main effect of domain (F(1,40)=68.187, p<0.001, ηp2= 0.630) by which

participants were more likely to select symbols from the gain (M=0.609, SE=0.013) than

from the loss (M=0.391, SE=0.013) domain. Additionally, the main effect of probability of

success for each symbol is also significant (F(1,40)=81.057, p<0.001, ηp2= 0.670) such that

participants in average selected symbols with a 75% probability of success more often

(M=0.650, SE=0.017) than those with just a 25% probability (M=0.350, SE=0.017). On the

other hand, the main effect of emotion induction in non-significant (F(1,40)=0.039, p=0.845,

ηp2=0.001). In addition to this, all other interactions were non-significant (p>0.100).

Rating Task. Similarly, the average rating each participant gave to each symbol presented

was compared across the different contexts in which the symbol was presented to the

participant during the main task, emotion induction, decision domain and probability of

success (Figure6). Based on this analysis, there seems to be a highly significant main effect

of domain (F(1,40)= 82.579, p<0.001, ηp2=0.674) such that symbols presented in the gain

domain (M=6.602, SE=0.162) were rated significantly more favorably than those in the loss

domain (M=4.30, SE=0.158) in average. Similarly, there was a significant main effect of

probability of success (F(1,40)=70.044, p<0.001, ηp2=0.637) where symbols with a

probability of success of 75% (M=6.541, SE=0.133) were rated more favorably than those


14

with just a 25% probability of success (M=4.361, SE=0.188). Once again, the main effect of

emotion was non-significant (F(1,40)=1.019, p=0.319, ηp2=0.025). Finally, there were no

significant interactions (p>0.100).

Exit Questionnaire

The last measure recorded of participants were their responses to the exit questionnaire. In

this questionnaire they were asked about the subjective experience during the tasks. Due to

delayed implementation of the questionnaire only data from 35/41 subjects is available and

will be analyzed. Using a repeated measures 2x7 ANOVA to observe the difference in

feelings (anxiety, fear, surprise, disgust, sad, happy, angry) between the shock themselves

and the blocks they were presented in; it was found that there was a highly significant main

effect between emotions (F(6,204)=12.932, p<0.001, ηp2=0.276). Pairwise analysis show that

the most common emotion was surprise (M=4.26, SD=0.30), followed by anxiety (M=3.91,

SD=0.35) and fear (M=3.50, SD=0.35). Anxiety was not significantly different from surprise

or fear, however it was felt significantly more (p<0.05) than disgust (M=2.61, SD=0.33),

sadness (M=2.18, SD=0.25), happiness (M=2.01, SD=0.18) and anger (M=2.77, 0.32). There

was also a main effect between the emotions felt during blocks and shocks themselves

(F(1,204)=23.74, p<0.001, ηp2=0.41). Furthermore, there was a significant interaction

between emotions and whether they were elicited during shocks themselves or during the

blocks (F(6,204)=3.06, p=0.007, ηp2=0.083) which seems to show that all emotions were felt

more throughout the block than during the shock itself.

In separate additional questions, participants rated their valence during shock blocks as

negative leaning (M=6.88, SD=3.707) and their arousal towards excited (M=4.59, SD=1.258)

via mannequin questions. They also had to complete a recognition task for the symbols where

the average performance was of 3.86 (SD=0.35) out of 4.0 possible points. Finally,

participants were awarded an average of 20.95 (SD=4.79) euros for their participation.

Discussion

Based on these results several conclusions can be reached. First of all, it seems that learning

is indeed taking place as evinced by the significant main effect of trial number on both

reaction time and accuracy during the learning task. This provides further evidence of the

participants developing a strategy and a more efficient method through practice as the task is

taking place. Furthermore, assessments after the learning task showed there seems to be a


15

clear pattern of preference and valence towards the symbols presented based on their

corresponding outcomes during the task and are capable of reporting them both implicitly

(binary preference choice) and explicitly (rating task).

Secondly, our experimental manipulations seem to have no main effects on the learning task,

as evidenced by both reaction time and accuracy. Preliminary evidence of SCR (not reported)

seems to show that there is a significant difference between anxious and safe condition.

Furthermore, habituation was taken into account through multiple calibrations which all

resulted in no significant differences in discomfort the participants would feel with the shock

used throughout the task. These results seems to suggest that there is an effect of our threat of

shock protocol on the participants alertness and arousal; however, this parasympathetic

change does not seem to affect learning or isn’t strong enough to do so in the current

manipulation. Additionally, the lack of a main effect of domain on learning seems to show

that learning occurs equally well in both conditions which has previously been found by

Guitart-Masip and colleagues as well (2012).

Despite the lack of main effects of our experimental manipulations, there were two very

significant interactions which are worth noting. In terms of reaction time, there was a highly

significant interaction between domain and trial which points towards domain having an

effect on the trial by trial learning taking place during the task. Since there was no main effect

of domain on reaction time, this interaction could reflect that the rate at which participants

learn in both conditions is different. If this is the case, it should become apparent when

pairwise comparisons for this interaction are tested. Additionally, a highly significant

interaction between both manipulations can be observed in performance. This interaction

suggests that anxiety improves reward learning but not punishment learning. This does not

fully match the emotion-congruent interaction hypothesized earlier as it was expected that

anxiety would improve punishment learning and not reward. An alternative explanation as to

why this is taking place would be that learning is enhanced by the autonomic arousal changes

taking place by the anxiety manipulation. It has been previously shown that arousal and stress

hormones could contribute to improved encoding which would certainly improve learning

(Cahill & Alkire, 2002). However, to determine whether this is the cause for the obtained

results requires further testing.

Alternative explanations as to why our emotion induction paradigm might not have yielded

the expected results in the decision making task can be found in the post test tasks and


16

questionnaire. As observed in the exit questionnaire, participants were not particularly

anxious when the threat of shock was taking place. Ultimately, it would be interesting to note

why some effects were observed in reaction time whereas not in performance or viceversa.

There is evidence to suggest that there could be a motor difference for both types of

processing and responses (Wrase et al., 2007). Additionally, without further testing it is hard

to assert whether a shorter reaction time is due to arousal or to learning and practice effects. It

is noteworthy that for reaction times there was unexpected spiking taking place at the

beginning of each set of trials, if this spike is taken into account, it could be possible the main

effects of both domain and emotion could become apparent and therefore the first trial’s

unusual increase of reaction time is due to an attentional artifact of the task used. This could

be further exacerbated by task switching demands due to the different experimental

manipulations and strategies used by the participants.

Moreover, further analysis are needed that due to the scope of this report were not conducted

and/or reported. Examples of this are skin conductance modeling and pairwise comparisons

pending reports. Additionally, further analysis and interpretations like a median split would

allow to take into account the intersubject variability that proved to have highly significant

main effects and interactions in the main task. By reducing the amount of noise in the data, it

would be possible to paint a clearer picture of the interaction between the experimental

manipulated variables. Another idea would be to covariate some of the questionnaires out or

divide the questionnaires into its base components, for instance the ERQ into reappraisal and

suppression elements; this in turn would make it easier to tune covariates or regressors for

further analysis.

The behavioral evidence collected points towards learning taking place and being affected

differentially by anxiety and domain in both reaction time and performance measurements.

These findings could improve our approach to learning and teaching so as to make these

processes more efficient. They are also supported by evidence that stress hormones could aid

and improve a person’s memory encoding (Cahill & Alkire, 2002), as could be the case in the

current study. Furthermore, if extended and applied in imaging studies, this task paradigm

could allow us to better understand the effects these two manipulations have in terms of

interactions between their brain correlates and their corresponding processes within

reinforcement learning models.


17

References

Bach, D. R., & Friston, K. J. (2013). Model‐based analysis of skin conductance responses:

Towards causal models in psychophysiology. Psychophysiology, 50(1), 15-22.

Basten, U., Biele, G., Heekeren, H. R., & Fiebach, C. J. (2010). How the brain integrates

costs and benefits during decision making. Proceedings of the National Academy of

Sciences, 107(50), 21767-21772.

Blanchette, I., & Richards, A. (2010). The influence of affect on higher level cognition: A

review of research on interpretation, judgement, decision making and

reasoning. Cognition & Emotion, 24(4), 561-595.

Cahill, L., & Alkire, M. T. (2003). Epinephrine enhancement of human memory

consolidation: interaction with arousal at encoding. Neurobiology of learning and

memory, 79(2), 194-198.

Carpenter, S. M., Peters, E., Västfjäll, D., & Isen, A. M. (2013). Positive feelings facilitate

working memory and complex decision making among older adults. Cognition &

emotion, 27(1), 184-192.

Edwards, W. (1954). The theory of decision making. Psychological bulletin, 51(4), 380.

Engelmann, J. B., Meyer, F., Fehr, E., & Ruff, C. C. (2015). Anticipatory anxiety disrupts

neural valuation during risky choice. Journal of Neuroscience, 35(7), 3085-3099.

Fehr, E., & Rangel, A. (2011). Neuroeconomic foundations of economic choice—recent

advances. The Journal of Economic Perspectives, 25(4), 3-30.

Guitart-Masip, M., Huys, Q. J., Fuentemilla, L., Dayan, P., Duzel, E., & Dolan, R. J. (2012).

Go and no-go learning in reward and punishment: interactions between affect and

effect. Neuroimage, 62(1), 154-166.

Heilman, R. M., Crişan, L. G., Houser, D., Miclea, M., & Miu, A. C. (2010). Emotion

regulation and decision making under risk and uncertainty. Emotion, 10(2), 257.


18

Kim, S. H., Yoon, H., Kim, H., & Hamann, S. (2015). Individual differences in sensitivity to

reward and punishment and neural activity during reward and avoidance learning.

Social cognitive and affective neuroscience, 10(9), 1219-1227.

Lench, H. C., Flores, S. A., & Bench, S. W. (2011). Discrete emotions predict changes in

cognition, judgment, experience, behavior, and physiology: a meta-analysis of

experimental emotion elicitations.

Loewenstein, G., & Lerner, J. S. (2003). The role of affect in decision making. Handbook of

affective science, 619(642), 3.

Mitchell, D. G. (2011). The nexus between decision making and emotion regulation: a review

of convergent neurocognitive substrates. Behavioural brain research, 217(1), 215-231.

Payzan-LeNestour, E., Dunne, S., Bossaerts, P., & O’Doherty, J. P. (2013). The neural

representation of unexpected uncertainty during value-based decision making. Neuron,

79(1), 191-201.

Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J., & Frith, C. D. (2006). Dopamine-

dependent prediction errors underpin reward-seeking behaviour in humans. Nature,

442(7106), 1042-1045.

Petzold, A., Plessow, F., Goschke, T., & Kirschbaum, C. (2010). Stress reduces use of

negative feedback in a feedback-based learning task. Behavioral neuroscience, 124(2),

248.

Robinson, O. J., Vytal, K., Cornwell, B. R., & Grillon, C. (2013). The impact of anxiety upon

cognition: perspectives from human threat of shock studies. Frontiers in Human

Neuroscience, 7.

Rushworth, M. F., Kolling, N., Sallet, J., & Mars, R. B. (2012). Valuation and decision-

making in frontal cortex: one or many serial or parallel systems?. Current opinion in

neurobiology, 22(6), 946-955.

Schmitz, A., & Grillon, C. (2012). Assessing fear and anxiety in humans using the threat of

predictable and unpredictable aversive events (the NPU-threat test). Nature Protocols,

7(3), 527-532.


19

Starcke, K., & Brand, M. (2012). Decision making under stress: a selective review.

Neuroscience & Biobehavioral Reviews, 36(4), 1228-1248.

White, S. F., Geraci, M., Lewis, E., Leshin, J., Teng, C., Averbeck, B., ... & Blair, K. S.

(2016). Prediction error representation in individuals with generalized anxiety

disorder during passive avoidance. American Journal of Psychiatry, 174(2), 110-117.

Wrase, J., Kahnt, T., Schlagenhauf, F., Beck, A., Cohen, M. X., Knutson, B., & Heinz, A.

(2007). Different neural systems adjust motor behavior in response to reward and

punishment. Neuroimage, 36(4), 1253-1262.

Wyart, V., De Gardelle, V., Scholl, J., & Summerfield, C. (2012). Rhythmic fluctuations in

evidence accumulation during decision making in the human brain. Neuron, 76(4),

847-858.

Yu, A. J. (2015). Decision-making tasks. Encyclopedia of computational neuroscience, 931-

937. Chicago


20

Figures

Figure1. Procedural overview. Calibration rounds are marked in purple, whereas the main

experimental task is marked in red. Set-up and practice tasks are marked by green whereas

post testing is highlighted in blue. Questionnaires were before arrival to the lab are not

colored.


21

Figure2. Tasks used throughout the experiment. Namely the calibration task (A), Practice

Task (B), Reinforcement learning task (C), Preference task (D) and Rating Task (E). Items

marked in blue show the participant’s actions.


22

Figure3. Stimuli example and its corresponding probability. Each round had 8 symbols in

total. Each symbol pair corresponded to either gain or loss (domain) and anxious or safe

(emotion) conditions. The symbol pair consisted of a good symbol, which would have the

preferable outcome 75% of the time. In the case of the gain domain it was a symbol which

would result in gaining 0.50 euro 75% of the time. In the case of the loss domain, it would be

the symbol which incurred losses just 25% of the time.


23

Figure4. Averaged calibration data. The graph shows the discomfort rating participants gave

in blue bars (SD in blue glow bars) which show no significant differences. On the other hand,

actual intensity is plotted in orange with orange glow error bars and shows a significant

increasing trend.


24

Figure5. Raw data trace averaged across participants for emotion, domain and trial by trial

changes for performance and reaction time. Although no pairwise analysis were performed

for the 2x2x24 ANOVAs for reaction time and performance, there seem to be trends visible

in the data. Different lines represent the different coditions.

A B


25

Figure6. Posttests average results. Rating is plotted in orange and its standard deviation is

plotted with an orange glow. Similarly, preference choices are plotted in glue with a standard

deviation bar in blue glow. As you can see there is an overlap and both implicit (preference)

and explicit (rating) measures show learning of the preferable symbol. Different conditions

are separated with a dotted line. Each marker represents the average measure for that

condition’s symbol.


26

Supplementary Materials

Table of contents

Appendix - Hardware ......................................................................................................................... 27

DS5 - Bipolar constant current stimulator ....................................................................................... 27

Custom-made amplifier ..................................................................................................................... 27

PC ..................................................................................................................................................... 27

Other (alcohol swabs, gel, tape) ....................................................................................................... 27

Appendix - Software .......................................................................................................................... 28

Matlab ........................................................................................................................................... 28

ToS_Calibration ....................................................................................................................... 28

ToS_Learning_v3 .................................................................................................................... 28

ToS_PostTestBehaveGraded ................................................................................................... 29

rating _task ............................................................................................................................... 29

Vsrrp ............................................................................................................................................. 29

NI_Controller ................................................................................................................................ 29

Appendix - Supporting Documents .................................................................................................. 30

Instructions and Consent Form. ....................................................................................................... 30

Researcher’s Checklist. ..................................................................................................................... 35

Payment Slip. .................................................................................................................................... 36

Appendix - Participant Communication ........................................................................................... 37

Questionnaires. ................................................................................................................................. 37

Invitation. .......................................................................................................................................... 37

Reminder. .......................................................................................................................................... 37

Appendix - Questionnaires ................................................................................................................ 38

Before arrival. ................................................................................................................................... 38

PANAS. ........................................................................................................................................ 39

BDI. ............................................................................................................................................... 40

BAI. ............................................................................................................................................... 42

Exit Questionnaire. ........................................................................................................................... 44

Appendix - Stimuli ............................................................................................................................. 50


27

Appendix - Hardware

DS5 - Bipolar constant current stimulator

Device used to administer shocks. Input and output voltages can be regulated as desired and

are administered with a wrist electrode attached with Velcro. Manipulations for the device

include backlight, current output, and input among others and were applied in the

accompanying Matlab code created. In order to reduce impedance, the skin surface where the

electrodes are applied can be cleaned with alcohol and further reduced using electrode gel.

Custom-made amplifier

Amplifier purchased from the University of Amsterdam’s Technical Support for Social and

Behavioral Sciences (TOP) department. It included a pair of sintered Ag/AgCl EMG

electrodes connected to a custom-made amplifier with an input resistance of 1GΩ and a

bandwidth of 5-1000Hz (6dB/oct). Electrodermal activity (Skin Conductance Level; SCL)

was measured with a sine wave shaped excitation voltage (1V pk-pk, 50Hz).The SCL circuit

measures the current flowing through the skin from the output electrode to a GND electrode

and converts this current to a conductance value. Manipulation, recording and specifications

were applied through an xml file fed directly into the program as a driver.

PC

A computer was required to run the experiment and its software (Matlab), present the

experiment (screen) as well as record responses and interface with the researcher (mouse and

keyboard). Additionally, it must also allow for the connection, manipulation, control and

recording of all devices involved.

Other (alcohol swabs, gel, tape)

Additionally, practical issues also required preparation and additional materials. Electrodes

for recording with the devices aforementioned required conducting gel to improve the signal

and tape to hold the skin conductance electrodes in place. Additionally, alcohol swabs to

clean the electrodes as well as the participant’s skin surface to improve the signal.


28

Appendix - Software

Matlab

Matlab2017 was used with a purchased personal student license. In order to run it, it requires

the Data Acquisition toolbox for interfacing with the shocker. Additionally, all stimuli

presentation and tasks were presented using Cogent 2000 developed by the Cogent 2000 team

at the FIL and the ICN and Cogent Graphics developed by John Romaya at the LON at the

Wellcome Department of Imaging Neuroscience (www.vislab.ucl.ac.uk/cogent.php).

Code used in the task consists of the following:

ToS_Calibration

o Used to calibrate the shock to each participant to take into account that

perception of pain and skin resistance can vary across participants. This is

done by calling on function shock_in_block and shock_setup which sets the

input parameters for the shocker device; this experiment’s parameters set the

input voltage as 5V and output as 25mA for all participants. The screen shows

the instruction “Press enter to be shocked”, after which the participant presses

enter and unsurprisingly gets shocked. After the shock is administered,

participants rate their pain perception of it on a scale from 1, Not painful, to 10,

Extremely painful. The intensity of the shocks increases and decreases in steps

of 10% of the maximum output indicated in the parameters. After the second

shock, if the participant rates a shock as less than 7, the intensity will be

increased by one step. Conversely, if after the first shock the participant rates a

shock as higher than 9, the intensity will be decreased by one step. The

minimum intensity is 2.5mA, namely 10% of the maximum intensity, which is

25mA. Once two consecutive shocks are rated as higher than 7, the intensity

of the last shock administered is the value that will be used throughout the

subsequent task.

o Requires:

Data Acquisition Toolbox

Input parameters provided by shock_in_block and shock_setup

o Output: A mat file with the intensity and subjective rating of the participant.

The name of this file is Sub[participant number]_calib_[iteration]_times.mat

ToS_Learning_v3

o The task was adapted from code used and provided by Stefano Palimitieri

(2006). This script is used to present the learning task itself and record the

participant’s responses and results. The task will present a pair of symbols on

the left and right sides of the fixation cross, said symbols will be randomly

assigned for each participant to represent an advantageous and

disadvantageous option in either the loss or gain domain. The task consists of

96 trials, grouped in anxiogenic and anxiolytic blocks of 3 trials each. There is

a pause halfway through the session, namely after 48 trials have been

completed. Inter trial time is jittered and lasts between 1 and 6 seconds.

Additionally, the script also sends triggers for the shocking device in

anxiogenic blocks as well as markers to the skin conductance response

amplifier for its later analysis.

o Requires:

Data Acquisition Toolbox

o Output:


29

Two mat files (Sub[participant #]_Session[session #].mat) listing the

symbols to be used in 2 separate iterations of the task

Two mat files showing the participants responses, accumulated

monetary reward for each run (first half and second half of the session),

reaction times. This file is called

Sub[participant#]_ToS_Session[session#]run_[run#].mat

ToS_PostTestBehaveGraded

o This script is used to present the participant with a series of binomial choices

with no anxiety or domain manipulations in order to obtain their implicit

internal ratings for the symbols shown in a specified session. Pairs of symbols

are shown and the participant selects their preferred symbol. There are no

monetary gains or losses, nor feedback in this task.

o Requires:

Mat file with stimuli presented during the learning task (created by

ToS_SCR)

o Output:

A mat file (PostTest_[participant#]) with the participants preference in

each of the binomial preference choices.

rating _task

o This script asks the participant for their valence rating after each symbol is

presented individually. Each symbol is presented and rated on 4 different

occasions in a randomized order using a number scale ranging from 1, very

negative, to 10,very positive. The symbols presented are those from a session

specified.

o Requires:

Mat file with stimuli presented during the learning task (created by

ToS_SCR)

o Output

A mat file with the ratings given to each symbol in each iteration. The

file name is RatingData_Sub[participant#].

Vsrrp

Vsrrp98 V10.0 was provided by the UvA with the purchase a pair of sintered Ag/AgCl

electrodes connected to a custom-made bipolar amplifier with an input resistance of 1GΩ and

a bandwidth of 5-1000Hz (6dB/oct). The software makes use of xml driver for recording,

analysis and conversion of data from vsrrp files to mat format for analysis in matlab.The xml

code used was:

SCL - Debug

o Used to record Skin Conductance Level from a pair of Ag/AgCl electrodes

taped to the ring and index finger of the participant, creating a single file with

all the data and allowing for its division into blocked segments according to

the desired design and markers sent during the task. Furthermore, this driver

allows for conversion to mat files after recording.

o Output:

Vsrrp and mat file with name specified at beginning or recording

NI_Controller

Software used by the interface to connect the shocker (DS5) to the main experimental PC.

Doesn’t require any inputs beside proper hardware connections.


30

Appendix - Supporting Documents

Documents used during the lab session with the participants. All documents were retrieved by

the researcher for archiving.

Instructions and Consent Form.

This form is given to the participants at the start of the experiment and includes basic

information about the research conducted as well as instructions to complete the task and the

participant’s payout information; finally, it includes the informed consent for participants to

sign if they agree to take part in the experiment.


31


32


33


34


35

Researcher’s Checklist.

This document is for the sole use of the researcher and aids in keeping to protocol and

standardizing procedure used on each participant.


36

Payment Slip.

Once all tasks are completed, participants will be handed their payment as calculated in the

instruction and consent. They will sign their acceptance of payment as record of its

occurrence using this payment slip.


37

Dear [Participant Name],

You have signed up for the session on [day], [month] [date] at [time] for experiment 1711 -

"Decision making and electric shock"

It will take place in E7.20 in the E2 building 7th floor (Not in the PPLE side).

Please let me know if you have any questions, want to reschedule or aren't sure about the room

location.

All best,

Isabela L.

Appendix - Participant Communication

In order to standardize communication with participants, templates were used for the three

times communication took place. Other communication and questions posed by the

participants (e.g. location, time clarification) were answered at the researcher’s discretion.

Questionnaires.

Participants who signed up for sessions were contacted individually and asked to take part in

the questionnaire via a link attached. Participants who accessed the questionnaire through a

mass email were contacted individually once the questionnaire was completed to schedule

their sessions. Participants who did not fill in the questionnaire to completion were not

eligible for taking part in the main experimental task.

Invitation.

After the participants completed the questionnaire, they were invited to the lab on their

selected time via the CREED system or asked for their slot preference given the available

time slots at the time the email was sent.

Reminder.

Approximately 6 hours before the experiment took place, participants were sent an email

reminder with the date, time and location of the experiment.

Dear participant,

Thanks for completing the questionnaire for experiment 1711 - "Decision making and electric

shock". You are now eligible to participate in the second half of the experiment. As this

experiment is done individually, we arrange time slots ourselves so it better suits your schedule.

Currently, we have all times slots open during this week (10:00-18:00). The experiment takes

approximately 1:30 hours so please let me know what time and date would suit you best and I

will confirm its availability and your attendance.

All best,

Isabela L.


38

Appendix - Questionnaires Before arrival.

These questionnaires were sent to participants before their arrival to the lab session scheduled.

Participants who did not complete the questionnaires were not eligible to perform the second

half of the experiment in the lab. This battery of tests includes demographic questions as well

as the PANAS, BDI, BAI, ERQ and were conducted online via Qualtrics. They are listed

below in their original formats.

Exclusion Questions

Introduction to the battery


39

PANAS.


40

BDI.


41


42

BAI.


43

ERQ.


44

Exit Questionnaire.

This questionnaire was completed by participants before leaving the lab.It includes

manipulation checks, questions about strategies used and a recognition task for symbols that

weren’t present in the experiment.


45


46


47


48


49


50

Appendix - Stimuli The symbols used during the task are part of the Agathodaimon font. The whole font is

shown below. Symbols not used are light gray. Symbols used for the main task are shown in

black. Symbols used for the recognition task are shown in red.

A A B B C C D D E E F F G G H H I I J J K K L L M M N N O O P P Q Q R R S S T T U U V V W W X X Y Y Z Z

a a b b c c d d e e f f g g h h i i j j k k l l m m n n o o p p q q r r s s t t u u v v w w x x y y z z

RUNNING HEAD: Reinforcement learning and aversive emotion ...

Documents

Transcript of RUNNING HEAD: Reinforcement learning and aversive emotion ...