Thesis Draft Mar 30_b&w
-
Upload
johnny-abrogado-iv -
Category
Documents
-
view
219 -
download
0
Transcript of Thesis Draft Mar 30_b&w
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 1/167
Mental Workload Measurement Using theIntersaccadic Interval
by
Eldon Todd Pierce
A thesis submitted in conformity with the requirementsfor the degree of Master’s of Applied Science
Institute of Biomaterials and Biomedical EngineeringUniversity of Toronto
© Copyright by Eldon Todd Pierce (2009)
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 2/167
ii
Abstract
“Mental Workload Measurement Using the Intersaccadic Interval”
Eldon Todd Pierce, 2009
Master’s of Applied Science
Institute of Biomaterials and Biomedical Engineering, University of Toronto
Mental workload is commonly defined as the proportion of a person's total mental capacity in use
at a given moment. A measure of mental workload would have utility in a number of rehabilitation
medicine applications, but no method has been adequately examined for these purposes. A
candidate measure is the intersaccadic interval (ISI), which is the duration between two successive
saccades. Previous studies indicate that ISI length may be linked to mental workload, but this link
is poorly understood for tasks that are not primarily visual. Therefore, the current study was an
investigation of ISI and workload intensity in three non-visual tasks: mental arithmetic, verbal
fluency, and audio perception. Workload was manipulated through changes in task difficulty as
well as study participant motivation level. An analysis of eye movements and other experimental
workload measures indicated a significant association between audio perceptual workload and ISI
length.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 3/167
iii
Acknowledgments
This thesis could not have been possible without the help of my two supervisors: Dr. Robin Green
and Dr. Geoff Fernie. Thank you Robin, for teaching me everything I ever wanted to know about
experimental psychology (and perhaps more). Geoff, thank you for working tirelessly to give all
researchers at Toronto Rehab a roof and a vision. Collaborating with both of you has been extremely
educational.
To the others on my committee, Dr. Jay Pratt and Dr. Anthony Easty, and my external reviewer, Dr.
Luc Tremblay: thank you very much for your insights and encouraging words.
I am also grateful for the financial support of the Toronto Rehabilitation Hospital, the University of
Toronto, and Ontario Centres of Excellence.
Finally, a debt of gratitude is owed to my friends, family, and colleagues for their encouragement and
assistance. In particular, thank you Jessica for being there to support me through the final throes. It
is fitting that the thesis begins and ends with some of your handiwork.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 4/167
iv
Table of Contents
List of Tables vii
List of Figures viii
List of Appendices ix
Chapter 1. Introduction and Rationale 1
Chapter 2. Literature Review 2
2.1 Workload Measurement in Rehabilitation Medicine 2
2.1.1 Application: Rehabilitation Treatment 2
2.1.2 Application: Neuropsychological Assessment 4
2.2 Theory of Mental Workload 5
2.3 Workload Measurement 12
2.3.1 Subjective Measures 13
2.3.2 Performance-Based Measures 17
2.3.3 Physiological Measures 192.3.3.1 Electroencephalographic Activity (EEG) 202.3.3.2 Surface Electromyography (EMG) 212.3.3.3 Electrocardiogram (ECG) 222.3.3.4 Electrodermal Activity (EDA) 262.3.3.5 Respiration 282.3.3.6 Eye Blinks 292.3.3.7 Pupil Size 32
2.3.4 Saccadic Eye Movements 332.3.4.1 Definition of Saccades 33
2.3.4.2 Definition of Intersaccadic Interval (ISI) 352.3.4.3 Eye Movements and Mental Workload 372.3.4.4 Eye movements and Non-Visual Tasks 40
Chapter 3. Objectives and Hypotheses 47
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 5/167
v
Chapter 4. Pilot Study 51
4.1 Methods 51
4.1.1 Participants51
4.1.2 Materials 524.1.2.1 Eye Tracking 524.1.2.2 Physiological Measures 544.1.2.3 Subjective Workload Ratings 554.1.2.4 Neuropsychological Tasks 56
4.1.3 Design 59
4.1.4 Procedures 60
4.1.5 Analysis 64
4.2 Results and Discussion 65
4.2.1 Subjective Workload Data 65
4.2.2 Task Performance Data 69
4.2.3 Eye Movement Data 72
4.3 Conclusions 74
Chapter 5. Full-Scale Study 76
5.1 Restatement of Hypotheses 76
5.2 Methods 77
5.2.1 Participants 77
5.2.2 Materials 785.2.2.1 Eye Tracking 785.2.2.2 Physiological Measures 795.2.2.3 Neuropsychological Tasks 805.2.2.4 Motivation Manipulation 815.2.2.5 Post-Experiment Questionnaire 82
5.2.3 Design 83
5.2.4 Procedures 84
5.2.5 Analysis 855.2.5.1 Parametric Model Assumptions 865.2.5.2 Hypothesis Testing Methods 885.2.5.3 Multiple Comparisons Correction 91
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 6/167
vi
5.3 Results and Discussion 95
5.3.1 Hypothesis Testing 955.3.1.1 Average Heart Rate 955.3.1.2 Average Saccade Rate 975.3.1.3 Trial Proportion of Long ISI 98
5.3.1.4 Holm-Bonferroni Correction 100
5.3.2 Post-Experiment Questionnaire 101
5.3.3 Convergence Measures 104
5.3.4 Agreement of Eye Movement Results with Previous Literature 107
5.3.5 Eye movements and Mental Workload 109
5.4 Conclusions 110
Chapter 6. Limitations 113
Chapter 7. Extensions 116
References 121
Appendices
Appendix A – Eye Tracker Specifications 132
Appendix B – Pre-Test Questionnaire 133
Appendix C – Pilot Study ISI Histogram 135Appendix D – Full-Scale Study Post-Test Questionnaire 136
Appendix E – Full-Scale Study Change Score Box Plots 140
Appendix F – Full Hypothesis Testing Battery 149
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 7/167
vii
List of Tables
Table 1. Saccade Detection Thresholds in Previous Literature
Table 2. Descriptions of Experimental Tasks
Table 3. Participant Demographic Summary
Table 4. Summary of Trends Subjected to Statistical Analysis
Table 5. Summary of Relevant Previous Literature
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 8/167
viii
List of Figures
Figure 1. Illustration of the resource allocation principle wherein resources allocated to a primary
task lead to a decrease in secondary task performance, unless in a data-limited state.
Figure 2. Illustration of the division of mental resources in Wicken’s multiple resource theory (1984).
Figure 3. Sanders’ (1993) cognitive-energetical model of a choice reaction task execution.
Figure 4. Robert & Hockey’s (1997) control system model of mental effort, including control loops
for automatic processes (Loop A) and for effortful processes (Loop B).
Figure 5. Change in task difficulty ratings from the nominal to high difficulty versions of each task..
Figure 6. Change in subjective effort ratings from nominal to high difficulty version of each task.
Figure 7. Change in subjective effort ratings from 1st to 2nd half of experiment, by experimental
group; data is pooled over all task types.
Figure 8. Mean and standard deviations of (percent) change in number of correct responses from
nominal to high difficulty task versions.
Figure 9a. Portion of trials in which performance improves from 1st to 2nd half of experiment
(pooled over both difficulty levels).Figure 9b. Portion in which performance degrades.
Figure 10. Mean and standard deviations of change in task performance from 1st to 2nd half (pooled
over difficulty levels), using data from only those participants who improved.
Figure 11a. Illustrating error resulting from residual change score method on unmatched groups.
Figure 11b. ANCOVA method is preferred in this case because it does not pool data for regression.
Figure 12. Plot of median ISI versus average saccade rate for all trials reveals a very consistent (r 2 =
0.86) inverse relationship; note logarithmic axes.
Figure 13. Individual observations of average heart rate in 1st half versus 2nd half for high difficulty
auditory task suggests no effect of motivation.
Figure 14. Individual observations of Trial Portion Long ISI in 1st half versus 2nd half for high
difficulty auditory task suggests a weak motivation effect.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 9/167
ix
Figure 15. Tally of participants’ starting letter perceived difficulty classifications.
Figure 16. Tally of participants’ starting letter perceived effort classifications.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 10/167
x
List of Appendices
Appendix A – Eye Tracker Specifications
Appendix B – Pre-Test Questionnaire
Appendix C – Pilot Study ISI Histogram
Appendix D – Full-Scale Study Post-Test Questionnaire
Appendix E – Full-Scale Study Change Score Box Plots
Appendix F – Full Hypothesis Testing Battery
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 11/167
1
Chapter 1. Introduction and Rationale
The objective of this thesis is to investigate the relationship between mental workload and the length
of people’s intersaccadic intervals during non-visual tasks. If a correlation can be found, the
intersaccadic interval may be a candidate for workload measurement during neuropsychological
assessments and cognitive rehabilitation treatments. It will be demonstrated not only that these
assessments and treatments would benefit from a measure of workload, but that there is currently no
valid and reliable measure available. An eye tracker-based measurement would be particularly well-
suited to these applications because it has the potential to be non-invasive (using a remote eye
tracker) and would have the temporal resolution necessary for the short task periods that are
sometimes involved.
The first task of any mental workload research is to define the subject of investigation; though its
manifestations are well known, they are also poorly understood. Everyone has experienced feelings
of mental taxation that accompany very difficult tasks, skill learning, and working under high
environmental demands. However, reviews of some 40 years of theoretical and experimental work
indicate that there is no universal definition for workload (Xie et al., 2000; Cain, 2007), let alone one
with any empirical validation.
It remains that there is some regulatory factor or group of factors whose measurement would be
useful in a variety of applications. Studies concerned with the measurement or prediction of “mental
workload” and its aliases, such as mental effort and cognitive load, tend to define it in operational
terms. For example, a study interested in the relationship between job satisfaction and workload
might define it as worker’s perceived (subjective) workload over the course of a day. In a similar
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 12/167
2
fashion, the current study will define mental workload operationally, as relevant to the intended
rehabilitation medicine applications.
Chapter 2. Literature Review
This literature review is divided into three primary sections. The first section addresses the intended
applications for a mental workload measurement tool in rehabilitation medicine. The second
describes various theories of mental workload that address specific aspects of these clinical
applications as well as its general conceptualization in the engineering psychology literature. The
third section is a compilation of experimental evidence for the success and limitations of various
proposed workload measurement methods, including the intersaccadic interval. Those measures
that have been included in the current investigation receive special attention with respect to practical
measurement issues.
2.1 Workload Measurement in Rehabilitation Medicine
2.1.1 Application: Rehabilitation Treatment
Does the degree to which a client “tries” during a cognitive rehabilitation task affect clinical
outcomes? Although the concept of measuring patients’ involvement in rehabilitation tasks has been
put forward (Papadelis, 2007), there are no studies that directly link cognitive rehabilitation
outcomes to patients’ effort levels. The reason for this omission may be the lack of a gold standard
effort measure (Salthouse, 2006). Certainly, a lack of empirical evidence for the link between effort
and outcome has not hampered its implementation by clinicians, who tend to set task difficulties
such that a patient is challenged but not discouragingly so. However, there is plenty of indirect
evidence in support of this practice.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 13/167
3
Studies of learning are one source, as it has been suggested that rehabilitation treatment should
follow the same “biological rules” as the deliberate practice involved with active skill-acquisition
(Kwakkel, 2006). The results of two comprehensive studies of skill-acquisition strategies suggest that
“participants’ motivation to attend to the task and exert effort to improve their performance” is the
factor most frequently associated with performance improvement (Ericsson et al., 1993). Mulder
(1986) also noted that the speed of skill acquisition has been correlated with arousal, which is, in
turn, thought to accompany effortful activities. An issue of particular importance to rehabilitation
medicine is the disambiguation of improvements in (compensatory) behavioural strategies versus
those in more general, physiological functionality (i.e., neuroplasticity). In this regard, there is
evidence from animal research that supports a “top-down” driven model of adult sensory map
plasticity (Polley et al., 2006). This result speaks to a potential role for effort in neuroplastic change
because effort is essentially a means of consciously regulating goal-oriented behaviour.
The importance of effortful activities in rehabilitation is also stressed in the environment enrichment
literature, which studies the effect of social interaction and other enrichment variables on cognitive
ability in aging as well as brain injured people. Schooler (1987) theorized that the benefits of an
enriched environment are the result of a feedback loop between increased involvement in cognitively
challenging activities and resultant improvements in cognitive functioning (which subsequently
encourages further involvement). In response, a recent pilot study by Green et al. (2006) has tested
the hypothesis that participation in cognitively challenging activities leads to an increase in cognitive
functioning. This study found that healthy participants who carried out mentally challenging tasks
daily for two weeks showed a greater improvement (from baseline) on a test battery for general
cognitive ability than participants who engaged in light reading for an equivalent treatment period;
improvement did not appear to be attributable to acquisition of new strategies. Evidence of general
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 14/167
4
cognitive improvements – which have potential for far-transfer – suggests that the development of
behavioural strategies is not the sole beneficiary of effortful mental activity.
Mental effort measurement could clearly contribute to ongoing research on the relationship between
exertion during rehabilitation treatment and outcomes. It may also be useful to clinicians that
currently rely on intuition alone in determining whether patients are being challenged too much or
too little. Another related application is in feedback systems for computer-administered treatments,
an area of considerable research interest.
2.1.2 Application: Neuropsychological Assessments
Effort (motivation) is an important consideration in neuropsychological testing because it can
confound the relationship between task performance and cognitive ability in some circumstances. It
has been demonstrated that another motivational factor, goal setting, can affect brain-injured
participants’ performance on math problems (Gauggel, 2002a; Gauggel, 2002b) and the Purdue
Pegboard Test (Cardall, 1943 for description of test; Gauggel, 2001). Healthy participants’
performance during a word association task was improved through feedback in a study by Podsakoff
& Farh (1989). Most compellingly, in a comparison of athletes’ baseline and post-concussion
neuropsychological assessments, Bailey et al. (2006) concluded that the will to continue playing
motivated them to significantly improve their test performance. However, there are also studies that
had mixed success in motivating participants to perform better on tests. Richards & Ruff (1989)
reported that San Diego Neuropsychological Battery (Ruff, 1985) outcomes in both healthy and
depressed participants were unaffected by the opportunity for a monetary reward. The use of
another motivational technique, performance feedback, was also shown to have no effect on the
response times of healthy participants, although it did affect brain-injured patients (Gauggel et al.,
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 15/167
5
2000). Therefore, it appears that it is not a matter of whether motivation can affect
neuropsychological test outcomes, but under which circumstances and to what degree.
One such circumstance is the extremely low effort that is often discussed in the malingering
literature. In this context, “low effort” is generally used to refer to wilful deception, and although
some strategies may involve a state of low mental activity, others may actually require added mental
work to produce incorrect answers. However, some of the many effort tests introduced as tools for
malingering detection (comprehensive test reviews include: Bianchini et al., 2001; Vickery et al.,
2001; Lynch, 2004) can also be used to detect involuntarily low effort, though the sensitivity of
individual tests may vary. For example, the hypothesis that (involuntarily) low effort could account
for schizophrenic participants’ poor performance on some neuropsychological tests was tested using
the Victoria Symptoms Validity Test (Egeland et al., 2003) and the Working Memory Test (Gorisson
et al., 2005), but low effort was only reported by the latter. Furthermore, Kessels et al. (2007) tested a
similar hypothesis for participants reporting symptoms of depression, and while they conclude that
there is evidence of an inability to allocate effort, neither the Test of Memory Malingering (Rees et al.,
1998) nor the Amsterdam Short-term Memory Test (Shagen et al., 1997) could detect it.
Another circumstance may be in cases of milder cognitive impairment. For example, Van Zandvoort
et al. (1998) reported on the insensitivity of some neuropsychological tests to cognitive deficits in
lacunar stroke patients, and theorized that the patients were able to conceal compensate during these
tests through the exertion of greater effort. Although it may be possible to reveal these deficits by
administering more difficult tests, the ecological validity of testing under extremely high effort is
itself questionable, as the vast majority of day-today tasks do not involve high effort. Furthermore, it
has been suggested that low motivation may be a primary contributor to dysfunction in
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 16/167
6
schizophrenic (Egeland et al., 2003; Gorisson et al., 2005), depressive (Kessels et al., 2007; Layne,
1980), and brain injured (Riese et al., 1999; Oddy et al., 2008) patients. However, there is currently
no generally accepted tool for the measurement of patient motivation (Oddy et al., 2008). Therefore,
the potential roles of mental effort measurements in neuropsychological assessments not only
include standardizing conventional tests, but also the development of more ecologically valid tests.
2.2 Theory of Mental Workload
As previously mentioned, mental workload is wide and elusive concept. Therefore, it is important to
remain grounded in those elements that can be observed. Tsang & Vidulich (2006) neatly summarize
the three manifestations of mental workload: “subjective experience, performance, and physiological
manifestations.” The following discussion will summarize some of the more enduring and/or
applicable models that have attempted to tie these phenomena together in a unified theory of
workload. As previously eluded to, none of these models are directly applicable to the empirical
goals of this thesis, but their inclusion will serve to align this thesis with the perspective of decades of
similar research.
In Kahneman’s seminal book (1973), he described a task performance feedback loop wherein mental
resources are allocated in response to task demands. He also described a mechanism of allocation
involving arousal, which was supported by his empirical research on pupil diameter and task
difficulty. In this context, workload would be defined as the proportion of an individuals’ total
resource capacity allocated at a given moment. Though his theory of resource allocation, the
“Resource Model,” was originally conceived in response to observations of dual-task (multitasking)
performance, it is also very commonly referenced in discussions of single task scenarios. A
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 17/167
7
discussion of the validity of the resource model is beyond the scope of this review, but critiques have
been written by Navon (1984) and Sanders (1997).
As mentioned, the resource model was a response to dual-tasking behaviours. According to the
model, there are a limited amount of resources that must be distributed amongst simultaneous tasks.
Therefore, when we engage in one task, less capacity is left for the processing of another task, with
the ultimate result of degraded task performance. Norman & Bobrow (1975) investigated this
phenomenon using a dual-task paradigm, wherein primary task performance was measured while
participants simultaneously carried out a secondary task of variable difficulty. They found that
primary task performance was generally negatively correlated with secondary task difficulty (Figure
1), which was presented as evidence for the shared allocation of resources between tasks. Notably,
they also attempted to clarify the meaning of resources as, “...such things as processing effort,
various forms of memory capacity, and communication channels” (Norman & Bobrow, 1975, p. 45).
Where a correlation between primary task performance and secondary task difficulty did not occur,
Norman & Bobrow defined performance as being data-limited, otherwise it was resource-limited.
“Data-limited” means that primary task performance is limited by the quality of the data, and
therefore independent of any further resource allocation. For example, in detecting an auditory
stimulus amongst noise, a person might perform better if they are not occupied with a difficult
secondary task, but if the stimulus signal to noise ratio is poor enough, then no increase in attention
or auditory acuity will improve performance.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 18/167
8
Figure 1. Illustration of the resource allocation principle wherein resources allocated to a primary task lead
to a decrease in secondary task performance, unless in a data-limited state.
A widely accepted amendment to the original resource model is the multiple resource model
introduced by Wickens (1984). This theory is intended to account for the observation that the extent
of task interference depends on the similarity of their processes, rather than their difficulty alone.
Wickens (2002) illustrates this concept with an anecdotal example: a driver’s performance would be
expected to suffer much more while following written directions compared to spoken directions
because reading would take the driver’s eyes off the road. Such interference is formalized in
Wickens’ predictive model, which describes pools of resources that are defined on four dimensions
(Figure 2): perception modality, processing code, processing stage, and response type. It is asserted
that multitask interference is somewhat predicted by the number of resource pools that are
commonly shared by the tasks.
P r i m a r y
T a s
k P e r f o r m a n c
e
Resources Allocated to Primary Task
data-limited
resource-limitedsecondary
task madeeasier
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 19/167
9
Figure 2. Illustration of the division of mental resources in Wicken’s multiple resource theory (1984).
Although originally conceived as a model of dual-task performance, the performance resource
function in Figure 1 is also commonly used to conceptualize the effect of wilful resource allocation
on performance during a single (resource-limited) task. Such wilful resource allocation was
originally challenged by Kahneman (1973). He cited a number of studies in which task performance
was not affected by participants’ motivation levels, concluding that “...allocating less effort than the
standard probably will cause a deterioration of performance [but] allocating more than the standard
seems to be beyond our ability” (Kahneman, 1973, p. 15). However, evidence from the previous
discussion of motivational effects on neuropsychological test performance in concussed athletes
seems to contradict this statement. Wickens (1991) also provides contrasting examples of studies in
which task performance is affected by instructions to “try harder,” by the imposition of performance
criteria, and also by motivational incentives. Further evidence for wilful resource allocation is
implicit in the practices of dual-task researchers, who must control participants’ priority between the
Spatial
Verbal
Perception
Cognition
Responding Manual(Spatial)
Vocal(Verbal)
Visual
Auditory
Manual(Spatial)
Vocal(Verbal)Codes
Modalities
ResponsesStages
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 20/167
10
primary and secondary tasks, so that variability in performance is minimized (Damos, 1991). The
results of Washburn & Putney (2001), who observed that task performance improved when a task
was made more difficult, would also be difficult to explain using Kahneman’s (1973) perspective.
These various viewpoints may be reconciled by concluding that people may often choose to achieve
maximum (standard) performance without being instructed to do so, but there are circumstances in
which they will allocate less resources, leaving room for performance improvement.
This compromise would not satisfy Mitchell & Hunt (1989), who question the more fundamental
assumption that maximum effort is limited by task demands by suggesting that a sufficiently
motivated person may invest themselves completely in any task, no matter its difficulty. However,
the “effort” they are referring to is conceptually different from Kahneman’s (1973) effort, not
necessarily being captured in measures of task performance, but perhaps in measures of perceived or
“subjective” workload. Pashler (1989) emphasizes the importance of this aspect of workload,
criticizing research in which workload is manipulated through task demands, but a subjective
workload measure is not used to confirm the success of this manipulation. He argues that these
studies “might better be described as revealing physiological concomitants of information-processing demands rather than effort” (Pashler, 1989, p. 382). While an information-processing
focus could be considered valid in some applications, it is inappropriate in the context of cognitive
rehabilitation for two reasons. Firstly, considering that people commonly achieve their maximal
performance by default according to Kahneman, the emphasis placed by Ericsson et al. (1993) on
effort for successful learning (see Section 2.1.1) indicates that they are referring to something that is
not necessarily indicated by task performance. Secondly, subjective effort reports are logically a more
direct measure than task performance for determining and maintaining workload tolerance
thresholds in patients.
The “cognitive-energetical” model of information processing (Sanders, 1983) distinguishes between
computational resource allocation, as it was originally conceived by Kahneman (1973), and an
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 21/167
11
energetical concept of effort. “Energetics” refers to the “intensity of behaviours” from drive, arousal,
and activation to stress, fatigue, and strain (Unema, 1995). The model was first introduced by
Sanders (1983) as a means of mapping the cognitive stages involved with choice reaction tasks while
accounting for stressors such as sleep deprivation (Figure 3). Information is channelled through
various processing stages, each having a limited, but variable capacity. Although Sanders uses only
four stages to model the choice reaction task, a different task may potentially involve others. The
processing capacity of the stages is thought to be controlled by a three-part mechanism that is based
on the work of Pribram & McGuinness (1975): arousal, activation, and effort. Arousal and activation
respectively affect the physiological response to stimuli and the readiness to respond, while effort
may act in two ways: as a regulator of basal state (arousal and activation) or in the regulation of
“controlled” executive processes (“response choice” in Figure 3). Other stages are referred to as
“automatic,” which require minimal resources, do not interfere with other processes, and do not
become more efficient with practice (Hasher & Zacks, 1979). Furthermore, it is thought that only
controlled processes are intentional and therefore affect our perception of subjective effort (de
Waard, 1996).
Figure 3. Sanders’ (1993) cognitive-energetical model of a choice reaction task execution.
StimulusPre rocessin
Evaluation
Arousal
FeatureExtraction
ResponseChoice
MotorAd ustment
(response)
(stimulus)
Activation
Effort
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 22/167
12
Robert & Hockey’s (1997) model (Figure 4) focuses on the mechanisms of effortful control, in loop
A, where they demand no appreciable level of effort. However, if a process or environmental
stressors are great enough that task performance becomes a concern, then control shifts to loop B,
wherein either additional effort can be exerted to improve performance or the performance goal can
be downgraded. A key observation is that when additional effort is exerted, it incurs a physiological
cost: an increase in “sympathetic and musculo-skeletal responses as well as neuroendocrine stress
patterns.” These costs can eventually lead to fatigue, which they liken to the adoption of a low-effort
strategy. In this state, less effortful behaviours are chosen though they are detrimental to task
performance.
Figure 4. Robert & Hockey’s (1997) control system model of mental effort, including control
loops for automatic processes (Loop A) and for effortful processes (Loop B).
Along the lines of Sanders’ (1983) dual-functionality of effort, Mulder (1986) draws a distinction
between two types of effort: compensatory effort and processing complexity. The basis for this
distinction is the observation that experimental manipulations of stressors, such as time constraint
or adverse environments, tend to result in different physiological responses compared to
manipulations of task demands. This concept was employed by Unema (1995), who found that one
measure of eye movement was responsive to manipulations of Sternberg memory task (Sternberg,
SupervisoryController
TaskGoals
EffortMonitor
Loop A
ActionMonitor
external load
Loop B
overtperformance
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 23/167
13
1969) difficulty (i.e., target set size), while another was responsive to both difficulty and a monetary
motivation. The duality of effort is interpreted by Robert & Hockey (1997) as set-points in their
model. They describe the effort set-point of loop A as being equivalent to the computational effort,
which is determined by task demands, and the loop B set-point as being driven by motivation due to
the perceived value of performance goals. It follows that the latter set-point would be more
susceptible to modification under stressful conditions.
2.3 Workload Measurement
This discussion will not consider those tools that are obviously impractical for use in the clinical
applications that have been previously outlined. Practical limitations include equipment
requirements (such as in the case of functional brain imaging of metabolic activity), intrusiveness
(concurrent, secondary task performance, e.g. random number generation while driving), and very
poor temporal resolution (body temperature or endocrine indices of urine, saliva, or blood plasma).
The literature from the 1970’s through the early 1990’s is very well summarized in the reviews of
Moray (1979, 1988), O’Donnell and Eggemeier (1986), Kramer (1991), Eggemeier et al. (1991),
Wilson and Eggemeier (1991), and de Waard (1996). In describing other, more current reviews,
Cain (2007) points out that the works of “rather few” researchers are repeatedly cited, indicating that
the growth of the field has because slowed. Nonetheless, work has continued to some extent, some of
which being described in Tsang & Vidulich’s recent chapter in the Handbook of Human Factors and
Ergonomics (2006). Although the following discussion will reflect the fact that the bulk of the
workload measurement literature is over a decade old and frequently surveyed, references to newer
research will elaborate on these results wherever possible.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 24/167
14
The various measures are organized into the three conventional categories: subjective measures,
performance-based measures, and physiological measures. Although saccadic eye movements are
considered a physiological measure, they are discussed in a separate sub-section, as their designation
as either a physiological or performance measure is debateable, depending their particular
application (de Waard, 1996).
2.3.1 Subjective Measures
Asking participants to rate their effort is the most direct way to measure the phenomenal experience
of workload, and therefore subjective ratings are said to have high “face validity” (Cain, 2007).
However, their validity depends on the needs of the researcher. Where workload is conceptualized as
processing resource allocation, a subjective measure is not useful because it generally does not
correlate with task performance (Gopher & Braune, 1984). Furthermore, it has been found that
subjective effort is primarily affected by the demands of processes that are consciously “well-
defined,” such as those involving the working memory (O’Donnell & Eggemeier, 1986; Tsang &
Vidulich, 2005). There is little more known about the relationship between perceived effort and
other underlying cognitive processes. Pashler (1998) approaches the question analytically,
considering possible functions for it: sensations of effort may be an evolutionary advantage in that it
conserves mental energy, or they may serve to warn of the impending failure of an overloaded brain.
However, he concludes that empirical evidence has not lent itself to one or the other. A case study of
Naccache et al. (2005) of a brain injured subject indicates that although subjective effort may
normally be linked to conscious control, it is a not prerequisite. Between a Stroop task that was
congruent (ink colour matched word) and incongruent, the participant did not perceive any change
in mental effort nor did they exhibit an increased skin conductance (arousal) response.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 25/167
15
However, there are scenarios in which subjective effort ratings are appropriate. In the
aforementioned rehabilitation treatment application, an indication of effort sensation would be
suitable as a predictor of frustration. Nonetheless, care must be taken in designing the rating system,
to ensure that it actually reflects participants’ experiences. Referring to the close correlation between
subjective ratings and task difficulty, Gopher & Donchin (1986) assert that subjective ratings of
effort are no more than thinly disguised task analyses. That is, they suggest that it is naive to assume
that participants cannot recognize experimental manipulations of task difficulty, so that subsequent
ratings are more indicative of participants’ observations of these manipulations rather than their
“feelings” of effort. According to O’Donnell & Eggemeier (1986), the solution is careful wording of
the subjective rating system to ensure that participants understand what they are being asked to
report. An interesting footnote in the previously discussed study of Naccache et al. (2005) suggests
that it is reasonable to expect people to differentiate between imposed demands and perceptions of
effort. Although the brain injured subject said that they did not feel any change in effort, they were
able to distinguish a change in the objective difficulty of the task.
There are a number of subjective workload rating systems with different instructions and
administration methods. They can all generally be classed as unidimensional or multidimensional
scales. For the unidimensional type, participants rate their workload on a single numeric or
geometric scale. For the multidimensional type, more than one scale is used to differentiate between
different aspects of workload (i.e., time load versus stress load), then the scales are combined into a
single measure.
Two variations of unidimensional systems are guided scales and visual analog scales. The Modified
Cooper-Harper and Bedford scales are guided, meaning that they use flowchart to assist the
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 26/167
16
participants in choosing a workload rating. From the samples published by Eggemeier & Wilson
(1991), it is clear that the utility of these particular scales is limited to the evaluation of engineered
systems. There are three unidimensional rating systems that have been validated against the more
widely researched multidimensional systems (discussed below): the 21-point Overall Workload Scale
(OWS, Vidulich & Tsang, 1987), the 15-point Rating Scale Mental Effort (RSME, de Waard, 1996),
and the open-ended magnitude estimation method of Gopher & Braune (1984). Whereas the OWS
involves a scale with the title “Overall Workload” and two scale labels at either end (“high” and
“low”), the RSME asks “how much effort” the task required and includes nine descriptive labels
(from “absolutely no effort” to “extreme effort”). Gopher & Braune’s (1984) scale uses a different
approach, giving participants a reference task to which a rating (number) is assigned. Participants
are then asked to assign a rating to subsequent tasks without any further limitations. There are also
examples of unidimensional scales that have been used without any documented attempt at
validation. Kennedy (2000) used a “mental demand” visual analog scale to successfully differentiate
between counting by 7’s and 3’s from a 3-digit number. Bergeur (2001) asked surgeons to use a 7-
point analog scale of “mental concentration and mental stress” to rate their mental effort during 2-
minute procedures.
The NASA-Task Load Index (NASA-TLX) and the Subjective Workload Assessment Technique
(SWAT) are the two most common multidimensional rating systems, and they are described in some
detail by Eggemeier & Wilson (1991). The NASA-TLX consists of six, 21-point scales (mental
demand, physical demand, temporal demand, performance, effort, and frustration) and a set of 15
paired comparisons to establish the relative contribution of each scale to the subject’s perceived
workload. The results of the paired comparison section are used to create a weighted average of all
the scales: the overall workload rating. The SWAT possesses only three, 3-point scales (time load,
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 27/167
17
mental effort load, and psychological stress load). For the purpose of establishing scale weights, 27
cards containing each possible selection on each scale are sorted into an order that reflects the
subject’s perception of increasing workload. Unlike the paired comparison in the NASA-TLX, the
SWAT sorting procedure takes place once: before the subject is asked to carry out the experimental
task.
The incorporation of conjoint scaling (card sorting task) is intended to impart interval properties to
the rating, but it limits the practical scale length and, therefore, theoretical sensitivity of the SWAT,
for which convergent validity has been confirmed in comparisons with the TLX (Rubio et al., 2004).
Reid & Nygren (1988) have shown that SWAT ratings can indeed possess interval properties (i.e.,
two different ratings can be compared not only in terms of which one is higher, but by how much).
However, Annett (2002) argues that only scales for which there have been norms established for the
intended subject populations and tasks should truly be considered to have interval properties.
Fortunately, he says that only ordinal properties, which are implicit in any subjective scale, are
required for the experimental comparison of conditions, as opposed to establishing general design
standards. As a side note, Cain (2007) recommends the use of non-parametric analysis methods for
subjective rating data due to their ordinal nature.
The theoretical advantage of multidimensional ratings systems over unidimensional systems is the
ability to discern specific aspects of workload (“diagnosticity”). However, because there is no
apparent advantage in terms of sensitivity (Tsang & Vidulich, 1987; Hill et al., 1992; Zijlstra, 1993), it
is recommended that unidimensional ratings are used wherever diagnosticity is not needed (Hendy,
1993). Furthermore, Rubio (2004) suggested that the diagnostic value of the NASA-TLX and SWAT
is questionable because these scales do not actually refer to the resource types in Wickens’ (1987)
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 28/167
18
multiple resource model. The obvious advantage of a unidimensional rating is its simplicity, which
can be reflected in the time to complete a rating. The SWAT is particularly problematic in this
regard, considering that the card sorting task can take as long as an hour to complete (Wierwille &
Eggemeier, 1993). Although the NASA-TLX has been reported to require only 60 seconds (on
average) for trained participants (Hill et al., 1992), its complexity has caused confusion in pilots
(Veltman, 1993). Furthermore, administration of the TLX to untrained (civilian) participants was
reported by Rubio et al. (2004) to take an average of 7.5 minutes, amounting to 60 minutes for the
eight different ratings.
A notable method of reducing the complexity of multidimensional rating systems is to discard the
scale weighting procedure (i.e., SWAT card sorting and TLX paired comparison set), instead
calculating overall workload as an equally weighted average. Following upon Hendy’s (1983)
successful demonstration of this method for TLX ratings, Goonetilleke & Luximon (2001) have
introduced a continuous SWAT (C-SWAT) system, which uses a visual analog scale in the place of
the original, discrete (three point) scale. In their evaluation of the C-SWAT, they report that highest
sensitivity was achieved using this continuous scale and equal scale weights, as opposed to using
weights assigned by paired comparison sets (as are used by the TLX) or the usual card-sorting
method.
2.3.2 Performance-Based Measures
In a previous section, Theory of Mental Workload , two behavioural manifestations of mental
workload were identified: task performance and the perception of effort. It has previously been
discussed that subjective effort ratings have been shown to diverge from performance measures due
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 29/167
19
to limitations in our ability to self-monitor. Performance similarly can not be considered an
uncomplicated measure of workload.
Indeed, the relationship between task demands and effort is complex and, therefore, not always
predictable, which leads Vidulich & Wickens (1986) to warn researchers against the careless
interpretation of performance and subjective effort ratings. To illustrate with an example: in the face
of increasing task demands participants’ performance scores generally decrease, but it is certainly
plausible in many scenarios that participants might work harder in order to maintain or even
increase task performance, such as was observed by Washburn & Putney (2001). Thus, when a
researcher observes an increase in task performance scores, additional measures are required to
clarify whether the subject is finding the task easier (i.e., learning of some type has occurred; anxiety
has decreased) or working harder.
Performance measures are also insensitive to workload changes in data-limited task scenarios, which
is often a characteristic of low demand tasks (Eggemeier et al., 1982). When task performance is
already at ceiling, any further exertion will obviously not affect performance. Furthermore, the
addition of stressors may not affect performance on a low demand task. Take, for example, a
memory span task wherein five digits are presented, and then they must be recited back after a short
pause. It is clearly impossible to recall more than five digits, no matter the effort expended. Also, if
the task was made more difficult through the additional stress of noise or uncomfortable heat, the
subjective effort rating would likely be affected, while performance in healthy participants most
likely would not. Therefore, while performance-based measures may have utility in more demanding
conditions, which is most often the case with dual task experiments, they can be insensitive to both
internally and externally driven changes in workload in low demand, single-task conditions. Notice
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 30/167
20
that this limitation is the primary impetus behind incorporating another measure of mental
workload into neuropsychological assessments.
There is also a possibility that different performance measures of a single task may diverge (i.e., show
speed-error trade-off). For example, most self-paced, discrete response tasks require participants to
strike some balance between accuracy and response latency, leaving them vulnerable to a speed-error
trade-off. Farmer & Brownson (2003) emphasize the importance of measuring both variables,
though an effort interpretation may not be possible in the event that they diverge. However, they
state that divergence is the exception rather than the rule, so measurement of the second variable is
usually only necessary as verification. It is also conceivable that participants may concentrate their
efforts on a performance variable that the experimenters are not measuring or even aware of.
Experimental design, including the choice of test instructions, appropriate control tasks and post-
experimental interview (to obtain a verbal report of subjects’ strategies) must be employed to avert
this problem.
As previously discussed, task performance is sometimes affected by wilful changes in effort for a
given task. Although an increase in performance may indicate a shift to a more efficient but also
effortful mental processing state (or strategy), it could also indicate task learning, which results in
higher efficiency but with constant or reduced effort. It is also plausible that participants may initiate
gross shifts in task execution strategy that would not generally be classified as “learning.” An
example might be the employment of sub-vocal rehearsal during the memory span task. Although
such gross shifts in strategy may be controlled through task training, detailed instructions, and
comprehensive interviewing, more subtle shifts in mental processing strategy may be undetectable.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 31/167
21
To this end, performance measures must be corroborated with subjective ratings and validated
physiological measures.
2.3.3 Physiological Measures
In the search for more sensitive, objective, and instantaneous measures of workload, researchers
have turned to physiological signals. Whereas early studies might have concentrated on a single
measure in their analysis of task workload, more contemporary studies tend to employ multiple
physiological measures as well as performance and subjective rating indices (Kramer, 1991). The
rationale is that different measures tend to be more or less sensitive in various contexts as well as to
types and intensities of workload. A recent trend in workload research is to combine measures into a
predictive model that can be adapted from empirical results using, for example, neural network
training (Van Orden et al., 2001; Noel et al., 2005) or multiple regression analysis (Myung & Ryu,
2005).
Although these approaches may yet achieve success in their practical aims, they may be premature
when the underlying bases of the individual measures require further investigation. In 1988, Moray
wrote that progress in physiological measurement has been hampered by undeveloped theoretical
foundations, and with little examination of these underlying mechanisms, it is arguably still true
today. In a similar vein, it has been observed that measurement research has moved to complex,
applied environments such as aviation and driving, but conflicting results indicate that there is a
great deal of more controlled, laboratory work to be done. Thus, research on physiological measures
can still be considered in its infancy, having demonstrated a number of interesting phenomena, but
demonstrating little by way of synthesizing them (Kramer, 1991).
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 32/167
22
2.3.3.1 Electroencephalographic Activity (EEG)
EEG-based measurements of workload have been attempted in two forms: frequency band power
and event related potentials (ERP). Both forms are subject to the same practical difficulties,
especially in clinical applications: low signal to noise (environmental as well as muscle movement
and cardiac artefacts), complex signal interpretation, and invasiveness. These difficulties may limit
the practicality of this measure for some clinical applications.
EEG spectral analysis is a more controversial workload measurement technique. Although variations
in alpha (8-13 Hz) and theta (4-7 Hz) band power have been correlated with task difficulty in some
studies, others have revealed serious complications arising from individual differences and the
confounding factor of overall arousal level (Kramer, 1991). Furthermore, O’Donnell and Eggemeier
discounted the utility of this measure on the grounds of its insensitivity in their widely cited review
(1986). However, more recent studies of relatively fine gradations in one-dimensional tracking task
difficulty (verified with subjective workload ratings) revealed a significant effect on a combined
alpha band and blink rate measure (Myung & Ryu, 2005) as well as on a multi-band composite
measure (Berka et al., 2007). However, field observations of pilot workload by Noel et al. (2005)
revealed no pattern in any frequency bands.
Researchers have identified a number of characteristic EEG signal peaks that occur in response to
subject’s active processing of a discrete event. Experimentally, ERPs are elicited using auditory or
visual stimuli that are either imbedded in the task or presented as a secondary task. One of these
ERPs in particular, the P300 (positive potential occurring between 250 and 500ms after stimuli
presentation), is the most commonly studied in the workload literature. Kramer (1991) cites a
number of studies in which P300 amplitude correlates to changes in both task demands and priority
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 33/167
23
shifts between the primary and secondary task in dual-task experiments. The effects of task difficulty
manipulations on the P300 suggest an added complexity to the measure. In reaction time
experiments, response selection difficulty affected only P300 amplitude while manipulations of
perceptual difficulty affected both latency and amplitude (Wickens & Hollands, 2000).
2.3.3.2 Surface Electromyography (EMG)
Tonic tension in task irrelevant muscles has been associated with various components of mental
workload. The theoretical basis for this observation being that muscle tension is a component of
activation, which is a concomitant of effort and response performance. In their review, Unema
(1995) cites studies in which changes in tonic forearm EMG amplitude are linked to monetary
motivation, task learning, subjective effort ratings, and reaction time performance. In the reaction
time experiments, muscle tension effects were observed just before the actual response was made.
However, the results of Unema’s own experiments, which were composed of Sternberg memory
tasks (Sternberg, 1969), did not consistently support these studies.
According to de Waard (1996), EMG studies of workload are more recently favouring facial sites
rather than forearm sites, specifically the “lateral frontalis muscle, the corrugator supercilii and
orbicularis oris inferior.” They suggest that the frontalis muscle is especially suited to mental
workload measurement, because the others have been found to respond to emotionally charged
stimuli. However, forearm EMG sites have continued to be used, such as in the study of Papadelis et
al. (2007), which reported a significant effect on forearm extensor muscle EMG activity from flight
simulation task difficulty.
As with EEG-based measures, the clinical use of EMG is limited by practical issues of signal to noise
and, to a lesser extent, intrusiveness. Furthermore, physiological and anatomical variability limit its
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 34/167
24
use to within-subjects designs. Contradictory results have led O’Donnell & Eggemeier (1986) to
question the simplicity of its interpretation, asserting that the EMG signal may indicate not only
sympathetic activity, but also somatic efforts to counteract poor motor performance as a result of
sub-optimal sympathetic activity.
2.3.3.3 Electrocardiogram (ECG)
Heart Rate (HR)
Perspectives on HR as a mental workload measure are mixed. Wilson (1991) is a cautious proponent
of HR, which was a U.S. government certified measure of aviation workload at the time of writing.
While studies of flight simulation did not report reliable effects, HR correlates with difficulty ratings
by telemarketers. He suggests that HR reflects a degree of psychological stress that only occurs where
there are perceived consequences. This view is corroborated in Wilson’s more recent (2002) study,
wherein HR is found to be more sensitive than HRV to manipulations of flight demands. Papadelis
et al. (2007) also report a correlation between HR and the perceived difference in the difficulty of
passive versus active learning of a simulated dual tracking/vigilance task. However, there were no
special consequences to poor performance in this task.
In contrast to Wilson, O’Donnell & Eggemeier (1986) mention HR only briefly, suggesting that its
global sensitivity limits its usefulness. de Waard (1996) echoes this concern, specifying speech,
emotion, time-on-task, and physical exercise as important confounds. He cites the study of
Wierwille et al. (1985), which suggests that heart rate variability (HRV) is a more sensitive measure
of workload than HR. There have been attempts to interpret the contradictory HR findings of
previous studies in terms of various workload types, but the complexity of this interpretation has led
most to look toward HRV instead (Kramer, 1991).
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 35/167
25
Heart Rate Variability
HRV data have been analyzed using the time domain or the frequency domain, but the latter
dominates the workload literature. The advantage of spectral analysis is that changes in the spectral
power density of various frequency power bands is thought to correspond to specific physiological
phenomena. Mulder (1985) suggested that spectral data should be categorized into low (0.02 - 0.06
Hz), medium (0.07 - 0.14 Hz), and high (0.15 – 0.50 Hz) bands, which each contain peaks caused by
HR oscillations due to temperature regulation, blood pressure regulation, and respiratory influences,
respectively. Jorna (1992) clarifies this explanation by describing the two primary causes of HRV: the
baroreflex and respiratory sinus arrhythmia (RSA). The baroreflex is a blood pressure control system
that can regulate sympathetic nervous system responses in peripheral resistance, venous tone,
ventricular contractility, and blood volume, as well as both sympathetic and parasympathetic (vagal)
responses in HR. RSA refers to the spontaneous speeding and slowing of the heartbeat that
accompanies each breathing cycle, a pattern that is predominantly caused by the down- and up-
regulation of parasympathetic activity at the sinoatrial node. Due the differential effect of the
sympathetic and parasympathetic systems on the various frequency components of HRV, Jorna
argues that low frequency power (< 0.10 Hz) is indicative of sympathetic activity, high frequency
(RSA) power of parasympathetic activity, and their ratio of sympathovagal balance. However, only
the parasympathetic/high frequency correlation is supported in the review of Berntson et al. (1997),
with the caveat that respiratory frequency must be not be abnormally low.
Most studies of HRV and workload follow in the footsteps of Mulder’s early work (1979) by focusing
on spectral power reductions in the middle frequency band, although there is some evidence for low
and high frequency effects as well (Kramer, 1991; Mulder, 1992). Jorna (1992) presents a critical
review of these various investigations and concludes that HRV is not sensitive to fine gradations in
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 36/167
26
task difficulty but responds consistently to stressors and correlates well with perceived effort ratings.
Task difficulty manipulations such as pacing, visual stimulus quality, number of response options,
stimulus-response compatibility, stimulus timing uncertainty, and tracking task complexity did not
significantly affect HRV or in some cases the effects could be explained by concomitant changes in
physical activity or breathing. However, significant decreases in HRV were observed after coarse
manipulations of difficulty such as task/rest comparisons or the introduction of a secondary task as
well as gross changes in working memory demands that exceeded participants’ capacities.
Furthermore, HRV has been shown to correlate with between-subject effort ratings on a tracking
task, and is sensitive to task stressors such as time-load, inexperience, supervision/observation
during tasks, public speaking, and driving complexity. The influence of stressors in realistic task
scenarios may help to explain the correlation between HRV and piloting/driving difficulty in the
field studies described by Wilson & Eggemeier (1991).
Turning toward more recent studies of mid-band HRV power, Paas et al. (1994) found that HRV
was insensitive to changes in the difficulty of a learning strategy, but responded to the presence of
the task (versus a resting state). Unema (1995) discovered that during the search phase of a
numerical Sternberg Task (Sternberg, 1969), HRV responded to working memory load (1 versus 4
digits) and the introduction of a monetary incentive. Hilburn (1997) reported a significant effect for
changes in air traffic control automation level and traffic volume. Veltman & Gaillard (1998) found
correlations between HRV, “large” changes in flight simulation task difficulty, and subjective effort
ratings. In a study of finer difficulty gradations in a dual task, Myung & Ryu (2005) found that HRV
was significantly correlated with tracking task difficulty but not with concurrent arithmetic problem
difficulty, although both reliably affected subjective effort ratings. Thus, HRV may correspond to
changes in objective task difficulty even while subjective effort ratings are unresponsive.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 37/167
27
Comparisons between studies are often complicated by the multitude of analysis options, but a
movement toward standardization has been underway through the 1990s (Cain, 2007). Notably, it is
common to see two types of spectral analysis methods: fast fourier transformation and
autoregressive modelling. Berntson et al. (1997) explain that because the latter technique is designed
to exclude noise, it generally produces cleaner (although possibly simplified) spectra. However,
similarities between the two methods lead to essentially equivalent results.
HRV is not a robust measure in the sense that the effect of a single false-positive or negative R-wave
potentially outstrip that caused by the manipulation of workload. Real-time, clinical HRV
measurements may be limited by the necessity of detecting and correcting such artefacts, a process
that is dealt with at length by Mulder (1992). Also, HRV is not an instantaneous measure, with
contemporary analysis techniques requiring a minimum window size of 30 – 40 seconds for middle
band power measurements (Mulder, 1992). Additionally, very lengthy windows are also problematic,
because the average heart rate should optimally remain constant.
Furthermore, it is generally recognized that HRV is affected by age, physical fitness level, body
position, muscle activity, and respiration patterns (Jorna, 1992). Respiration is especially important
because it is affected by speech, which is a part of many mental tasks. Breathing becomes more
erratic during speech, which shifts the spectral power due to RSA from the high frequency band into
lower frequencies. As a result, the effect of effort on middle frequency power may be concealed or
otherwise affected. Porges & Byrne (1992) argue that the effect on speech patterns is negligible where
the task involves short, command-like verbalizations (< 10 s), which would suggest that most
psychological tasks would not require any special consideration in this regard. However, a recent
study by Beda et al. (2007) reports that while middle band power decreases between silent serial
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 38/167
28
subtraction and the resting state (as predicted), the effect is reversed when the task is done aloud.
Due to the effects of speech as well as spontaneous changes in breathing patterns, it is recommended
that respiratory rate measurements always accompany those of HRV, in order to assess the
frequency distribution of RSA (Mulder, 1992). Mulder also proposes an improved measure of
baroreflex gain that is resistant to the effects of respiration: the “modulus,” which is the blood
pressure variability divided by the HRV. Veltman & Gaillard (1998) confirm the effectiveness of this
measure in their study of a flight simulator task that involves sub-audible vocalizations.
2.3.3.4 Electrodermal Activity (EDA)
In the workload literature, EDA generally refers to the measurement of the skin’s conductance
through the application of a small current, although there is also a much less common technique that
does not involve an external current source. Skin conductance is thought to indicate sympathetic
activity due to its influence on eccrine sweat gland secretions, although the possibility of
parasympathetic involvement has been raised (Unema, 1995). In a more general sense, EDA is linked
to the concept of arousal, supported by studies of stimulant/depressant injection, EEG desynchrony,
and habituation (Prokasy & Raskin, 1973). It follows that the theoretical justification for EDA in
workload measurement is that a more aroused state is associated with greater engagement in the task.
EDA measures are classified as phasic (“response”) or tonic (“level”), and phasic measures are
furthermore divided between specific and non-specific (“spontaneous”) responses. Phasic portions
of the EDA signal are temporary increases in conductance from the baseline (tonic) level of
conductance, which is an average taken across the task period. Specific responses are distinguished
from non-specific ones because they are identified as being caused by the presentation of some
experimental stimuli.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 39/167
29
Tonic EDA is limited as a measurement tool because it is highly susceptible to inter-subject
physiological differences and electrode interface conditions. However, there have been relevant
investigations of this measure. Malmo (1965) concluded that tonic EDA was not significantly
correlated with participants’ motivation level during a auditory tracking task, although finger sweat
blot readings were correlated. In Malmo’s experiment, participants were instructed to exert more
effort during “important” trials versus “practice” trials. On the other hand, Bergeur et al. (2001)
found that tonic EDA level and subjective effort covaried between two different surgical techniques.
The presentation of stimuli is generally too intrusive for most practical workload applications, so
specific EDA is also of limited utility. However, in laboratory studies of dual task conditions, larger
specific responses have coincided with secondary task performance decrements (Klinger, 1991). The
magnitude of participants’ responses has also been linked to their performance in correctly
perceiving (in a signal detection task) or memorizing (in a learning task) the stimuli, according to a
review by Andreassi (2000).
Non-specific responses are arguably the most practical EDA-related workload measure, and
empirical data suggests at least a weak link between workload and non-specific response rate and
amplitude (Klinger, 1991). This link is best supported by evidence of a correlation between response
rate and reaction time during vigilance tasks (Surwillo & Quilter, 1965; Andreassi, 2000). In
particular, Andreassi describes a between-subjects study in which reaction times were significantly
different between those participants that exhibited high non-specific response rate (“labiles”) and
those that had low rates (“stabiles”). After failing to find a link between motivational incentives and
response rate, Fowles (1988) argued that non-specific EDA accompanies only negative feedback,
which results in behavioural inhibition. Given the pervasive role of inhibition in any mental task and
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 40/167
30
arguably, in the function of working memory itself (Hasher, 2007), it is difficult to directly test this
position. More recent within-subjects studies report significant effects of various workload
manipulations on non-specific responses. Gendolla & Richter (2005) reported that non-specific
response rate was significantly higher when participants were told that a visual detection task was a
“Concentration and Achievement Test for Students” versus a “filler” activity. Subject motivation and
short-term memory load (i.e., target list length) were also correlated with the standard deviation of
the raw EDA signal in Unema’s study (1995), which employed a numerical Sternberg Task
(Sternberg, 1969). In this study, the effect was especially apparent in the target list memorization
stage of the task.
Non-specific responses are commonly differentiated from signal noise by their peak-to-peak
amplitude. Amplitude thresholds can vary between 0.002 and 0.05 microSiemens (Doctor et al. 1964;
Vossel and Zimmer, 1990; Kettunen & Ravaja, 2000; Storm et al., 2000), as they are dependent on the
precise placement of the electrodes as well as their preparation and type. It appears that the
threshold is best determined through visual inspection of the signal, this being the gold standard
method employed by Storm et al. (2000) in their evaluation of an automated response detection
algorithm. Aperiodicity is another condition that has been used to distinguish non-specific
responses (Doctor et al., 1964). This condition was enforced in Storm’s algorithm through a
minimum (one second) wave width, which was defined as the time between a response’s valley and
subsequent peak. This threshold effectively eliminated “responses” due to a sinusoidal signal
component, and would also be effective in disregarding movement artefact noise.
According to de Waard (1996), the most serious methodological issue with EDA measurement is its
“global sensitivity.” Not only is eccrine gland activity influenced by energetical responses to
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 41/167
31
emotional and workload-related factors, it is also subject to temperature, humidity, age, sex, time of
day, season, and respiration irregularities.
2.3.3.5 Respiration
de Waard (1996) and Wilson & Eggemeier (1991) cite laboratory studies in which respiratory rate is
negatively correlated to both task difficulty and compensatory effort level. Much of this research is in
applied studies of aviation workload, although there are also more recent examples of unsuccessful
attempts in this area (Noel et al., 2005; Papadelis et al., 2007). Wientjes (1992) has argued that such
equivocal results necessitate the measurement of both respiratory rate and tidal volume, as their
work reveals a more complex pulmonary response involving both variables. While the introduction
of their experimental task led to the expected increase in respiratory rate and decrease in tidal
volume, subsequent manipulations of motivation through performance feedback led to an increase
in tidal volume with no change in respiratory rate. Wientjes (1992) presents a model in which
pulmonary control is affected by the metabolic demands of cognitive work; that is, as more effort is
expended, more oxygen is required. Although there is some evidence for the effect of cognitive work
on oxygen consumption (Backs & Seljos, 1994), the predominant pattern of fast, shallow breathing
indicates a strong influence from arousal and emotive stress responses (Roscoe, 1992).
In addition to being potentially difficult to interpret, respiratory measures are also subject to
significant confounding effects from physical activity and speech. As such, Cain (2007) advises that
respiration cannot be used alone to indicate workload. de Waard (1996) also expresses concerns with
the methodological difficulties in measuring tidal volume, which either involves a relatively invasive
flow meter or an indirect method (such as plethysmographic bands on the chest and abdomen) that
is less accurate and may require frequent calibration.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 42/167
32
2.3.3.6 Eye Blinks
Blinks are classified as endogenous when they are not caused by any apparent external stimulus and
as reflexive when they occur in response to sudden, task irrelevant stimuli (Stern et al., 1984).
Examples of experimental stimuli are flashes of light and tapping on the forehead. The occurrence,
duration, and (reflexive blink) stimulus latency of a blink can be measured using photographic,
video scanning, infrared corneal reflection, electroculography, and electromyography techniques.
Neumann & Lipp (2002) conducted experiments showing that the extent of subjects’ engagement in
a task may be indicated in the magnitude and latency of reflexive blinks. Although Kramer (1991)
recommends blink latency in his review of workload measurement techniques, the majority of the
literature concentrates on endogenous blinking, as its measurement is suited to a wider variety of
applications.
According to a review of the cognition literature by Stern et al. (1984), endogenous blink rate is
responsive not only to visual task workload, being inhibited until task-relevant information has been
processed, but also to non-visual workload, with the “...magnitude of blink inhibition being
proportional to attentional demands” (Stern et al., 1984, p. 26) The latter relationship is supported
by the findings of Holland & Tarlow (1972), in which blink rate is negatively correlated to the
number of digits in a memory span task and the difficulty of a paced arithmetic task. Further
supporting evidence was reported by Bagley & Manelis (1979), who used an arithmetic task of
variable difficulty. Finally, Ohira (1996) showed that the relationship extends to lexical workload by
using a word-naming task with variable target word difficulty. Stern et al. (1984) caution that
vocalization during the task can actually cause the reverse effect, as with the increase in blink rate
that occurs when participants are asked to carry out arithmetic problems aloud. A study by Holland
& Tarlow (1975) seems to indicate that vocalization may not be the only confounding factor in the
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 43/167
33
relationship between workload and blink rate. Their research suggests that endogenous blinks are
related to shifts in thought, termed “cognitive change.” This conclusion has been based on the
observations that blinks tend to punctuate individual solutions during a serial arithmetic task and
sentences during verbal conversation, while being subject to inhibition during mental imagery.
Blink rate has been used as an applied workload measurement tool with mixed success. At the time
of Kramer’s (1991) and Wilson & Eggemeier’s writing (1991), there were a number of conflicting
blink rate studies involving both visual and non-visual task demand manipulations. Whereas these
results led Kramer to suggest that blink rate is not yet ready for application, Wilson & Eggemeier
explained that they followed a pattern in which they were contaminated by qualitative changes in
visual information demands. An example was a study in which blink rate was shown to increase
between ground and flight segments, though workload had clearly increased. It was explained that
the “richness and variety” of task-relevant visual information in the flight segment simply required a
greater number of fixations from the pilots. They claimed that these problems could be overcome
through better methodological controls.
The prospect of restrictive controls led O’Donnell & Eggemeier (1986) to recommend blink duration
over blink rate. Kramer (1991) shared this preference, citing studies in which average blink duration
was negatively correlated with workload in studies involving simulated versus actual flight, co-pilots
taking over command, and single versus multitasking. However, with this method as with all eye
blink parameters, he warned that there is a significant fatigue effect wherein increasing time on task
leads to a greater number of blinks.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 44/167
34
More recent workload studies have continued to concentrate on blink frequency rather than
duration, possibly because it requires less temporal resolution and robustness of measurement. A
significant negative correlation between blink rate and subjective workload was observed in a study
of actual flight manoeuvres (Noel et al., 2005), although the authors note that this finding was
inconsistent between different pilots and even test days. In studies of flight simulator tasks, Veltman
& Guillard (1998) as well as Papadelis (2007) reported a more consistent relationship between blink
interval (time between blinks) and simulated flight difficulty. However, in the case of Veltman &
Guillard, who also introduced a concurrent working memory (WM) task, increasing working
memory load actually led to an increase in the number blinks. They concluded that facial muscle
activity involved with a sub-vocal rehearsal strategy for the WM task may have encouraged blinking.
Bergeur et al. (2001) found that eye blink rate followed subjective ratings in surgeons between open
versus arthroscopic techniques. Myung & Ryu (2005) observed that manipulations of target speed
during a tracking task led to a significant effect on blink interval, but only an insignificant effect was
observed between single and double digit multiplication.
In the few, more recent studies of blink duration, the results have paralleled those of concurrent
blink rate measures. In the aforementioned flight simulation experiments of Veltman & Guillard
(1998) blink duration decreased from easy to hard flight sections, but did not appear to be affected
by working memory load. Similarly, Papadelis et al. (2007) reported significant negative correlation
between duration and simulated flight task difficulty.
2.3.3.7 Pupil Size
Pupil size is affected by two muscles groups: the dilator group is innervated by the sympathetic
nervous system and the constrictor group by the parasympathetic system. Empirical observations of
an apparent association between pupil diameter and arousal were pivotal to Kahneman’s theories on
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 45/167
35
attention and effort (1973). According to Kramer’s (1991) review, a positive correlation between
pupil diameter and workload has subsequently been reported for a variety of tasks with
manipulations of cognitive, perceptual, and response-related demands. He echoes Kahneman’s
assertion that these results are likely caused by cortical influence on the reticular core.
In more recent studies, pupil diameter has been generally successful in predicting workload.
Washburn & Putney (2001) reported a correlation between pupil diameter and visual recognition
task difficulty (stimulus presentation time) as well as task performance (response accuracy and
latency) on individual trials. Recarte & Nunes (2003) found that pupil diameter correlated well with
subjective effort ratings in an experiment involving only driving or in conjunction with an auditory
perception, verbal production, or mental arithmetic task. However, they also reported an unexpected
divergence of pupil and subjective data with the introduction of a long-term memory recall task as
well as an apparent insensitivity of the measure during some dual-task conditions. The latter
exception could be explained by the tendency of the pupillary response to plateau or even reverse at
very high workload levels (Cain, 2007). Although most workload research aggregates pupil diameter
measurements over the course of a task or condition, Klingner et al. (2008) adopted a technique that
is more common in cognitive research. They reported the change in diameter upon presentation of a
task-related stimulus, which was a multiplicand in their case. The magnitude of pupil diameter
change was observed to be correlated with the difficulty of the multiplication problem.
It appears that the pupil diameter is primarily restricted as a workload measure due to
methodological issues. The effects of illumination, reflexive responses to vergence (between near and
long distance fixation), and emotion are known to be larger than those typically caused by cognitive
factors (O’Donnell & Eggemeier, 1986; Kramer, 1991). Furthermore, measurements of the pupil on
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 46/167
36
the order of 0.1 mm are required, although advances in remote, infrared corneal reflection systems
now allow such precise measurements on participants without the use of a chin rest or head gear
(Klingner et al., 2008).
2.3.4 Saccadic Eye Movements
2.3.4.1 Definition of Saccades
Carpenter (1988) divided eye movements into two functional categories: catching (fast) and holding
(slow) movements. Catching movements include saccades and the quick phase of nystagmus.
Holding movements include vergence as well as vestibular and smooth pursuit movements, which
both include the slow phase of nystagmus. Carpenter also describes three types of micro-movements
(i.e., median amplitude less than 5 minutes of arc), which are involuntary and have not been shown
to serve any functional purpose: tremor, microsaccades, and drift.
Saccades are commonly differentiated from other movements by their higher peak velocities. Yarbus
(1967), who has been cited as an important source for empirical saccade data (Duchowski, 2007),
reports that the duration, peak velocity, and acceleration of a saccade are a function of its amplitude:
for 5° to 20° saccades, these values typically range from 40 to 70 ms, 200 to 450 °/s, and 15,000 to
20,000 °/s2, respectively. The relationship between these various characteristics of saccades can be
described by a series of mathematical functions called the “main sequence” (Bahill et al., 1975;
Carpenter, 1991). Yarbus (1967) does not discuss saccades beyond 20° because they naturally occur
as composites of smaller saccades and head movements, but larger saccades have been observed in
laboratory experiments. However, Bahill et al. (1975) detailed the (typical) results from a single
subject, whose 50° saccades reached 900 °/s and lasted 100 ms. Importantly, very small saccades may
have peak velocities that are less than those observed during “slow” holding movements. Bahill et al.
(1975) reported that the peak velocity of a 0.5° saccade was approximately 45 °/s in their test subject,
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 47/167
37
while Hood (1975) demonstrated that the slow phase of optokinetic nystagmus can surpass 50 °/s.
However, such high speed holding movements do not occur in the absence of appropriately fast
moving stimuli, so the velocity of saccadic movements can generally be considered greater than
other types of movements.
Saccades can generally be considered ballistic movements (Carpenter, 1988), meaning that a second
saccadic movement cannot be instigated until the first has already been completed as it was
originally programmed. Thus, saccades follow stereotyped trajectories, so that their occurrence,
onset, and offset can be defined by velocity thresholds. A survey of the literature reveals that for
studies including saccades smaller that 2°, a detection (peak velocity) threshold in the
neighbourhood of 30 °/s is most common. In the summary of this survey (Table 1), notice that
acceleration and jerk thresholds may also be used.
Table 1. Saccade Detection Thresholds in Previous Literature
CitationVelocity Threshold
(°/s) Other Criteria
Viire et al., 1987 75 (onset and offset)
Oohira et al., 1991 20/30 (onset/offset)
Ignace et al., 1997100 (peak)
25 (onset and offset)2.1° maximum amplitude
(to omit corrective saccades)
Fischer et al., 199730 (peak)
20 (onset and offset)
Hooge & Erkelens, 1998100 (peak)
25 (onset and offset)2.1° maximum amplitude
Wyatt, 1998 2 x 105°/s3 minimum jerk (onset)
Walker et al., 200030 (peak)
20 (onset and offset)
Greene & Rayner, 2001 35 (peak) 9500°/s2
minimum acceleration (peak)
Harbluk & Noy, 2002 30 (peak) zero velocity crossing (onset and offset)
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 48/167
38
2.3.4.2 Definition of Intersaccadic Interval (ISI)
The ISI is defined as the time lapse between the end of one saccade and the start of the next. A very
similar measure is the fixation interval or duration, which is the length of time that an external target
is foveated. In the absence of vestibular and smooth pursuit movements, these two intervals are
identical. Although the term “fixation” implies that our visual attention is aligned with our gaze, it
has been shown that this need not be the case (Posner, 1980; Sigman & Coles 1980). Nonetheless,
fixation duration is a more common term in studies of visual information processing, which must
often assume some link between gaze and attention. In a practical sense, a fixation is generally
defined using a spatial threshold, but the ISI is defined by the occurrence of consecutive saccades,
which are detected using dynamic property thresholds.
It has been mentioned previously that saccades are generally considered ballistic movements, which
consequently implies some minimum ISI that is a refraction period. Although it has been shown that
this assumption can not be made under all circumstances, there is most often the practical limitation
of saccade programming time. For example, corrective saccades, which are a response to
over/undershooting a target, occur after the primary saccade has been completed, with a latency of
about 130 ms (Becker, 1969). Even in response time tasks where the temporal and spatial uncertainty
of stimulus presentation is removed, saccadic latency is at least 150-175 ms (Rayner, 1998). Leigh &
Zee (1999) claim that approximately 70 ms is required for visual information to be absorbed and
begin to affect the eye movement centers in the brainstem. The existence of express saccades, which
are low latency saccades (> 80 ms) that occur when the primary fixation point disappears shortly
before presentation of a target stimulus (Fischer et al., 1997), suggest that some additional time may
be required to disengage the current point of fixation. Of course, the theory of an obligatory
refractory period caused by saccade programming and fixation disengagement implies that these
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 49/167
39
operations can not occur in advance of or in parallel with saccadic movements. On the contrary,
Leigh & Zee (1999) argue that the apparent inability for eye movements to be modified during the
execution of a saccade is really only a consequence of their short duration. In other words, if there
were enough time to program a second saccade signal during the execution of the first, there would
appear to be no refraction period. They support this view with evidence of seamless changes in the
trajectory of “slow” saccades, which occur due to certain neurological disorders, as well as in healthy
participants when a saccade is executed toward a target that unexpectedly moves on a two
dimensional plane. Naturally, this movement must occur just after the first saccade movement has
been programmed, but early enough that the second saccade signal can influence the movement
before it has been completed.
Citing the infrequency of these conditions in everyday life and our apparent inability to process
visual information during very short intersaccadic intervals, some researchers exclude fixation
durations, using a threshold of 70 to 100 ms (Unema, 1995; Falkmer & Gregersen 1999). In their
latter argument, they are referring to saccadic suppression, which is a partial visual impairment
accompanying the execution of a saccade. This impairment is thought to be almost complete during
a period lasting from 20 ms before the start of a saccade to 50 ms after the completion of a saccade
(Stark et al., 1976). Due to saccadic suppression, it could be argued that a fixation shorter than 70 ms
cannot possibly contribute to visual information processing, and therefore should be disregarded.
Naturally, this argument does apply to studies that are not concerned with eye movements related to
visual information encoding. Their former argument, on the infrequency of very short fixations, is
contentious. Whereas Cohen (1977) called the number of fixations less than 100 ms long negligible,
Velichovsky et al. (2000) report that 7% of their participants’ fixation durations were in the
neighbourhood of 60 ms. Furthermore, they found that the frequency of these short durations was
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 50/167
40
especially sensitive to changes in driving simulator task conditions. Clearly, the use of a minimum
fixation duration or intersaccadic interval threshold is not appropriate without some justification
regarding information uptake.
2.3.4.3 Eye Movements and Mental Workload
Rayner’s (1998) review presents plenty of evidence for a correlation between fixation duration length
and visual task (e.g., reading) difficulty in the laboratory. More recently, a similar effect has been
sought in research on applied workload measurement, leading Zhang et al. (2004) to comment that
fixation duration is one of the most commonly studied estimators of driver workload. However, the
results of applied workload studies have been inconsistent and difficult to interpret because they
commonly involve complex visual and non-visual task components. In order to understand these
results better, research concerning the effect of non-visual tasks on eye movements will be reviewed.
Fixation durations are correlated with the difficulty of a variety of tasks involving visual information.
In matching comparison characters with target characters, the legibility (Unema, 1995) and number
(Gould, 1973; Unema, 1995) of the comparison characters was found to positively correlate with
average fixation duration. In a visual search of dot clusters where participants were asked to find a
specified cluster size, fixation durations of individual clusters positively correlated with the number
of dots they contained (Findlay & Kapoula, 1992). Furthermore, Moffitt’s (1980) review cited many
other studies in which fixation duration is linked to visual search difficulty. In studies of reading
research, fixations of text are longer when they are more illegible as well as more difficult, whether
lexically or, in the case of math word problems, computationally (Rayner, 1998).
These results may seem trivial because it is logical to assume that we fixate a target for as long as is
necessary to process it. However, one caveat is that our ability to extend fixations “online” seems
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 51/167
41
limited. Using the example of reading, while it is true that more difficult text is fixated for longer
periods, it is also refixated more often (Rayner, 1998). The occurrence of refixations indicates that
fixation durations are, at least partially, resistant to online modification in response to processing
demands. Refixations are also present during visual search tasks, leading Hooge et al. (1998) to
suggest a model wherein duration length is pre-programmed based on information from the visual
periphery. It has even been argued that all saccade production is a rhythmic phenomenon that is
irrelevant of “any internal or external stimuli” (Filin, 2002, p. 181). The practical implication of this
issue is that the number of refixations may be just as important as fixation duration length statistics
in predicting visual workload. These two measures can be taken separately, as was recommended by
Blanchard (1985) or in a combined measure such as dwell time, which is the cumulative time that
any given target is fixated. Through their studies on mental rotation of graphical figures, Carpenter
& Just (1978) concluded that this measure is more strongly correlated to the difficulty of the rotation
task than average fixation duration alone. In the mental workload literature, there have been studies
incorporating dwell time or a similar measure (Tole et al., 1982; Harbluk & Noy, 2002; Matessa &
Remington, 2005).
The relatively consistent correlation between fixation duration and the difficulty of (laboratory)
visual tasks has been successfully applied to workload measurement, as most common applications,
such as aviation and driving, involve high visual loads. For example, a very early study concerning
fixations of flight gauges found that durations were longer where the gauge was more difficult to
read or interpret (Fitts et al., 1950). However, more complex and varied visual stimuli have led to an
array of results and interpretations. One major consideration is whether cognitive/perceptual load is
manipulated at the level of individual stimuli, as with the visual tasks previously discussed, or rather
through increasing the number of visual stimuli to be processed. Where drivers are subjected to the
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 52/167
42
latter manipulation such as in busy versus quiet intersections (Rahimi et al., 1990) and rural versus
urban areas (Chapman & Underwood, 1998), average fixation durations have decreased rather than
increased. Conversely, a net zero effect may result in a task in which there is a significant cognitive
load involved with integrating information from various stimuli. Brookings et al. (1996) reported no
significant response for saccade rate to both traffic density and complexity of air traffic control
simulations. Similarly, Van Orden et al. (2001) found no significant effect for target number on
fixation frequency in an air warfare simulation, but there was a notable increase in the number of
long fixations (> 500 ms). Unfortunately, this measure was not reported by Brookings et al. Moray’s
(1986) inconsistent findings of fixation durations under varying flight phase difficulty suggest that
the effects of complex visual/cognitive workload manipulations may also be highly dependent on
individual study participants’ coping strategies.
Dual-task studies are a method of conducting a more closely controlled investigation of eye
movement responses to cognitive load during applied tasks. Tole et al. (1982) presented pilots with
recordings of number pairs and asked them to indicate whether they were in ascending or
descending order. The addition of this secondary task, which required auditory-verbal working
memory and auditory attention, led to a rightward shift in the histograms of flight instrument
fixation durations, indicating that increasing workload leads to longer fixations. Callan (1998)
similarly reported that a secondary computational task led to increases in average fixation duration
and the number of long fixations (> 500 ms) in pilots. Indicating a related pattern, saccade frequency
was found to decrease in drivers asked to carry out a secondary arithmetic task, decreasing further
for double digit versus single digit addition (Harbluk & Noy, 2002). However, it should be noted that
no significant effect was found in drivers asked to do paced addition in a study by Tsai et al. (2007).
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 53/167
43
Secondary tasks other than auditory-verbal working memory / auditory attention tasks involving
mental arithmetic seem to affect eye movements differently. Recarte & Nunes (2000) studied the
effects of driving with a concurrent verbal fluency task or mental imagery task. They found that
while the imagery task (requiring visuospatial working memory) caused an increase in average
fixation durations, the verbal fluency task did not. They also observed that the former effect was due
to the occurrence of a few “very long” fixations, while the majority of fixations did not seem to be
affected by the presence of the task.
If the secondary task simply caused an interruption of the habitual scanning involved with the
primary task, the introduction of any secondary task would consistently lead to longer fixation
durations. However, since the resultant effects appear to be dependent on secondary task type, it
begs the question of whether and how different non-visual tasks affect eye movements on their own.
2.3.4.4 Eye-Movements and Non-Visual Tasks
In general, it can be said that a non-visual task tends to increase the rate of eye movements
compared to a baseline, resting state. This has been shown to be true for self-paced multiplication,
whether eyes closed (Lorens & Darrow, 1962) or even hypnotized (Amadeo & Shagrass, 1963);
imagining a scene (Amadeo & Shagrass, 1963); creating mental anagrams (Andreassi, 1973); naming
words beginning with a letter (Ruth & Giambra, 1974); and in auditory vigilance (Amadeo &
Shagrass, 1963; Amadeo & Gomez, 1966) and tone detection (Antrobus, 1973) tasks. Experimental
control of the baseline state is an obvious issue for these studies. Some researchers instructed
participants to “relax as though going to sleep” (Andreassi, 1973), “keep your mind blank” (Lorens &
Darrow, 1962), or simply “relax” (Amadeo & Shagrass, 1963), while others used the pre-task period
as a baseline, with no special instructions (Antrobus, 1973; Ruth & Giambra 1974). It is also
important to note that these authors did not report on saccadic eye movements as they are currently
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 54/167
44
defined, but on more general, sometimes undefined (Andreassi, 1973), “eye movements.” Most
defined these movements in terms of electrooculogram thresholds, whether in microvolts (Lorens
and Darrow, 1962) or mm of trace deflection (Amadeo & Shagrass, 1963; Ruth & Giambra 1964;
Amadeo & Gomez 1966). Antrobus used “ocular quiescence” intervals, in which no eye movement
was greater than 3°. Although these various definitions may seem arbitrary, they likely speak to the
technical limitations of time, such that saccade detection came down to visual inspection of output
traces.
In subsequent experiments, researchers have avoided using a baseline, resting state by comparing eye
movements between different task conditions. Bergstrom & Hiscock (1988) found that more
movements (detected manually from video footage) were made by participants that answered
verbally administered questions a) of a lexical rather than a visuospatial nature, b) that involved a
higher degree of mental imagery, and c) that were less constrained. Their use of “constraint” is
referring to the extent of the environmental support required to answer the question, in other words,
a task with recall demands versus a task with only perceptual and recognition demands. An example
of their high imagery, unconstrained questions is “Name three printed capital letters that contain
four straight lines.” A moderate imagery, constrained example is “How many vowels are present in
the word: ‘directly’?” Using a similar methodology, Weiner & Ehrlichman (1976) and Ehrlichman &
Barrett (1983) each found that visuospatial questions elicited fewer eye movements than lexical or
otherwise non-visuospatial questions, as determined through visual inspection of video recordings
and electrooculograms, respectively.
It is difficult to interpret these results in terms of mental workload; although it could be argued that
a recall task is generally more demanding than a perception and recognition task, they involve very
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 55/167
45
different processes, which may conflate the effect of workload. However, this line of research has also
given rise to more applicable generalizations about eye movements and cognition. Some researchers
discussed the possibility of a positive correlation between eye movement rates and general attention
level (Lorens & Darrow, 1962; Amadeo & Shagrass, 1963), due to a concomitant increase in arousal.
More recent research tends to favour alternate theories: “interference avoidance” (Antrobus, 1973;
Weiner & Ehrlichman, 1976), in which eye movements are suspended to avoid interference between
visual information processing and the non-visual task.
Although the justification for each of these theories, which will be discussed later on, is compelling,
there is a scarcity of empirical data from studies in which effort is manipulated while the cognitive
process/strategy itself is held as constant as possible. This type of experimental control is important
to workload research, which must differentiate between the effects of effort and those of mental
operations themselves. Thus, inconsistent cognitive activity obviously complicates interpretation of
task versus baseline rest state studies from the perspective of mental workload, but also that of
experiments where disparate tasks are compared. An example of the latter is that of Klinger et al.
(1973), who observed more frequent eye movements during tasks designated as “high
concentration” (Wechsler Adult Intelligence Scale [WAIS; Wechsler, 1955] arithmetic problems,
word generation based on starting letter, and creation of mental anagrams) versus “low
concentration” tasks (paced and self-paced counting by 2’s). However, there have been studies in
which task difficulty was more carefully manipulated. Singer et al. (1971) reported that the frequency
of quick optokinetic nystagmus (OKN) movements (while viewing a rotating drum) was positively
correlated to arithmetic task difficulty. Participants were asked to carry out three types of
transformations on a given number, N : N +1, +1, +1... (low difficulty); N +1, +2, +3... (medium
difficulty); N +1, -1, +2, -1... (high difficulty). Although saccadic and quick OKN movements were
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 56/167
46
referred to as separate phenomena, they are believed to share the same neurological pathways (Fuchs
et al., 1985), to the point that optokinetic stimulation has been proposed as a clinical means of
saccade production (Garbutt et al., 2001). Nevertheless, their disparate behavioural functions should
be noted. In another study of task difficulty, Antrobus (1973) described an experiment in which eye
movements became less frequent as working memory load was increased in an auditory (tone)
perception task. Working memory load was manipulated in terms of the length of tone patterns that
participants were asked to recognize amongst random tones. Rather than manipulate task difficulty,
Ruth & Giambra (1974) attempted to manipulate participants’ concentration levels with a constant
task condition, which could be considered novel even in current workload literature. After randomly
presented letter prompts, participants were asked to name words that started with that letter. The
high and low concentration conditions were affected through the task instructions, which either
“emphasized alertness and maximum output” or did not. The study found that eye movement rates
were higher during the high concentration condition.
A positive correlation between eye movement rate and attention may be due to an associated
increase in arousal. The basis for a link between eye movements and arousal lies in the neurology of
saccadic programming. As described by Leigh & Zee (1999), the programming process involves
many different areas in the brain and possibly different pathways, depending on the type of saccade.
For example, whereas the parietal eye field has been implicated in triggering reflexive saccades,
which occur in response to unexpected stimuli, voluntary saccades are thought to be more heavily
influenced by activity in the frontal eye fields Leigh & Kennard (2004). However, all of these
pathways end in the coordination of burst neurons, which discharge at a very high rate for a short
period of time, omnipause neurons, which inhibit burst neuron activity during fixations, and neural
integrator cells, whose tonic output is equivalent to the integral of the burst neuron output. The final
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 57/167
47
saccadic signal that travels through the ocular motorneurons is a combination of the burst signal,
which moves the eye, and the neural integrator signal, which subsequently holds the eye in place
against elastic centering forces from supporting tissues. The location of these burst and omnipause
neurons within the diffuse reticular formation suggests an influence of arousal on saccadic
production. Unema (1995) argued that the relationship between saccade rate and arousal is also
supported behaviourally, as heightened arousal is marked by a greater sensitivity to sensory stimuli,
leading to more frequent saccade production. Early theories of attention and eye movements often
relied on a link between arousal/activation and saccadic rate (Antrobus et al., 1964; Singer et al.,
1971; Andreassi, 1973), but this association was poorly substantiated. Although a positive correlation
between eye movement rate and alertness arguably exists in the extreme case of wakefulness versus
hypnosis (Amadeo & Shagrass, 1963), no significant correlation was found for more subtle (but also
more ecologically valid) manipulations in arousal: from resting to task conditions (Lorrens &
Darrow, 1962; Singer & Antrobus, 1965) and in response to emotionally affective images (Hinton,
1982). However, in more recent years, manipulations of arousal through time on task have been
more successful, correlating saccade rate and mean fixation duration with fatigue (via performance
decrements and/or subjective ratings) during extended air traffic control simulation tasks (Stern et
al., 1994; McGregor & Morris, 1996; Stern et al., 1996), tracking tasks (Van Orden et al., 2000), and
flight simulations (Morris & Miller, 1996). There are also divergent results from studies of similar
phenomena; for example, Mousseau (2004) found no correlation between fixation duration and
fatigue in hockey players. In their 2000 review, Sirevaag & Stern conclude that the effects are best
observed in saccades whose performance is task-irrelevant. An example would be a return saccade,
which follows the presentation and retraction of an experimental stimulus.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 58/167
48
It could also be said that saccades are inhibited during cognitive processes; this link would suggest a
negative correlation between eye movement frequency and workload. A common indication of this
phenomenon is that fixations often last longer than would be required to simply encode visual
information without further processing it. For example, it is believed that readers’ perception of text
is finished in the first 50-70 ms or so of a fixation, but average fixation durations are on the order of
300 ms (Rayner, 1998). Although there is an obvious reason for eye movements to keep pace with
cognitive processing in the case of reading and other visual tasks, why should this theory apply to
non-visual tasks? Certainly, the habitual inhibition of a visual scanning reflex would be an
evolutionary disadvantage in the face of natural dangers. On the other hand, it might enhance our
powers of mental concentration by discouraging interference between task-relevant cognitive
activity and shifts in visual attention, which necessarily accompany saccades according to most
research on the matter (Sigman & Coles, 1980; Shepherd et al., 1986; Findlay & Gilchrist 1998; Van
der Stigchel & Theeuwes, 2005; but see Stelmach & Herdman, 1997 for alternate view). In the non-
visual task literature, two explanations have been given for the apparent dependence of the
interference effect on task type. Weiner & Ehrlichman (1976) suggested that the strength of the
interference depends on the similarity of the task to the process of visual encoding/perception. Thus,
it was justified that the effect of a verbal task differs from that of a spatial task. Antrobus (1973)
theorized that because eye movements appear to be suspended during execution phases of mental
tasks, the rate of eye movements should be correlated to that of “cognitive change,” which would be
lower during an auditory perception task than during an arithmetic task. However, this explanation
is incompatible with the results of Singer et al. (1971), wherein more difficult (and therefore less
frequent) arithmetic operations led to more eye movements. In response, Antrobus (1973) suggested
the possibility of a variable strength effect, similar to that proposed by Weiner & Ehrlichman (1976),
as well as the possibility of a competing arousal effect. The combined effect of arousal and cognitive
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 59/167
49
interference on eye movement rate is conceptualized by Unema’s (1995) model, which divides a
fixation into two (possibly overlapping) stages: “cognitive elaboration,” in which eye movements are
inhibited due to some cognitive process(es) and “saccadic latency,” in which an eye movement is
allowed, but a target has yet to be chosen. The effect of increasing arousal is to lower the target
criterion threshold, decreasing the length of the saccadic latency stage. Unema’s two-stage model is
an extension of that presented by Fischer & Breitmeyer’s (1987), who concluded that saccades are
inhibited (rather than simply absent) during the engagement of visual attention.
In summary, it has been proposed that non-visual task eye movements may be influenced by mental
workload, but also by factors related to specific mental processes such as memory search constraint
or involvement of mental imagery. In order to evaluate the effect of workload on eye movements, it
is necessary to parse out the effect of process-specific factors by manipulating effort at different
difficulty levels of the same mental process. It is also important to bear in mind that the effect of
workload may be complex, due to the possibility of competing phenomena linked to arousal and
“cognitive interference.”
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 60/167
50
Chapter 3. Objectives and Hypotheses
One purpose of the previous discussion was to establish that rehabilitation treatments and
neuropsychological assessments stand to benefit from a clinical measure of mental effort.
Furthermore, the practicality of eye movement-based measures and their prevalence in the applied
research literature seem to recommend them to this purpose. However, evidence from some applied
studies as well as early investigations of eye movements during non-visual tasks suggests that the
relationship between intersaccadic interval lengths and workload may not be as robust for primarily
non-visual tasks as for visual ones.
It follows that the objective of the current study is to investigate the intersaccadic interval length
response to mental workload during non-visual tasks. More specifically, the tasks and experimental
conditions were chosen to address the practical question of whether eye movements could be used to
measure workload during non-visual neuropsychological assessment and cognitive rehabilitation
tasks. This focus serves two purposes: 1) to contribute to a gap in the literature regarding the
potential for a general relationship between mental processes and eye movements during non-visual
tasks, and 2) to model the experiment as closely as possible after the intended applications of this
research. The latter is especially important in workload research because a universal measure of
workload is generally considered unfeasible.
This study will measure the occurrence of saccades in participants while they carry out the following
three experimental tasks at two different levels of difficulty: serial subtraction, verbal fluency, and
words in noise recognition. As there is no gold standard for workload measurement, a variety of
other indices will also be recorded in order to confirm that the manipulations of mental workload
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 61/167
51
were successful: HR, spontaneous skin conductance response rate (a type of electrodermal activity),
self-reported (subjective) workload ratings, and task performance level.
Objective (1):
To demonstrate that average saccade rate changes in response to increased difficulty of the
experimental tasks. This will be achieved by administering each of the three neuropsychological
tests to participants at two levels of difficulty, while concurrently recording eye movements.
Note: Although the intersaccadic interval was initially the primary measure, preliminary data
collection indicated the average saccade rate to be a superior measure of the same phenomena.
Full details of this decision can be found in the Full-Scale Study section.
Hypothesis:
Based on the results of the pilot study, which are discussed in the Pilot Study section, the response
of average saccade rate to difficulty level will differ between task types. In the auditory task, it is
expected that the average saccade rate will be lower for the high difficulty condition than in the
nominal difficulty condition. In the math and fluency tasks, the presence of a significant change is
hypothesized, but its direction is not.
Objective (2):
To demonstrate that a decrease in participants’’’’ motivation level will be accompanied by a change
in average saccade rate. This will be achieved by measuring average saccade rate in a standard
condition where participants are asked to do “as best as they can” versus a second experimental
condition in which participants are asked to relax and disregard their performance.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 62/167
52
Note: The initial aim of the study was to demonstrate an eye movement response to an increase in
motivation from a standard, baseline level. However, interviews conducted in pilot research
indicated that many participants were maximally motivated at the standard level. Therefore, in
order to have two levels of motivation, it was necessary to compare standard conditions to those
which in which participants were asked to deliberately lower their motivation levels.
Hypothesis:
Based on the results of the pilot study, the response of average saccade rate to motivation level
will differ between task types. In the auditory task, it is expected that the average saccade rate will
be lower in the standard motivation condition than in the low motivation condition. In the math
and fluency tasks, the presence of a significant change is hypothesized, but its direction is not.
Objective (3):
To demonstrate that the average saccade rate response to changes in task difficulty and
motivation level converges with (i) electrophysiological measure findings (i.e., average
spontaneous skin conductance response frequency and heart rate), (ii) task performance, and (iii)
self-reports of “mental effort.”
Hypothesis:
For all three task types, average saccade rate findings will converge with electrophysiological
findings, task performance, and self-reports of subjective effort, for both task difficulty and
motivation manipulations. Specifically, it is expected that task performance will be negatively
impacted by task difficulty, but positively impacted by an increase in motivation level, while heart
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 63/167
53
rate and the rate of spontaneous skin conductance responses will be positively correlated to effort
for both experimental manipulations.
Objective (4):
Investigate whether another eye movement statistic, the occurrence of long ISI, responds more
consistently to experimental manipulations of workload than the average saccade rate measure.
Hypothesis:
Based on the results of the pilot study, Long ISI, which are defined as being longer than 1500 ms,
will be significantly more prevalent where either the difficulty or motivation level during the
auditory task is increased. For the math and fluency tasks, the presence of a significant effect is
hypothesized, but its direction is not.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 64/167
54
Chapter 4. Pilot study
In the pilot study, eye movements were tracked in naive participants. Here, the aims were to 1)
identify significant technical and methodological complications that would influence signal quality,
2) descriptively characterize eye-movement data, and 3) refine experimental procedures. A
noteworthy procedural change made part-way through the pilot work was the method of motivating
participants, as discussed later on. These and other details of the pilot work have been documented
to illustrate and support the evolution of the experimental method for the formal study, which is
presented in the Full-Scale Study section.
4.1 Methods
4.1.1 Participants
The required sample size was estimated through a power analysis using data from a previous study
(Ruth & Giambra, 1974) of mean eye movement rates (i.e., roughly the inverse of mean interval
length), in which the wording of a verbal fluency test was manipulated in order to differentially
motivate two groups of participants. From these results, a within-participants standard deviation of
33 movements/minute and an effect size of 0.7 (half of that which they report) gives a power of 0.6
for n = 12 (alpha = 0.5). Furthermore, a previous dual task study (Harbluk & Noy, 2002) reports
almost double the effect size for the manipulation of task difficulty (single versus double digit
multiplication) during driving. Although a power of 0.6 is on the threshold of acceptability, it was
deemed adequate considering its conservative estimation. Therefore, 12 participants were planned
per experimental group (i.e., 24 in total).
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 65/167
55
Because this was pilot work, fewer participants were recruited because unanticipated methodological
issues warranted a premature reappraisal of the experimental method. Therefore, only 20
participants were recruited through posters on the University of Toronto campus.
Inclusion criteria:
o healthy, young adults between the ages of 18 and 55
o able to speak and read English fluently
o normal hearing and vision (with or without any corrective lens except bifocal glasses)
Exclusion criteria:
o known diagnosis of developmental disorder (e.g. attention deficit hyperactivity disorder;
dyslexia)
o history of brain injury or other neurological disorder
o history of psychotic disorder
4.1.2 Materials
Overview: Participants completed three types of neuropsychological tasks. During the tasks, eye
movements were recorded using a video eye tracker and three other physiological measures were
also recorded. In addition, task performance (number of correct responses and errors) as well as a
subjective workload rating were recorded for each task. Participants’ mental effort was manipulated
through the difficulty of the tasks as well as their motivation level.
4.1.2.1 Eye tracking
Eye movements were recorded using two monocular, video tracking “EyeLink” systems (SR Research,
Mississauga ON): a 500 Hz remote tracking system and a 1000 Hz “tower mount” system.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 66/167
56
Commercial images and specifications of these eye trackers can be found in Appendix A. The remote
tracker did not require any head support, but relied on an adhesive forehead marker to
accommodate head movements within a 30x22x30cm envelope. The tower mount tracker required
the use of a forehead rest and involved a dichroic reflector in front of the participants view, which
reflected infrared light, but passed visible light. Infrared light was thereby reflected into the eye and
then back into the video camera. Both systems utilized the dark pupil, corneal reflection (CR)
method. This method calculates the gaze position by measuring the position of the CR with respect
to the center of the pupil, which is detected as an area of low infrared reflectance because the
illuminator is not coaxial with the optical path. Although the remote tracking system is
advantageous from a clinical and experimental point of view, as it can be made less conspicuous, the
tower mount system was chosen after some initial testing with naive participants. Despite
instructions to be still, gross head and upper body movements resulted in very high signal noise
levels. However, when a forehead rest was used, it was revealed that the range of eye movements was
too great for the remote system; thus, the tower mount system, with roughly double the range in
both the vertical and horizontal directions, was the final choice.
Even with the head supported, signal noise was also caused by minute movements during
verbalization, and more importantly, pupil occlusion by the eyelids and eyelashes. The resultant high
frequency noise confused saccade detection algorithms based on velocity thresholding, so an
algorithm was developed to distinguish saccades from noise. This was accomplished using the
assumption of the main sequence relationships (Bahill, 1975), which describe the functional
relationship between peak saccade velocity, duration, and amplitude. If the amplitude of a “spike” in
the signal was not within an acceptable range of that predicted by its peak velocity (via the main
sequence functions), then it was considered noise. The use of the main sequence in detecting
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 67/167
57
saccades is an uncommon technique, but has appeared in the literature at least once before (Giolma,
1984). After detection using the algorithm, the data were inspected visually, sometimes requiring
correction in very noisy sections or where saccade parameters were in gross disagreement with main
sequence relationships. However, less manual correction was necessary after processing with the
main sequence detection algorithm compared to the proprietary Eyelink algorithm.
4.1.2.2 Physiological Measures
Three other physiological measures were recorded, though time constraints precluded their analysis
in the pilot study, but they were planned for the full-scale study. Their purpose in the pilot study was
to anticipate any technical measurement issues.
Palmar electrodermal activity (EDA) was measured using reusable electrodes and a conductive gel
(GEL100, Biopac Systems, Chicago IL) that were applied to the index and middle fingers. The EDA
signal was conditioned and recorded on computer using a Biopac MP100 data acquisition system. A
GSR100C bioamplifier conditioned the signal using a gain of 5 microSiemens/output volts and a 10
Hz low pass filter, no high pass filter (not de-trended), and the signal was sampled at 125 Hz. The
tonic EDA level varied widely between participants and conditions, but was generally between 3 and
10 microSiemens.
A two-lead electrocardiogram (ECG) was also used to record heart period data. To this end,
disposable electrodes (EL503, BioPac Systems) were applied on the wrists or ipsilateral wrist and leg,
depending on which gave a stronger signal in each individual. As with EDA, this signal was also
recorded using the MP100 data aqcuisition system. The ECG amplifier (ECG100C) applied a gain of
2000 mV/output volts, a 60 Hz notch filter, and a 0.5 Hz high pass filter, and the signal was sampled
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 68/167
58
at 250 Hz. The ECG signal was also processed using a hardware “R-wave detector,” which enhanced
the R-wave and filtered out other peaks in the data for ease of heart period determination.
Respiratory rate was collected via inductive respiratory effort belts around the chest and abdomen.
These belts were integrated into a “Lifeshirt” vest that was worn on the outside of participant’s
clothing (Vivometrics, Ventura CA). Fluctuations in chest and abdomen diameter were logged in a
battery powered data acquisition/logging unit that accompanied the Lifeshirt, and the data was later
downloaded to the computer.
4.1.2.3 Subjective Workload Ratings
After each task, participants were asked to rate their mental workload. The most popular workload
ratings systems, SWAT and NASA-TLX, were unfeasible for this purpose because they were too time
consuming to be administered over twelve times during the course of an experiment. A brief
experiment was of the utmost importance, as participants were required to maintain some level of
motivation.
In pre-pilot testing with 10 participants, a computerized adaptation of the C-SWAT was used, as per
the recommendation of Goonetilleke & Luximon (1993). It began by asking participants to rate the
relative important of three workload dimensions, “mental effort load,” “time load,” and
“psychological stress load,” and then to rate each dimension on a scale from 1 to 9. However, many
participants reported that this rating system seemed unnecessarily lengthy and open to various
interpretations. Interestingly, the concerns of O’Donnell & Eggemeier (1986) were also echoed: that
the rating system did not differentiate between (extrinsically imposed) task difficulty and
(intrinsically controlled) “effort.”
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 69/167
59
Because no rating system could be found that addressed this difference, a new system was devised for
the pilot study. Participants were instructed that they would be asked to rate the “task difficulty” and
“mental effort” of every task. They were also instructed that the former should be thought of as an
objective quantity (out of their control), while the latter is something that they could control, but
that a more difficult task generally requires more effort for best performance. This phrasing was
developed through interviews with pre-pilot experiment participants, with the intention of creating
definitions that address the preconceptions of most people while clearly distinguishing between the
two constructs. After explaining the terms, participants were then asked to rate their perceptions of
each on a visual analog scale from 1 to 9 (lowest to highest perceived difficulty/effort). Their verbal
responses were recorded and later transcribed. Participants’ approval and compliance with this
system was much higher than with the computerized C-SWAT system used in the pre-pilot study.
4.1.2.4 Neuropsychological Tasks
The tasks were programmed in E-Prime (Psychology Software Tools Inc., Pittsburgh PA), so that
they could be automatically administered. E-Prime was also capable of synchronizing the start and
end of each task with the acquisition of other measurements through digital trigger signals. All text
prompts appeared at the center of the screen, serving the purpose of re-centering the participant’s
gaze at the start of each experimental task. Auditory prompts were broadcast via headphones in
stereo, so as not to encourage the participant to look toward their source. Verbal responses were
registered by a hidden microphone and later transcribed.
The three experimental, non-visual tasks were adapted from conventional neuropsychological tests,
and were selected with the aim of addressing those types of tasks that were most prevalent in the eye
movement literature. Each task had two versions: nominal and high difficulty. On all trials, the
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 70/167
60
participant was asked to carry out the task as accurately and (where applicable) as quickly as possible
without making errors.
The three tasks were verbal fluency, identification of words in background noise, and serial
subtraction.
Verbal fluency is a widely used clinical neuropsychological test of word generation. The task entailed
speeded retrieval of words beginning with a letter, for a fixed period of time. Participants were not
permitted to use proper nouns or to repeat words. Words were generated aloud and recorded for
later transcription. The dependent variable for performance was the total number of words
generated in a fixed amount of time. Two levels of difficulty were generated by selecting letters for
which the number of words starting with that letter is high (nominal difficulty) or low (high
difficulty), based on Borkwoski’s (1967) data. The selection of the letters was also confirmed in pre-
pilot testing of 27 participants. Because both the nominal and high difficulty versions of the task
were repeated twice during the experiment, two letters were selected for each difficulty level: ‘t’ and
‘m’ (nominal) and ‘j’ and ‘q’ (high).
Identification of words in background noise is an experimental test of auditory attention and
auditory perception. It entailed the presentation of single words in background noise, where the
background noise was unintelligible crowd “babble.” Participants were asked to say the words aloud
as soon as they identified them. The dependent performance variable was the number of correct
responses. The background noise level was consistent across the two levels of difficulty, but the
volume of the words was either high (nominal difficulty) or low (high difficulty).
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 71/167
61
The serial subtraction task is a test of mental control with auditory-verbal working memory and
computational skill demands. A variant of the task was used, in which participants were asked to
subtract numbers, alternating by 1’s and 2’s (nominal difficulty version) or 8’s and 9’s (high
difficulty version). Participants were assigned a start number and then asked to subtract the numbers
aloud. The dependent variable was the number of correct responses.
The three task types and their difficulty versions are summarized in Table 2. For the purpose of
brevity, the three tasks will be generally referred to as the fluency task, auditory task, and math task,
respectively.
Table 2. Descriptions of Experimental Tasks
Task Type Nominal Difficulty High Difficulty
Verbal FluencyGenerate words starting with
‘t’ and ‘m’Generate words starting with
‘j’ and ‘q’
Identification ofWords in
Background NoiseWords are spoken quietly. Words are spoken loudly.
Serial SubtractionAlternating subtraction by
1’s and 2’sAlternating subtraction by 8’s
and 9’s
The tasks each involved three stages:
1. Instruction Period: instructions for the task are presented at the center of the screen
2. Observation Period: the screen is blank; participant carries out the task for 60 seconds;
outcome measures and participant responses are recorded
3. Rest Period: participant is asked to relax before pressing a key to continue
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 72/167
62
There were two approaches to motivating participants. In the first approach, a contest with cash
prizes is introduced at the midpoint of the experiment as an incentive for participants to exert more
effort towards improving the performance the last half. In the second approach, the same contest is
introduced, but participants were also instructed to try “as little as possible” during the first half of
the experiment. The purpose of these instructions was to accentuate the difference in motivation
level between the first and second half (in which alternate forms of the same tasks were presented). It
was also a response to interviews with the initial participants, who felt that they had automatically
given their best effort on the first half, and therefore could not improve themselves in the second half.
The experimental groups who received these two manipulation approaches will be henceforth
referred to as the “motivated group” and “de/motivated group,” with those in de/motivated being
asked to try as little as possible in the first half of the experiment.
The contest involved three cash prizes ($100, $50, and $25), which were offered on the basis of task
performance improvement (from the first to second half of the experiment). Task performance was
said to be averaged over all three task types, so that participants would exert additional effort on all
types, rather than focusing on just one. Improvement in performance was used rather than absolute
performance so that participants would believe they had a chance at winning, regardless of their
perceived skill level. Three cash prizes, rather than one, were similarly offered to enhance the
participants’ perceptions of the contest’s odds.
4.1.3 Design
This study was originally conceived as a mixed between- and within-participants repeated measures
design, with two experimental groups: 1) with motivation manipulation (motivation group) and 2)
without motivation manipulation (controls). As has already been discussed, the perceived failure of
the initial motivation technique prompted the development of a second approach, and therefore a
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 73/167
63
third experimental group was created: the de/motivated group described earlier. All participants
repeated each task and difficulty level three times, over a practice block and two experimental blocks
(18 trials in total). Completion of the entire task battery took 30-40 minutes, which pre-pilot
experiments (n = 10) suggested as the maximum duration before participants tended to report
moderate fatigue levels. All participants in the motivation and de/motivated groups were given
notice of a contest for cash prizes after the first block. However, those in the de/motivated group
were also asked to try as little as possible during the first block, in order to accentuate the effect of
the contest on motivation levels. The control group was not told of the contest, and their only
instruction was to concentrate on the tasks while avoiding frustration. In sum, the control group
received two blocks of identical trials, while the motivated and de/motivated groups received two
blocks of trials wherein the first block is considered “unmotivated” and the second block was
“motivated.” The purpose of having a control group was to determine whether any changes in
outcome measures are due to factors such as fatigue or practice, rather than the presence of either
motivation manipulation.
The order of task presentation was counterbalanced by trials using a balanced latin square design to
control for carry-over effects. Furthermore, the presentation of specific prompts in each trial (i.e.,
letters, starting numbers, words in noise stimuli) were reversed for half of the participants in order
to verify that any apparent effect was not simply due to the order of the prompts. For example, while
half of participants were prompted with ‘q’ for the difficult fluency task in the first half of the
experiment and ‘j’ in the second half, the other half of participants were prompted with ‘j’ in the first
half.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 74/167
64
The independent variables for this study were:
1. Task Type
2. Task Difficulty Level
3. Presence of Monetary Incentive (Motivation)
The outcome measures were:
1. Intersaccadic Interval (ISI) Length
2. Task Performance: number of correct and incorrect responses
3. Subjective Effort Ratings (1-9) for “Task Difficulty” and “Mental Effort”
4.1.4 Procedures
The inclusion and exclusion criteria described previously were indicated on the recruitment poster.
When participants contacted the study coordinator to discuss participation, these criteria were
confirmed. At that time, participants were also requested to follow some guidelines designed to
avoid excessive fatigue effects during the experiment:
On the night before the study,
o avoid excessive alcohol consumption
o avoid illicit drug use
o get a good nights rest
On the day of the study,
o avoid heavy exercise
o avoid eating a heavy meal before testing
o avoid abnormal caffeine consumption
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 75/167
65
A consent form and pre-test questionnaire (see Appendix B), which gathered basic demographic
information and verified that the preceding guidelines were met, were electronically provided to the
participant at least 24 hours in advance of the scheduled experimental session. Upon arrival at the
session, the participant were asked to read the form if they had not already. They were then asked
whether they had any questions, and subsequently, to sign the form. The ethics review board of the
Toronto Rehabilitation Institute and the University of Toronto approved all procedures.
The experimental room was maintained at a comfortable temperature and isolated from outside
noise. Overhead fluorescent lighting did not interfere with the operation of the tower mount system.
The participant was seated at a desk that was spare except for the eye tracking system and a
computer monitor placed approximately 40 cm from the participant’s face. The monitor was as close
as comfortable viewing allowed, so that the angular span of the eye tracker calibration targets was as
large as possible. A stationary chair was chosen to discourage large movements during the
experiment.
The experimenter placed electrodes on the fingers of their non-dominant hand, as the dominant
hand would press a button in order to start the experimental tasks. Electrodes were also placed on
wrists and/or lower leg of the participant, depending on which location resulted in the best signal for
each individual. A vest containing inductance respiratory effort belts was also worn by the
participant, on the outside of their clothing. Where the remote eye tracker was used (versus the
tower mount system), an adhesive “target” was also applied to participants’ foreheads. To allow time
for electrode gel to absorb into the skin and for the participant to become accustomed to their
environment, they were now asked to fill out the pre-test questionnaire.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 76/167
66
When the questionnaire was completed, the participant was then situated for optimal eye tracker
performance. The desk height was adjustable for this purpose. In the case of the remote eye tracker,
optimal performance in terms of measurement range was achieved with the tracker approximately
10-15 cm below the height of their eyes. However, this position tended to draw attention to the
tracker itself and also obstructed the view of the monitor, which had to be located directly in front of
the participant for calibration purposes. Thus, the remote tracker was positioned at about 20 cm
below the eyes. In the case of the tower mount eye tracker, desk height and participant position was
adjusted so that their forehead rested comfortably against the rest. The eye tracker was then
calibrated using a 9-point calibration procedure (spanning approximately ± 20°).
Before starting the experiment, the participant was told:
1. the experimenter would be outside of the room during the experiment except after the
practice session and after the experiment halfway point, when they would re-enter the room
to answer questions
2. the experimenter would be monitoring the progress of the experiment at all times, and
would step in if there was an unforeseen problem
3. to avoid excessive body movements, especially turning around
4. (in the case of the tower mount system) they need only have their forehead in the rest during
the actual (60 s) task periods
5. it was important to us that they concentrate on the tasks but avoid becoming frustrated with
their performance
6. it was important to keep their eyes open (the experimenter monitored the eye tracker output
and reminded the participant whenever necessary)
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 77/167
67
7. (for the de/motivated group) they should try “as little as possible” during the tasks; in
particular they should try to relax and pay little attention to their performance
Participants were also instructed that the start of each task was self-paced. Therefore, they were able
to take as much or as little time as they chose between tasks.
If the participant belonged to the either the motivation or de/motivated manipulation groups, then
they would received notice of the motivating contest through the computer monitor. The
experimenter would also enter the room to answer any questions and to ensure that the contest
notice was taken seriously. If the participant belonged to the control group, they would only receive
notice that they were halfway through. The experimenter would also enter the room for the purposes
of experimental symmetry and to verify the comfort of the participant.
At the end of the session, the various instruments and electrodes were removed, and the participant
was debriefed on the full details of the study, the reasons for any previous nondisclosures, and the
study’s potential scientific and clinical impact. Control participants would be informed of the
contest and told that their results would be entered alongside all other participants.
The experimental session lasted 1 to 1.5 hours, including setup and debriefing.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 78/167
68
4.1.5 Analysis
For each dependent measure, only a small number of datasets were useable (n = 8, see Results section,
below). Therefore, data were examined descriptively and graphically, as there were an insufficient
sample size to conduct inferential statistics.
The first step in the analysis was to use subjective effort and task performance data to verify that the
experimental manipulations of task difficulty and participant motivation level actually resulted in a
change in mental workload. An increase in workload would have been indicated by the following
outcomes:
o increase in subjective effort ratings
o increase in task performance from low motivation condition to high motivation condition
o decrease in task performance from nominal to high task difficulty conditions
Changes in subjective workload ratings and task performance were aggregated and plotted in terms
of the presence of change as well as magnitude, between experimental conditions. These plots were
visually analyzed with the intention of investigating the effects of task difficulty manipulations; the
effect of motivation versus the control group; and the relative effectiveness of the two motivation
groups: motivated group and de/motivated group.
Eye movement data from eight participants was used in a visual inspection of ISI lengths. Two of the
participants were from the control group and six from the de/motivated group (none from the
motivated group). Eye movement data were presented in comparative histograms of ISI length for
each independent variable (task type, task difficulty, and participant group membership). That is, the
data from each participant were represented by 12 figures, each figure containing two normalized
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 79/167
69
histograms for the purposes of comparing ISI length distributions between two difficulty levels of a
task or halves of the experiment. The effect of motivation was investigated by comparing the
consistency and magnitude of any response in the de/motivated group participants to those of the
control group.
As previously mentioned, heart rate, electrodermal activity, and respiratory rate were also collected
as an investigation of potential technical issues, but were not subject to analysis due to time
constraints.
4.2 Results and Discussion
4.2.1 Subjective Workload Data
Participants were asked to rate their perceived difficulty of and effort expended on each 60 second
task immediately after they completed it. There are two important trends exhibited by this data: 1)
subjective effort and task difficulty ratings generally corresponded to both motivation and task
difficulty manipulations; 2) those participants in the de/motivated group, who were told to “try less”
in the first half of the experiment, reported a larger change in effort and task difficulty from the first
to second half, with respect to both the motivated group and the control group.
Figure 5 presents the change in participants’ perceived task difficulty ratings when presented with a
more difficult version of each task. This figure represents the pooled observations of all experimental
groups and at both halves of the experiment, as similar trends were observed regardless of group
membership of experiment half. The consistency of the association between imposed and perceived
task difficulty is remarkable considering that the participants could not reference their ratings of
previous tasks, except by memory. Variations in the consistency of this association between task
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 80/167
70
types could be explained by relative face validity of each task’s difficulty manipulation. For example,
whereas it was obvious to the participant when the spoken words were made more audible in some
trials, it was perhaps less obvious when a more difficult letter was presented for the fluency task. In
the latter case, it could be that participants were able to rely upon their perceived (in)frequency of
the starting letters in the English language, or they were able to recognize a change in the number of
responses that they produced for each letter.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Math Fluency Auditory
Task Type
P o r t i o n o f R e p o r t
s
Reported NoChange
Reported aNegative Change
Reported aPositive Change
Figure 5. Change in task difficulty ratings from the nominal to high difficulty versions of each task.
Because the manipulation of task difficulty was only a means of affecting mental effort, the
association between task difficulty and subjective effort (Figure 6) is more important. It is to be
expected that subjective effort ratings should roughly associate with difficulty, not only because
more difficult tasks theoretically demand more effort, but also because of the connection between
difficulty and effort that was described in the instructions to the participant. Another explanation is
that participants’ mental effort ratings are influenced by their ability to recognize the intent of the
experiment, rather than their actual “feelings” of effort. Comparing Figures 5 and 6 does indicate
that at least some participants seem to draw a distinction between perceived difficulty and effort, as
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 81/167
71
per the assertions of Naccache et al. (2005). However, their relative similarity may either suggest the
success of the difficulty manipulation in affecting effort, or they simply cast doubt on the ability of
participants to distinguish between subjective effort and imposed demands. Unfortunately, this is a
enduring problem with the interpretation of subjective effort ratings.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Math Fluency Auditory
Task Type
P o r t i o n o f R e p o r t s
Reported NoChange
Reported aNegative Change
Reported aPositive Change
Figure 6. Change in subjective effort ratings from nominal to high difficulty version of each task.
The relative success of the two approaches to motivating participants, used with the motivated group
and de/motivated group, can be inferred from Figure 7. Clearly, those participants that were asked to
“try less” in the first half of the experiment (de/motivated group) more consistently report a change
in subjective effort after the introduction of the contest. However, as with the ratings of perceived
task difficulty, it is conceivable that this effect may have been more a consequence of preconceptions
than the relative presence of effortful feelings. That is, perhaps participants were more apt to indicate
a change in effort when they had been expressly asked to “try less” in the first half of the experiment.
Looking to performance data may lend further evidence in favour of one approach over the other.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 82/167
72
Figure 7. Change in subjective effort ratings from 1st
to 2nd
half of experiment, by experimentalgroup; data is pooled over all task types.
It should also be noted that reports were excluded from the data of Figure 7 if extreme scale values
(either 1 or 9) were assigned to both halves of the experiment. It is conceivable that participants may
have assigned a higher or lower rating in these cases, had they a longer scale or previous knowledge
of the motivation manipulation. Approximately 10% of observations were thus disregarded. This
stipulation was not made for the rating datasets in Figures 5 and 6, because all tasks/difficulty levels
had been previously presented to the participant in the practice block.
In post-test interviews concerning the contest, it was common for participants to mention that the
contest would have been more effective had they the guarantee of receiving prize money
immediately after the experiment. The option of compensating motivation group participants based
on their performance (e.g., 50 cents per correct responses) was considered, but would have required
the calculation of performance scores during the experiment, rather than being transcribed
afterwards. Because the eye tracking equipment required the experimenter’s attention during the
experiment, this would have posed a feasibility issue.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Motivated Group 1 Control Group Motivated Group 2
P o r t i o n o f R e p o
r t s
Reported NoChange
Reported aNegative Change
Reported aPositive Change
Motivated Group Control Group De/Motivated Group
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 83/167
73
4.2.2 Task Performance Data
Task performance data were used to verify that the experimental manipulations of task difficulty (i.e.,
more demanding math problems, more uncommon starting letters, quieter words in noise) actually
resulted in lower task performance. The concept of a control loop that allocates mental effort in
response to our perceptions of task performance is very common in the literature, from the early
work of Kahneman (1973). Therefore, a distinct change in task performance bodes well for the
success of the manipulation in affecting mental workload. Figure 8 illustrates that task selection was
successful in this respect.
-90
-80
-70
-60
-50
-40
-30
-20
-10
0Math Fluency Auditory
Task Type
M e a n C h
a n g e i n N u m b e r o f C o r r e c t
R e s p o n s e s ( % )
Figure 8. Mean and standard deviations of (percent) change in number of correct responses
from nominal to high difficulty task versions
To answer the question of whether either of the motivation manipulations affected task performance,
it is necessary to view the data for each task because some participants reported that the effect of
motivation on performance differed between them. These data are presented in Figures 9a and 9b, as
the proportion of observations where performance improved and degraded, respectively, between
the first and second halves of the experiment. The figures clearly suggest that the second approach to
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 84/167
74
motivation, which included instructions to exert less effort in the first half, was more successful than
the first approach, which did not have any special instructions for the first half. The superiority of
the second approach is particularly evident in Figure 9b, which shows that the performance of the
de/motivated group participants less frequently degraded from the 1st to 2nd halves than that of
motivated group participants. However, the general success of the motivation manipulations was
reasonably poor with respect to the control group. For example, Figure 9a shows that roughly the
same proportion of control participant trials as motivated participant trials improved their
performance on the math task. The most obvious interpretation of this result is that any
improvements in performance on the math task were due to practice effects. However, Figure 9b
shows that in many cases, control participants did not benefit from practice, and perhaps even
experienced fatigue, leading to a low effort strategy. Because all participants reported that the contest
increased their motivation to do well, loss of interest is not likely responsible for any degradation in
motivated participants’ performance. A more plausible explanation is that the contest led to above
optimal arousal in some cases, leading to “mental blocks,” and generally poor composure. Subjective
assessments of participants’ voice recordings support this conclusion.
Figure 9a (left). Portion of trials in which performance improves from 1st to 2nd half of experiment (pooled over
both difficulty levels); Figure 9b (right). Portion in which performance degrades.
0%
20%
40%
60%
80%
100%
Math Fluency Auditory
Task Type
Motivated
Group
Control Group
De/Motivated
Group
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Math Fluency Auditory
Task Type
P o r t i o n o f T r i a l s
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 85/167
75
Another way to analyze the effect of motivation is to examine the magnitude of performance change,
with the purposes of 1) determining whether there was an effect of motivation compounded with a
presumed practice effect, and 2) to further evaluate the two motivation techniques. Note that
observations of no improvement or negative change were excluded from these data in case they had
been confounded by excessive anxiety due to the contest or (in controls) lack of interest due to
fatigue. Comparing the magnitudes of change in motivated group observations versus control group
observations (Figure 10), it seems that the effect of the contest was actually detrimental, rather than
additional to the effect of practice. With the exception of the auditory task, the larger magnitudes
exhibited by the de/motivated group seems to suggest that task performance was affected by
instructions to exert less effort in the first half of the experiment. However, the large standard
deviations belie the variability between individual trials as well as the small sample sizes from which
these means were calculated. In particular, the control group/auditory task data had a very small
sample size (n = 2) as most participants did not improve their performance. The other group/task
combinations contained between 5 and 7 trial cases.
0
30
60
90
120
Math Fluency Auditory
Task Type
M e a n P e r f o r m a n c e I n c r e a s e ( % )
Motivated
Group 1
Control
Group
Motivated
Group 2
Figure 10. Mean and standard deviations of change in task performance from 1st to 2nd half (pooled
over difficulty levels), using data from only those participants who improved.
MotivatedGroup
ControlGroup
De/MotivatedGroup
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 86/167
76
Most importantly with respect to the manipulation of motivation, the performance data corroborate
with subjective rating data as well as informal post-experiment interviews with participants. That is,
participants were generally sceptical of the effectiveness of the contest as a motivation tool unless
they were part of the group that was explicitly instructed to exert low effort in the first half of the
experiment. Otherwise, participants often reported that they had performed optimally in the first
half and therefore did not feel as if it was possible to improve in the second half.
A further observation is that the introduction of the contest seems to cause a high stress condition in
many people, possibly leading to a detriment rather than enhancement of performance. Though
stress could be thought of as a component of workload (see Robert & Hockey, 1997), this condition
may lack ecological validity as a replication of clinical conditions. It is unlikely that a clinician or
psychometrist would allow a client to reach such a high level of stress that it is detrimental to their
performance on a rehabilitation treatment task or neuropsychological test.
4.2.3 Eye Movement Data
As previously mentioned, only eight participants’ eye movement datasets were suitable for
intersaccadic interval (ISI) length analysis. Data from the other participants was deemed too noisy,
due to the prevalence of pupil occlusions or head movements. The greatest improvement to signal
quality resulted from switching from the remote eye tracker to the tower mount system; all eight of
the “suitable data” participants were tested on the latter. Six of these participants belonged to the
de/motivated group (informed of contest and asked to try less in the first half), while two belonged
to the control group.
Histograms of ISI lengths for each participant and each experimental condition (experiment half,
task type, and difficulty level) were compared visually. An inspection of histograms comparing
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 87/167
77
difficulty levels for each task type revealed no obvious effect resulting from a change in difficulty.
Although the distribution of participants’ ISI lengths would vary widely between conditions, the
direction of the shift was not consistent for any of the task types. Histograms comparing ISI lengths
from the first to the second half of the experiment similarly did not reveal any consistent effect for
neither the math task nor the fluency task.
However, an effect of motivation is suggested by the auditory task data, particularly in the high
difficulty (quiet words in noise) version. That is, the frequency of very long ISI (on the order of
several seconds long) appears to be higher on the auditory task in de/motivated group participants,
relative to control group participants. This conclusion was drawn from the histograms in Appendix
C, which are a compilation of each participants’ ISI data for the auditory task, comparing ISI
distributions between the first and second half of the experiment. Note that the distributions appear
to be either bimodal or unimodal with a strong positive skew. Therefore, the histograms have a non-
linearly scaled x-axis to enhance the visualization of both short and long ISI lengths.
Again, post-experiment interviews were valuable in understanding participants’ eye movement
behaviour during the experiment. In particular, some participants assumed that they were required
to stare straight ahead during the tasks. This misunderstanding is important because it would
obscure any treatment effect due to ISI lengths being generally very long. Further, a reverse effect
may occur, wherein effortful tasks cause a lapse in participants’ conscious efforts to stare straight
ahead, thus leading to more frequent eye movements, not less. Participants reported that this
assumption was mainly a result of having their head movements constrained by the forehead rest.
Another factor identified was the proximity of the monitor to the face (40 cm), which was intended
to improve calibration accuracy. Participants also reported that the calibration procedure itself drew
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 88/167
78
attention to their eye movements, as the purpose of the eye tracker became obvious to them. An
additional finding from the interviews was that eye closures may have actually been encouraged by
the request that participants avoid them. One participant spontaneously revealed that the thought of
this requirement led to a psychosomatic irritation and a subsequent urge to blink. Subsequently,
other participants reported the same experience when specifically prompted about it.
In addition to frequent blinking, there were a number of other contributors to eye movement signal
quality. In some participants, it was not possible for the tracker to properly detect the corneal
reflection and pupil center through a large enough range of movements. Reasons include anatomical
characteristics (size and shape of the eyes) and squinting. Because the tracker detected the pupil as
an area of sub-threshold infrared reflectance, it was also important that the level of infrared (IR)
light reflected through the pupil was lower than that reflected by surrounding tissues. In some
participants, this issue arose due to mascara use, which readily absorbs IR light, while in others, the
IR reflectance of their pupils was inexplicably high. Contact lens use was not identified as a factor.
Within individual participants, signal quality was also periodically degraded by eyelid occlusion of
the pupil when looking downward and (much less so) by gross head movements. Slight head
movements due to verbalization resulted in only low amplitude noise levels (< 0.3 visual angle).
Finally, although the eye tracker was specified to function with non-bifocal corrective lenses, all
corrective lens use restricted the visual angle range substantially. Further, they necessitated very
precise adjustments to the dichroic mirror angle, so that if the participant removed their forehead
from the rest and then returned to a slightly different position, readjustment would be necessary.
Soft contact lenses also did not pose any problems in this regard, while tracking with hard contact
lenses was not attempted.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 89/167
79
4.3 Conclusions
The results of the pilot study do not suggest a general relationship between ISI length and effort on
non-visual tasks. However, the possibility of a correlation between motivation in the auditory task
(identification of words in noise) and ISI length is worthy of further investigation, considering that it
has not been previously reported. Furthermore, the very low sample size and unresolved
methodological issues require that these results be verified with an improved experiment involving
more participants. Improvements should address the following issues:
o efficacy of the motivation manipulation
o
conscious control of eye movements by participants
o generally poor eye tracker data quality due to:
o prolonged blinks/closures
o pupil occlusion when looking downwards
o easily controlled factors: corrective lens and mascara use
o the eventuality of low eye signal quality on some trials due to head movements or
uncontrollable pupil occlusions
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 90/167
80
Chapter 5. Full-Scale Study
5.1 Restatement of Hypotheses
Hypothesis (1):
Based on the results of the pilot study, which are discussed in the Pilot Study section, the response
of average saccade rate to difficulty level will differ between task types. In the auditory task, it is
expected that the average saccade rate will be lower for the high difficulty condition than in the
nominal difficulty condition. In the math and fluency tasks, the presence of a significant change is
hypothesized, but its direction is not.
Hypothesis (2):
Based on the results of the pilot study, the response of average saccade rate to motivation level
will differ between task types. In the auditory task, it is expected that the average saccade rate will
be lower in the standard motivation condition than in the low motivation condition. In the math
and fluency tasks, the presence of a significant change is hypothesized, but its direction is not.
Hypothesis (3):
For all three task types, average saccade rate will correlate with the electrophysiological findings,
task performance, and self-reports of subjective effort, for both task difficulty and motivation
manipulations. Specifically, it is expected that task performance will be negatively impacted by
task difficulty, but positively impacted by an increase in motivation level, while heart rate and the
rate of spontaneous skin conductance responses will be positively correlated to effort for both
experimental manipulations.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 91/167
81
Hypothesis (4):
Based on the results of the pilot study, Long ISI, which are defined as being longer than 1500 ms,
will be significantly more prevalent where either the difficulty or motivation level during the
auditory task is increased. For the math and fluency tasks, the presence of a significant effect is
hypothesized, but its direction is not.
5.2 Methods
Many of the methods for the full-scale study were identical to those of the pilot study, therefore this
section will only highlight differences between the two.
5.2.1 Participants
37 participants were recruited, but the “healthy vision” inclusion criterion was changed to:
o Have normal hearing and vision (may wear glasses with a low prescription or any soft
contact lens)
The purpose of this modified criterion was to ensure that participants could remove their glasses for
the duration of the experiment while still being able to read the task instructions. This ability was
confirmed in the screening process, via email or telephone.
Of these 37 participants, the data of 13 were excluded from the analysis on the basis of poor eye
movement signal quality (11 participants) or other technical difficulties (2 participants). The
remaining 24 participants comprised of an equal number of motivation manipulation and control
group participants, and represented each counterbalanced trial combination of the latin square
design described earlier. According to the demographic summary in Table 3, the groups are well
matched in terms of participant age, but not in terms of gender.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 92/167
82
Table 3. Participant Demographic Summary
Group Membership Median Age
(Standard Deviation)Number of Females
(Males)
Control 27 (8) 2 (10)
Motivation Manipulation 29 (13) 8 (4)
5.2.2 Materials
5.2.2.1 Eye Tracking
Eye movements were tracked using the tower mount system that is described in the Pilot Study
section. Data quality was substantially improved by fixing participants’ eyebrows in a slightly raised
position using a piece of medical tape, a technique documented by Johansson et al. (2001). Although
participants could blink freely, the eyelid was less likely to occlude the pupil when they looked
downward. This technique also had the unexpected benefit of reducing blink frequency in most
participants, presumably because it served as an implicit reminder to avoid closing their eyes.
Eye movement data collected under these conditions was of a high enough quality that the
proprietary DataViewer software (SR Research, Mississauga ON) was able to adequately detect the
majority of saccades correctly. Therefore, processing with the main sequence-based saccade
detection algorithm developed during the pilot study was not necessary. However, all of the raw data
was again visually inspected and any problem areas corrected manually. As in the pilot study,
saccade data was also post-processed for minimum amplitude and detection of saccades during short
periods of pupil occlusion.
ISI/Saccade data were summarized using three variables: “trial proportion of long ISI” (percent),
median ISI length, and average saccade rate. While long ISI have been previously recorded in terms
of their frequency (Callan, 1998), this measure would have been misleading during the short tasks
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 93/167
83
involved with this study. There were several cases wherein only one or two saccades were executed
during the 30 second trial. Instead, the cumulative duration of all long ISI during a trial were
expressed as a percentage of the total trial duration. In this way, trials with very few, extremely long
ISI would be more adequately compared with those in which several, moderately long saccades were
executed. A long ISI was defined as being at least 1500 ms in duration. This threshold was found to
best distinguish a task difficulty and motivation treatment effect. Finally, note that eye movement
data from the first 2000 ms of each trial were disregarded in the calculation of these summary
variables. It was assumed that this time was necessary for the most participants to reach a steady-
state in their execution of the tasks.
The eye tracker also recorded blink events, defined by the loss of the corneal reflection and/or pupil.
This loss was presumed to be caused by occlusion of the eyelid, but observations of the participants
revealed that it could also be caused by eye positions outside the range of the tracker. At these
extreme visual angles, the tracker had problems for two reasons: 1) reflectance threshold of the pupil
was too high (i.e., apparent pupil reflectance began to approach that of surrounding tissues), and/or
2) the curvature of the eyeball was such that the corneal reflection was not present. With the use of
the eyebrow lifting technique described previously, actual blink rates were very low in some
participants, so that the majority of their “blink” events were actually due to out of range looking.
Therefore, blink rate was not used as a measure in this study.
5.2.2.2 Electrodermal Activity and Heart Rate
EDA and heart rate information were collected as in the pilot study, with the only notable change
being the use of a high impedance ground strap on the participant’s ankle. Through the course of the
experiment, static electricity became an issue as humidity levels dropped with the change of season.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 94/167
84
In participants tested before the issue was identified, lifting of the heel resulted in an EDA signal
artefact, much like a movement artefact. The raw EDA waveform of all participants was visually
inspected and regions with either artefacts were excluded from analysis.
The EDA signal was characterized by the frequency of spontaneous skin conductance responses
observed. The signal was digitally filtered with a 1 Hz low-pass FIR filter (700 coefficients), which
effectively enforced the one second minimum wave period limit used by Storm et al. (2000). An
algorithm was created to process the filtered signal, searching for local max/minima and
determining the amplitude of individual responses. As per the recommendation of Storm et al., a
minimum 0.02 microSiemens amplitude threshold was used.
Average heart rate was calculated using the “AcqKnowledge” data acquisition/analysis software
(BioPac Systems, Chicago IL) following a visual inspection of the R-wave signal for movement
artefacts. The use of heart rate variability as a dependent measure was considered, the trial time was
deemed too short for reliable estimates of medium range variability (0.07 - 0.14 Hz) favoured by
previous studies of mental workload. Although it has been stated that measurement windows as
narrow as 30 seconds can be used (Mulder, 1992), preliminary testing suggested that conventional
analysis methods were not suitable. Therefore, heart rate variability measurement was considered
outside the scope of this study.
As with eye movements, heart rate and EDA data were ignored for the first 2000 ms of each trial.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 95/167
85
5.2.2.3 Neuropsychological Tasks
The task battery was identical to that administered in the pilot study, but the length of each trial was
decreased from 60 to 30 seconds. The reason for this change was to allow for a single repetition of
each task condition without substantially increasing the overall length of the experiment. Trial
repetition not only improved the reliability of the experiment, but also served as a safeguard against
data loss due to eye closure, misunderstanding the instructions, or other unforeseen circumstances.
It was also thought that participants would better avoid distraction during a shorter task.
As in the Pilot Study section, the three neuropsychological tasks, properly called serial subtraction,
verbal fluency, identification of words in noise, will be referred to as “math task,” “fluency task,” and
“auditory task” in this section.
Having two repetitions of each task required double the number of fluency task starting letters, so a
new set was chosen using the results of a previous investigation of this task by Borkwoski (1967).
Tests in naive participants (n = 5) verified that letters in the nominal difficulty set (b, m, t, and f)
and the high difficulty set (k, q, y, and j) resulted in response counts that were consistent within each
set but substantially different between difficulty levels.
The task battery script was also modified to use a white background instead of a black background.
This modification caused participants’ pupils to be more constricted, which improved the eye
tracker response by helping to preventing pupil occlusion during partial eyelid closure or downward
looking.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 96/167
86
5.2.2.4 Motivation Manipulation
The “motivation” of one group of participants was manipulated by instructing them to try less hard
on the second half of the experiment. Unlike the pilot study, there was no mention of a contest. The
new method was adopted because it had been demonstrated that participants were generally able to
self-regulate their effort, but the previous “second approach” had two important problems: 1)
possible ecological invalidity due to extremely high stress levels in some participants, and 2)
experimental asymmetry. The asymmetry arose because control group and motivation group
participants were not given equivalent treatment in the first half of the experiment. Considering the
discussion of learning and effort in the Literature Review section, it stands to reason that the effect of
practice would have been higher in the control group. It is furthermore conceivable that the control
group was potentially more susceptible to fatigue effects. These possibilities would have unduly
complicated interpretation of the results.
One drawback to the new motivation manipulation approach is that it requires participants to be
sufficiently motivated in the first half of the experiment to create some contrast with the second half.
However, pilot experimentation had suggested that the majority of participants were intrinsically
motivated to do well, and this inclination was augmented by clearly instructing participants to “try
their best” in the first half.
5.2.2.5 Post-Experiment Questionnaire
Subjective workload ratings were not interspersed within the task battery session, as they were in the
pilot study, being instead replaced by a post-experiment questionnaire. The primary reason for this
change was to save time, as doubling the number of trials would have otherwise resulted in an
unfeasibly long session duration. Furthermore, the results of the pilot study indicated that the
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 97/167
87
relationship between subjective ratings with experimental conditions was only uncertain for the
manipulation of verbal fluency difficulty as well as that of motivation for all tasks. Participants’
perceptions of the verbal fluency task are addressed in Parts 1(c) and 1(f) of the questionnaire
( Appendix D), where they are asked to sort the various starting letters according to their associated
difficulty and effort levels. Participants assess the effects of the motivation manipulation in Part 4 of
the questionnaire, which asks whether they were successful in down regulating their effort level on
the second half of the experiment. Part 1(g) is also important, as it verifies that participants entered
the experiment with a high level of motivation.
The questionnaire also includes questions on the participants’ awareness of their eye movements
(Part 3), or lack thereof. They were asked whether they consciously stared, whether they felt as if they
were supposed to be looking anywhere in particular, and whether they otherwise thought about their
eye movements during the experiment.
Questions regarding changes in mental strategy were also included, to identify any gross changes in
strategy that would cause misleading mental effort or task performance effects.
5.2.3 Design
The design of the experiment is similar to that of the pilot study, but with an additional block (two in
total) of practice trials to minimize practice effects and four blocks of experimental trials (rather
than two blocks, as in the pilot study). Because the trials were half as long at in the pilot study (30
seconds) and subjective ratings were queried after task battery completion, the session still lasted 30-
40 minutes. Control group and motivation manipulation groups are treated identically for the
practice blocks and the first two blocks of the experiment, but only the motivation group was asked
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 98/167
88
to exert less effort for blocks three and four. As with the pilot study, the presentation order of the
three task types and two difficulty versions (six different tasks in total) in each block was
counterbalanced between participants using a balanced latin square design. Task prompting stimuli
(e.g., starting numbers for serial subtraction or starting letters for verbal fluency) were also
counterbalanced so that half of participants received a given stimuli set in blocks one and two, while
the other half received that set in blocks three and four. Thus, there were 24 group, task order, and
stimuli order combinations in total, which were each administered to a single participant.
The independent variables for this study were:
4. Task Type
5. Task Difficulty Level
6. Presence of Demotivation Instructions
The outcome measures were:
1. Subjective Reports of Effort and Perceived Task Difficulty
2. Number of Correct Responses
3. Number of Incorrect Responses
4. Trial Proportion of Long ISI
5. Median ISI Length
6. Average Saccade Rate
7. Average Heart Rate
8. Spontaneous Skin Conductance Rate
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 99/167
89
5.2.4 Procedures
The testing procedures were identical to those in the pilot study, but for the following changes:
o in the guidelines included with the consent form, participants were asked to avoid wearing
mascara to the study; makeup removal pads and the option of rescheduling were both offered
to any participants who had not complied
o the monitor was placed as far away from the participant as was comfortable (60 cm), in order
to address concerns that the proximity of the monitor may cause participants to assume that
they are required to stare straight ahead
o a high impedance grounding strap was attached to participants’ ankles, to prevent static
electricity-related artefacts in electrophysiological signals
o participants were told that the eye tracker was being used to measure pupil diameter (which it
is also capable of recording), and they were told they did not need to remember to stare
straight ahead because it could make this measurement regardless of gaze direction
o a piece of medical tape was used to affix the eyebrow in a slightly raised position, ensuring that
the participant could still blink freely
o participants were not asked to avoid closing their eyes unless eye closure was observed during
the practice session or later in the experiment; this request was only necessary in a few
participants
o
after the practice blocks, all participants were told that it was necessary for them to give their
best effort on the tasks, while avoiding frustration on those that are more difficult
o participants belonging to the motivation manipulation group were given instructions midway
through the experiment to “try as little as possible” on the last half of the experiment; the
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 100/167
90
research coordinator entered the testing room at this point for all participants, to answer any
questions and ensure their comfort
o calibration was not performed on each participant, but on the research coordinator before the
participant arrived, in order to avoid drawing attention to the actual purpose of the eye
tracker; the accuracy of this calibration was verified for each participant after the task battery
was completed, using a four-point validation pattern; these validations revealed less than 20%
gaze position error over a visual angle of approximately 30°, which is adequate for the current
study because accurate gaze position and velocity estimates were unnecessary
o at the end of the experiment, participants were asked to fill out a post-test questionnaire
( Appendix D), which included questions about subjective effort, task difficulty, any conscious
control of eye movements, and (if applicable) perceived success in down-regulating their effort
during the last half of the experiment
5.2.5 Analysis
The analysis was completed used a condensed dataset, wherein repeated observations from blocks 1
and 2 were combined into an average “1 st half” measurement, and those from blocks 3 and 4 into a
“2nd half” measurement. Where one of the observations was deemed unusable due to artefacts,
equipment failure, or less commonly, misinterpretation of task instructions, its equivalent trial
repetition was used alone.
As in the pilot study, the goals of the analysis were to seek dependent variable responses to
experimental manipulations of task difficulty and motivation level. These responses were sought for
each dependent variable and for each task type (math, fluency, and auditory). The hypotheses of this
study were:
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 101/167
91
1. Average saccade rate will be significantly, positively correlated with task difficulty on the
auditory task
2. Average saccade rate will be lower where participants are motivated to do well on the
auditory task
3. For both manipulations of motivation and difficulty, average saccade rate findings will
converge with those in heart rate, spontaneous skin conductance response rate, self-reports
of subjective effort, and task performance
4. The trial portion of long ISI will be greater where either the difficulty or motivation level
during the auditory task is increased
A preliminary inspection of the data was completed using box and whisker plots of raw change
scores resulting from each dependent and independent variable combination ( Appendix E). There
were three conclusions: 1) observed effects were very small and vary widely between measure and
group membership, 2) the distributions of observation groups (i.e., individual conditions,
represented by a single box and whiskers) were often not normal, and 3) the distributions of
observation groups within each independent variable combination varied widely (i.e., different
skews and variances). An independent variable “combination” is referring to a set of observation
groups that would be compared to test an effect hypothesis.
5.2.5.1 Parametric Model Assumptions
The third conclusion of the box plot inspections is particularly critical to the model choice for
hypothesis testing. The obvious choice of analysis for the design of this study is a mixed (repeated
and between subjects) analysis of variance (ANOVA). However, ANOVA are parametric analyses
that model populations using assumptions that should be reflected in the observed samples. One of
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 102/167
92
those assumptions is that all populations are normally distributed1. It is often possible to transform
data that do not initially meet this criterion, but this may not be the case where different observation
groups exhibit characteristics that require different transformations. Simply put, it can occur that a
transformation may increase normality for one observation group while decreasing it in another.
This problem was confirmed in some subsets of the data through the (failed) use of these common
transformations:
)1log( i
t
i X X += (1)
( ) ii
t
i X X
X −+
=
1
1
max,
(2)
i
t
i X X += 1 (3)
where t
i X is the transformation of observation
i X . Note that in Eqn. 1 and 2, 1 was only added
toi X where it was required to transform a null measurement. Eqn. 2 is a form of a more common
transformation wherein each observation is subtracted from the maximum observation (+1) in order
to preserve the direction of change scores, as recommended by Field (2005). Normality between
observation groups was tested using the Shapiro-Wilks test, which was administered as with all
statistical methods using SPSS v17 (SPSS Inc., Chicago IL).
Another criterion for the parametric analysis of variance (ANOVA) is that of homoscedascity (equal
variances). A mixed ANOVA requires equal variance of changes between treatment levels for each
factor (versus equal variances of sample populations in a between subjects ANOVA).
1 ANOVA for independent sample groups (i.e., between subjects) requires normality of observation group
residuals, which are the difference between each observation and their group mean. However, a mixed
ANOVA is an analysis of change scores to determine the significance of change between treatments as well as
whether the average change seen in one participant group is different to that in another group. Thus, this
model assumed the normality of change score residuals. It can be shown that normality of observation group
samples is mathematically equivalent to normality of change score residuals.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 103/167
93
Heteroscedascity is of greatest concern where sample sizes are unbalanced between groups, as the
variance of the larger group will bias the total (pooled) variance calculation. Therefore, although
Levine’s test of equal variances was generally used to determine whether this was a concern for a
given dataset, a more conservative criterion was used for unbalanced datasets (e.g., where
observations are missing). This threshold was taken from Field (2005), who recommends that the
ratio of the maximum to minimum observed sample variance should not exceed two.
Although outlier removal is another option for improvement of both homoscedascity and normality
of the comparison groups, none were omitted in this analysis. There were certainly outliers present
in the data by any definition of the term (e.g. 1.5 inter-quartile range criterion), but there was no
simply no practical justification for their removal. The validity of individual observations could be
confirmed with some certainty because each trial was documented through audio recordings. Thus,
participants’ behaviour on outlier trials was thereby verified as being consistent with that on other
trials.
5.2.5.2 Hypothesis Testing Methods
Where datasets exhibit problematic heteroscedascity or non-normality, non-parametric hypothesis
tests must be used. Unfortunately, no non-parametric test of mixed design effects has gained any
widespread acceptance, so any non-parametric analysis of the data must be somewhat more
piecemeal, addressing each experimental manipulation separately. The effect of difficulty level was
assessed where necessary using a Wilcoxon’s signed-rank test (analog to paired t -test) between levels.
The effect of motivation could have been assessed in two possible approaches: 1) use a Wilcoxon’s
test to determine 1st to 2nd half effect sizes in both the control and motivation group, then compare
effect sizes, and 2) compute change scores for each group, then use a Kolmogorov-Smirnov (K-S)
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 104/167
94
test (analog to unpaired t -test) to indicate whether the two groups experienced the same effect. The
K-S test is similar to the more common Mann-Whitney test, but has been recommended for small
sample sizes (Field, 2005). The second approach, involving change scores, was preferred because it
did not depend on the reliability of the effect size calculation. However, it will be shown that either
of these methods lack an important feature, which is the ability to account for the correlation
between pre- and post-treatment measures. Therefore, the results had to be interpreted with caution.
Where data fulfilled the assumptions of a parametric test, these tests were favoured, as they are more
powerful under these circumstances. As previously mentioned, a mixed ANOVA was one approach
to analyzing the data, but it was not deemed the most appropriate parametric test under all
circumstances. In many cases, its interpretation would have been problematic because 1st half
observations were highly correlated with the magnitude of change scores, and they tended to be
higher in one of the control or motivation groups compared to the other. The correlation of change
with pre-treatment levels can be the result of two very common phenomenon: 1) regression to the
mean, and 2) the law of initial values (LIV). LIV refers to a property of many psychophysiological
measures wherein the baseline level limits the amount of change that is possible (Stern, 2001). In
general, these phenomenon do not pose a problem to the use of the mixed ANOVA because mean
pre-treatment levels are roughly equivalent in the control and experimental groups, by virtue of
random assignment. However, if pre-test observations happen to be higher in the experimental
group and the change is strongly correlated to change scores, as was encountered, then ANOVA will
rightly report a difference between the groups in their average change scores. Although this
interpretation is valid, Hedeker’s (2006) interpretation of The Lord’s Paradox illustrates that it is not
the only one available. Another, more appropriate interpretation would answer the question of
whether a motivation group participant tends to exhibit more or less change compared to a control
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 105/167
95
participant, given that they started at the same pre-treatment level. It should be noted that this is not
the goal of the study, as a mental effort measurement tool should practically be capable of detecting a
difference regardless of starting level (to a point). However, this approach was simply a means of
rectifying the problem of experimental groups exhibiting non-equivalent pre-test conditions for a
measure that was correlated to change scores.
There are two common methods of distinguishing pre-treatment correlation effects from treatment
effects, and they are both based on regressing the change scores (or equivalently, the post-treatment
measure) on the pre-treatment measure. In one approach, the residualized change score, the
regression is performed by pooling the data from both experimental and control groups together
and fitting a regression line x2,i = mx1,i + C, where x1,i are the pre-treatment and x2,i the post-
treatment observations. The residuals for each observation i are then taken as the residualized
change scores. Another approach is to perform an analysis of covariance (ANCOVA) on the post-
treatment measure versus group, but identify the pre-treatment measure as a covariate, thus
accounting for its effect. Although the difference between these methods is generally small, the
ANCOVA method is preferred where the groups have very disparate mean pre-treatment levels
Forbes (2005). Figure 11a illustrates the issue with its representation of the residual change score
method being applied to a data set with a wide spread between control and experimental groups.
Because the regression line does not actually reflect the relationship between pre- and post-
treatment measures, x1,i and x2,i, the mean difference in residual change scores clearly underestimates
the actual treatment effect ( A-B). However, the ANCOVA (Figure 11b) method more accurately
interprets the effect ( A-B) by assigning individual regression lines to each group and identifying the
difference the two regression lines’ intercepts. The significance of this difference is calculated with
respect to the variance of residuals for each group. Note that any linear regression technique requires
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 106/167
96
a parametric model, with assumptions of homoscedascity and normality of residuals. Furthermore,
these methods obviously require equivalence of within-groups regression slopes, otherwise the
treatment effect cannot be distinguished (Cronbach & Furby, 1970). Where slopes agree and the
correlation between pre- and post-treatment measures is high (r > 0.4), they recommend the use of
the ANCOVA method described above. It will be seen that these conditions applied in a number of
cases.
Figure 11a (left). Illustrating error resulting from residual change score method on unmatched groups;
Figure 11b (right). ANCOVA method is preferred in this case because it does not pool data for regression.
5.2.5.3 Multiple Comparisons Correction
An important consideration in this study was the correction for multiple comparisons, which is
necessary to correct for the chance of a type I error; that is, the greater number of hypothesis tests
taken, the greater the likelihood of a false positive error occurring. This effect can be balanced using
the Bonferroni correction, which essentially involves decreasing the significance threshold (α) by a
x 1
ControlGroup
ExperimentalGroup
pooledregressionline A
B
ControlGroup
ExperimentalGroup
within-groupsregression
A
B
x 2 x 2
x 1
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 107/167
97
factor of the number of tests applied (k). However, the Holm-Bonferroni method, which is a less
conservative alternative, was used. After each hypothesis test was completed, their associated p-
values were ordered and the lowest p-value tested against the Bonferroni adjusted criterion (α/k). If
the null hypothesis was rejected, then the next p-value was compared with (α/k-1). This process
continued until a null hypothesis was eventually rejected. Note that this correction process was
completed separately for each experimental manipulation, the rationale being that each
manipulation could be considered relatively independent of each other. That is, making k
comparisons toward the hypothesis that the experimental manipulation had some effect should not
have affected the type I error likelihood for subsequent tests of the difficulty manipulation. By the
same token, it could be argued that each outcome measure was investigating an independent
phenomena, and that each task represented an independent (sub-) experiment. However, it was
determined that these variables were each too closely inter-related due their common reliance on the
effectiveness of the manipulations, which must always be considered an uncertainty in studies of
mental workload. Thus, considering the small effect sizes predicted by the visual inspection, it was
clearly necessary to control type II error (false negative) by limiting the number of comparisons for
each experimental manipulation.
To this end, four of the original nine dependent variables were excluded from analysis (subjective
effort data is not suitable for hypothesis testing). The excluded variables were performance (correct
and incorrect responses), median ISI length, and spontaneous SCR rate. Exclusion of performance
data was simply a consequence of having not identified any promising/interesting trends during
visual inspection. The exclusion of median ISI length and SCR rate require more explanation.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 108/167
98
Median ISI length was effectively replaced by the average saccade rate, as it was found that the these
two measures correlated very closely, which is illustrated in Figure 12 (note logarithmic axes). The
inverse relationship is predicted when the sum of all saccade durations during a trial are negligible
with respect to the observation period. The median ISI length offers two advantages over the median
ISI length: 1) confidence in measure, and 2) resistance to outliers. With reference to the first
advantage, recall that it was necessary to detect the presence of saccades during “blink” periods in
which the eye position was unknown. If detected, the precise moment of the saccade had to then be
estimated at the centre of the blink period. This practice obviously lends some uncertainty to
subsequent ISI length calculations, while the mere presence of a saccade is more certain. With
reference to outliers, it was found that median ISI length data exhibit extreme positive skew and was
resistant to common transformations toward a normal distribution. Saccade rate data was much less
problematic in this regard.
y = 19000x-0.9
R2 = 0.86
1
10
100
1000
100 1000 10000 100000
Median ISI (ms)
S a c c a d e R a t e ( 1 / s )
Figure 12. Plot of median ISI versus average saccade rate for all trials reveals a very
consistent (r 2 = 0.86) inverse relationship; note logarithmic axes.
Spontaneous SCR data was primarily disregarded because it was sparse. Large sections of the EDA
signal had to be excluded because they exhibited movement artefacts, which masked any actual
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 109/167
99
response occurrences. As a result, almost 20% of observations were missing values. In some cases,
comparison group sizes were thereby high unbalanced, which can be problematic with variance
violations in parametric tests, as previously discussed. Furthermore, interpretation of the data would
have been complicated by the inclusion of several EDA “non-responders,” which is defined by
Kettunen (2000) as a person in which less than one spontaneous response is elicited per two minutes.
A less conservative approach to the Holm-Bonferroni correction was also realized by limiting the
number of comparisons to those identified as “promising” through a visual inspection. The most
promising trends were identified from the aforementioned change score box plots and subsequently
tested for statistical significance using the method most recommended by the discussion in the
preceding section. Table 4 is a summary of these trends, with references to locate their positions on
the box plots of Appendix E.
Table 4. Summary of Trends Subjected to Statistical Analysis
MeasureTask
Type
Description of TrendAppendix E
Reference
(-) difficulty effect in 1st
halfplot group: 1reference: A
Fluency(+) motivation effect for high difficultyversion of task
1B
(-) difficulty effect in 1st
half1C
AverageHeart Rate
Auditory(+) motivation effect for high difficultyversion of task
1D
Math (+) difficulty effect in 1st
half2E
(-) difficulty effect in 1st
half2F
AverageSaccade
Rate Auditory(+) motivation effect for high difficultyversion of task
2G
(+) difficulty effect in 1st
half3HTrial
Proportionof Long ISI
Auditory(-) motivation effect for high difficultyversion of task
3I
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 110/167
100
It should be noted that the problem of multiple comparisons is not conventionally alleviated through
visual inspection of the data and subsequently omitting dependent variables and individual
conditions from analysis. Arguably, the act of visual inspection is a form of an hypothesis test, so
that the resultant analysis must be corrected for the total number of possible comparisons, rather
than just the number of “promising” trends. However, this approach was chosen because no
alternative method of reducing type II error likelihood was found. In short, the conventional Holm-
Bonferroni method was deemed too conservative.
To be thorough, the results of a post-hoc, full hypothesis test battery are also included ( Appendix F ).
This analysis included comparisons between all experimental conditions, for all dependent variables
not excluded on the grounds of technical difficulties. It can be shown retrospectively that the
findings of the conventional approach do not differ to any consequential extent from those
presented in the Results and Discussion.
5.3 Results and Discussion
5.3.1 Hypothesis Testing
Hypothesis tests were conducted on each of the trends noted in a visual inspection of the data, which
were presented in Table 4. The method of testing was dependent on the characteristics of each
dataset, so they are detailed below. Where p-values meet the criterion of significance (α = 0.05 for
two-tailed tests), they are reported without reference to its significance, as this determination
depends on the final results of the Holm-Bonferroni correction method.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 111/167
101
5.3.1.1 Average Heart Rate
Fluency Task
The comparison groups in this dataset all passed the Shapiro-Wilk test for normality and Levine’s
test for homoscedascity. Because the sample groups were unbalanced due to a technical issue with
one of the control participants, the ratio of maximum to minimum variance was calculated: 1.9.
Since this value was less than the threshold suggested by Field (2005) of 2, parametric analyses were
deemed appropriate. A mixed ANOVA reported p = 0.010 for the effect of task difficulty (F (1,21) =
7.94). The effect size was r = 0.52 Although it is common to report ω2 for ANOVA effect sizes, Field
(2005) recommends the more focused contrast effect size:
R R
R
df df F
df F r
+
=
),1(
),1((4)
Looking to the effect of motivation during the high difficulty version of the task, it was found that
the correlation between 1st half and 2nd half measures was very high, with ρ = 0.92. As per the
recommendations of Chronbach & Furby (1970), the effect of motivation level was tested using the
ANCOVA method rather than mixed ANOVA, with 1st half measures as a covariate. The test
confirmed a very strong relationship between 1st half and 2nd half measures with p < 0.000 (F (1,20) =
365.4, r = 1.00). It reported p = 0.002 (F (1,20) = 12.03, r = 0.88) for the effect of motivation during
the high difficulty version of the task. Note that the effect size, r , was calculated based on the
regression parameter t -statistic, as recommended by Field (2005) for unbalanced ANCOVA:
22
2
−+=
N t
t
r (5)
where N is equivalent to the total number of observations, including repeated measures.
Homogeneity of 1st versus 2nd half observation regression slopes was confirmed post-hoc, as per the
instructions of Field (2005).
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 112/167
102
Auditory Task
Although the data were tested to be sufficiently normal and homoscedastic, they did not meet Field’s
recommended criterion of max/min variance, with a ratio of 2.1. Furthermore, attempts to correct
the data using conventional data transformations (Eqn. 1,2, and 3) were unsuccessful.
Therefore, non-parametric analyses were chosen to test the effect of task difficulty. Two Wilcoxon
signed-rank tests were used for each experiment half over all participants’ data ( n = 23). The test
returned p < 0.001 (z = -4.04, r = -0.60) for the 1st half. The effect size was calculated as per Field:
N
zr = (6)
where N is equivalent to the total number of observations, included repeated measures.
Non-parametric analysis were also necessary to test the effect of motivation for the high difficulty
task version. A Kolmogorov-Smirnov test of change scores reported no significant effect, but this
result must be more closely scrutinized because it does not account for the very strong correlation of
1st and 2nd half measures ( ρ = 0.96). However, a plot of observations from the two experimental
groups (Figure 13) indicates that the there was likely no effect of motivation on HR.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 113/167
103
50
60
70
80
90
100
110
50 60 70 80 90 100 110
1st Half Average HR (bpm)
2 n d H a l f A v e r a g e H R
( b p m )
Control
Group
Motivation
Group
Figure 13. Individual observations of average heart rate in 1st half versus 2nd half for high difficulty
auditory task suggests no effect of motivation.
5.3.1.2 Average Saccade Rate
Math Task
The comparison groups of this dataset exhibited a high degree of homoscedascity (max/min variance
ratio = 6.3), and their characteristics that were not improved through the aforementioned
transformation techniques. Therefore, the effect of task difficulty in the first half of the experiment
was tested using Wilcoxon signed-rank tests over all participants’ data (n = 24). The result was that
the effect is not significant.
Auditory Task
All comparison groups passed the Shapiro-Wilk and Levine’s tests of normality and homoscedascity.
Therefore, parametric hypothesis tests were used.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 114/167
104
The effect of difficulty was tested with a mixed ANOVA, returning p = 0.002 (F (1,22) = 14.91, r =
0.60) for the general effect of task difficulty and p = 0.036 (F (1,22) = 5.01, r = 0.4) for the combined
effect of difficulty and group. Effect sizes were calculated using the contrast effect method (Eqn. 4).
The latter effect is referring to the tendency of motivated participants to exhibit a larger response to
task difficulty manipulations. It is difficult to interpret this result as being an interaction effect of
motivation or simply being due to an imbalance in the groups’ baseline observations, as difficulty
change scores are positively correlated to observations at the nominal level ( ρ = 0.64/0.72 for 1st/2nd
half).
The effect of motivation for the high difficulty version of the task was tested using the ANCOVA
method because of the high correlation between 1st and 2nd half observations ( ρ = 0.6). However, the
test did not find a significant effect for motivation, even for a one-tailed criterion (α = 0.1). A post-
hoc test for homogeneity of regression slopes verified that the model criteria were met.
5.3.1.3 Trial Proportion of Long ISI
Auditory Task
Without transformation, the comparison groups in this dataset passed normality and
homoscedascity tests.
The data tested using a mixed ANOVA, which reported p = 0.004 (F (1,22) = 10.07, r = 0.56) for the
effect of task difficulty. The effect of motivation was below a one-tailed significance criterion (α =
0.1), having p = 0.057 (F (22,1) = 4.02, r = 0.39).
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 115/167
105
However, because the 1st and 2nd half observations were highly correlated ( ρ = 0.61) , the ANCOVA
method of testing the effect of motivation is the preferred method of analysis. Contrary to the mixed
ANOVA results, it reported a non-significant effect for motivation.
In view of this disagreement, a scatter plot of change scores versus 1 st half observations for the high
difficulty version of the auditory task is presented in Figure 14. A distinction between the two groups
is evident, though their within-groups variance may preclude statistical significance. Note that when
two motivation group outliers (marked in Figure 14 with “****”) were removed from the dataset, the
ANCOVA method returned p = 0.029 (F (1,19) = 5.58, r = 0.48) for the effect of motivation. However,
outlier removal was not justified, as these two participants exhibited this behaviour consistently
(considering both trial repetitions) and were furthermore not found to differ from the other
participants in any other aspect.
0
20
40
60
80
100
120
0 20 40 60 80 100 120
1st Half Long ISI Portion (%)
2 n d H a l f L o n g I S I P o r t i o n ( %
)
Control
Group
Motivation
Group
Figure 14. Individual observations of Trial Portion Long ISI in 1st half versus 2nd half for high
difficulty auditory task suggest a weak motivation effect.
*
*
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 116/167
106
5.3.1.4 Holm-Bonferroni Correction
Following the multiple comparisons correction method described in the Analysis section, the
significances of the preceding results were determined. In total, 5 hypothesis tests were performed
with regards to the effect of task difficulty and 4 tests regarding the effect of motivation. Therefore,
the minimum (starting), corrected significance criteria were 0.010 and 0.013 for difficulty and
motivation manipulations. The significance of interaction effects (involving changes in both the
difficulty and motivation conditions) were evaluated using whichever of the two criteria that was
smaller (more stringent).
As a result, only the effects if task difficulty were found to have significant effects on any of the
measures. Specifically:
o An increase in verbal fluency task difficulty was associated with a decrease in average heart
rate (effect size, r = 0.52)
o A decrease in motivation during the high difficulty version of the verbal fluency task
difficulty was associated with an increase in average heart rate (effect size, r = 0.52)
o An increase in auditory task difficulty was associated with a decrease in average heart rate
(effect size, r = 0.60 /0.42 for 1st and 2nd halves of experiment)
o An increase in auditory task difficulty was associated with a decrease in average saccade rate
(effect size, r = 0.60); this finding confirms Hypothesis (1), where it refers to the auditory
task
o An increase in auditory task difficulty was associated with an increase in the trial proportion
of long ISI, which are defined as exceeding 1500 ms in duration (effect size, r = 0.56); this
finding confirms Hypothesis (4), where it refers to the auditory task
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 117/167
107
Regarding Hypothesis (4), A non-significant, but strong trend was also noted: a decrease in
motivation during the high difficulty version of the auditory task was associated with a decrease in
the trial proportion of long ISI. This result is consistent with the observations of the Pilot Study,
wherein more motivated participants exhibited more frequent long ISI compared to less motivated
participants.
Findings with respect to Hypothesis (3) will be discussed in a later section, Convergence of Measures.
5.3.2 Post-Experiment Questionnaire
With respect to participants’ ratings of task difficulty and effort, in Part 1 of the questionnaire
( Appendix D), the results were similar to those gathered in the pilot study, but there was clearer
evidence that participants were able to differentiate between perceived effort and difficulty. Whereas
participants unanimously identified manipulations of math and auditory task difficulty, they less
consistently equated increased task difficulty with increased effort. 21% (5 participants) and 12% (3)
reported no change in effort between math and auditory task difficulty levels, respectively. The
observation that some participants made a blatant distinction between effort and difficulty lends
further support to the capacity of subjective effort ratings to differentiate between them. Not only is
this important because there have been proposals otherwise (see Gopher & Donchin, 1986), but it is
arguably counterintuitive to the everyday experience where effort is presumably very closely coupled
to task difficulty. This coupling is reflected in the “control-system” model of mental workload
wherein our perceptions of task performance serve as feedback in effort allocation decisions (Robert
& Hockey, 1997). However, in the context of the experiment, where the goal was to “try” rather than
perform, it is conceivable that effort could have been independent of task difficulty.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 118/167
108
Subjective ratings of the fluency task were of particular interest because those ratings taken during
the pilot study were less consistent in supporting the success of the difficulty manipulation than in
those concerning the math and auditory tasks. Furthermore, the length of the trials had been
shortened compared to the pilot study, which may have narrowed the performance gap between
difficulty levels. Participants were given a list of the stimulus letters and asked to sort them into three
categories ( Appendix D, Part 1c): “Less Difficult,” “More Difficult,” “Neutral or Don’t Remember.”
This procedure was repeated for the relative effort levels (e.g. “Tried Less”) associated with the letters.
Each participant’s reports were compiled into Figure 15 and 16, which illustrate that ratings of
starting letter difficulty were more consistent than those for effort. However, in both cases, the trend
clearly distinguishes between those letters defined as nominally difficult and those defined as highly
difficult. Furthermore, box plots of fluency task change scores between difficulty levels ( Appendix E,
plot group 9, reference J) show that the manipulations of fluency task difficulty were successful in
affecting task performance.
0
5
10
15
20
25
k b f t j m y q
Starting Letter
R e p o r t C o u n t
"Neutral or Don't
Remember"
"Less Difficult"
"More Difficult"
Figure 15. Tally of participants’ starting letter perceived difficulty classifications.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 119/167
109
0
2
4
6
8
10
12
14
16
18
k b f t j m y q
Starting Lette r
R e p o r t C o u n t
"Neutral or Don't
Remember"
"Tried Less"
"Tried More"
Figure 16. Tally of participants’ starting letter perceived effort classifications.
Parts 1g and 1h asked participants to rate their effort level on the 1 st half of the experiment and
whether their effort level changed from the 1st to 2nd halves. The number of participants who
reported trying their “hardest” on the 1st half of the experiment were roughly equivalent between the
motivation and control groups (9 and 7 participants, respectively), while all others reported trying
“somewhat.” This result indicates that, in general, the assumption of intrinsic motivation was valid.
Participants’ ratings of relative effort between the 1st and 2nd half are also divided between the
motivation and control group, with the number of participants who reported trying harder in the 1 st
half being 9 in the motivation group but only 1 in the control group. This difference is not surprising
considering that motivation group participants were explicitly given instructions to try less in the 2 nd
half, but it is interesting that so few control group participants tried harder in the second half. This
result seems to indicate that the effect of practice may have been offset by that of fatigue.
Another section of the post-experiment questionnaire (Part 2) queried participants on their
perceptions of their eye movement behaviours. Half of participants reported having consciously
fixated for the purposes of concentration in general. This result could either be interpreted as a
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 120/167
110
failure of the experiment to avoid participants’ conscious control of their eye movements, or an
explanation of the underlying cause of any trend between task difficulty and eye movements, as will
be discussed later on. A single participant reported having assumed that they were required to look
straight ahead, despite instructions otherwise, but also that the assumption did not preoccupy them.
Furthermore, this participant’s eye movement observations did not exhibit outlier behaviour.
References to mental visualization were notably present in descriptions of mental strategy (Part 3).
17% (4 participants) reported visualizing the numbers during the math task; 8% (2) visualized
objects during the fluency task; and 4% (1) visualized the speaker during the auditory task. In
retrospect, a question specifically targeting mental visualization strategies should have been included
in the questionnaire. It is conceivable that many more participants used visualization strategies, but
would not have realized it unless asked.
The last section of the questionnaire (Part 4) queried motivation group participants on whether they
felt that a) they actually tried less in the 2nd half, and b) their task performance was subsequently
lower. The results of this section contrast starkly with those of Part 1h (above) in assessing the
effectiveness of the motivation manipulation. Roughly half of participants felt that they were
generally able to try less, and almost all of them did not feel their performance decreased, although
the majority acknowledged the effect of practice. Three participants mentioned that it was
particularly difficult to exert lower effort during the math tasks, as they repeatedly “caught”
themselves working harder.
5.3.3 Convergence of Measures
Because there is no gold standard for mental workload measurement, this study was designed to test
for a relationship between eye movements and workload through a convergence of several measures.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 121/167
111
This approach has the goals of demonstrating not only that eye movements were affected by
experimental manipulations, but that these manipulations were actually successful in changing effort.
However, the current study did not yield significant effects or strong trends for each measure,
manipulation, and each task type. In terms of eye movement measures, the auditory task was the
only case in which effects were exhibited with some confidence. At face value, the effect of the
motivation and task difficulty manipulations were as predicted, suggesting a negative correlation
between saccade occurrence and effort during the auditory task. But was effort actually manipulated?
For the manipulation of task difficulty during the auditory task, convergence of measures is
acknowledged, but not without caution. Subjective ratings appear to indicate a change in effort, but
these data must be interpreted with the understanding that subjective effort measures may only be
an indication of participants’ perceptions of experimental intent (see Gopher & Donchin, 1986),
rather than actual effortful feelings. The results of the pilot study should be more reliable in this
regard, as the presentation of a numerical scale is intuitively better suited to a qualitative assessment
than a categorical one. Looking back at the pilot study data, participants were almost unanimous in
rating their effort higher for the higher difficulty task. Turning to performance data, there is a clear
distinction in the number of correct responses. If effort regulation is viewed as a control system with
perceived task performance as a feedback variable, as in Robert & Hockey’s (1997) model, then a
performance effect is at least indicative of the potential for a change in effort. That said, it is entirely
possible that at least some participants could try equally hard on both tasks and still achieve different
scores. However, the model speaks to the most common strategy that people employ, which is to
increase their effort in response to performance decrements. There was also a significant effect of
task difficulty on average heart rate. However, this effect was the opposite of what is predicted by the
literature and therefore requires further discussion later on.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 122/167
112
There are fewer indications that the manipulation of motivation was successful in changing effort
levels. Half of post-experiment questionnaires from the motivation group did not indicate that the
manipulation was successful in affecting effort. Furthermore, these results should be considered
optimistic, due to participants’ perceptions of the experimental intent. Performance measures also
show very little effect of motivation, but although this result does not bode well, it is very possible
that a change in effort could have no effect on performance, as the auditory task was very likely data-
limited for most people (see Literature Review for explanation of “data-limited”). In other words,
any effort expended beyond a very nominal level would not affect participants’ ability to hear and
understand the words. That said, looking to the lack of any motivation effect on math and fluency
task performance, which should be resource-limited, may indicate that the manipulation was
generally unsuccessful in changing participants’ motivation levels. It could also indicate that the
theorized relationship between effort and performance is simply false. Recall that this view was
expressed by Kahneman (1973).
As with task difficulty, the effect of motivation caused the opposite effect in heart rate compared to
that predicted by previous literature. Heart rate is thought to be linked to effort through a stressor
response that can accompany mental workload, especially where there are perceived consequences to
participants’ performance (see Wilson 1991). However, this link would suggest a positive correlation
of heart rate and effort, rather than the negative correlation observed here. At face value, these
seemingly spurious results lend further support to the consideration of heart rate as an unreliable,
easily confounded measure of effort. However, an alternate explanation is that the decreases in heart
rate are a result of breath-holding, which is known to cause bradycardia (Smith, 1977). This
explanation lends itself best to the effect of auditory task difficulty, as breath-holding could have
been employed as a strategy to minimize breathing noise during the high difficulty version of the
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 123/167
113
task. In the fluency task, where significant heart rate effects were also observed, breath-holding may
have corresponded with responses, leading to bradycardia where fewer responses were given (high
difficulty level) or the participant was concentrating more intently on the task (high motivation
level). Conceivably, breath-holding occurred during memory searches between each response, just as
one pauses mid-step when they are trying to remember where they left their keys. Although tenable,
confirmation of this theory requires respiratory event data. Therefore, average heart rate trends
cannot be considered to have indicated any associated change in effort level due to motivation or
difficulty.
To sum the results of this section and previous sections regarding task performance and subjective
effort ratings regarding Hypothesis (3): A convergence of measures was strongly demonstrated for
subjective effort ratings, task performance, and eye movements (average saccade rate and portion
long ISI) for difficulty manipulations during the auditory task. Further, a somewhat weaker
convergence was shown for these measures for motivation manipulation during the auditory task. In
the math and fluency tasks, there was also demonstrated a weak convergence between subjective
effort ratings and task performance in the manipulation of task difficulty. However, convergence was
not otherwise demonstrated.
5.3.4 Agreement of Eye Movement Results with Previous Literature
As highlighted in the Literature Review, there have been very few studies that manipulated task
difficulty and/or motivation level while recording eye movements during non-visual tasks. Only
three applicable studies were uncovered, though they have been more generally discussed previously,
the details of their methodology and findings are summarized in Table 5.
Table 5. Summary of Relevant Previous Literature
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 124/167
114
CitationExperimental
Manipulation orComparison
Eye Movement (EM)Measure
Mean ResultsSample
Size(# Trials)
Klinger et
al., 1973
Math Task:Single Digit Addition
vs.“Moderately Difficult
Problems”
Number of seconds in a 30second trial that contained
at least one EM with a“deflection” > 65°
Increase in numberof seconds (does
not specify)
21
(3-8)
Ruth &Giambra,
1974
Verbal fluency task:High Motivation
vs.Low Motivation
Instructions
frequency of EM withcausing deflection > 1mm
(on EOG polygraphreadout)
70-95 EM/minutevs.
45-55 EM/minute
24(4)
Antrobus,1973
Identification of AudioTones:
Single Tone vs.Two-Tone vs.
Three-Tone Sequence
Frequency of EM withamplitude > 3°
Decrease infrequency (does
not specify)
NotDisclosed
Although the observations of the current study generally agree with the previous research on
auditory task workload (Antrobus, 1973), they are discordant with those on the math (Klinger et al.,
1973) and fluency (Ruth & Giambra, 1974) tasks. In either case, a positive correlation of eye
movement rates and workload is implied, whereas in the current study there was no trend observed.
Although this disagreement is certainly cause for concern, the credibility of these previous studies is
subject to some scrutiny.
A very important feature of Ruth & Giambra’s (1974) study is that participants in the high
motivation group were instructed to make their responses aloud to an observer, while those in the
low motivation group only thought about their responses. This difference may have had a large
impact on the observed effect, considering that the act of verbalization could have prompted the
occurrence saccades, just as it is associated with blinking (Stern et al., 1984). However, their
technique should also be noted as a clever manipulation of participants’ motivation because
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 125/167
115
participants whose performance is observed will be much more motivated than those who are not.
Such an approach might be considered in future research.
Klinger et al.’s (1973) results are unfortunately very difficult to interpret due to their unusual choice
of summary measure and eye movement amplitude threshold. It is also notable that there was a
much broader range of math problem difficulties presented by Klinger et al. than the current study.
Whereas Klinger et al. appear to have presented participants with two very different math tasks, the
two difficulty versions in the current study (counting backwards by 1’s and 2’s versus 8’s and 9’s)
were chosen because they are expected to involve very similar cognitive processes and strategies.
However, due to the author’s brevity in describing the moderate difficulty tasks, it is not possible to
speculate on their specific differences with the single digit addition task.
5.3.5 Eye Movements and Mental Workload
As previously discussed, mental effort has been inferred from the occurrence of saccadic eye
movements in operators engaging in a secondary, non-visual task though the relationship between
non-visual task workload and eye movements is poorly characterized. The results of the current
study confirm suspicions raised by very early research, indicating that a variety of eye movement
responses can be expected, depending on task characteristics. As has been the approach of previous
studies of non-visual tasks, these various responses can be conceptualized in a model of eye
movement control and mental workload, such with the arousal/interference avoidance model
introduced in the Literature Review section.
It is beyond the scope of the current study to prove or disprove any theorized links between effort
and eye movements. However, their general approach may be problematic because they tend to
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 126/167
116
imply a “hard-wired” link between eye movements and effort, when such an implication may be
unnecessary and misleading. In looking at the results of the current study alone, without the
influence of any preconceived model or previous results (of which there are precious few), an
alternate, more pragmatic perspective is equally viable.
Earlier in this discussion, it was suggested that the large number of participants reporting the use of
a conscious fixation strategy may be an indication of bias because it is the goal of the study to record
“natural” eye movements rather than affected ones. However, another perspective is that the
commonality of this strategy is actually the root cause of any observed link between eye movements
and workload. That is, people consciously or habitually stare when they are listening, and this
behaviour is more prominent when they either have difficulties with hearing a stimulus or when they
are more motivated to hear better. This explanation is not to say that the behaviour is not linked to
actual or perceived interference avoidance, but regarding it as a learned, rather than an intrinsic
response is of practical consequence. For example, with regards to the use of eye movements as a
clinical mental effort tool, this distinction is important because it means that it could never be a
sensitive measure in people that do not exhibit this behavioural pattern. If this is the case, then an
eye movement based measure of effort could join the ranks of many other physiological measures,
which may reliably demonstrate an effect over a sample group, but not necessarily in each individual.
5.4 Conclusions
The current study was successful in reinforcing the results of previous work in which eye movements
were linked to audio-perceptual task workload. In particular, average saccade rate and the
occurrence of long ISI (> 1500 ms) were correlated with task difficulty, and there was a non-
significant relationship suggested between long ISI and participant motivation level. The effect of
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 127/167
117
motivation appeared to be stronger in the more difficult version of the task, indicating an interaction
effect between difficulty and motivation.
Eye movement measures were not shown to correlate with effort manipulations during math and
verbal fluency tasks, which is at odds with previously documented findings (Antrobus, 1973; Klinger
et al., 1973; Ruth & Giambra, 1974). This disagreement casts some doubt on the success of the
motivation manipulation in changing participants’ effort levels, at least during the fluency task, as
Ruth and Giambra demonstrated a very strong motivation effect.
For all task types, a high degree of variance, both within- and between-participants, was observed.
However, this variability cannot necessarily be attributed to unreliability of the eye movement
measures themselves, as variation in individual participants’ responses to experimental
manipulations could not be ruled out through convergence of measures. In particular, post-
experiment questionnaire results suggested that the manipulation of motivation may have been
unsuccessful in affecting many participants’ effort levels.
At face value, the results of this preliminary study indicate that an ISI length- or average saccade-
based effort measurement tool would find only narrow application in neuropsychological testing
and rehabilitation treatment. Firstly, between subjects variance in eye movement effects was very
high, so the universality of the relationship between eye movements and effort may be questionable.
Secondly, there was only one task in which the response appeared to be significant with respect to
this variance: the auditory task. Thirdly, it was found that accurate detection of saccades during non-
visual tasks through current video tracking methods is not a trivial endeavour, especially in people
that have smaller eyes and/or tend to squint.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 128/167
118
However, these apparent roadblocks to clinical implementation are only overwhelming when the
needs of all clinical applications are considered together. Although eye movements may not hold
promise as a global measure of workload, one that is appropriate for all applications, there may well
be a niche for it. For example, the findings of the current study offer promise for the use of eye
movements as a diagnostic tool for mild traumatic brain injury. In this capacity, even an effort
measure that is effective for a single domain, such as auditory perception, would provide valuable
information on a clients’ cognitive functioning.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 129/167
119
Chapter 6. Limitations
The limitations of this study are expressed in terms of its ability to demonstrate the utility of eye
movements as a clinical measure of mental effort.
Use of Non-Clinical Population
It is likely that special considerations and findings may be associated with certain clinical
populations. Therefore, these populations will require specific investigation before the tool could be
employed with them.
Experimental Manipulations of Effort
Although the experimental manipulations of effort through task difficulty and motivation were
carefully developed through the use of pre- and pilot study observations, it is clear that they can be
further improved.
Only Two Levels per Experimental Manipulation
In order to conserve the total experiment duration, only two levels of each experimental
manipulation were made. The demonstration of an effect through three or more levels would not
only be a more convincing demonstration of the phenomenon, but the characteristics of the
response may speak to the resolution of the tool and the presence of any ceiling/floor effects.
Limited Effort Range
This limitation is particularly relevant to the math and fluency tasks, where the manipulation of
effort through difficulty was less strongly perceived by participants than with the auditory task. It
has been argued that the subtlety of difficulty manipulations in these tasks may have been
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 130/167
120
responsible for the lack of any observation effect, a finding that does not agree with the (limited)
previous literature.
Ecological Validity
Whereas the experiment was modelled after neuropsychological testing, there were also a variety of
departures that were deemed necessary for the sake of isolating particular phenomena. Once these
fundamental relationships have been demonstrated, a more ecologically valid experimental method
should
Limited Number of Task Types
The tasks were chosen in the current study because they had been previously associated with very
different eye movement responses to effort. The variety of responses observed in the current study
suggest that any generalization of these results must be approach with caution. Therefore, testing
with variety of task characteristics is recommended.
Inadequate Assessment of Underlying Mechanisms
This study was designed to demonstrate a relationship between eye movements and effort in the
context of the intended applications, rather than explain it. Although a full understanding of such a
complex phenomenon may not be possible, the clinical use of an eye movement based tool would
require identification of at least its primary factors. Otherwise, it would be very difficult to anticipate
confounding factors
Small Sample Size
Although any viable clinical effort measurement tool should demonstrate a reliable effect regardless
of sample size, the subtlety of any effort manipulation dictates that even the best planned experiment
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 131/167
121
will not have an equivalent effect across all participants. A general divergence of measures in the
current study indicates that this may have been the case in the current study. Thus, these
experiments can exhibit very high variance, though the measurement itself is sound in principle.
Repeated Measures Design
Although a repeated measures design is advantageous from an experimental point of view, in many
circumstances involving the clinical applications of interest it would be necessary to gauge variations
in effort across subjects. For example, during neuropsychological testing it may be useful to gauge a
client’s effort against some standard. Though indicative, a within-subjects effect does not
immediately confirm a reliable between-subjects effect.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 132/167
122
Chapter 7. Extensions
Clinical Applications
It is important to reinforce one of the conclusions of the Literature Review: that researchers have
generally turned from seeking a global measure of workload to seeking measures that are application
specific. Therefore, in further evaluations of eye movements as a clinical effort measure, the first step
should be to determine which neuropsychological and rehabilitation medicine applications are likely
to be compatible with the method. Although the results of the current study recommend tasks
involving auditory perception, it is necessary that future research confirms and builds upon these
findings. In particular, a logical extension would be the exploration of other perceptual domains,
such as touch and smell.
After this preliminary work has been completed, there will still be a great deal of research necessary
to characterizing an eye-based effort measurement tool’s capabilities. Inter- and intra-participant
reliability, sensitivity, ceiling/floor effects, and resolution are all basic measurement tool
characteristics that will need to be assessed. In addition, it will be necessary to investigate whether
certain patient populations are unsuited to the tool, whether because tracking their eyes poses a
technical issue, or because they do not exhibit the same eye movement responses that healthy people
do.
Additionally, the use of mental effort as a diagnostic tool requires research into the validity of the
endeavour itself. As the Literature Review suggests, there have been indications that effort
measurements would be an effective extension to conventional neuropsychological tests, especially
in the detection of very mild impairments. However, the consequences of false diagnoses call for a
very rigorous investigation of any new methodologies.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 133/167
123
Eye Movements and General Workload Measurement
The results of the current study suggest that the most promising application of an eye movement
based effort measurement tool is in tasks involving audio perceptual workload. Therefore, it is
predicted that the most gainful research would expand on this particular finding. Considering that
the end goal of this research is to develop a tool that is sensitive to changes in individuals’ effort
rather than trends in a sample population, it would be prudent to adopt an approach that better
controls for individuals’ various responses to experimental manipulations. As it should be expected
that any experimental manipulation, no matter how cleverly posed, will not be successful in affecting
effort in each participant/trial, there is a need to 1) better gauge the success of experimental
manipulations, and 2) better account for any variation in their success.
The current study has demonstrated that subjective effort ratings are one tool for the former
objective. However, where the focus of the experiment is narrowed to audio perceptual workload, it
may be possible to seek other, more specialized measures that suit this particular application. For
example, pupil diameter has been shown to be effective in a previous study of driving and auditory
perception (Recarte & Nunes, 2003).
To the purpose of accounting for variations in experimental manipulation success: a secondary
measure may be used as a covariate in the analysis of any effects. An similar approach may be to set a
secondary measure threshold for the success of an individual trial, then either omit unsuccessful
trials or carry out a comparison of “unsuccessful” versus “successful” outcomes. Again, the purpose
of this technique would be to distinguish the reliability of the measure from that of the experiment.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 134/167
124
Improvements to the Experimental Manipulations of Effort
To the same end, it is also necessary to take what has been learned from the current study and
improve the experimental manipulations themselves. The recommendations of incorporating more
levels of manipulation and, in some cases, a greater range of effort have already been outlined in the
Limitations section. Looking specifically to the motivation manipulation, which is a particularly
difficult experimental goal, there are a number of other suggestions.
First, the method of Ruth & Giambra (1974) has already been cited as an interesting approach. They
affected participants’ motivation levels not only through the wording of instructions but also in
having motivated participants give their responses verbally, thus introducing the pressure of “being
observed.” Although this method may have had the disadvantage of an unbalanced effect of
verbalization on eye movements, the general concept could also have been implemented with a non-
verbal response such as a conspicuous button press response. Alternately, participants may be asked
to verbalize in both groups, but only the motivation group is accompanied by an observer.
With respect to the monetary incentive technique, it was noted that some participants did not report
a strong change in motivation because the reward was neither guaranteed nor immediate. In light of
this observation, an alternate method is recommended, wherein participants are rewarded for each
correct response they give, and the money is promised immediately after the study. In the current
study, this approach was considered but not used because of the technical issues involved with
response counting during the experiment versus being transcribed afterwards. If the tallying of
responses is possible during future experiments, then this method would be highly recommended
where very high motivation levels are desired. Recall that where a repeated design is used, this
method would not preclude the issue of intrinsically high motivation during the pre-treatment phase
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 135/167
125
and in the control group. Simply asking participants to try very little has been shown to be effective
in some participants in the current study, but may pose a problem for others, especially during very
high difficulty tasks. It should also be noted that this motivation technique introduces an ethical
concern because participants are potentially being compensated different amounts for the same time
volunteered. This issue could be easily circumvented by “topping up” everyone to a pre-determined
level when the experiment has finished.
Considering the complexity of manipulating motivation, it is tempting to abandon it in future
investigations when it has been implied that the same variable (effort) is affected by through a
change in task difficulty. However, it must be clearly recognized that the concept of effort as a
unifying construct for a number of possibly independent phenomena should be considered a matter
of convenience rather than fact. The most prudent recommendation is that the best experimental
manipulation most closely mirrors the intended application, rather than representing a some general
effort condition. This practical point emphasizes the importance of carefully defining the application
in question.
Underlying Basis for Eye Movement – Audio Perceptual Workload Relationship
In the Literature Review, the approach of some previous literature was criticised for not considering
the mechanisms behind the various phenomena that were documented. In the current study, a
relationship between saccadic eye movement occurrence and audio perceptual workload has been
observed, and it is appropriately recommended that a better understanding of this relationship is
sought. In the Results and Discussion section of the full-scale study, a pragmatic explanation for the
observed effect was introduced. Briefly, it was suggested that the observation may simply reflect a
learned behavioural pattern wherein people tend to stare when they are straining to listen.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 136/167
126
Importantly, this approach is at odds with theories of a “hard-wired” relationship between eye
movement inhibition and cognitive load. Not only is the behavioural explanation less general in
terms of predicting responses to various task types, it also implies that the effect could vary more
widely between people. Certainly, a wide variety of responses were observed in the current study,
both between task characteristics and individual participants. Based on the results of the current
study, future research should test this perspective.
Eye Movements and Aging
The relationship between eye movements and auditory perception could also stimulate an entirely
different line of research into perceptual deficits. If the effects observed in the current study are
taken to suggest an interference avoidance strategy during auditory perception, it would be
interesting to investigate any bearing that this effect may have on their performance. Put simply, is a
person’s ability to concentrate on auditory perception reflected in their eye movements? In
particular, the study of aging populations may provide insights into theories of inhibition and aging.
Hasher (2007) has suggested that a failure to eliminate task-irrelevant information, a process termed
deletion, is an important component of decline. Therefore, eye movements may be a meaningful
measure by which to gauge the failure of inhibitory mechanisms.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 137/167
127
References
Amadeo, M., & Shagrass, M. D. (1963). Eye-movements, attention, and hypnosis. Journal of
Nervous and Mental Disease, 136(2), 139-145.
Andreassi, J. L. (1973). Alpha and problem solving: A demonstration. Perceptual and Motor Skills,
46, 905-906.
Andreassi, J. L. (2000). Human behaviour and physiological response (4th ed.). Mahwah, NJ:
Lawrence Erlbaum Associates Inc.
Annett, J. (2002). Subjective rating scales: Science or art? Ergonomics, 45(14), 966.
Antrobus, J. S., Antrobus, J. S., & Singer, J. L. (1964). Eye movements accompanying daydreaming,
visual imagery, and thought suppression. Journal of Abnormal Psychology, 69, 244-252.
Antrobus, J. S. (1973). Eye movements and nonvisual cognitive tasks. In Zikmund, V. (Ed.), The
oculomotor system and brain functions: Proceedings of the international symposium held at
Smolenice 19-22 October, 1970. London: Butterworth.Backs, R. W., & Seljos, K. A. (1994). Metabolic and cardiorespiratory measures of mental effort: The
effects of level of difficulty in a working memory task. International Journal of Psychophysiology,
16(1), 57-68.
Bagley, J., & Manelis, L. (1979). Effect of awareness on an indicator of cognitive load. Perceptual
and Motor Skills, 49(2), 591-594.
Bahill, A. T., Clark, M. R., & Stark, L. (1975). The main sequence, a tool for studying human eye
movements. Mathematical Biosciences, 24(3-4), 191-204.
Bailey, C. M., Echemendia, R. J., & Arnett, P. A. (2006). The impact of motivation on
neuropsychological performance in sports-related mild traumatic brain injury. Journal of the
International Neuropsychological society, 12(4), 475-484.
Becker, W., & Fuchs, A. F. (1969). Further properties of the human saccadic system: Eye
movements and correction saccades with and without visual fixation points. Vision Research,
9(10), 1247-1258.
Beda, A., Jandre, F. C., Phillips, D. I. W., Giannella-Neto, A., & Simpson, D. M. (2007). Heart-rate
and blood-pressure variability during psychophysiological tasks involving speech: Influence of
respiration. Psychophysiology, 44(5), 767-778.
Bergstrom, K. J., & Hiscock, M. (1988). Factors influencing ocular motility during the performance of
cognitive tasks. Canadian Journal of Psychology, 42(1), 1-23.
Berguer, R., Smith, W. D., & Chung, Y. H. (2001). Performing laparoscopic surgery is significantly
more stressful for the surgeon than open surgery. Surgical Endoscopy, 15(10), 1204.
Berka, C., Levendowski, D. J., Lumicao, M. N., Yau, A., Davis, G., Zivkovic, V. T., Olmstead, R.E.,
Tremoulet, P.D., & Craven, P.L. (2007). EEG correlates of task engagement and mental workload
in vigilance, learning, and memory tasks. Aviation Space and Environmental Medicine, 78(5),
B231-B244.
Bianchini, K. J., Mathias, C. W., & Greve, K. W. (2001). Symptom validity testing: A critical review.
Clinical Neuropsychologist, 15(1), 19-45.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 138/167
128
Blanchard, H. E. (1985). A comparison of some processing time measures based on eye
movements. Acta Psychologica, 58(1), 1-15.
Borkowski, J. G., Benton, A. L., & Spreen, O. (1967). Word fluency and brain damage.
Neuropsychologia, 5(2), 135-140.
Brookings, J. B., & Damos, D. L. (1991). Individual differences in multiple-task performance. In D. L.
Damos (Ed.), Multiple-task performance (pp. 363-386). London: Taylor & Francis.
Brookings, J. B., Wilson, G. F., & Swain, C. R. (1996). Psychophysiological responses to changes in
workload during simulated air traffic control. Biol Psychol, 42(3), 361-77.
Cain, B. (2007). Review of the mental workload literature. Report #RTO-TR-HFM-121-Part-II.
Defense Research and Development Canada, Toronto.
Callan, D. J. (1998). Eye movement relationships to excessive performance error in aviation.
Proceedings of the Human Factors and Ergonomics Society, 2, 1132-1136.
Cardall, A. J. (1943). Purdue pegboard. Oxford, England: Science Research Associates.
Carpenter, P. A., & Just, M. A. (1978). Eye fixation during mental rotation. In J. W. Senders, D. F.
Fisher & R. A. Monty (Eds.), Eye movements and the higher psychological functions (pp. 115-133). L. Erlbaum Associates.
Carpenter, R. H. S. (1988). Movements of the eyes. (2nd ed.). London: Pion.
Carpenter, R. H. S. (1991). Eye movements. Boca Raton: CRC Press.
Chapman, P. R., & Underwood, G. (1998). Visual search of dynamic scenes: Event types and the
role of experience in viewing driving situations. In G. Underwood (Ed.), In eye guidance in reading
and scene perception (pp. 369-393). Amsterdam: Elsevier.
Chronbach, L. J., & Furby, L. (1970). How we should measure "change" - or should we?
Psychological Bulletin, 74(1), 68-80.
Cohen, A. (1977). Is the duration of an eye fixation a sufficient criterion referring to information input.
Perceptual Motor Skills, (45), 766.
de Waard, D. (1996). The measurement of drivers' mental workload. Traffic Research Centre,
University of Groningen.
Doctor, R. F., Kaswan, J. W., & Nakamura, C. Y. (1964). Spontaneous heart rate and GSR changes
as related to motor performance. Psychophysiology, 1(1), 73-78.
Duchowski, A. T. (2007). In ebrary Inc. (Ed.), Eye tracking methodology: Theory and practice (2nd
ed.). London: Springer.
Egeland, J., Sundet, K., Rund, B. R., Asbjornsen, A., Hugdahl, K., Landro, N. I., Lund, A., Roness,
A., & Stordal, K.I. (2003). Sensitivity and specificity of memory dysfunction in schizophrenia: A
comparison with major depression. Journal of Clinical and Experimental Neuropsychology, 25(1),
79-93.
Eggemeier, F. T., Crabtree, M. S., Zingg, J. J., Reid, G. B., & Shingledecker, C. A. (1982).
Subjective workload assessment in a memory update task. Proceedings of the 26th Human
Factors Society Annual Meeting.
Eggemeier, F. T., & Wilson, G. F. (1991). Performance-based and subjective assessment of
workload in multi-task environments. In D. L. Damos (Ed.), Multiple-task performance. (pp. 217-
278). London: Taylor & Francis.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 139/167
129
Eggemeier, F. T., Wilson, G. F., Kramer, A. F., & Damos, D. L. (1991). Workload assessment in
multi-task environments. In D. L. Damos (Ed.), Multiple-task performance (pp. 207-216). London:
Taylor & Francis.
Ehrlichman, H., & Barrett, J. (1983). ‘Random’ saccadic eye movements during verbal-linguistic and
visual-imaginal tasks. Acta Psychologica, 53(1), 9-26.
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the
acquisition of expert performance. Psychological Review, 100(3), 363-406.
Falkmer, T., & Gregersen, N. P. (1999). System for driver training and assessment using interactive
evaluation tools and reliable methodologies. Report #GRD1-1999-10024. Transport Resource
Knowledge Centre.
Farmer, E., & Brownson, A. (2003). Review of workload measurement, analysis and interpretation
methods. Report #CARE-Integra-TRS-130-02-WP2. Brussels: European Organization for the
Safety of Air Navigation (Eurocontrol).
Field, A. P. (2005). Discovering statistics using SPSS (2nd ed.). London: SAGE.
Filin, V. A. (2002). Saccade automaticity and pursuing eye movement.
Twenty-Fifth European Conference on Visual Perception, Glasgow, Scotland., 31(Supplement)
Findlay, J. M., & Gilchrist, I. D. (1998). Eye guidance and visual search. In G. Underwood (Ed.), Eye
guidance in reading and scene perception (pp. 295-312). Oxford, UK: Elsevier Science Ltd.
Findlay, J. M., & Kapoula, Z. (1992). Scrutinization, spatial attention, and the spatial programming of
saccadic eye movements. Quarterly Journal of Experimental Psychology A, 45(4), 633-47.
Fischer, B., & Breitmeyer, B. (1987). Mechanisms of visual attention revealed by saccadic eye
movements. Neuropsychologia, 25(1A), 73-83.
Fischer, B., Gezeck, S., & Hartnegg, K. (1997). The analysis of saccadic eye movements from gap
and overlap paradigms. Brain Research Protocols, 2(1), 47-52.
Fitts, P. M., Jones, R. E., & Milton, J. L. (1950). Eye movements of aircraft pilots during instrument-landing approaches. Aeronautical Engineering Review, 9(2), 24–29.
Fowles, D. C. (1988). Psychophysiology and psychopathology: A motivational approach.
Psychophysiology, 25(4), 373-391.
Fuchs, A. F., Kaneko, C. R., & Scudder, C. A. (1985). Brainstem control of saccadic eye movements.
Annual Review of Neuroscience, 8, 307-337.
Garbutt, S., Harwood, M. R., & Harris, C. M. (2001). Comparison of the main sequence of reflexive
saccades and the quick phases of optokinetic nystagmus. The British Journal of Ophthalmology,
85(12), 1477-1483.
Gauggel, S., & Billino, J. (2002). The effects of goal setting on the arithmetic performance of brain-
damaged patients. Archives of Clinical Neuropsychology, 17(3), 283-294.
Gauggel, S., & Fischer, S. (2001). The effect of goal setting on motor performance and motor
learning in brain-damaged patients. Neuropsychological Rehabilitation, 11(1), 33-44.
Gauggel, S., Hoop, M., & Werner, K. (2002). Assigned versus self-set goals and their impact on the
performance of brain-damaged patients. Journal of Clinical and Experimental Neuropsychology,
24(8), 1070-1080.
Gauggel, S., Wietasch, A., Bayer, C., & Rolko, C. (2000). The impact of positive and negative
feedback on reaction time in brain-damaged patients. Neuropsychology, 14(1), 125-133.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 140/167
130
Gendolla, G. H. E., & Richter, M. (2005). Ego involvement and effort: Cardiovascular, electrodermal,
and performance effects. Psychophysiology, 42(5), 595-603.
Giolma, J. P., & Lyne, J. E. (1984). Identification and characterization of rapid eye movements by
computer. Midwest Symposium on Circuits & Systems, St. Louis, Missouri.
Goonetilleke, R.S., & Luximon, A. (2001). Simplified subjective workload assessment technique.
Ergonomics, 44(3), 229.
Gopher, D., & Braune, R. (1984). On the psychophysics of workload: Why bother with subjective
measures? Human Factors, 26(5), 519-532.
Gopher, D., & Donchin, E. (1986). Workload: An examination of the concept. In K. R. Boff, & L.
Kaufman (Eds.), Handbook of perception and human performance. Oxford, England: John Wiley
& Sons.
Gopher, D. (1973). Eye-movement patterns in selective listening tasks of focused attention.
Perception & Psychophysics, 14(2), 259-264.
Gorissen, M., Sanz, J. C., & Schmand, B. (2005). Effort and cognition in schizophrenia patients.
Schizophrenia Research, 78(2-3), 199-208.
Gould, J. D. (1973). Eye movements during visual search and memory search. J Exp Psychol, 98(1),
184-95.
Green, R. E., Melo, B., Christensen, B., Ngo, L., & Skene, C. (2006). Evidence of transient
enhancement to cognitive functioning in healthy young adults through environmental enrichment:
Implications for rehabilitation after brain injury. Brain and Cognition, 60(2), 201-203.
Greene, H. H., & Rayner, K. (2001). Eye movements and familiarity effects in visual search. Vision
Research, 41(27), 3763.
Harbluk, J. L., & Noy, Y. I. (2002). The impact of cognitive distraction on driver visual behaviour and
vehicle control. Report #1388E. Transport Canada.
Hasher, L., Lustig, C., & Zacks, R. T. (2007). Inhibitory mechanisms and the control of attention. In A.R. Conway, C. Jarrold, M. J. Kane, A. Miyake, & J. N. Towse (Eds.), Variation in working memory
(pp. 227-249). New York, NY: Oxford University Press.
Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes in memory. Journal of
Experimental Psychology: General, 108(3), 356-388.
Hedeker, D. R. (2006). In Gibbons R. D. (Ed.), Longitudinal data analysis. Hoboken, NJ: Wiley-
Interscience.
Hendy, K. C., Hamilton, K. M., & Landry, L. N. (1993). Measuring subjective workload: When is one
scale better than many? Human Factors, 35(4), 579-602.
Hill, S. G., Iavecchia, H. P., Byers, J. C., Bittner, A. C., Zaklad, A. L., & Christ, R. E. (1992).
Comparison of four subjects workload rating scales. Human Factors, 34(4), 429-439.
Hillburn, B. G. (1997). Free flight and air traffic controller mental workload. Ninth International
Symposium on Aviation Psychology. Columbus, Ohio.
Hinton, J. W. (1982). Ocular responses to meaningful visual stimuli and their psychological
significance. In R. Groner, & P. Fraisse (Eds.) (pp. 204-212). Amsterdam: North Holland.
Holland, M. K., & Tarlow, G. (1972). Blinking and mental load. Psychological Reports, 31(1), 119-
127.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 141/167
131
Holland, M. K., & Tarlow, G. (1975). Blinking and thinking. Perceptual and Motor Skills, 41(2), 403-
406.
Hood, J. D. (1975). Observations upon role of peripheral retina in execution of eye-movements.
ORL-Journal for OTO-Rhino-Laryngology and its Related Specialties, 37(2), 65-73.
Hooge, I. T. C., & Erkelens, C. J. (1998). Adjustment of fixation duration in visual search. Vision
Research, 38(9), 1295-1302.
Johansson, R. S., Westling, G., & Backstrom, A. (2001). Eye-hand coordination in object
manipulation. Journal of Neuroscience, 21(17), 6917-6932.
Jorna, P. G. A. M. (1992). Spectral analysis of heart rate and psychological state: A review of its
validity as a workload index. Biological Psychology, 34, 237-257.
Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall.
Kennedy, D. O., & Scholey, A. B. (2000). Glucose administration, heart rate and cognitive
performance: Effects of increasing mental effort. Psychopharmacology, 149(1), 0063.
Kessels, R. P. C., Ruis, C., & Kappelle, L. J. (2007). The impact of self-reported depressive
symptoms on memory function in neurological outpatients. Clinical Neurology and Neurosurgery,109(4), 323-326.
Kettunen, J., & Ravaja, N. (2000). A comparison of different time series techniques to analyze
phasic coupling: A case study of cardiac and electrodermal activity. Psychophysiology, 37(4), 395.
Klinger, E., Gregoire, K. C., & Barta, S. G. (1973). Physiological correlates of mental activity: Eye
movements, alpha, and heart rate during imagining, suppression, concentration, search, and
choice. Psychophysiology, 10(5), 471-477.
Klingner, J., Kumar, R., & Hanrahan, P. (2008). Measuring the task-evoked pupillary response with a
remote eye tracker. Proceedings of the 2008 Symposium on Eye Tracking Research &
Applications. New York, NY: ACM.
Kramer, A. F. (1991). Physiological metrics of mental workload: A review of recent progress. In D. L.Damos (Ed.), Multiple-task performance (pp. 279-328). London: Taylor & Francis.
Kwakkel, G. (2006). Impact of intensity of practice after stroke: Issues for consideration. Disability
and Rehabilitation, 28(13-14), 823-830.
Lawrence, S., Robert, K., Susan, S., Derek, H., & Bruce, B. (1976). Saccadic suppression of image
displacement. Vision Research, 16(10), 1185-1187.
Layne, C. (1980). Motivational deficit in depression: People's expectations × outcomes' impacts.
Journal of Clinical Psychology, 36(3), 647-652.
Leigh, R. J., & Kennard, C. (2004). Using saccades as a research tool in the clinical neurosciences.
Brain : A Journal of Neurology, 127(3), 460-477.
Leigh, R. J. (1999). In Zee D. S. (Ed.), The neurology of eye movements (3rd ed. ed.). New York:
Oxford University Press.
Lorens, S. A., & Darrow, C. W. (1962). Eye movements, EEG, GSR, and EKG during mental
multiplication. Electroencephalography and Clinical Neurophysiology, 14, 739-746.
Lynch, W. J. (2004). Determination of effort level, exaggeration, and malingering in neurocognitive
assessment. The Journal of Head.Trauma.Rehabilitation, 19(3), 277-283.
Malmo, R. B. (1965). Finger-sweat prints in the differentiation of high and low incentive.
Psychophysiology, 1(3), 231-240.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 142/167
132
Matessa, M., & Remington, R. (2005). Eye movements in human performance modeling of space
shuttle operations. 49th Annual Meeting of the Human Factors and Ergonomics Society. Human
Factors an Ergonomics Society Inc.
Mitchell, D. B., & Hunt, R. R. (1989). How much "effort" should be devoted to memory? Memory &
Cognition, 17(3), 337-348.
Moffitt, K. (1980). Evaluation of the fixation duration in visual search. Perception and Psychophysics,
27(4), 370-372.
Moray, N. (1986). Monitoring behavior and supervisory control. In K. R. Boff, L. Kaufman, & J. P.
Thomas (Eds.), Handbook of perception and human performance. Wiley-Interscience.
Moray, N. (1988). Mental workload since 1979. In D. J. Oborne (Ed.), International review of
ergonomics (pp. 123-150). London: Taylor & Francis.
Mousseau, M. B. (2004). The onset and effect of cognitive fatigue on simulated sport performance.
Kinesiology Abstracts, 17(2), 33-34.
Mulder, G. (1979). Sinus arrhymia and mental workload. In N. Moray (Ed.), Mental workload: Its
theory and measurement (pp. 327-343). New York: Plenum Press.
Mulder, G. (1986). The concept and measurement of mental effort. In G. M. Hockey, A. W. K.
Gaillard, & M. G. H. Coles (Eds.), Energetics and human information processing (pp. 175-198).
Dordrecht: Matinus Nijhoff.
Mulder, L. J. M. (1992). Measurement and analysis methods of heart rate and respiration for use in
applied environments. Biological Psychology, 34(2-3), 205-236.
Myung, R., & Ryu, K. (2005). Evaluation of mental workload with a combined measure based on
physiological indices during a dual task of tracking and mental arithmetic. International Journal of
Industrial Ergonomics, 35(11), 991-1009.
Naccache, L., Dehaene, S., Cohen, L., Habert, M. O., Guichart-Gomez, E., Galanaud, D., & Willer,
J.C. (2005). Effortless control: Executive attention and conscious feeling of mental effort aredissociable. Neuropsychologia, 43(9), 1318-1328.
Navon, D. (1984). Resources--a theoretical soup stone? Psychological Review, 91, 216-234.
Neumann, D., & Lipp, O. (2002). Spontaneous and reflexive eye activity measures of mental
workload. Australian Journal of Psychology, 54(3), 174.
Noel, J. B., Bauer, K. W., & Lanning, J. W. (2005). Improving pilot mental workload classification
through feature exploitation and combination: A feasibility study. Computers & Operations
Research, 32(10), 2713-2730.
Norman, D. A., & Bobrow, D. G. (1975). Data-limited and resource-limited processes. Cognitive
Psychology, 7(1), 44-64.
Oddy, M., Cattran, C., & Wood, R. (2008). The development of a measure of motivational changes
following acquired brain injury. Journal of Clinical and Experimental Neuropsychology, 30(5), 568-
575.
O'Donnell, R., & Eggemeier, F. T. (1986). Workload assessment methodology. In K. R. Boff, L.
Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance. Wiley-
Interscience.
Ohira, H. (1996). Eyeblink activity in a word-naming task as a function of semantic priming and
cognitive load. Perceptual and Motor Skills, 82(3), 835.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 143/167
133
Oohira, A., Zee, D. S., & Guyton, D. L. (1991). Disconjugate adaptation to long-standing, large-
amplitude, spectacle-corrected anisometropia. Investigative Ophthalmology & Visual Science,
32(5), 1693-1703.
Paas, F. G. W. C., Van Merrienboer, J. J. G., & Adam, J. J. (1994). Measurement of cognitive load in
instructional research. Perceptual and Motor Skills, 79(1), 419.
Papadelis, C., Kourtidou-Papadeli, C., Bamidis, P., & Albani, M. (2007). Effects of imagery training
on cognitive performance and use of physiological measures as an assessment tool of mental
effort. Brain and Cognition, 64(1), 74-85.
Pashler, H. E. (1998). The psychology of attention. Cambridge, MA: MIT Press.
Podsakoff, P. M., & Farh, J. L. (1989). Effects of feedback sign and credibility on goal setting and
task performance. Organizational Behavior & Human Decision Processes, 44(1), 45-68.
Polley, D. B., Steinberg, E. E., & Merzenich, M. M. (2006). Perceptual learning directs auditory
cortical map reorganization through top-down influences. Journal of Neuroscience, 26(18), 4970-
4982.
Porges, S. W., & Byrne, E. A. (1992). Research methods for measurement of heart-rate and
respiration. Biological Psychology, 34(2-3), 93-130.
Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32(1), 3-
25.
Pribram, K. H., & McGuiness, D. (1975). Arousal, activation, and effort in control of attention.
Psychological Review, 82(2), 116-149.
Prokasy, W. F., & Raskin, D. C. (Eds.). (1973). Electrodermal activity in physiological research. New
York: Academic Press.
Rahimi, M., Big, R. P., & Thom, D. R. (1990). A field evaluation of driver eye and head movement
strategies toward environmental targets and distracters. Applied Ergonomics, 21(4), 267-274.
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research.Psychological Bulletin, 124(3), 372-422.
Recarte, M. A., & Nunes, L. M. (2000). Effects of verbal and spatial-imagery tasks on eye fixations
while driving. Journal of Experimental Psychology: Applied, 6(1), 31-43.
Recarte, M. A., & Nunes, L. M. (2003). Mental workload while driving: Effects on visual search,
discrimination, and decision making. Journal of Experimental Psychology: Applied, 9(2), 119-137.
Rees, L. M., Tombaugh, T. N., Gansler, D. A., & Moczynski, N. P. (1998). Five validation
experiments of the test of memory malingering (TOMM). Psychological Assessment, 10(1), 10-20.
Richards, P. M., & Ruff, R. M. (1989). Motivational effects on neuropsychological functioning:
Comparison of depressed versus nondepressed individuals. Journal of Consulting and Clinical
Psychology, 57(3), 396-403.
Riese, H. (1999). Mental fatigue after very severe closed head injury: Sustained performance,
mental effort, and distress at two levels of workload in a driving simulator. Neuropsychological
Rehabilitation, 9(2), 189.
Robert, G., & Hockey, J. (1997). Compensatory control in the regulation of human performance
under stress and high workload: A cognitive-energetical framework. Biological Psychology, 45(1-
3), 73-93.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 144/167
134
Roscoe, A. H. (1992). Assessing pilot workload - why measure heart-rate, HRV and respiration.
Biological Psychology, 34(2-3), 259-287.
Rubio, S., Díaz, E., & Martín, J. (2004). Evaluation of subjective mental workload: A comparison of
SWAT, NASA-TLX, and workload profile methods. Applied Psychology: An International Review,
53(1), 61-86.
Ruff, R. M. (1985). San Diego neuropsychological test battery (manual). San Diego: San Diego
University.
Ruth, J. S., & Giambra, L. M. (1974). Eye movements as a function of attention and rate of change in
thought content. Perceptual and Motor Skills, 39, 475-480.
Salthouse, T. A. (2006). Mental exercise and mental aging: Evaluating the validity of the "use it or
lose it" hypothesis. Perspectives on Psychological Science, 1(1), 68-87.
Sanders, A. F. (1997). A summary of resource theories from a behavioral perspective. Biological
Psychology, 45(1-3), 5-18.
Sanders, A. F. (1983). Towards a model of stress and human performance. Acta Psychologica,
53(1), 61-97.
Schagen, S., Schmand, B., deSterke, S., & Lindeboom, J. (1997). Amsterdam short-term memory
test: A new procedure for the detection of feigned memory deficits. Journal of Clinical and
Experimental Neuropsychology, 19(1), 43-51.
Schooler, C. (1987). Cognitive effects of complex environments during the life-span: A review and
theory. In C. Schooler, & S. K. Warner (Eds.), Cognitive functioning and social structure over the
life course (pp. 29-49). Ablex.
Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye movements and
spatial attention. The Quarterly Journal of Experimental Psychology, 38(3), 475-491.
Sigman, M., & Coles, P. (1980). Visual scanning during pattern recognition in children and adults.
Journal of Experimental Psychology, 30, 267-276.Singer, J. L., & Antrobus, J. S. (1965). Eye movements during fantasies. Archives of General
Psychiatry, 12, 71-76.
Singer, J. L., Greenberg, S., & Antrobus, J. S. (1971). Looking with the mind's eye. Transactions of
the New York Academy of Science, 33(2), 694-709.
Sirevaag, E. J., & Stern, J. A. (1999). Ocular measures of fatigue and cognitive factors. In R. W.
Backs, & W. Boucsein (Eds.), Engineering psychophysiology: Issues and applications (pp. 267-
287). Mahwah, NJ.: Lawrence Erlbaum Associates, Inc.
Smith, R. M., & Hong, S. K. (1977). Heart rate response to breath holding at 18.6 ATA. Respiration
Physiology, 30(1-2), 69-79.
Stelmach, L. B., Campsall, J. M., & Herdman, C. M. (1997). Attentional and ocular movements.
Journal of Experimental Psychology, Human Perception, and Performance, 23(3), 823-844.
Stern, J. A., Boyer, D., Schroeder, D., Touchstone, M., & Stoliarov, N. (1994). Blinks, saccades, and
fixation pauses during vigilance task performance. 1: Time on task. Report # DOTFAAAM9426.
St. Louis, MO. Dept. of Civil Engineering; United States: Washington Univ.
Stern, J. A., Boyer, D. J., Schroeder, D. J., Touchstone, R. M., & Stoliarov, N. (1996). Blinks,
saccades; and fixation pauses during vigilance task performance: 2: Gender and time of day.
Report # DOTFAAAM969. St. Louis, MO. Dept. of Psychology; United States: Washington Univ.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 145/167
135
Stern, J. A., Walrath, L. C., & Goldstein, R. (1984). The endogenous eyeblink. Psychophysiology,
21(1), 22-33.
Stern, R. M. (2001). In Ray W. J., & Quigley K. S. (Eds.), Psychophysiological recording (2nd ed.).
New York: Oxford University Press.
Sternberg, S. (1969). Memory-scanning: Mental processes revealed by reaction-time experiments.
American Scientist, 57, 421-457.
Storm, H., Fremming, A., Ødegaard, S., Martinsen, Ø G., & Mørkrid, L. (2000). The development of
a software program for analyzing spontaneous and externally elicited skin conductance changes
in infants and adults. Clinical Neurophysiology, 111(10), 1889-1898.
Surwillo, W. W., & Quilter, R. E. (1965). The relation of frequency of spontaneous skin potential
responses to vigilance and to age. Psychophysiology, 1, 272-276.
Tole, J. R., Stephens, A. T., Harris, R. L.,Sr., & Eprath, A. R. (1982). Visual scanning behavior and
mental work load in aircraft pilots. Aviation Space and Environmental Medicine,53(1), 54-61.
Tsai, Y., Viirre, E., Strychacz, C., Chase, B., & Jung, T. (2007). Task performance and eye activity:
Predicting behavior relating to cognitive workload. Aviation, Space, and Environmental Medicine,
78(5), B176-185.
Tsang, P. S., & Vidulich, M. A. (1987). Absolute magnitude estimation and relative judgment
approaches to subjective workload assessment. Proceedings of the Human Factors Society 31st
Annual Meeting.
Tsang, P. S., & Vidulich, M. A. (2006). Mental workload and situation awareness. In G. Salvendy
(Ed.), Handbook of human factors and ergonomics (3rd ed.). New York: Wiley.
Unema, P. J. A. (1995). Eye movements and mental effort. (Ph.D., Verlag Shaker).
Van der Stigchel, S., & Theeuwes, J. (2005). The influence of attending to multiple locations on eye
movements. Vision Research, 45(15), 1921-1927.
Van Orden, K. F., Jung, T. P., & Makeig, S. (2000). Combined eye activity measures accuratelyestimate changes in sustained visual task performance. Biological Psychology, 52(3), 221-240.
Van Orden, K. F., Limbert, W., Makeig, S., & Jung, T. P. (2001). Eye activity correlates of workload
during a visuospatial memory task. Human Factors, 43(1), 111-121.
Van Zandvoort, M. J. E., Kappelle, L. J., Algra, A., & De Haan, E. H. F. (1998). Decreased capacity
for mental effort after single supratentorial lacunar infarct may affect performance in everyday life.
Journal of Neurology, Neurosurgery, and Psychiatry, 65(5), 697-702.
Velichkovsky, B. M., Domhoefer, S. M., Pannasch, S., & Unema, P. J. A. (2000).
Visual fixations and level of attentional processing. Proceedings of the International Conference
Eye Tracking Research and Applications, Palm Beach Gardens, FL.
Veltman, J. A., & Gaillard, A. W. K. (1993). Measurement of pilot workload with subjective and
physiological techniques. Proceedings of Workload Assessment and Aviation Safety 3.1-3.13.
Veltman, J. A., & Gaillard, A. W. K. (1996). Physiological indices of workload in a simulated flight
task. Biological Psychology, 42(3), 323-342.
Vetlaman, J.A., & Gaillard, A.W.K. (1998). Physiological workload reactions to increasing levels of
task difficulty. Ergonomics, 41(5), 656.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 146/167
136
Vickery, C. D., Berry, D. T. R., Inman, T. H., Harris, M. J., & Orey, S. A. (2001). Detection of
inadequate effort on neuropsychological testing: A meta-analytic review of selected procedures.
Archives of Clinical Neuropsychology, 16(1), 45-73.
Vidulich, M. A., & Wickens, C. D. (1986). Causes of dissociation between subjective workload
measures and performance - caveats for the use of subjective assessments. Applied Ergonomics,
17(4), 291-296.
Viirre, E., Cadera, W., & Vilis, T. (1987). The pattern of changes produced in the saccadic system
and vestibuloocular reflex by visually patching one eye. Journal of Neurophysiology, 57(1), 92-
103.
Vossel, G., & Zimmer, H. (1990). Psychometric properties of non-specific electrodermal response
frequency for a sample of male students. International Journal of Psychophysiology, 10(1), 69-73.
Walker, R., Walker, D. G., Husain, M., & Kennard, C. (2000). Control of voluntary and reflexive
saccades. Experimental Brain Research, 130(4), 540.
Washburn, D. A., & Putney, R. T. (2001). Attention and task difficulty: When is performance
facilitated? Learning and Motivation, 32, 36-47.
Wechlser, D. (1955). Wechsler adult intelligence scale. New York: Psychological Corp.
Weiner, J. M., & Ehrlichman, H. (1976). Ocular motility and cognitive process. Cognition, 4(1), 31-43.
Wickens, C. D. (1991). Processing resources and attention. In D. L. Damos (Ed.), Multiple task
performance. London: Taylor & Francis.
Wickens, C. D., & Hollands, J. G. (2000). Engineering psychology and human performance (3rd ed.).
Upper Saddle River, NJ: Prentice Hall.
Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman, & D. R. Davies (Eds.),
Varieties of attention (pp. 82-85). Orlando: Academic Press.
Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical Issues in
Ergonomics Science, 3(2), 159.
Wientjes, C. J. E. (1992). Respiration in psychophysiology: Methods and applications. Biological
Psychology, 34, 179-203.
Wierwille, W. W., & Eggemeier, F. T. (1993). Recommendations for mental workload measurement
in a test and evaluation environment. Human Factors, 35(2), 263(19)-282.
Wierwille, W. W., Rahimi, M., & Casali, J. G. (1985). Evaluation of 16 measures of mental workload
using a simulated flight task emphasizing mediational activity. Human Factors, 27(5), 489-502.
Wilson, G. F. (1991). Progress in the psychophysiological assessment of workload. Interim Report
#1. Armstrong Lab Wright-Patterson AFB, OH.
Wilson, G. F. (2002). An analysis of mental workload in pilots during flight using multiple psycho-
physiological measures. The International Journal of Aviation Psychology, 1, 3-18.
Wilson, G. F., & Eggemeier, F. T. (1991). Psychophysical assessment of workload in multi-task
environments. In D. L. Damos (Ed.), Multiple task performance (pp. 329-360). London: Taylor &
Francis.
Wu, C., Szymanski, C., & Cain, Z. (2007). Conjugated polymer dots for multiphoton fluorescence
imaging. Journal of the American Chemical Society, 129(43), 12904-12905.
Wyatt, H. J. (1998). Detecting saccades with jerk. Vision Research, 38(14), 2147-2153.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 147/167
137
Xie, G., Salvendy, B. (2000). Review and reappraisal of modeling and predicting mental workload in
single- and multi-task environments. Work & Stress, 14(1), 74.
Yarbus, A. L. (1973). Eye movements and vision. New York, NY: Plenum Press.
Zhang, Y., Owechko, Y., & Zhang, J. (2004). Driver cognitive workload estimation: A data-driven
perspective. Intelligent Transportation Systems Conference. Washington, D. C.
Zijlstra, F. R. H. (1993). Efficiency in work behavior: A design approach for modern tools. Delft,
Netherlands: Delft University of Technology.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 148/167
138
Appendix A – Tower and Remote (Desktop) Eye Tracker Descriptions
Note: Adapted from SR Research website: http://www.eyelinkinfo.com/fixed_main.php [Retrieved 11/26/08]
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 149/167
139
Appendix B - Pre-Test Questionnaire (Pilot and Full-Scale Study)
Participant #: ______
Date: _____________
1. What is your gender?
Male
Female
2. What is your age? ____
3. How do you feel today?
Better than normal
Normal
Worse than normal Much worse than normal
4. Do you have any aches or pains or other medical problems that are bothering you today?
Better than normal
Normal
Worse than normal
Much worse than normal
If yes, describe: ______________________________________________________
5. How did you sleep last night?
Better than normal Normal
Worse than normal
Much worse than normal
6. Have you had any of the following today or yesterday?
Flu or cold □ Yes □ No
Vomiting □ Yes □ No
Fever □ Yes □ No
Diarrhea □ Yes □ No
7. Did you participate in any physical activity today or yesterday? More than normal amount of physical activity
Normal amount of physical activity
Less than normal amount of physical activity
If yes, describe the activity: ______________________________________________
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 150/167
140
8. How has your last meal left you feeling?
Uncomfortably full or lethargic
Satisfied
Hungry
When did you eat your last meal?__________________________________________
9. Did you consume any alcohol within the last 12 hours?
No
Yes
i. What? □ wine □ beer □ other
ii. When? __________________________________________________
iii. How much? ______________________________________________
10. Did you consume more or less caffeine today than you normally do?
No
Yes
i. What?□
tea□
coffee□
energy drink □
otherii. When? __________________________________________________
iii. How much more/less than normal?____________________________
11. Do you take any psychoactive medication, such those that make you feel tired or affect
your mood (e.g., anti-depressant, ant-anxiety, anti-seizure medications)?
No
Yes
i. Please specify: ____________________________________________
12. Have you taken any illicit drugs or new medications today or yesterday (you do not need
to specify)?
No
Yes
13. Are you wearing contact lenses today?
No
Yes
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 151/167
141
Appendix C – Pilot Study ISI Length Histograms for Auditory Task
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 152/167
142
Appendix D – Full-Scale Study Post-Test Questionnaire
Participant #: ______
Date: _____________
1. You will be asked to describe your perception of the various tasks.
First, you will be asked about the difficulty of the tasks. Difficulty refers to the demands
placed on you by the task, and you have no control over them.
Next you will be asked about "how hard you tried" during the various tasks. Unlike task
difficulty, you can control your effort. However, some people feel that they have to try
harder to complete a more difficult task successfully.
Some of these questions may seem obvious or intuitive, but we are interested in hearingabout your actual experience during the experiment.
a) Which task did you find more difficult (circle one):
1’s & 2’s Subtraction 8’s & 9’s Subtraction Neither (the same)
b) Which task did you find more difficult (circle one):
Quiet Words in Noise Loud Words in Noise Neither (the same)
c) Please group the following letters into one of three groups, as you perceived them during
the word generation task: (k, b, f, t, j, m, y, q)
____________________ __________________ _________________
Less Difficult More Difficult Neutral or Don’t Remember
d) On which task did you try harder (circle one):
1’s & 2’s Subtraction 8’s & 9’s Subtraction Neither (the same)
e) On which task did you try harder (circle one):
Quiet Words in Noise Loud Words in Noise Neither (the same)
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 153/167
143
f) Please sort the following letters into three groups, as you perceived them during the
word generation task: (k, b, f, t, j, m, y, q)
____________________ __________________ _________________
Less Difficult More Difficult Neutral or Don’t Remember
g) How would you describe your effort level on the first half of the experiment (circle one):
I Tried My Hardest I Tried Somewhat I Tried Very Little
h) On which did you generally try harder (circle one):
First Half Second Half Neither
of Experiment of Experiment (the same)
2. During the tasks, did you consciously look or stare anywhere in particular, such as straight
ahead?
Yes No Don’t RememberIf so, why?
Did you think that you were supposed to look anywhere in particular, such as straight ahead?
In other words, did you feel like we wanted you to do this?
Yes No Don’t Remember
Did you otherwise think about your eye movements during the experiment?
Yes No Don’t Remember
3. As the experiment went on, did you notice a change in mental strategy for any of the
following types of tasks? Please explain.
a) Subtraction Task:
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 154/167
144
b) Word Generation Task:
c) Words in Noise Recognition Task:
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 155/167
145
4. (only answer these last questions if you were asked to regulate your effort between the first
and second half of the experiment)
a) Did you feel that you were able to try less during the second half of the experiment? If
you feel that your answer to this question is different for some tasks versus others, please
mention how it differs.
b) Do you think that your performance was any lower on the second half of the experiment
compared to the first? If you feel that your answer to this question is different for some
tasks versus others, please mention how it differs.
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 156/167
146
Appendix E – Full-Scale Study Change Score Box Plots
Plot Group 1. Change Score Box Plots for Average Heart Rate (bpm)
-8
-6
-4
-2
0
2
4
6
8
Math Fluency Auditory
N o m i n a l t o H i g h D i f f i c u l t y
C h a n g e
( 1 s t H a l f , A l l S u b j e c t s )
-15
-10
-5
0
5
10Math Fluency Auditory
1 s t t o 2 n d H a l f C h a n g e
( N o m i n a l D i f f i c u l t y T a s k s )
-15
-10
-5
0
5
10
motivation
group
control
group
1 s t t o 2 n d H
a l f C h a n g e
( H i g h D i f f i c
u l t y T a s k s )
A
D
C
B
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 157/167
147
2. Change Score Box Plots for Saccade Rate (saccades/minute)
-120
-100
-80
-60
-40
-20
0
20
40
60
Math Fluency Auditory
N o m i n a l t o H i g h D i f f i c u l t y
C h a n g e
( 1 s t H a l f , A l l S u b j e
c t s )
-80
-60
-40
-20
0
20
40
60
80
100
Math Fluency Auditory
1 s t t o 2 n d H a l f C h a n g e
( N o m i n a l D i f f i c u l t y T a s k s )
-80
-60
-40
-20
0
20
40
60
80
100
120
motivation
group
control
group
1
s t t o 2 n d H a l f C h a n g e
( H i g h D i f f i c u l t y T a s k s )
E F
G
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 158/167
148
3. Change Score Box Plots for Trial Proportion of Long ISI (percent)
-60
-40
-20
0
20
40
60
Math Fluency Auditory
N o m i n a l t o H i g h D i f f i c u l t y
C h a n g e
( 1 s t H a l f , A l l S u b j e c t s )
-50
-40
-30
-20
-10
0
10
20
30
Math Fluency Auditory
1 s t t o 2 n d H a l f C h a n g e
( N o m i n a l D i f f i c u l t y T a s k s )
-80
-60
-40
-20
0
20
40
60
motivation
group
control
group
1
s t t o 2 n d H a l f C h a n g e
( H i g h D i f f i c u l t y T a s k s )
H
I
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 159/167
149
4. Change Score Box Plots for Median ISI (ms) with Outliers (|X| > 1000 ms)
-8000
-6000
-4000
-2000
0
2000
4000
6000
Math Fluency Auditory
N o m i n a l t o H i g h D i f f i c u l t y
C h a n g e
( 1 s t H a l f , A l l S u b j e c t s )
-8000
-6000
-4000
-2000
0
2000
4000
6000
8000
Math Fluency Auditory
1 s t t o 2 n d H a l f C h a n g e
( N o m i n a l D i f f i c u l t y T a s k s )
-6000
-4000
-2000
0
2000
4000
6000
8000
10000
12000
14000
motivation
group
control
group
1
s t t o 2 n d H a l f C h a n g e
( H i g h D i f f i c u l t y T a s k s )
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 160/167
150
5. Change Score Box Plots for Median ISI (ms) with Outliers Present
-1000
-800
-600
-400
-200
0
200
400
600
800
Math Fluency Auditory
N o m i n a l t o H i g h D i f f i c u l t y
C h a n g e
( 1 s t H a l f , A l l S u b j e c t s )
-800
-600
-400
-200
0
200
400
600
800
Math Fluency Auditory
1 s t t o 2 n d H a l f C h a n g e
( N o m i n a l D i f f i c u l t y T a s k s )
-1200
-1000
-800
-600
-400
-200
0
200
400
600
800
motivation
group
control
group
1
s t t o 2 n d H a l f C h a n g e
( H i g h D i f f i c u l t y T a s k s )
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 161/167
151
7. Change Score Box Plots for Blink Rate (blinks/minute)
-80
-70
-60
-50
-40
-30
-20
-10
0
10
20
30
Math Fluency Auditory
N o m i n a l t o H i g h D i f f i c u l t y
C h a n g e
( 1 s t H a l f , A l l S u b j e c t s )
-40
-30
-20
-10
0
10
20
30
40
50
60
70
Math Fluency Auditory
1 s t t o 2 n d H a l f C h a n g e
( N o m i n a l D i f f i c u l t y T a s k s )
-40
-30
-20
-10
0
10
20
30
40
50
60
motivation
group
control
group
1
s t t o 2 n d H a l f C h a n g e
( H i g h D i f f i c u l t y T a s k s )
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 162/167
152
8. Change Score Box Plots for Average Spontaneous SCR Rate (responses/minute)
-6
-5
-4
-3
-2
-1
0
1
2
3
4
Math Fluency Auditory
N o m i n a l t o H i g h D i f f i c u l t y
C h a n g e
( 1 s t H a l f , A l l S u b j e c t s )
-5
-4
-3
-2
-1
0
1
2
3
4
Math Fluency Auditory
1 s t t o 2 n d H a l f C h a n g e
( N o m i n a l D i f f i c u l t y T a s k s )
-4
-3
-2
-1
0
1
2
3
motivation
group
control
group
1
s t t o 2 n d H a l f C h a n g e
( H i g h D i f f i c u l t y T a s k s )
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 163/167
153
9. Change Score Box Plots for Number of Correct Responses (/trial**)
**Auditory task scores are multiplied by a factor of four so that they can be visualized on a common
scale with the other tasks
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
Math Fluency Auditory
N o m i n a l t o H i g h D i f f i c u l t y
C h a n g e
( 1 s t H a l f , A l l S u b j e c t s )
-10
-8
-6
-4
-2
0
2
4
6
8
Math Fluency Auditory
1 s t t o 2 n d H a l f C h a n g e
( N o m i n a l D i f f i c u l t y T a s k s )
-6
-4
-2
0
2
4
6
8
motivation
group
control
group
1 s t t o 2 n d H a l f C
h a n g e
( H i g h D i f f i c u l t y T
a s k s )
J
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 164/167
154
10. Change Score Box Plots for Number of Incorrect Responses (/trial)
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
Math Fluency Auditory
N o m i n a l t o H i g h D i f f i c u l t y
C h a n g e
( 1 s t H a l f , A l l S u b j e c
t s )
-7
-6
-5
-4
-3
-2
-1
0
1
2
Math Fluency Auditory
1 s t t o 2 n d H a l f C h a n g e
( N o m i n a l D i f f i c u l t y T a s k s )
-4
-3
-2
-1
0
1
2
3
motivation
group
control
group
1
s t t o 2 n d H a l f C h a n g e
( H i g h D i f f i c u l t y T a s k s )
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 165/167
155
Appendix F – Full Hypothesis Testing Battery
Average Heart Rate Saccade Rate
Difficulty mixed ANOVA: p > 0.1failed Levine’s test…
Wilcoxon: p > 0.1
Nom. ANCOVA: p > 0.1ANCOVA: p > 0.1
Kolmogorov-Smirnov: p > 0.1
M a t h T a s k
M o t i v a t i o n
High ANCOVA: p > 0.1ANCOVA: p > 0.1
Kolmogorov-Smirnov: p > 0.1
Difficultymixed ANOVA: p = 0.010(half*diff*group p = 0.044;
diff effect stronger in 2nd
half, controls)
failed Levine’s test…Wilcoxon: p = 0.10
Nom. ANCOVA: p > 0.1
failed Levine’s test…
Kolmogorov-Smirnov: p > 0.1ANCOVA: p > 0.1
F l u e n c y
T a s k
M o t i v a t i o n
High ANCOVA: p = 0.002failed Levine’s test…
Kolmogorov-Smirnov: p > 0.1ANCOVA: p > 0.1
Difficultyvar. ratio > 2; unbalanced groups…
Wilcoxon: p < 0.001mixed ANOVA: p < 0.001
mixed ANOVA: p = 0.004(diff*group p = 0.036; largereffect in motivation group)
Wilcoxon: p = 0.014
Nom.var. ratio > 2; unbalanced groups…
Kolmogorov-Smirnov: p > 0.1ANCOVA: p > 0.1
ANCOVA: p > 0.1Kolmogorov-Smirnov: p > 0.1
A u d i t o r y T a s k
M o
t i v a t i o n
High
var. ratio > 2; unbalanced groups…
Kolmogorov-Smirnov: p > 0.1ANCOVA: p > 0.1
ANCOVA: p > 0.1Kolmogorov-Smirnov: p = 0.10
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 166/167
156
...continued next page...
Notes:Notes:Notes:Notes:
• boldboldboldbold font denotes significant findings (before any multiple comparisons correction)
• unless otherwise noted, comparison groups all passed tests of assumptions for parametric
tests used
• parametric and non-parametric test results given where data is suspected to not fit
parametric assumptions, though it does not fail appropriate tests
• difficulty effects calculated based on 1st half data only (from all subjects, regardless of
group); did not test for interaction effects of difficulty*group or difficulty*half, unlessautomatically tested through the mixed ANOVA method
• explicitly tested for motivation*difficulty effect (by dividing comparisons into high and
nominal difficulty level) because splitting data into nominal and high difficulty levels was
necessary for non-parametric analysis methods
• total number of task difficulty comparisons: 12; starting Holm-Bonferroni significance
criterion, α = 0.004 (two-tailed)
Trial Portion Long ISI Correct Response Count
Difficultymixed ANOVA: p > 0.1
Wilcoxon: p > 0.1
failed Shapiro-Wilks… Wilcoxon: p < 0.001
mixed ANOVA: p < 0.001
Nom.ANCOVA: p > 0.1Wilcoxon: p > 0.1
failed Shapiro-Wilks… Kolmogorov-Smirnov: p > 0.1
ANCOVA: p = 0.070 M a t h T a s k
M o t i v a t i o n
HighANCOVA: p > 0.1Wilcoxon: p > 0.1
failed Shapiro-Wilks…Kolmogorov-Smirnov: p > 0.1
ANCOVA: p > 0.1
Difficulty failed Levine’s
and Shapiro-Wilks…Wilcoxon: p > 0.1
failed Shapiro-Wilks… Wilcoxon: p < 0.001
mixed ANOVA: p < 0.001
Nom.failed Levine’s and Shapiro-Wilks…
Kolmogorov-Smirnov: p > 0.1
failed Shaprio-Wilk’s &unequal slopes (ANCOVA assumption)…
Kolmogorov-Smirnov: p = 0.10
F l u e n c y T a s k
M o t i v a t i o n
Highfailed Levine’s and Shapiro-Wilks…
Kolmogorov-Smirnov: p > 0.1failed Shapiro-Wilks…
Kolmogorov-Smirnov: p > 0.1ANCOVA: p > 0.1
Difficultymixed ANOVA: p = 0.004
Wilcoxon: p = 0.018
failed Shapiro-Wilks… Wilcoxon: p < 0.001
mixed ANOVA: p < 0.001
Nom.ANCOVA: p > 0.1
Kolmogorov-Smirnov: p > 0.1
failed Shapiro-Wilks…Kolmogorov-Smirnov: p > 0.1
ANCOVA: p > 0.1
A u d i t o r y T a s k
M o t i v a t i o n
HighANCOVA: p > 0.1
Kolmogorov-Smirnov: p = 0.10
failed Shapiro-Wilks…Kolmogorov-Smirnov: p > 0.1
ANCOVA: p > 0.1
8/8/2019 Thesis Draft Mar 30_b&w
http://slidepdf.com/reader/full/thesis-draft-mar-30bw 167/167
• total number of motivation comparisons: 24; starting Holm-Bonferroni significance
criterion, α = 0.002 (two-tailed)