Thesis Draft Mar 30_b&w

8/8/2019 Thesis Draft Mar 30_b&w

http://slidepdf.com/reader/full/thesis-draft-mar-30bw 1/167

Mental Workload Measurement Using theIntersaccadic Interval

by

Eldon Todd Pierce

A thesis submitted in conformity with the requirementsfor the degree of Master’s of Applied Science

Institute of Biomaterials and Biomedical EngineeringUniversity of Toronto

© Copyright by Eldon Todd Pierce (2009)



ii

Abstract

“Mental Workload Measurement Using the Intersaccadic Interval”

Eldon Todd Pierce, 2009

Master’s of Applied Science

Institute of Biomaterials and Biomedical Engineering, University of Toronto

Mental workload is commonly defined as the proportion of a person's total mental capacity in use

at a given moment. A measure of mental workload would have utility in a number of rehabilitation

medicine applications, but no method has been adequately examined for these purposes. A

candidate measure is the intersaccadic interval (ISI), which is the duration between two successive

saccades. Previous studies indicate that ISI length may be linked to mental workload, but this link

is poorly understood for tasks that are not primarily visual. Therefore, the current study was an

investigation of ISI and workload intensity in three non-visual tasks: mental arithmetic, verbal

fluency, and audio perception. Workload was manipulated through changes in task difficulty as

well as study participant motivation level. An analysis of eye movements and other experimental

workload measures indicated a significant association between audio perceptual workload and ISI

length.



iii

Acknowledgments

This thesis could not have been possible without the help of my two supervisors: Dr. Robin Green

and Dr. Geoff Fernie. Thank you Robin, for teaching me everything I ever wanted to know about

experimental psychology (and perhaps more). Geoff, thank you for working tirelessly to give all

researchers at Toronto Rehab a roof and a vision. Collaborating with both of you has been extremely

educational.

To the others on my committee, Dr. Jay Pratt and Dr. Anthony Easty, and my external reviewer, Dr.

Luc Tremblay: thank you very much for your insights and encouraging words.

I am also grateful for the financial support of the Toronto Rehabilitation Hospital, the University of

Toronto, and Ontario Centres of Excellence.

Finally, a debt of gratitude is owed to my friends, family, and colleagues for their encouragement and

assistance. In particular, thank you Jessica for being there to support me through the final throes. It

is fitting that the thesis begins and ends with some of your handiwork.



iv

Table of Contents

List of Tables vii

List of Figures viii

List of Appendices ix

Chapter 1. Introduction and Rationale 1

Chapter 2. Literature Review 2

2.1 Workload Measurement in Rehabilitation Medicine 2

2.1.1 Application: Rehabilitation Treatment 2

2.1.2 Application: Neuropsychological Assessment 4

2.2 Theory of Mental Workload 5

2.3 Workload Measurement 12

2.3.1 Subjective Measures 13

2.3.2 Performance-Based Measures 17

2.3.3 Physiological Measures 192.3.3.1 Electroencephalographic Activity (EEG) 202.3.3.2 Surface Electromyography (EMG) 212.3.3.3 Electrocardiogram (ECG) 222.3.3.4 Electrodermal Activity (EDA) 262.3.3.5 Respiration 282.3.3.6 Eye Blinks 292.3.3.7 Pupil Size 32

2.3.4 Saccadic Eye Movements 332.3.4.1 Definition of Saccades 33

2.3.4.2 Definition of Intersaccadic Interval (ISI) 352.3.4.3 Eye Movements and Mental Workload 372.3.4.4 Eye movements and Non-Visual Tasks 40

Chapter 3. Objectives and Hypotheses 47



v

Chapter 4. Pilot Study 51

4.1 Methods 51

4.1.1 Participants51

4.1.2 Materials 524.1.2.1 Eye Tracking 524.1.2.2 Physiological Measures 544.1.2.3 Subjective Workload Ratings 554.1.2.4 Neuropsychological Tasks 56

4.1.3 Design 59

4.1.4 Procedures 60

4.1.5 Analysis 64

4.2 Results and Discussion 65

4.2.1 Subjective Workload Data 65

4.2.2 Task Performance Data 69

4.2.3 Eye Movement Data 72

4.3 Conclusions 74

Chapter 5. Full-Scale Study 76

5.1 Restatement of Hypotheses 76

5.2 Methods 77

5.2.1 Participants 77

5.2.2 Materials 785.2.2.1 Eye Tracking 785.2.2.2 Physiological Measures 795.2.2.3 Neuropsychological Tasks 805.2.2.4 Motivation Manipulation 815.2.2.5 Post-Experiment Questionnaire 82

5.2.3 Design 83

5.2.4 Procedures 84

5.2.5 Analysis 855.2.5.1 Parametric Model Assumptions 865.2.5.2 Hypothesis Testing Methods 885.2.5.3 Multiple Comparisons Correction 91



vi

5.3 Results and Discussion 95

5.3.1 Hypothesis Testing 955.3.1.1 Average Heart Rate 955.3.1.2 Average Saccade Rate 975.3.1.3 Trial Proportion of Long ISI 98

5.3.1.4 Holm-Bonferroni Correction 100

5.3.2 Post-Experiment Questionnaire 101

5.3.3 Convergence Measures 104

5.3.4 Agreement of Eye Movement Results with Previous Literature 107

5.3.5 Eye movements and Mental Workload 109

5.4 Conclusions 110

Chapter 6. Limitations 113

Chapter 7. Extensions 116

References 121

Appendices

Appendix A – Eye Tracker Specifications 132

Appendix B – Pre-Test Questionnaire 133

Appendix C – Pilot Study ISI Histogram 135Appendix D – Full-Scale Study Post-Test Questionnaire 136

Appendix E – Full-Scale Study Change Score Box Plots 140

Appendix F – Full Hypothesis Testing Battery 149



vii

List of Tables

Table 1. Saccade Detection Thresholds in Previous Literature

Table 2. Descriptions of Experimental Tasks

Table 3. Participant Demographic Summary

Table 4. Summary of Trends Subjected to Statistical Analysis

Table 5. Summary of Relevant Previous Literature



viii

List of Figures

Figure 1. Illustration of the resource allocation principle wherein resources allocated to a primary

task lead to a decrease in secondary task performance, unless in a data-limited state.

Figure 2. Illustration of the division of mental resources in Wicken’s multiple resource theory (1984).

Figure 3. Sanders’ (1993) cognitive-energetical model of a choice reaction task execution.

Figure 4. Robert & Hockey’s (1997) control system model of mental effort, including control loops

for automatic processes (Loop A) and for effortful processes (Loop B).

Figure 5. Change in task difficulty ratings from the nominal to high difficulty versions of each task..

Figure 6. Change in subjective effort ratings from nominal to high difficulty version of each task.

Figure 7. Change in subjective effort ratings from 1st to 2nd half of experiment, by experimental

group; data is pooled over all task types.

Figure 8. Mean and standard deviations of (percent) change in number of correct responses from

nominal to high difficulty task versions.

Figure 9a. Portion of trials in which performance improves from 1st to 2nd half of experiment

(pooled over both difficulty levels).Figure 9b. Portion in which performance degrades.

Figure 10. Mean and standard deviations of change in task performance from 1st to 2nd half (pooled

over difficulty levels), using data from only those participants who improved.

Figure 11a. Illustrating error resulting from residual change score method on unmatched groups.

Figure 11b. ANCOVA method is preferred in this case because it does not pool data for regression.

Figure 12. Plot of median ISI versus average saccade rate for all trials reveals a very consistent (r 2 =

0.86) inverse relationship; note logarithmic axes.

Figure 13. Individual observations of average heart rate in 1st half versus 2nd half for high difficulty

auditory task suggests no effect of motivation.

Figure 14. Individual observations of Trial Portion Long ISI in 1st half versus 2nd half for high

difficulty auditory task suggests a weak motivation effect.



ix

Figure 15. Tally of participants’ starting letter perceived difficulty classifications.

Figure 16. Tally of participants’ starting letter perceived effort classifications.



x

List of Appendices

Appendix A – Eye Tracker Specifications

Appendix B – Pre-Test Questionnaire

Appendix C – Pilot Study ISI Histogram

Appendix D – Full-Scale Study Post-Test Questionnaire

Appendix E – Full-Scale Study Change Score Box Plots

Appendix F – Full Hypothesis Testing Battery



1

Chapter 1. Introduction and Rationale

The objective of this thesis is to investigate the relationship between mental workload and the length

of people’s intersaccadic intervals during non-visual tasks. If a correlation can be found, the

intersaccadic interval may be a candidate for workload measurement during neuropsychological

assessments and cognitive rehabilitation treatments. It will be demonstrated not only that these

assessments and treatments would benefit from a measure of workload, but that there is currently no

valid and reliable measure available. An eye tracker-based measurement would be particularly well-

suited to these applications because it has the potential to be non-invasive (using a remote eye

tracker) and would have the temporal resolution necessary for the short task periods that are

sometimes involved.

The first task of any mental workload research is to define the subject of investigation; though its

manifestations are well known, they are also poorly understood. Everyone has experienced feelings

of mental taxation that accompany very difficult tasks, skill learning, and working under high

environmental demands. However, reviews of some 40 years of theoretical and experimental work

indicate that there is no universal definition for workload (Xie et al., 2000; Cain, 2007), let alone one

with any empirical validation.

It remains that there is some regulatory factor or group of factors whose measurement would be

useful in a variety of applications. Studies concerned with the measurement or prediction of “mental

workload” and its aliases, such as mental effort and cognitive load, tend to define it in operational

terms. For example, a study interested in the relationship between job satisfaction and workload

might define it as worker’s perceived (subjective) workload over the course of a day. In a similar



2

fashion, the current study will define mental workload operationally, as relevant to the intended

rehabilitation medicine applications.

Chapter 2. Literature Review

This literature review is divided into three primary sections. The first section addresses the intended

applications for a mental workload measurement tool in rehabilitation medicine. The second

describes various theories of mental workload that address specific aspects of these clinical

applications as well as its general conceptualization in the engineering psychology literature. The

third section is a compilation of experimental evidence for the success and limitations of various

proposed workload measurement methods, including the intersaccadic interval. Those measures

that have been included in the current investigation receive special attention with respect to practical

measurement issues.

2.1 Workload Measurement in Rehabilitation Medicine

2.1.1 Application: Rehabilitation Treatment

Does the degree to which a client “tries” during a cognitive rehabilitation task affect clinical

outcomes? Although the concept of measuring patients’ involvement in rehabilitation tasks has been

put forward (Papadelis, 2007), there are no studies that directly link cognitive rehabilitation

outcomes to patients’ effort levels. The reason for this omission may be the lack of a gold standard

effort measure (Salthouse, 2006). Certainly, a lack of empirical evidence for the link between effort

and outcome has not hampered its implementation by clinicians, who tend to set task difficulties

such that a patient is challenged but not discouragingly so. However, there is plenty of indirect

evidence in support of this practice.



3

Studies of learning are one source, as it has been suggested that rehabilitation treatment should

follow the same “biological rules” as the deliberate practice involved with active skill-acquisition

(Kwakkel, 2006). The results of two comprehensive studies of skill-acquisition strategies suggest that

“participants’ motivation to attend to the task and exert effort to improve their performance” is the

factor most frequently associated with performance improvement (Ericsson et al., 1993). Mulder

(1986) also noted that the speed of skill acquisition has been correlated with arousal, which is, in

turn, thought to accompany effortful activities. An issue of particular importance to rehabilitation

medicine is the disambiguation of improvements in (compensatory) behavioural strategies versus

those in more general, physiological functionality (i.e., neuroplasticity). In this regard, there is

evidence from animal research that supports a “top-down” driven model of adult sensory map

plasticity (Polley et al., 2006). This result speaks to a potential role for effort in neuroplastic change

because effort is essentially a means of consciously regulating goal-oriented behaviour.

The importance of effortful activities in rehabilitation is also stressed in the environment enrichment

literature, which studies the effect of social interaction and other enrichment variables on cognitive

ability in aging as well as brain injured people. Schooler (1987) theorized that the benefits of an

enriched environment are the result of a feedback loop between increased involvement in cognitively

challenging activities and resultant improvements in cognitive functioning (which subsequently

encourages further involvement). In response, a recent pilot study by Green et al. (2006) has tested

the hypothesis that participation in cognitively challenging activities leads to an increase in cognitive

functioning. This study found that healthy participants who carried out mentally challenging tasks

daily for two weeks showed a greater improvement (from baseline) on a test battery for general

cognitive ability than participants who engaged in light reading for an equivalent treatment period;

improvement did not appear to be attributable to acquisition of new strategies. Evidence of general



4

cognitive improvements – which have potential for far-transfer – suggests that the development of

behavioural strategies is not the sole beneficiary of effortful mental activity.

Mental effort measurement could clearly contribute to ongoing research on the relationship between

exertion during rehabilitation treatment and outcomes. It may also be useful to clinicians that

currently rely on intuition alone in determining whether patients are being challenged too much or

too little. Another related application is in feedback systems for computer-administered treatments,

an area of considerable research interest.

2.1.2 Application: Neuropsychological Assessments

Effort (motivation) is an important consideration in neuropsychological testing because it can

confound the relationship between task performance and cognitive ability in some circumstances. It

has been demonstrated that another motivational factor, goal setting, can affect brain-injured

participants’ performance on math problems (Gauggel, 2002a; Gauggel, 2002b) and the Purdue

Pegboard Test (Cardall, 1943 for description of test; Gauggel, 2001). Healthy participants’

performance during a word association task was improved through feedback in a study by Podsakoff

& Farh (1989). Most compellingly, in a comparison of athletes’ baseline and post-concussion

neuropsychological assessments, Bailey et al. (2006) concluded that the will to continue playing

motivated them to significantly improve their test performance. However, there are also studies that

had mixed success in motivating participants to perform better on tests. Richards & Ruff (1989)

reported that San Diego Neuropsychological Battery (Ruff, 1985) outcomes in both healthy and

depressed participants were unaffected by the opportunity for a monetary reward. The use of

another motivational technique, performance feedback, was also shown to have no effect on the

response times of healthy participants, although it did affect brain-injured patients (Gauggel et al.,



5

2000). Therefore, it appears that it is not a matter of whether motivation can affect

neuropsychological test outcomes, but under which circumstances and to what degree.

One such circumstance is the extremely low effort that is often discussed in the malingering

literature. In this context, “low effort” is generally used to refer to wilful deception, and although

some strategies may involve a state of low mental activity, others may actually require added mental

work to produce incorrect answers. However, some of the many effort tests introduced as tools for

malingering detection (comprehensive test reviews include: Bianchini et al., 2001; Vickery et al.,

2001; Lynch, 2004) can also be used to detect involuntarily low effort, though the sensitivity of

individual tests may vary. For example, the hypothesis that (involuntarily) low effort could account

for schizophrenic participants’ poor performance on some neuropsychological tests was tested using

the Victoria Symptoms Validity Test (Egeland et al., 2003) and the Working Memory Test (Gorisson

et al., 2005), but low effort was only reported by the latter. Furthermore, Kessels et al. (2007) tested a

similar hypothesis for participants reporting symptoms of depression, and while they conclude that

there is evidence of an inability to allocate effort, neither the Test of Memory Malingering (Rees et al.,

1998) nor the Amsterdam Short-term Memory Test (Shagen et al., 1997) could detect it.

Another circumstance may be in cases of milder cognitive impairment. For example, Van Zandvoort

et al. (1998) reported on the insensitivity of some neuropsychological tests to cognitive deficits in

lacunar stroke patients, and theorized that the patients were able to conceal compensate during these

tests through the exertion of greater effort. Although it may be possible to reveal these deficits by

administering more difficult tests, the ecological validity of testing under extremely high effort is

itself questionable, as the vast majority of day-today tasks do not involve high effort. Furthermore, it

has been suggested that low motivation may be a primary contributor to dysfunction in



6

schizophrenic (Egeland et al., 2003; Gorisson et al., 2005), depressive (Kessels et al., 2007; Layne,

1980), and brain injured (Riese et al., 1999; Oddy et al., 2008) patients. However, there is currently

no generally accepted tool for the measurement of patient motivation (Oddy et al., 2008). Therefore,

the potential roles of mental effort measurements in neuropsychological assessments not only

include standardizing conventional tests, but also the development of more ecologically valid tests.

2.2 Theory of Mental Workload

As previously mentioned, mental workload is wide and elusive concept. Therefore, it is important to

remain grounded in those elements that can be observed. Tsang & Vidulich (2006) neatly summarize

the three manifestations of mental workload: “subjective experience, performance, and physiological

manifestations.” The following discussion will summarize some of the more enduring and/or

applicable models that have attempted to tie these phenomena together in a unified theory of

workload. As previously eluded to, none of these models are directly applicable to the empirical

goals of this thesis, but their inclusion will serve to align this thesis with the perspective of decades of

similar research.

In Kahneman’s seminal book (1973), he described a task performance feedback loop wherein mental

resources are allocated in response to task demands. He also described a mechanism of allocation

involving arousal, which was supported by his empirical research on pupil diameter and task

difficulty. In this context, workload would be defined as the proportion of an individuals’ total

resource capacity allocated at a given moment. Though his theory of resource allocation, the

“Resource Model,” was originally conceived in response to observations of dual-task (multitasking)

performance, it is also very commonly referenced in discussions of single task scenarios. A



7

discussion of the validity of the resource model is beyond the scope of this review, but critiques have

been written by Navon (1984) and Sanders (1997).

As mentioned, the resource model was a response to dual-tasking behaviours. According to the

model, there are a limited amount of resources that must be distributed amongst simultaneous tasks.

Therefore, when we engage in one task, less capacity is left for the processing of another task, with

the ultimate result of degraded task performance. Norman & Bobrow (1975) investigated this

phenomenon using a dual-task paradigm, wherein primary task performance was measured while

participants simultaneously carried out a secondary task of variable difficulty. They found that

primary task performance was generally negatively correlated with secondary task difficulty (Figure

1), which was presented as evidence for the shared allocation of resources between tasks. Notably,

they also attempted to clarify the meaning of resources as, “...such things as processing effort,

various forms of memory capacity, and communication channels” (Norman & Bobrow, 1975, p. 45).

Where a correlation between primary task performance and secondary task difficulty did not occur,

Norman & Bobrow defined performance as being data-limited, otherwise it was resource-limited.

“Data-limited” means that primary task performance is limited by the quality of the data, and

therefore independent of any further resource allocation. For example, in detecting an auditory

stimulus amongst noise, a person might perform better if they are not occupied with a difficult

secondary task, but if the stimulus signal to noise ratio is poor enough, then no increase in attention

or auditory acuity will improve performance.



8

Figure 1. Illustration of the resource allocation principle wherein resources allocated to a primary task lead

to a decrease in secondary task performance, unless in a data-limited state.

A widely accepted amendment to the original resource model is the multiple resource model

introduced by Wickens (1984). This theory is intended to account for the observation that the extent

of task interference depends on the similarity of their processes, rather than their difficulty alone.

Wickens (2002) illustrates this concept with an anecdotal example: a driver’s performance would be

expected to suffer much more while following written directions compared to spoken directions

because reading would take the driver’s eyes off the road. Such interference is formalized in

Wickens’ predictive model, which describes pools of resources that are defined on four dimensions

(Figure 2): perception modality, processing code, processing stage, and response type. It is asserted

that multitask interference is somewhat predicted by the number of resource pools that are

commonly shared by the tasks.

P r i m a r y

T a s

k P e r f o r m a n c

e

Resources Allocated to Primary Task

data-limited

resource-limitedsecondary

task madeeasier



9

Figure 2. Illustration of the division of mental resources in Wicken’s multiple resource theory (1984).

Although originally conceived as a model of dual-task performance, the performance resource

function in Figure 1 is also commonly used to conceptualize the effect of wilful resource allocation

on performance during a single (resource-limited) task. Such wilful resource allocation was

originally challenged by Kahneman (1973). He cited a number of studies in which task performance

was not affected by participants’ motivation levels, concluding that “...allocating less effort than the

standard probably will cause a deterioration of performance [but] allocating more than the standard

seems to be beyond our ability” (Kahneman, 1973, p. 15). However, evidence from the previous

discussion of motivational effects on neuropsychological test performance in concussed athletes

seems to contradict this statement. Wickens (1991) also provides contrasting examples of studies in

which task performance is affected by instructions to “try harder,” by the imposition of performance

criteria, and also by motivational incentives. Further evidence for wilful resource allocation is

implicit in the practices of dual-task researchers, who must control participants’ priority between the

Spatial

Verbal

Perception

Cognition

Responding Manual(Spatial)

Vocal(Verbal)

Visual

Auditory

Manual(Spatial)

Vocal(Verbal)Codes

Modalities

ResponsesStages



10

primary and secondary tasks, so that variability in performance is minimized (Damos, 1991). The

results of Washburn & Putney (2001), who observed that task performance improved when a task

was made more difficult, would also be difficult to explain using Kahneman’s (1973) perspective.

These various viewpoints may be reconciled by concluding that people may often choose to achieve

maximum (standard) performance without being instructed to do so, but there are circumstances in

which they will allocate less resources, leaving room for performance improvement.

This compromise would not satisfy Mitchell & Hunt (1989), who question the more fundamental

assumption that maximum effort is limited by task demands by suggesting that a sufficiently

motivated person may invest themselves completely in any task, no matter its difficulty. However,

the “effort” they are referring to is conceptually different from Kahneman’s (1973) effort, not

necessarily being captured in measures of task performance, but perhaps in measures of perceived or

“subjective” workload. Pashler (1989) emphasizes the importance of this aspect of workload,

criticizing research in which workload is manipulated through task demands, but a subjective

workload measure is not used to confirm the success of this manipulation. He argues that these

studies “might better be described as revealing physiological concomitants of information-processing demands rather than effort” (Pashler, 1989, p. 382). While an information-processing

focus could be considered valid in some applications, it is inappropriate in the context of cognitive

rehabilitation for two reasons. Firstly, considering that people commonly achieve their maximal

performance by default according to Kahneman, the emphasis placed by Ericsson et al. (1993) on

effort for successful learning (see Section 2.1.1) indicates that they are referring to something that is

not necessarily indicated by task performance. Secondly, subjective effort reports are logically a more

direct measure than task performance for determining and maintaining workload tolerance

thresholds in patients.

The “cognitive-energetical” model of information processing (Sanders, 1983) distinguishes between

computational resource allocation, as it was originally conceived by Kahneman (1973), and an



11

energetical concept of effort. “Energetics” refers to the “intensity of behaviours” from drive, arousal,

and activation to stress, fatigue, and strain (Unema, 1995). The model was first introduced by

Sanders (1983) as a means of mapping the cognitive stages involved with choice reaction tasks while

accounting for stressors such as sleep deprivation (Figure 3). Information is channelled through

various processing stages, each having a limited, but variable capacity. Although Sanders uses only

four stages to model the choice reaction task, a different task may potentially involve others. The

processing capacity of the stages is thought to be controlled by a three-part mechanism that is based

on the work of Pribram & McGuinness (1975): arousal, activation, and effort. Arousal and activation

respectively affect the physiological response to stimuli and the readiness to respond, while effort

may act in two ways: as a regulator of basal state (arousal and activation) or in the regulation of

“controlled” executive processes (“response choice” in Figure 3). Other stages are referred to as

“automatic,” which require minimal resources, do not interfere with other processes, and do not

become more efficient with practice (Hasher & Zacks, 1979). Furthermore, it is thought that only

controlled processes are intentional and therefore affect our perception of subjective effort (de

Waard, 1996).

Figure 3. Sanders’ (1993) cognitive-energetical model of a choice reaction task execution.

StimulusPre rocessin

Evaluation

Arousal

FeatureExtraction

ResponseChoice

MotorAd ustment

(response)

(stimulus)

Activation

Effort



12

Robert & Hockey’s (1997) model (Figure 4) focuses on the mechanisms of effortful control, in loop

A, where they demand no appreciable level of effort. However, if a process or environmental

stressors are great enough that task performance becomes a concern, then control shifts to loop B,

wherein either additional effort can be exerted to improve performance or the performance goal can

be downgraded. A key observation is that when additional effort is exerted, it incurs a physiological

cost: an increase in “sympathetic and musculo-skeletal responses as well as neuroendocrine stress

patterns.” These costs can eventually lead to fatigue, which they liken to the adoption of a low-effort

strategy. In this state, less effortful behaviours are chosen though they are detrimental to task

performance.

Figure 4. Robert & Hockey’s (1997) control system model of mental effort, including control

loops for automatic processes (Loop A) and for effortful processes (Loop B).

Along the lines of Sanders’ (1983) dual-functionality of effort, Mulder (1986) draws a distinction

between two types of effort: compensatory effort and processing complexity. The basis for this

distinction is the observation that experimental manipulations of stressors, such as time constraint

or adverse environments, tend to result in different physiological responses compared to

manipulations of task demands. This concept was employed by Unema (1995), who found that one

measure of eye movement was responsive to manipulations of Sternberg memory task (Sternberg,

SupervisoryController

TaskGoals

EffortMonitor

Loop A

ActionMonitor

external load

Loop B

overtperformance



13

1969) difficulty (i.e., target set size), while another was responsive to both difficulty and a monetary

motivation. The duality of effort is interpreted by Robert & Hockey (1997) as set-points in their

model. They describe the effort set-point of loop A as being equivalent to the computational effort,

which is determined by task demands, and the loop B set-point as being driven by motivation due to

the perceived value of performance goals. It follows that the latter set-point would be more

susceptible to modification under stressful conditions.

2.3 Workload Measurement

This discussion will not consider those tools that are obviously impractical for use in the clinical

applications that have been previously outlined. Practical limitations include equipment

requirements (such as in the case of functional brain imaging of metabolic activity), intrusiveness

(concurrent, secondary task performance, e.g. random number generation while driving), and very

poor temporal resolution (body temperature or endocrine indices of urine, saliva, or blood plasma).

The literature from the 1970’s through the early 1990’s is very well summarized in the reviews of

Moray (1979, 1988), O’Donnell and Eggemeier (1986), Kramer (1991), Eggemeier et al. (1991),

Wilson and Eggemeier (1991), and de Waard (1996). In describing other, more current reviews,

Cain (2007) points out that the works of “rather few” researchers are repeatedly cited, indicating that

the growth of the field has because slowed. Nonetheless, work has continued to some extent, some of

which being described in Tsang & Vidulich’s recent chapter in the Handbook of Human Factors and

Ergonomics (2006). Although the following discussion will reflect the fact that the bulk of the

workload measurement literature is over a decade old and frequently surveyed, references to newer

research will elaborate on these results wherever possible.



14

The various measures are organized into the three conventional categories: subjective measures,

performance-based measures, and physiological measures. Although saccadic eye movements are

considered a physiological measure, they are discussed in a separate sub-section, as their designation

as either a physiological or performance measure is debateable, depending their particular

application (de Waard, 1996).

2.3.1 Subjective Measures

Asking participants to rate their effort is the most direct way to measure the phenomenal experience

of workload, and therefore subjective ratings are said to have high “face validity” (Cain, 2007).

However, their validity depends on the needs of the researcher. Where workload is conceptualized as

processing resource allocation, a subjective measure is not useful because it generally does not

correlate with task performance (Gopher & Braune, 1984). Furthermore, it has been found that

subjective effort is primarily affected by the demands of processes that are consciously “well-

defined,” such as those involving the working memory (O’Donnell & Eggemeier, 1986; Tsang &

Vidulich, 2005). There is little more known about the relationship between perceived effort and

other underlying cognitive processes. Pashler (1998) approaches the question analytically,

considering possible functions for it: sensations of effort may be an evolutionary advantage in that it

conserves mental energy, or they may serve to warn of the impending failure of an overloaded brain.

However, he concludes that empirical evidence has not lent itself to one or the other. A case study of

Naccache et al. (2005) of a brain injured subject indicates that although subjective effort may

normally be linked to conscious control, it is a not prerequisite. Between a Stroop task that was

congruent (ink colour matched word) and incongruent, the participant did not perceive any change

in mental effort nor did they exhibit an increased skin conductance (arousal) response.



15

However, there are scenarios in which subjective effort ratings are appropriate. In the

aforementioned rehabilitation treatment application, an indication of effort sensation would be

suitable as a predictor of frustration. Nonetheless, care must be taken in designing the rating system,

to ensure that it actually reflects participants’ experiences. Referring to the close correlation between

subjective ratings and task difficulty, Gopher & Donchin (1986) assert that subjective ratings of

effort are no more than thinly disguised task analyses. That is, they suggest that it is naive to assume

that participants cannot recognize experimental manipulations of task difficulty, so that subsequent

ratings are more indicative of participants’ observations of these manipulations rather than their

“feelings” of effort. According to O’Donnell & Eggemeier (1986), the solution is careful wording of

the subjective rating system to ensure that participants understand what they are being asked to

report. An interesting footnote in the previously discussed study of Naccache et al. (2005) suggests

that it is reasonable to expect people to differentiate between imposed demands and perceptions of

effort. Although the brain injured subject said that they did not feel any change in effort, they were

able to distinguish a change in the objective difficulty of the task.

There are a number of subjective workload rating systems with different instructions and

administration methods. They can all generally be classed as unidimensional or multidimensional

scales. For the unidimensional type, participants rate their workload on a single numeric or

geometric scale. For the multidimensional type, more than one scale is used to differentiate between

different aspects of workload (i.e., time load versus stress load), then the scales are combined into a

single measure.

Two variations of unidimensional systems are guided scales and visual analog scales. The Modified

Cooper-Harper and Bedford scales are guided, meaning that they use flowchart to assist the



16

participants in choosing a workload rating. From the samples published by Eggemeier & Wilson

(1991), it is clear that the utility of these particular scales is limited to the evaluation of engineered

systems. There are three unidimensional rating systems that have been validated against the more

widely researched multidimensional systems (discussed below): the 21-point Overall Workload Scale

(OWS, Vidulich & Tsang, 1987), the 15-point Rating Scale Mental Effort (RSME, de Waard, 1996),

and the open-ended magnitude estimation method of Gopher & Braune (1984). Whereas the OWS

involves a scale with the title “Overall Workload” and two scale labels at either end (“high” and

“low”), the RSME asks “how much effort” the task required and includes nine descriptive labels

(from “absolutely no effort” to “extreme effort”). Gopher & Braune’s (1984) scale uses a different

approach, giving participants a reference task to which a rating (number) is assigned. Participants

are then asked to assign a rating to subsequent tasks without any further limitations. There are also

examples of unidimensional scales that have been used without any documented attempt at

validation. Kennedy (2000) used a “mental demand” visual analog scale to successfully differentiate

between counting by 7’s and 3’s from a 3-digit number. Bergeur (2001) asked surgeons to use a 7-

point analog scale of “mental concentration and mental stress” to rate their mental effort during 2-

minute procedures.

The NASA-Task Load Index (NASA-TLX) and the Subjective Workload Assessment Technique

(SWAT) are the two most common multidimensional rating systems, and they are described in some

detail by Eggemeier & Wilson (1991). The NASA-TLX consists of six, 21-point scales (mental

demand, physical demand, temporal demand, performance, effort, and frustration) and a set of 15

paired comparisons to establish the relative contribution of each scale to the subject’s perceived

workload. The results of the paired comparison section are used to create a weighted average of all

the scales: the overall workload rating. The SWAT possesses only three, 3-point scales (time load,



17

mental effort load, and psychological stress load). For the purpose of establishing scale weights, 27

cards containing each possible selection on each scale are sorted into an order that reflects the

subject’s perception of increasing workload. Unlike the paired comparison in the NASA-TLX, the

SWAT sorting procedure takes place once: before the subject is asked to carry out the experimental

task.

The incorporation of conjoint scaling (card sorting task) is intended to impart interval properties to

the rating, but it limits the practical scale length and, therefore, theoretical sensitivity of the SWAT,

for which convergent validity has been confirmed in comparisons with the TLX (Rubio et al., 2004).

Reid & Nygren (1988) have shown that SWAT ratings can indeed possess interval properties (i.e.,

two different ratings can be compared not only in terms of which one is higher, but by how much).

However, Annett (2002) argues that only scales for which there have been norms established for the

intended subject populations and tasks should truly be considered to have interval properties.

Fortunately, he says that only ordinal properties, which are implicit in any subjective scale, are

required for the experimental comparison of conditions, as opposed to establishing general design

standards. As a side note, Cain (2007) recommends the use of non-parametric analysis methods for

subjective rating data due to their ordinal nature.

The theoretical advantage of multidimensional ratings systems over unidimensional systems is the

ability to discern specific aspects of workload (“diagnosticity”). However, because there is no

apparent advantage in terms of sensitivity (Tsang & Vidulich, 1987; Hill et al., 1992; Zijlstra, 1993), it

is recommended that unidimensional ratings are used wherever diagnosticity is not needed (Hendy,

1993). Furthermore, Rubio (2004) suggested that the diagnostic value of the NASA-TLX and SWAT

is questionable because these scales do not actually refer to the resource types in Wickens’ (1987)



18

multiple resource model. The obvious advantage of a unidimensional rating is its simplicity, which

can be reflected in the time to complete a rating. The SWAT is particularly problematic in this

regard, considering that the card sorting task can take as long as an hour to complete (Wierwille &

Eggemeier, 1993). Although the NASA-TLX has been reported to require only 60 seconds (on

average) for trained participants (Hill et al., 1992), its complexity has caused confusion in pilots

(Veltman, 1993). Furthermore, administration of the TLX to untrained (civilian) participants was

reported by Rubio et al. (2004) to take an average of 7.5 minutes, amounting to 60 minutes for the

eight different ratings.

A notable method of reducing the complexity of multidimensional rating systems is to discard the

scale weighting procedure (i.e., SWAT card sorting and TLX paired comparison set), instead

calculating overall workload as an equally weighted average. Following upon Hendy’s (1983)

successful demonstration of this method for TLX ratings, Goonetilleke & Luximon (2001) have

introduced a continuous SWAT (C-SWAT) system, which uses a visual analog scale in the place of

the original, discrete (three point) scale. In their evaluation of the C-SWAT, they report that highest

sensitivity was achieved using this continuous scale and equal scale weights, as opposed to using

weights assigned by paired comparison sets (as are used by the TLX) or the usual card-sorting

method.

2.3.2 Performance-Based Measures

In a previous section, Theory of Mental Workload , two behavioural manifestations of mental

workload were identified: task performance and the perception of effort. It has previously been

discussed that subjective effort ratings have been shown to diverge from performance measures due



19

to limitations in our ability to self-monitor. Performance similarly can not be considered an

uncomplicated measure of workload.

Indeed, the relationship between task demands and effort is complex and, therefore, not always

predictable, which leads Vidulich & Wickens (1986) to warn researchers against the careless

interpretation of performance and subjective effort ratings. To illustrate with an example: in the face

of increasing task demands participants’ performance scores generally decrease, but it is certainly

plausible in many scenarios that participants might work harder in order to maintain or even

increase task performance, such as was observed by Washburn & Putney (2001). Thus, when a

researcher observes an increase in task performance scores, additional measures are required to

clarify whether the subject is finding the task easier (i.e., learning of some type has occurred; anxiety

has decreased) or working harder.

Performance measures are also insensitive to workload changes in data-limited task scenarios, which

is often a characteristic of low demand tasks (Eggemeier et al., 1982). When task performance is

already at ceiling, any further exertion will obviously not affect performance. Furthermore, the

addition of stressors may not affect performance on a low demand task. Take, for example, a

memory span task wherein five digits are presented, and then they must be recited back after a short

pause. It is clearly impossible to recall more than five digits, no matter the effort expended. Also, if

the task was made more difficult through the additional stress of noise or uncomfortable heat, the

subjective effort rating would likely be affected, while performance in healthy participants most

likely would not. Therefore, while performance-based measures may have utility in more demanding

conditions, which is most often the case with dual task experiments, they can be insensitive to both

internally and externally driven changes in workload in low demand, single-task conditions. Notice



20

that this limitation is the primary impetus behind incorporating another measure of mental

workload into neuropsychological assessments.

There is also a possibility that different performance measures of a single task may diverge (i.e., show

speed-error trade-off). For example, most self-paced, discrete response tasks require participants to

strike some balance between accuracy and response latency, leaving them vulnerable to a speed-error

trade-off. Farmer & Brownson (2003) emphasize the importance of measuring both variables,

though an effort interpretation may not be possible in the event that they diverge. However, they

state that divergence is the exception rather than the rule, so measurement of the second variable is

usually only necessary as verification. It is also conceivable that participants may concentrate their

efforts on a performance variable that the experimenters are not measuring or even aware of.

Experimental design, including the choice of test instructions, appropriate control tasks and post-

experimental interview (to obtain a verbal report of subjects’ strategies) must be employed to avert

this problem.

As previously discussed, task performance is sometimes affected by wilful changes in effort for a

given task. Although an increase in performance may indicate a shift to a more efficient but also

effortful mental processing state (or strategy), it could also indicate task learning, which results in

higher efficiency but with constant or reduced effort. It is also plausible that participants may initiate

gross shifts in task execution strategy that would not generally be classified as “learning.” An

example might be the employment of sub-vocal rehearsal during the memory span task. Although

such gross shifts in strategy may be controlled through task training, detailed instructions, and

comprehensive interviewing, more subtle shifts in mental processing strategy may be undetectable.



21

To this end, performance measures must be corroborated with subjective ratings and validated

physiological measures.

2.3.3 Physiological Measures

In the search for more sensitive, objective, and instantaneous measures of workload, researchers

have turned to physiological signals. Whereas early studies might have concentrated on a single

measure in their analysis of task workload, more contemporary studies tend to employ multiple

physiological measures as well as performance and subjective rating indices (Kramer, 1991). The

rationale is that different measures tend to be more or less sensitive in various contexts as well as to

types and intensities of workload. A recent trend in workload research is to combine measures into a

predictive model that can be adapted from empirical results using, for example, neural network

training (Van Orden et al., 2001; Noel et al., 2005) or multiple regression analysis (Myung & Ryu,

2005).

Although these approaches may yet achieve success in their practical aims, they may be premature

when the underlying bases of the individual measures require further investigation. In 1988, Moray

wrote that progress in physiological measurement has been hampered by undeveloped theoretical

foundations, and with little examination of these underlying mechanisms, it is arguably still true

today. In a similar vein, it has been observed that measurement research has moved to complex,

applied environments such as aviation and driving, but conflicting results indicate that there is a

great deal of more controlled, laboratory work to be done. Thus, research on physiological measures

can still be considered in its infancy, having demonstrated a number of interesting phenomena, but

demonstrating little by way of synthesizing them (Kramer, 1991).



22

2.3.3.1 Electroencephalographic Activity (EEG)

EEG-based measurements of workload have been attempted in two forms: frequency band power

and event related potentials (ERP). Both forms are subject to the same practical difficulties,

especially in clinical applications: low signal to noise (environmental as well as muscle movement

and cardiac artefacts), complex signal interpretation, and invasiveness. These difficulties may limit

the practicality of this measure for some clinical applications.

EEG spectral analysis is a more controversial workload measurement technique. Although variations

in alpha (8-13 Hz) and theta (4-7 Hz) band power have been correlated with task difficulty in some

studies, others have revealed serious complications arising from individual differences and the

confounding factor of overall arousal level (Kramer, 1991). Furthermore, O’Donnell and Eggemeier

discounted the utility of this measure on the grounds of its insensitivity in their widely cited review

(1986). However, more recent studies of relatively fine gradations in one-dimensional tracking task

difficulty (verified with subjective workload ratings) revealed a significant effect on a combined

alpha band and blink rate measure (Myung & Ryu, 2005) as well as on a multi-band composite

measure (Berka et al., 2007). However, field observations of pilot workload by Noel et al. (2005)

revealed no pattern in any frequency bands.

Researchers have identified a number of characteristic EEG signal peaks that occur in response to

subject’s active processing of a discrete event. Experimentally, ERPs are elicited using auditory or

visual stimuli that are either imbedded in the task or presented as a secondary task. One of these

ERPs in particular, the P300 (positive potential occurring between 250 and 500ms after stimuli

presentation), is the most commonly studied in the workload literature. Kramer (1991) cites a

number of studies in which P300 amplitude correlates to changes in both task demands and priority



23

shifts between the primary and secondary task in dual-task experiments. The effects of task difficulty

manipulations on the P300 suggest an added complexity to the measure. In reaction time

experiments, response selection difficulty affected only P300 amplitude while manipulations of

perceptual difficulty affected both latency and amplitude (Wickens & Hollands, 2000).

2.3.3.2 Surface Electromyography (EMG)

Tonic tension in task irrelevant muscles has been associated with various components of mental

workload. The theoretical basis for this observation being that muscle tension is a component of

activation, which is a concomitant of effort and response performance. In their review, Unema

(1995) cites studies in which changes in tonic forearm EMG amplitude are linked to monetary

motivation, task learning, subjective effort ratings, and reaction time performance. In the reaction

time experiments, muscle tension effects were observed just before the actual response was made.

However, the results of Unema’s own experiments, which were composed of Sternberg memory

tasks (Sternberg, 1969), did not consistently support these studies.

According to de Waard (1996), EMG studies of workload are more recently favouring facial sites

rather than forearm sites, specifically the “lateral frontalis muscle, the corrugator supercilii and

orbicularis oris inferior.” They suggest that the frontalis muscle is especially suited to mental

workload measurement, because the others have been found to respond to emotionally charged

stimuli. However, forearm EMG sites have continued to be used, such as in the study of Papadelis et

al. (2007), which reported a significant effect on forearm extensor muscle EMG activity from flight

simulation task difficulty.

As with EEG-based measures, the clinical use of EMG is limited by practical issues of signal to noise

and, to a lesser extent, intrusiveness. Furthermore, physiological and anatomical variability limit its



24

use to within-subjects designs. Contradictory results have led O’Donnell & Eggemeier (1986) to

question the simplicity of its interpretation, asserting that the EMG signal may indicate not only

sympathetic activity, but also somatic efforts to counteract poor motor performance as a result of

sub-optimal sympathetic activity.

2.3.3.3 Electrocardiogram (ECG)

Heart Rate (HR)

Perspectives on HR as a mental workload measure are mixed. Wilson (1991) is a cautious proponent

of HR, which was a U.S. government certified measure of aviation workload at the time of writing.

While studies of flight simulation did not report reliable effects, HR correlates with difficulty ratings

by telemarketers. He suggests that HR reflects a degree of psychological stress that only occurs where

there are perceived consequences. This view is corroborated in Wilson’s more recent (2002) study,

wherein HR is found to be more sensitive than HRV to manipulations of flight demands. Papadelis

et al. (2007) also report a correlation between HR and the perceived difference in the difficulty of

passive versus active learning of a simulated dual tracking/vigilance task. However, there were no

special consequences to poor performance in this task.

In contrast to Wilson, O’Donnell & Eggemeier (1986) mention HR only briefly, suggesting that its

global sensitivity limits its usefulness. de Waard (1996) echoes this concern, specifying speech,

emotion, time-on-task, and physical exercise as important confounds. He cites the study of

Wierwille et al. (1985), which suggests that heart rate variability (HRV) is a more sensitive measure

of workload than HR. There have been attempts to interpret the contradictory HR findings of

previous studies in terms of various workload types, but the complexity of this interpretation has led

most to look toward HRV instead (Kramer, 1991).



25

Heart Rate Variability

HRV data have been analyzed using the time domain or the frequency domain, but the latter

dominates the workload literature. The advantage of spectral analysis is that changes in the spectral

power density of various frequency power bands is thought to correspond to specific physiological

phenomena. Mulder (1985) suggested that spectral data should be categorized into low (0.02 - 0.06

Hz), medium (0.07 - 0.14 Hz), and high (0.15 – 0.50 Hz) bands, which each contain peaks caused by

HR oscillations due to temperature regulation, blood pressure regulation, and respiratory influences,

respectively. Jorna (1992) clarifies this explanation by describing the two primary causes of HRV: the

baroreflex and respiratory sinus arrhythmia (RSA). The baroreflex is a blood pressure control system

that can regulate sympathetic nervous system responses in peripheral resistance, venous tone,

ventricular contractility, and blood volume, as well as both sympathetic and parasympathetic (vagal)

responses in HR. RSA refers to the spontaneous speeding and slowing of the heartbeat that

accompanies each breathing cycle, a pattern that is predominantly caused by the down- and up-

regulation of parasympathetic activity at the sinoatrial node. Due the differential effect of the

sympathetic and parasympathetic systems on the various frequency components of HRV, Jorna

argues that low frequency power (< 0.10 Hz) is indicative of sympathetic activity, high frequency

(RSA) power of parasympathetic activity, and their ratio of sympathovagal balance. However, only

the parasympathetic/high frequency correlation is supported in the review of Berntson et al. (1997),

with the caveat that respiratory frequency must be not be abnormally low.

Most studies of HRV and workload follow in the footsteps of Mulder’s early work (1979) by focusing

on spectral power reductions in the middle frequency band, although there is some evidence for low

and high frequency effects as well (Kramer, 1991; Mulder, 1992). Jorna (1992) presents a critical

review of these various investigations and concludes that HRV is not sensitive to fine gradations in



26

task difficulty but responds consistently to stressors and correlates well with perceived effort ratings.

Task difficulty manipulations such as pacing, visual stimulus quality, number of response options,

stimulus-response compatibility, stimulus timing uncertainty, and tracking task complexity did not

significantly affect HRV or in some cases the effects could be explained by concomitant changes in

physical activity or breathing. However, significant decreases in HRV were observed after coarse

manipulations of difficulty such as task/rest comparisons or the introduction of a secondary task as

well as gross changes in working memory demands that exceeded participants’ capacities.

Furthermore, HRV has been shown to correlate with between-subject effort ratings on a tracking

task, and is sensitive to task stressors such as time-load, inexperience, supervision/observation

during tasks, public speaking, and driving complexity. The influence of stressors in realistic task

scenarios may help to explain the correlation between HRV and piloting/driving difficulty in the

field studies described by Wilson & Eggemeier (1991).

Turning toward more recent studies of mid-band HRV power, Paas et al. (1994) found that HRV

was insensitive to changes in the difficulty of a learning strategy, but responded to the presence of

the task (versus a resting state). Unema (1995) discovered that during the search phase of a

numerical Sternberg Task (Sternberg, 1969), HRV responded to working memory load (1 versus 4

digits) and the introduction of a monetary incentive. Hilburn (1997) reported a significant effect for

changes in air traffic control automation level and traffic volume. Veltman & Gaillard (1998) found

correlations between HRV, “large” changes in flight simulation task difficulty, and subjective effort

ratings. In a study of finer difficulty gradations in a dual task, Myung & Ryu (2005) found that HRV

was significantly correlated with tracking task difficulty but not with concurrent arithmetic problem

difficulty, although both reliably affected subjective effort ratings. Thus, HRV may correspond to

changes in objective task difficulty even while subjective effort ratings are unresponsive.



27

Comparisons between studies are often complicated by the multitude of analysis options, but a

movement toward standardization has been underway through the 1990s (Cain, 2007). Notably, it is

common to see two types of spectral analysis methods: fast fourier transformation and

autoregressive modelling. Berntson et al. (1997) explain that because the latter technique is designed

to exclude noise, it generally produces cleaner (although possibly simplified) spectra. However,

similarities between the two methods lead to essentially equivalent results.

HRV is not a robust measure in the sense that the effect of a single false-positive or negative R-wave

potentially outstrip that caused by the manipulation of workload. Real-time, clinical HRV

measurements may be limited by the necessity of detecting and correcting such artefacts, a process

that is dealt with at length by Mulder (1992). Also, HRV is not an instantaneous measure, with

contemporary analysis techniques requiring a minimum window size of 30 – 40 seconds for middle

band power measurements (Mulder, 1992). Additionally, very lengthy windows are also problematic,

because the average heart rate should optimally remain constant.

Furthermore, it is generally recognized that HRV is affected by age, physical fitness level, body

position, muscle activity, and respiration patterns (Jorna, 1992). Respiration is especially important

because it is affected by speech, which is a part of many mental tasks. Breathing becomes more

erratic during speech, which shifts the spectral power due to RSA from the high frequency band into

lower frequencies. As a result, the effect of effort on middle frequency power may be concealed or

otherwise affected. Porges & Byrne (1992) argue that the effect on speech patterns is negligible where

the task involves short, command-like verbalizations (< 10 s), which would suggest that most

psychological tasks would not require any special consideration in this regard. However, a recent

study by Beda et al. (2007) reports that while middle band power decreases between silent serial



28

subtraction and the resting state (as predicted), the effect is reversed when the task is done aloud.

Due to the effects of speech as well as spontaneous changes in breathing patterns, it is recommended

that respiratory rate measurements always accompany those of HRV, in order to assess the

frequency distribution of RSA (Mulder, 1992). Mulder also proposes an improved measure of

baroreflex gain that is resistant to the effects of respiration: the “modulus,” which is the blood

pressure variability divided by the HRV. Veltman & Gaillard (1998) confirm the effectiveness of this

measure in their study of a flight simulator task that involves sub-audible vocalizations.

2.3.3.4 Electrodermal Activity (EDA)

In the workload literature, EDA generally refers to the measurement of the skin’s conductance

through the application of a small current, although there is also a much less common technique that

does not involve an external current source. Skin conductance is thought to indicate sympathetic

activity due to its influence on eccrine sweat gland secretions, although the possibility of

parasympathetic involvement has been raised (Unema, 1995). In a more general sense, EDA is linked

to the concept of arousal, supported by studies of stimulant/depressant injection, EEG desynchrony,

and habituation (Prokasy & Raskin, 1973). It follows that the theoretical justification for EDA in

workload measurement is that a more aroused state is associated with greater engagement in the task.

EDA measures are classified as phasic (“response”) or tonic (“level”), and phasic measures are

furthermore divided between specific and non-specific (“spontaneous”) responses. Phasic portions

of the EDA signal are temporary increases in conductance from the baseline (tonic) level of

conductance, which is an average taken across the task period. Specific responses are distinguished

from non-specific ones because they are identified as being caused by the presentation of some

experimental stimuli.



29

Tonic EDA is limited as a measurement tool because it is highly susceptible to inter-subject

physiological differences and electrode interface conditions. However, there have been relevant

investigations of this measure. Malmo (1965) concluded that tonic EDA was not significantly

correlated with participants’ motivation level during a auditory tracking task, although finger sweat

blot readings were correlated. In Malmo’s experiment, participants were instructed to exert more

effort during “important” trials versus “practice” trials. On the other hand, Bergeur et al. (2001)

found that tonic EDA level and subjective effort covaried between two different surgical techniques.

The presentation of stimuli is generally too intrusive for most practical workload applications, so

specific EDA is also of limited utility. However, in laboratory studies of dual task conditions, larger

specific responses have coincided with secondary task performance decrements (Klinger, 1991). The

magnitude of participants’ responses has also been linked to their performance in correctly

perceiving (in a signal detection task) or memorizing (in a learning task) the stimuli, according to a

review by Andreassi (2000).

Non-specific responses are arguably the most practical EDA-related workload measure, and

empirical data suggests at least a weak link between workload and non-specific response rate and

amplitude (Klinger, 1991). This link is best supported by evidence of a correlation between response

rate and reaction time during vigilance tasks (Surwillo & Quilter, 1965; Andreassi, 2000). In

particular, Andreassi describes a between-subjects study in which reaction times were significantly

different between those participants that exhibited high non-specific response rate (“labiles”) and

those that had low rates (“stabiles”). After failing to find a link between motivational incentives and

response rate, Fowles (1988) argued that non-specific EDA accompanies only negative feedback,

which results in behavioural inhibition. Given the pervasive role of inhibition in any mental task and



30

arguably, in the function of working memory itself (Hasher, 2007), it is difficult to directly test this

position. More recent within-subjects studies report significant effects of various workload

manipulations on non-specific responses. Gendolla & Richter (2005) reported that non-specific

response rate was significantly higher when participants were told that a visual detection task was a

“Concentration and Achievement Test for Students” versus a “filler” activity. Subject motivation and

short-term memory load (i.e., target list length) were also correlated with the standard deviation of

the raw EDA signal in Unema’s study (1995), which employed a numerical Sternberg Task

(Sternberg, 1969). In this study, the effect was especially apparent in the target list memorization

stage of the task.

Non-specific responses are commonly differentiated from signal noise by their peak-to-peak

amplitude. Amplitude thresholds can vary between 0.002 and 0.05 microSiemens (Doctor et al. 1964;

Vossel and Zimmer, 1990; Kettunen & Ravaja, 2000; Storm et al., 2000), as they are dependent on the

precise placement of the electrodes as well as their preparation and type. It appears that the

threshold is best determined through visual inspection of the signal, this being the gold standard

method employed by Storm et al. (2000) in their evaluation of an automated response detection

algorithm. Aperiodicity is another condition that has been used to distinguish non-specific

responses (Doctor et al., 1964). This condition was enforced in Storm’s algorithm through a

minimum (one second) wave width, which was defined as the time between a response’s valley and

subsequent peak. This threshold effectively eliminated “responses” due to a sinusoidal signal

component, and would also be effective in disregarding movement artefact noise.

According to de Waard (1996), the most serious methodological issue with EDA measurement is its

“global sensitivity.” Not only is eccrine gland activity influenced by energetical responses to



31

emotional and workload-related factors, it is also subject to temperature, humidity, age, sex, time of

day, season, and respiration irregularities.

2.3.3.5 Respiration

de Waard (1996) and Wilson & Eggemeier (1991) cite laboratory studies in which respiratory rate is

negatively correlated to both task difficulty and compensatory effort level. Much of this research is in

applied studies of aviation workload, although there are also more recent examples of unsuccessful

attempts in this area (Noel et al., 2005; Papadelis et al., 2007). Wientjes (1992) has argued that such

equivocal results necessitate the measurement of both respiratory rate and tidal volume, as their

work reveals a more complex pulmonary response involving both variables. While the introduction

of their experimental task led to the expected increase in respiratory rate and decrease in tidal

volume, subsequent manipulations of motivation through performance feedback led to an increase

in tidal volume with no change in respiratory rate. Wientjes (1992) presents a model in which

pulmonary control is affected by the metabolic demands of cognitive work; that is, as more effort is

expended, more oxygen is required. Although there is some evidence for the effect of cognitive work

on oxygen consumption (Backs & Seljos, 1994), the predominant pattern of fast, shallow breathing

indicates a strong influence from arousal and emotive stress responses (Roscoe, 1992).

In addition to being potentially difficult to interpret, respiratory measures are also subject to

significant confounding effects from physical activity and speech. As such, Cain (2007) advises that

respiration cannot be used alone to indicate workload. de Waard (1996) also expresses concerns with

the methodological difficulties in measuring tidal volume, which either involves a relatively invasive

flow meter or an indirect method (such as plethysmographic bands on the chest and abdomen) that

is less accurate and may require frequent calibration.



32

2.3.3.6 Eye Blinks

Blinks are classified as endogenous when they are not caused by any apparent external stimulus and

as reflexive when they occur in response to sudden, task irrelevant stimuli (Stern et al., 1984).

Examples of experimental stimuli are flashes of light and tapping on the forehead. The occurrence,

duration, and (reflexive blink) stimulus latency of a blink can be measured using photographic,

video scanning, infrared corneal reflection, electroculography, and electromyography techniques.

Neumann & Lipp (2002) conducted experiments showing that the extent of subjects’ engagement in

a task may be indicated in the magnitude and latency of reflexive blinks. Although Kramer (1991)

recommends blink latency in his review of workload measurement techniques, the majority of the

literature concentrates on endogenous blinking, as its measurement is suited to a wider variety of

applications.

According to a review of the cognition literature by Stern et al. (1984), endogenous blink rate is

responsive not only to visual task workload, being inhibited until task-relevant information has been

processed, but also to non-visual workload, with the “...magnitude of blink inhibition being

proportional to attentional demands” (Stern et al., 1984, p. 26) The latter relationship is supported

by the findings of Holland & Tarlow (1972), in which blink rate is negatively correlated to the

number of digits in a memory span task and the difficulty of a paced arithmetic task. Further

supporting evidence was reported by Bagley & Manelis (1979), who used an arithmetic task of

variable difficulty. Finally, Ohira (1996) showed that the relationship extends to lexical workload by

using a word-naming task with variable target word difficulty. Stern et al. (1984) caution that

vocalization during the task can actually cause the reverse effect, as with the increase in blink rate

that occurs when participants are asked to carry out arithmetic problems aloud. A study by Holland

& Tarlow (1975) seems to indicate that vocalization may not be the only confounding factor in the



33

relationship between workload and blink rate. Their research suggests that endogenous blinks are

related to shifts in thought, termed “cognitive change.” This conclusion has been based on the

observations that blinks tend to punctuate individual solutions during a serial arithmetic task and

sentences during verbal conversation, while being subject to inhibition during mental imagery.

Blink rate has been used as an applied workload measurement tool with mixed success. At the time

of Kramer’s (1991) and Wilson & Eggemeier’s writing (1991), there were a number of conflicting

blink rate studies involving both visual and non-visual task demand manipulations. Whereas these

results led Kramer to suggest that blink rate is not yet ready for application, Wilson & Eggemeier

explained that they followed a pattern in which they were contaminated by qualitative changes in

visual information demands. An example was a study in which blink rate was shown to increase

between ground and flight segments, though workload had clearly increased. It was explained that

the “richness and variety” of task-relevant visual information in the flight segment simply required a

greater number of fixations from the pilots. They claimed that these problems could be overcome

through better methodological controls.

The prospect of restrictive controls led O’Donnell & Eggemeier (1986) to recommend blink duration

over blink rate. Kramer (1991) shared this preference, citing studies in which average blink duration

was negatively correlated with workload in studies involving simulated versus actual flight, co-pilots

taking over command, and single versus multitasking. However, with this method as with all eye

blink parameters, he warned that there is a significant fatigue effect wherein increasing time on task

leads to a greater number of blinks.



34

More recent workload studies have continued to concentrate on blink frequency rather than

duration, possibly because it requires less temporal resolution and robustness of measurement. A

significant negative correlation between blink rate and subjective workload was observed in a study

of actual flight manoeuvres (Noel et al., 2005), although the authors note that this finding was

inconsistent between different pilots and even test days. In studies of flight simulator tasks, Veltman

& Guillard (1998) as well as Papadelis (2007) reported a more consistent relationship between blink

interval (time between blinks) and simulated flight difficulty. However, in the case of Veltman &

Guillard, who also introduced a concurrent working memory (WM) task, increasing working

memory load actually led to an increase in the number blinks. They concluded that facial muscle

activity involved with a sub-vocal rehearsal strategy for the WM task may have encouraged blinking.

Bergeur et al. (2001) found that eye blink rate followed subjective ratings in surgeons between open

versus arthroscopic techniques. Myung & Ryu (2005) observed that manipulations of target speed

during a tracking task led to a significant effect on blink interval, but only an insignificant effect was

observed between single and double digit multiplication.

In the few, more recent studies of blink duration, the results have paralleled those of concurrent

blink rate measures. In the aforementioned flight simulation experiments of Veltman & Guillard

(1998) blink duration decreased from easy to hard flight sections, but did not appear to be affected

by working memory load. Similarly, Papadelis et al. (2007) reported significant negative correlation

between duration and simulated flight task difficulty.

2.3.3.7 Pupil Size

Pupil size is affected by two muscles groups: the dilator group is innervated by the sympathetic

nervous system and the constrictor group by the parasympathetic system. Empirical observations of

an apparent association between pupil diameter and arousal were pivotal to Kahneman’s theories on



35

attention and effort (1973). According to Kramer’s (1991) review, a positive correlation between

pupil diameter and workload has subsequently been reported for a variety of tasks with

manipulations of cognitive, perceptual, and response-related demands. He echoes Kahneman’s

assertion that these results are likely caused by cortical influence on the reticular core.

In more recent studies, pupil diameter has been generally successful in predicting workload.

Washburn & Putney (2001) reported a correlation between pupil diameter and visual recognition

task difficulty (stimulus presentation time) as well as task performance (response accuracy and

latency) on individual trials. Recarte & Nunes (2003) found that pupil diameter correlated well with

subjective effort ratings in an experiment involving only driving or in conjunction with an auditory

perception, verbal production, or mental arithmetic task. However, they also reported an unexpected

divergence of pupil and subjective data with the introduction of a long-term memory recall task as

well as an apparent insensitivity of the measure during some dual-task conditions. The latter

exception could be explained by the tendency of the pupillary response to plateau or even reverse at

very high workload levels (Cain, 2007). Although most workload research aggregates pupil diameter

measurements over the course of a task or condition, Klingner et al. (2008) adopted a technique that

is more common in cognitive research. They reported the change in diameter upon presentation of a

task-related stimulus, which was a multiplicand in their case. The magnitude of pupil diameter

change was observed to be correlated with the difficulty of the multiplication problem.

It appears that the pupil diameter is primarily restricted as a workload measure due to

methodological issues. The effects of illumination, reflexive responses to vergence (between near and

long distance fixation), and emotion are known to be larger than those typically caused by cognitive

factors (O’Donnell & Eggemeier, 1986; Kramer, 1991). Furthermore, measurements of the pupil on



36

the order of 0.1 mm are required, although advances in remote, infrared corneal reflection systems

now allow such precise measurements on participants without the use of a chin rest or head gear

(Klingner et al., 2008).

2.3.4 Saccadic Eye Movements

2.3.4.1 Definition of Saccades

Carpenter (1988) divided eye movements into two functional categories: catching (fast) and holding

(slow) movements. Catching movements include saccades and the quick phase of nystagmus.

Holding movements include vergence as well as vestibular and smooth pursuit movements, which

both include the slow phase of nystagmus. Carpenter also describes three types of micro-movements

(i.e., median amplitude less than 5 minutes of arc), which are involuntary and have not been shown

to serve any functional purpose: tremor, microsaccades, and drift.

Saccades are commonly differentiated from other movements by their higher peak velocities. Yarbus

(1967), who has been cited as an important source for empirical saccade data (Duchowski, 2007),

reports that the duration, peak velocity, and acceleration of a saccade are a function of its amplitude:

for 5° to 20° saccades, these values typically range from 40 to 70 ms, 200 to 450 °/s, and 15,000 to

20,000 °/s2, respectively. The relationship between these various characteristics of saccades can be

described by a series of mathematical functions called the “main sequence” (Bahill et al., 1975;

Carpenter, 1991). Yarbus (1967) does not discuss saccades beyond 20° because they naturally occur

as composites of smaller saccades and head movements, but larger saccades have been observed in

laboratory experiments. However, Bahill et al. (1975) detailed the (typical) results from a single

subject, whose 50° saccades reached 900 °/s and lasted 100 ms. Importantly, very small saccades may

have peak velocities that are less than those observed during “slow” holding movements. Bahill et al.

(1975) reported that the peak velocity of a 0.5° saccade was approximately 45 °/s in their test subject,



37

while Hood (1975) demonstrated that the slow phase of optokinetic nystagmus can surpass 50 °/s.

However, such high speed holding movements do not occur in the absence of appropriately fast

moving stimuli, so the velocity of saccadic movements can generally be considered greater than

other types of movements.

Saccades can generally be considered ballistic movements (Carpenter, 1988), meaning that a second

saccadic movement cannot be instigated until the first has already been completed as it was

originally programmed. Thus, saccades follow stereotyped trajectories, so that their occurrence,

onset, and offset can be defined by velocity thresholds. A survey of the literature reveals that for

studies including saccades smaller that 2°, a detection (peak velocity) threshold in the

neighbourhood of 30 °/s is most common. In the summary of this survey (Table 1), notice that

acceleration and jerk thresholds may also be used.

Table 1. Saccade Detection Thresholds in Previous Literature

CitationVelocity Threshold

(°/s) Other Criteria

Viire et al., 1987 75 (onset and offset)

Oohira et al., 1991 20/30 (onset/offset)

Ignace et al., 1997100 (peak)

25 (onset and offset)2.1° maximum amplitude

(to omit corrective saccades)

Fischer et al., 199730 (peak)

20 (onset and offset)

Hooge & Erkelens, 1998100 (peak)

25 (onset and offset)2.1° maximum amplitude

Wyatt, 1998 2 x 105°/s3 minimum jerk (onset)

Walker et al., 200030 (peak)

20 (onset and offset)

Greene & Rayner, 2001 35 (peak) 9500°/s2

minimum acceleration (peak)

Harbluk & Noy, 2002 30 (peak) zero velocity crossing (onset and offset)



38

2.3.4.2 Definition of Intersaccadic Interval (ISI)

The ISI is defined as the time lapse between the end of one saccade and the start of the next. A very

similar measure is the fixation interval or duration, which is the length of time that an external target

is foveated. In the absence of vestibular and smooth pursuit movements, these two intervals are

identical. Although the term “fixation” implies that our visual attention is aligned with our gaze, it

has been shown that this need not be the case (Posner, 1980; Sigman & Coles 1980). Nonetheless,

fixation duration is a more common term in studies of visual information processing, which must

often assume some link between gaze and attention. In a practical sense, a fixation is generally

defined using a spatial threshold, but the ISI is defined by the occurrence of consecutive saccades,

which are detected using dynamic property thresholds.

It has been mentioned previously that saccades are generally considered ballistic movements, which

consequently implies some minimum ISI that is a refraction period. Although it has been shown that

this assumption can not be made under all circumstances, there is most often the practical limitation

of saccade programming time. For example, corrective saccades, which are a response to

over/undershooting a target, occur after the primary saccade has been completed, with a latency of

about 130 ms (Becker, 1969). Even in response time tasks where the temporal and spatial uncertainty

of stimulus presentation is removed, saccadic latency is at least 150-175 ms (Rayner, 1998). Leigh &

Zee (1999) claim that approximately 70 ms is required for visual information to be absorbed and

begin to affect the eye movement centers in the brainstem. The existence of express saccades, which

are low latency saccades (> 80 ms) that occur when the primary fixation point disappears shortly

before presentation of a target stimulus (Fischer et al., 1997), suggest that some additional time may

be required to disengage the current point of fixation. Of course, the theory of an obligatory

refractory period caused by saccade programming and fixation disengagement implies that these



39

operations can not occur in advance of or in parallel with saccadic movements. On the contrary,

Leigh & Zee (1999) argue that the apparent inability for eye movements to be modified during the

execution of a saccade is really only a consequence of their short duration. In other words, if there

were enough time to program a second saccade signal during the execution of the first, there would

appear to be no refraction period. They support this view with evidence of seamless changes in the

trajectory of “slow” saccades, which occur due to certain neurological disorders, as well as in healthy

participants when a saccade is executed toward a target that unexpectedly moves on a two

dimensional plane. Naturally, this movement must occur just after the first saccade movement has

been programmed, but early enough that the second saccade signal can influence the movement

before it has been completed.

Citing the infrequency of these conditions in everyday life and our apparent inability to process

visual information during very short intersaccadic intervals, some researchers exclude fixation

durations, using a threshold of 70 to 100 ms (Unema, 1995; Falkmer & Gregersen 1999). In their

latter argument, they are referring to saccadic suppression, which is a partial visual impairment

accompanying the execution of a saccade. This impairment is thought to be almost complete during

a period lasting from 20 ms before the start of a saccade to 50 ms after the completion of a saccade

(Stark et al., 1976). Due to saccadic suppression, it could be argued that a fixation shorter than 70 ms

cannot possibly contribute to visual information processing, and therefore should be disregarded.

Naturally, this argument does apply to studies that are not concerned with eye movements related to

visual information encoding. Their former argument, on the infrequency of very short fixations, is

contentious. Whereas Cohen (1977) called the number of fixations less than 100 ms long negligible,

Velichovsky et al. (2000) report that 7% of their participants’ fixation durations were in the

neighbourhood of 60 ms. Furthermore, they found that the frequency of these short durations was



40

especially sensitive to changes in driving simulator task conditions. Clearly, the use of a minimum

fixation duration or intersaccadic interval threshold is not appropriate without some justification

regarding information uptake.

2.3.4.3 Eye Movements and Mental Workload

Rayner’s (1998) review presents plenty of evidence for a correlation between fixation duration length

and visual task (e.g., reading) difficulty in the laboratory. More recently, a similar effect has been

sought in research on applied workload measurement, leading Zhang et al. (2004) to comment that

fixation duration is one of the most commonly studied estimators of driver workload. However, the

results of applied workload studies have been inconsistent and difficult to interpret because they

commonly involve complex visual and non-visual task components. In order to understand these

results better, research concerning the effect of non-visual tasks on eye movements will be reviewed.

Fixation durations are correlated with the difficulty of a variety of tasks involving visual information.

In matching comparison characters with target characters, the legibility (Unema, 1995) and number

(Gould, 1973; Unema, 1995) of the comparison characters was found to positively correlate with

average fixation duration. In a visual search of dot clusters where participants were asked to find a

specified cluster size, fixation durations of individual clusters positively correlated with the number

of dots they contained (Findlay & Kapoula, 1992). Furthermore, Moffitt’s (1980) review cited many

other studies in which fixation duration is linked to visual search difficulty. In studies of reading

research, fixations of text are longer when they are more illegible as well as more difficult, whether

lexically or, in the case of math word problems, computationally (Rayner, 1998).

These results may seem trivial because it is logical to assume that we fixate a target for as long as is

necessary to process it. However, one caveat is that our ability to extend fixations “online” seems



41

limited. Using the example of reading, while it is true that more difficult text is fixated for longer

periods, it is also refixated more often (Rayner, 1998). The occurrence of refixations indicates that

fixation durations are, at least partially, resistant to online modification in response to processing

demands. Refixations are also present during visual search tasks, leading Hooge et al. (1998) to

suggest a model wherein duration length is pre-programmed based on information from the visual

periphery. It has even been argued that all saccade production is a rhythmic phenomenon that is

irrelevant of “any internal or external stimuli” (Filin, 2002, p. 181). The practical implication of this

issue is that the number of refixations may be just as important as fixation duration length statistics

in predicting visual workload. These two measures can be taken separately, as was recommended by

Blanchard (1985) or in a combined measure such as dwell time, which is the cumulative time that

any given target is fixated. Through their studies on mental rotation of graphical figures, Carpenter

& Just (1978) concluded that this measure is more strongly correlated to the difficulty of the rotation

task than average fixation duration alone. In the mental workload literature, there have been studies

incorporating dwell time or a similar measure (Tole et al., 1982; Harbluk & Noy, 2002; Matessa &

Remington, 2005).

The relatively consistent correlation between fixation duration and the difficulty of (laboratory)

visual tasks has been successfully applied to workload measurement, as most common applications,

such as aviation and driving, involve high visual loads. For example, a very early study concerning

fixations of flight gauges found that durations were longer where the gauge was more difficult to

read or interpret (Fitts et al., 1950). However, more complex and varied visual stimuli have led to an

array of results and interpretations. One major consideration is whether cognitive/perceptual load is

manipulated at the level of individual stimuli, as with the visual tasks previously discussed, or rather

through increasing the number of visual stimuli to be processed. Where drivers are subjected to the



42

latter manipulation such as in busy versus quiet intersections (Rahimi et al., 1990) and rural versus

urban areas (Chapman & Underwood, 1998), average fixation durations have decreased rather than

increased. Conversely, a net zero effect may result in a task in which there is a significant cognitive

load involved with integrating information from various stimuli. Brookings et al. (1996) reported no

significant response for saccade rate to both traffic density and complexity of air traffic control

simulations. Similarly, Van Orden et al. (2001) found no significant effect for target number on

fixation frequency in an air warfare simulation, but there was a notable increase in the number of

long fixations (> 500 ms). Unfortunately, this measure was not reported by Brookings et al. Moray’s

(1986) inconsistent findings of fixation durations under varying flight phase difficulty suggest that

the effects of complex visual/cognitive workload manipulations may also be highly dependent on

individual study participants’ coping strategies.

Dual-task studies are a method of conducting a more closely controlled investigation of eye

movement responses to cognitive load during applied tasks. Tole et al. (1982) presented pilots with

recordings of number pairs and asked them to indicate whether they were in ascending or

descending order. The addition of this secondary task, which required auditory-verbal working

memory and auditory attention, led to a rightward shift in the histograms of flight instrument

fixation durations, indicating that increasing workload leads to longer fixations. Callan (1998)

similarly reported that a secondary computational task led to increases in average fixation duration

and the number of long fixations (> 500 ms) in pilots. Indicating a related pattern, saccade frequency

was found to decrease in drivers asked to carry out a secondary arithmetic task, decreasing further

for double digit versus single digit addition (Harbluk & Noy, 2002). However, it should be noted that

no significant effect was found in drivers asked to do paced addition in a study by Tsai et al. (2007).



43

Secondary tasks other than auditory-verbal working memory / auditory attention tasks involving

mental arithmetic seem to affect eye movements differently. Recarte & Nunes (2000) studied the

effects of driving with a concurrent verbal fluency task or mental imagery task. They found that

while the imagery task (requiring visuospatial working memory) caused an increase in average

fixation durations, the verbal fluency task did not. They also observed that the former effect was due

to the occurrence of a few “very long” fixations, while the majority of fixations did not seem to be

affected by the presence of the task.

If the secondary task simply caused an interruption of the habitual scanning involved with the

primary task, the introduction of any secondary task would consistently lead to longer fixation

durations. However, since the resultant effects appear to be dependent on secondary task type, it

begs the question of whether and how different non-visual tasks affect eye movements on their own.

2.3.4.4 Eye-Movements and Non-Visual Tasks

In general, it can be said that a non-visual task tends to increase the rate of eye movements

compared to a baseline, resting state. This has been shown to be true for self-paced multiplication,

whether eyes closed (Lorens & Darrow, 1962) or even hypnotized (Amadeo & Shagrass, 1963);

imagining a scene (Amadeo & Shagrass, 1963); creating mental anagrams (Andreassi, 1973); naming

words beginning with a letter (Ruth & Giambra, 1974); and in auditory vigilance (Amadeo &

Shagrass, 1963; Amadeo & Gomez, 1966) and tone detection (Antrobus, 1973) tasks. Experimental

control of the baseline state is an obvious issue for these studies. Some researchers instructed

participants to “relax as though going to sleep” (Andreassi, 1973), “keep your mind blank” (Lorens &

Darrow, 1962), or simply “relax” (Amadeo & Shagrass, 1963), while others used the pre-task period

as a baseline, with no special instructions (Antrobus, 1973; Ruth & Giambra 1974). It is also

important to note that these authors did not report on saccadic eye movements as they are currently



44

defined, but on more general, sometimes undefined (Andreassi, 1973), “eye movements.” Most

defined these movements in terms of electrooculogram thresholds, whether in microvolts (Lorens

and Darrow, 1962) or mm of trace deflection (Amadeo & Shagrass, 1963; Ruth & Giambra 1964;

Amadeo & Gomez 1966). Antrobus used “ocular quiescence” intervals, in which no eye movement

was greater than 3°. Although these various definitions may seem arbitrary, they likely speak to the

technical limitations of time, such that saccade detection came down to visual inspection of output

traces.

In subsequent experiments, researchers have avoided using a baseline, resting state by comparing eye

movements between different task conditions. Bergstrom & Hiscock (1988) found that more

movements (detected manually from video footage) were made by participants that answered

verbally administered questions a) of a lexical rather than a visuospatial nature, b) that involved a

higher degree of mental imagery, and c) that were less constrained. Their use of “constraint” is

referring to the extent of the environmental support required to answer the question, in other words,

a task with recall demands versus a task with only perceptual and recognition demands. An example

of their high imagery, unconstrained questions is “Name three printed capital letters that contain

four straight lines.” A moderate imagery, constrained example is “How many vowels are present in

the word: ‘directly’?” Using a similar methodology, Weiner & Ehrlichman (1976) and Ehrlichman &

Barrett (1983) each found that visuospatial questions elicited fewer eye movements than lexical or

otherwise non-visuospatial questions, as determined through visual inspection of video recordings

and electrooculograms, respectively.

It is difficult to interpret these results in terms of mental workload; although it could be argued that

a recall task is generally more demanding than a perception and recognition task, they involve very



45

different processes, which may conflate the effect of workload. However, this line of research has also

given rise to more applicable generalizations about eye movements and cognition. Some researchers

discussed the possibility of a positive correlation between eye movement rates and general attention

level (Lorens & Darrow, 1962; Amadeo & Shagrass, 1963), due to a concomitant increase in arousal.

More recent research tends to favour alternate theories: “interference avoidance” (Antrobus, 1973;

Weiner & Ehrlichman, 1976), in which eye movements are suspended to avoid interference between

visual information processing and the non-visual task.

Although the justification for each of these theories, which will be discussed later on, is compelling,

there is a scarcity of empirical data from studies in which effort is manipulated while the cognitive

process/strategy itself is held as constant as possible. This type of experimental control is important

to workload research, which must differentiate between the effects of effort and those of mental

operations themselves. Thus, inconsistent cognitive activity obviously complicates interpretation of

task versus baseline rest state studies from the perspective of mental workload, but also that of

experiments where disparate tasks are compared. An example of the latter is that of Klinger et al.

(1973), who observed more frequent eye movements during tasks designated as “high

concentration” (Wechsler Adult Intelligence Scale [WAIS; Wechsler, 1955] arithmetic problems,

word generation based on starting letter, and creation of mental anagrams) versus “low

concentration” tasks (paced and self-paced counting by 2’s). However, there have been studies in

which task difficulty was more carefully manipulated. Singer et al. (1971) reported that the frequency

of quick optokinetic nystagmus (OKN) movements (while viewing a rotating drum) was positively

correlated to arithmetic task difficulty. Participants were asked to carry out three types of

transformations on a given number, N : N +1, +1, +1... (low difficulty); N +1, +2, +3... (medium

difficulty); N +1, -1, +2, -1... (high difficulty). Although saccadic and quick OKN movements were



46

referred to as separate phenomena, they are believed to share the same neurological pathways (Fuchs

et al., 1985), to the point that optokinetic stimulation has been proposed as a clinical means of

saccade production (Garbutt et al., 2001). Nevertheless, their disparate behavioural functions should

be noted. In another study of task difficulty, Antrobus (1973) described an experiment in which eye

movements became less frequent as working memory load was increased in an auditory (tone)

perception task. Working memory load was manipulated in terms of the length of tone patterns that

participants were asked to recognize amongst random tones. Rather than manipulate task difficulty,

Ruth & Giambra (1974) attempted to manipulate participants’ concentration levels with a constant

task condition, which could be considered novel even in current workload literature. After randomly

presented letter prompts, participants were asked to name words that started with that letter. The

high and low concentration conditions were affected through the task instructions, which either

“emphasized alertness and maximum output” or did not. The study found that eye movement rates

were higher during the high concentration condition.

A positive correlation between eye movement rate and attention may be due to an associated

increase in arousal. The basis for a link between eye movements and arousal lies in the neurology of

saccadic programming. As described by Leigh & Zee (1999), the programming process involves

many different areas in the brain and possibly different pathways, depending on the type of saccade.

For example, whereas the parietal eye field has been implicated in triggering reflexive saccades,

which occur in response to unexpected stimuli, voluntary saccades are thought to be more heavily

influenced by activity in the frontal eye fields Leigh & Kennard (2004). However, all of these

pathways end in the coordination of burst neurons, which discharge at a very high rate for a short

period of time, omnipause neurons, which inhibit burst neuron activity during fixations, and neural

integrator cells, whose tonic output is equivalent to the integral of the burst neuron output. The final



47

saccadic signal that travels through the ocular motorneurons is a combination of the burst signal,

which moves the eye, and the neural integrator signal, which subsequently holds the eye in place

against elastic centering forces from supporting tissues. The location of these burst and omnipause

neurons within the diffuse reticular formation suggests an influence of arousal on saccadic

production. Unema (1995) argued that the relationship between saccade rate and arousal is also

supported behaviourally, as heightened arousal is marked by a greater sensitivity to sensory stimuli,

leading to more frequent saccade production. Early theories of attention and eye movements often

relied on a link between arousal/activation and saccadic rate (Antrobus et al., 1964; Singer et al.,

1971; Andreassi, 1973), but this association was poorly substantiated. Although a positive correlation

between eye movement rate and alertness arguably exists in the extreme case of wakefulness versus

hypnosis (Amadeo & Shagrass, 1963), no significant correlation was found for more subtle (but also

more ecologically valid) manipulations in arousal: from resting to task conditions (Lorrens &

Darrow, 1962; Singer & Antrobus, 1965) and in response to emotionally affective images (Hinton,

1982). However, in more recent years, manipulations of arousal through time on task have been

more successful, correlating saccade rate and mean fixation duration with fatigue (via performance

decrements and/or subjective ratings) during extended air traffic control simulation tasks (Stern et

al., 1994; McGregor & Morris, 1996; Stern et al., 1996), tracking tasks (Van Orden et al., 2000), and

flight simulations (Morris & Miller, 1996). There are also divergent results from studies of similar

phenomena; for example, Mousseau (2004) found no correlation between fixation duration and

fatigue in hockey players. In their 2000 review, Sirevaag & Stern conclude that the effects are best

observed in saccades whose performance is task-irrelevant. An example would be a return saccade,

which follows the presentation and retraction of an experimental stimulus.



48

It could also be said that saccades are inhibited during cognitive processes; this link would suggest a

negative correlation between eye movement frequency and workload. A common indication of this

phenomenon is that fixations often last longer than would be required to simply encode visual

information without further processing it. For example, it is believed that readers’ perception of text

is finished in the first 50-70 ms or so of a fixation, but average fixation durations are on the order of

300 ms (Rayner, 1998). Although there is an obvious reason for eye movements to keep pace with

cognitive processing in the case of reading and other visual tasks, why should this theory apply to

non-visual tasks? Certainly, the habitual inhibition of a visual scanning reflex would be an

evolutionary disadvantage in the face of natural dangers. On the other hand, it might enhance our

powers of mental concentration by discouraging interference between task-relevant cognitive

activity and shifts in visual attention, which necessarily accompany saccades according to most

research on the matter (Sigman & Coles, 1980; Shepherd et al., 1986; Findlay & Gilchrist 1998; Van

der Stigchel & Theeuwes, 2005; but see Stelmach & Herdman, 1997 for alternate view). In the non-

visual task literature, two explanations have been given for the apparent dependence of the

interference effect on task type. Weiner & Ehrlichman (1976) suggested that the strength of the

interference depends on the similarity of the task to the process of visual encoding/perception. Thus,

it was justified that the effect of a verbal task differs from that of a spatial task. Antrobus (1973)

theorized that because eye movements appear to be suspended during execution phases of mental

tasks, the rate of eye movements should be correlated to that of “cognitive change,” which would be

lower during an auditory perception task than during an arithmetic task. However, this explanation

is incompatible with the results of Singer et al. (1971), wherein more difficult (and therefore less

frequent) arithmetic operations led to more eye movements. In response, Antrobus (1973) suggested

the possibility of a variable strength effect, similar to that proposed by Weiner & Ehrlichman (1976),

as well as the possibility of a competing arousal effect. The combined effect of arousal and cognitive



49

interference on eye movement rate is conceptualized by Unema’s (1995) model, which divides a

fixation into two (possibly overlapping) stages: “cognitive elaboration,” in which eye movements are

inhibited due to some cognitive process(es) and “saccadic latency,” in which an eye movement is

allowed, but a target has yet to be chosen. The effect of increasing arousal is to lower the target

criterion threshold, decreasing the length of the saccadic latency stage. Unema’s two-stage model is

an extension of that presented by Fischer & Breitmeyer’s (1987), who concluded that saccades are

inhibited (rather than simply absent) during the engagement of visual attention.

In summary, it has been proposed that non-visual task eye movements may be influenced by mental

workload, but also by factors related to specific mental processes such as memory search constraint

or involvement of mental imagery. In order to evaluate the effect of workload on eye movements, it

is necessary to parse out the effect of process-specific factors by manipulating effort at different

difficulty levels of the same mental process. It is also important to bear in mind that the effect of

workload may be complex, due to the possibility of competing phenomena linked to arousal and

“cognitive interference.”



50

Chapter 3. Objectives and Hypotheses

One purpose of the previous discussion was to establish that rehabilitation treatments and

neuropsychological assessments stand to benefit from a clinical measure of mental effort.

Furthermore, the practicality of eye movement-based measures and their prevalence in the applied

research literature seem to recommend them to this purpose. However, evidence from some applied

studies as well as early investigations of eye movements during non-visual tasks suggests that the

relationship between intersaccadic interval lengths and workload may not be as robust for primarily

non-visual tasks as for visual ones.

It follows that the objective of the current study is to investigate the intersaccadic interval length

response to mental workload during non-visual tasks. More specifically, the tasks and experimental

conditions were chosen to address the practical question of whether eye movements could be used to

measure workload during non-visual neuropsychological assessment and cognitive rehabilitation

tasks. This focus serves two purposes: 1) to contribute to a gap in the literature regarding the

potential for a general relationship between mental processes and eye movements during non-visual

tasks, and 2) to model the experiment as closely as possible after the intended applications of this

research. The latter is especially important in workload research because a universal measure of

workload is generally considered unfeasible.

This study will measure the occurrence of saccades in participants while they carry out the following

three experimental tasks at two different levels of difficulty: serial subtraction, verbal fluency, and

words in noise recognition. As there is no gold standard for workload measurement, a variety of

other indices will also be recorded in order to confirm that the manipulations of mental workload



51

were successful: HR, spontaneous skin conductance response rate (a type of electrodermal activity),

self-reported (subjective) workload ratings, and task performance level.

Objective (1):

To demonstrate that average saccade rate changes in response to increased difficulty of the

experimental tasks. This will be achieved by administering each of the three neuropsychological

tests to participants at two levels of difficulty, while concurrently recording eye movements.

Note: Although the intersaccadic interval was initially the primary measure, preliminary data

collection indicated the average saccade rate to be a superior measure of the same phenomena.

Full details of this decision can be found in the Full-Scale Study section.

Hypothesis:

Based on the results of the pilot study, which are discussed in the Pilot Study section, the response

of average saccade rate to difficulty level will differ between task types. In the auditory task, it is

expected that the average saccade rate will be lower for the high difficulty condition than in the

nominal difficulty condition. In the math and fluency tasks, the presence of a significant change is

hypothesized, but its direction is not.

Objective (2):

To demonstrate that a decrease in participants’’’’ motivation level will be accompanied by a change

in average saccade rate. This will be achieved by measuring average saccade rate in a standard

condition where participants are asked to do “as best as they can” versus a second experimental

condition in which participants are asked to relax and disregard their performance.



52

Note: The initial aim of the study was to demonstrate an eye movement response to an increase in

motivation from a standard, baseline level. However, interviews conducted in pilot research

indicated that many participants were maximally motivated at the standard level. Therefore, in

order to have two levels of motivation, it was necessary to compare standard conditions to those

which in which participants were asked to deliberately lower their motivation levels.

Hypothesis:

Based on the results of the pilot study, the response of average saccade rate to motivation level

will differ between task types. In the auditory task, it is expected that the average saccade rate will

be lower in the standard motivation condition than in the low motivation condition. In the math

and fluency tasks, the presence of a significant change is hypothesized, but its direction is not.

Objective (3):

To demonstrate that the average saccade rate response to changes in task difficulty and

motivation level converges with (i) electrophysiological measure findings (i.e., average

spontaneous skin conductance response frequency and heart rate), (ii) task performance, and (iii)

self-reports of “mental effort.”

Hypothesis:

For all three task types, average saccade rate findings will converge with electrophysiological

findings, task performance, and self-reports of subjective effort, for both task difficulty and

motivation manipulations. Specifically, it is expected that task performance will be negatively

impacted by task difficulty, but positively impacted by an increase in motivation level, while heart



53

rate and the rate of spontaneous skin conductance responses will be positively correlated to effort

for both experimental manipulations.

Objective (4):

Investigate whether another eye movement statistic, the occurrence of long ISI, responds more

consistently to experimental manipulations of workload than the average saccade rate measure.

Hypothesis:

Based on the results of the pilot study, Long ISI, which are defined as being longer than 1500 ms,

will be significantly more prevalent where either the difficulty or motivation level during the

auditory task is increased. For the math and fluency tasks, the presence of a significant effect is




54

Chapter 4. Pilot study

In the pilot study, eye movements were tracked in naive participants. Here, the aims were to 1)

identify significant technical and methodological complications that would influence signal quality,

2) descriptively characterize eye-movement data, and 3) refine experimental procedures. A

noteworthy procedural change made part-way through the pilot work was the method of motivating

participants, as discussed later on. These and other details of the pilot work have been documented

to illustrate and support the evolution of the experimental method for the formal study, which is

presented in the Full-Scale Study section.

4.1 Methods

4.1.1 Participants

The required sample size was estimated through a power analysis using data from a previous study

(Ruth & Giambra, 1974) of mean eye movement rates (i.e., roughly the inverse of mean interval

length), in which the wording of a verbal fluency test was manipulated in order to differentially

motivate two groups of participants. From these results, a within-participants standard deviation of

33 movements/minute and an effect size of 0.7 (half of that which they report) gives a power of 0.6

for n = 12 (alpha = 0.5). Furthermore, a previous dual task study (Harbluk & Noy, 2002) reports

almost double the effect size for the manipulation of task difficulty (single versus double digit

multiplication) during driving. Although a power of 0.6 is on the threshold of acceptability, it was

deemed adequate considering its conservative estimation. Therefore, 12 participants were planned

per experimental group (i.e., 24 in total).



55

Because this was pilot work, fewer participants were recruited because unanticipated methodological

issues warranted a premature reappraisal of the experimental method. Therefore, only 20

participants were recruited through posters on the University of Toronto campus.

Inclusion criteria:

o healthy, young adults between the ages of 18 and 55

o able to speak and read English fluently

o normal hearing and vision (with or without any corrective lens except bifocal glasses)

Exclusion criteria:

o known diagnosis of developmental disorder (e.g. attention deficit hyperactivity disorder;

dyslexia)

o history of brain injury or other neurological disorder

o history of psychotic disorder

4.1.2 Materials

Overview: Participants completed three types of neuropsychological tasks. During the tasks, eye

movements were recorded using a video eye tracker and three other physiological measures were

also recorded. In addition, task performance (number of correct responses and errors) as well as a

subjective workload rating were recorded for each task. Participants’ mental effort was manipulated

through the difficulty of the tasks as well as their motivation level.

4.1.2.1 Eye tracking

Eye movements were recorded using two monocular, video tracking “EyeLink” systems (SR Research,

Mississauga ON): a 500 Hz remote tracking system and a 1000 Hz “tower mount” system.



56

Commercial images and specifications of these eye trackers can be found in Appendix A. The remote

tracker did not require any head support, but relied on an adhesive forehead marker to

accommodate head movements within a 30x22x30cm envelope. The tower mount tracker required

the use of a forehead rest and involved a dichroic reflector in front of the participants view, which

reflected infrared light, but passed visible light. Infrared light was thereby reflected into the eye and

then back into the video camera. Both systems utilized the dark pupil, corneal reflection (CR)

method. This method calculates the gaze position by measuring the position of the CR with respect

to the center of the pupil, which is detected as an area of low infrared reflectance because the

illuminator is not coaxial with the optical path. Although the remote tracking system is

advantageous from a clinical and experimental point of view, as it can be made less conspicuous, the

tower mount system was chosen after some initial testing with naive participants. Despite

instructions to be still, gross head and upper body movements resulted in very high signal noise

levels. However, when a forehead rest was used, it was revealed that the range of eye movements was

too great for the remote system; thus, the tower mount system, with roughly double the range in

both the vertical and horizontal directions, was the final choice.

Even with the head supported, signal noise was also caused by minute movements during

verbalization, and more importantly, pupil occlusion by the eyelids and eyelashes. The resultant high

frequency noise confused saccade detection algorithms based on velocity thresholding, so an

algorithm was developed to distinguish saccades from noise. This was accomplished using the

assumption of the main sequence relationships (Bahill, 1975), which describe the functional

relationship between peak saccade velocity, duration, and amplitude. If the amplitude of a “spike” in

the signal was not within an acceptable range of that predicted by its peak velocity (via the main

sequence functions), then it was considered noise. The use of the main sequence in detecting



57

saccades is an uncommon technique, but has appeared in the literature at least once before (Giolma,

1984). After detection using the algorithm, the data were inspected visually, sometimes requiring

correction in very noisy sections or where saccade parameters were in gross disagreement with main

sequence relationships. However, less manual correction was necessary after processing with the

main sequence detection algorithm compared to the proprietary Eyelink algorithm.

4.1.2.2 Physiological Measures

Three other physiological measures were recorded, though time constraints precluded their analysis

in the pilot study, but they were planned for the full-scale study. Their purpose in the pilot study was

to anticipate any technical measurement issues.

Palmar electrodermal activity (EDA) was measured using reusable electrodes and a conductive gel

(GEL100, Biopac Systems, Chicago IL) that were applied to the index and middle fingers. The EDA

signal was conditioned and recorded on computer using a Biopac MP100 data acquisition system. A

GSR100C bioamplifier conditioned the signal using a gain of 5 microSiemens/output volts and a 10

Hz low pass filter, no high pass filter (not de-trended), and the signal was sampled at 125 Hz. The

tonic EDA level varied widely between participants and conditions, but was generally between 3 and

10 microSiemens.

A two-lead electrocardiogram (ECG) was also used to record heart period data. To this end,

disposable electrodes (EL503, BioPac Systems) were applied on the wrists or ipsilateral wrist and leg,

depending on which gave a stronger signal in each individual. As with EDA, this signal was also

recorded using the MP100 data aqcuisition system. The ECG amplifier (ECG100C) applied a gain of

2000 mV/output volts, a 60 Hz notch filter, and a 0.5 Hz high pass filter, and the signal was sampled



58

at 250 Hz. The ECG signal was also processed using a hardware “R-wave detector,” which enhanced

the R-wave and filtered out other peaks in the data for ease of heart period determination.

Respiratory rate was collected via inductive respiratory effort belts around the chest and abdomen.

These belts were integrated into a “Lifeshirt” vest that was worn on the outside of participant’s

clothing (Vivometrics, Ventura CA). Fluctuations in chest and abdomen diameter were logged in a

battery powered data acquisition/logging unit that accompanied the Lifeshirt, and the data was later

downloaded to the computer.

4.1.2.3 Subjective Workload Ratings

After each task, participants were asked to rate their mental workload. The most popular workload

ratings systems, SWAT and NASA-TLX, were unfeasible for this purpose because they were too time

consuming to be administered over twelve times during the course of an experiment. A brief

experiment was of the utmost importance, as participants were required to maintain some level of

motivation.

In pre-pilot testing with 10 participants, a computerized adaptation of the C-SWAT was used, as per

the recommendation of Goonetilleke & Luximon (1993). It began by asking participants to rate the

relative important of three workload dimensions, “mental effort load,” “time load,” and

“psychological stress load,” and then to rate each dimension on a scale from 1 to 9. However, many

participants reported that this rating system seemed unnecessarily lengthy and open to various

interpretations. Interestingly, the concerns of O’Donnell & Eggemeier (1986) were also echoed: that

the rating system did not differentiate between (extrinsically imposed) task difficulty and

(intrinsically controlled) “effort.”



59

Because no rating system could be found that addressed this difference, a new system was devised for

the pilot study. Participants were instructed that they would be asked to rate the “task difficulty” and

“mental effort” of every task. They were also instructed that the former should be thought of as an

objective quantity (out of their control), while the latter is something that they could control, but

that a more difficult task generally requires more effort for best performance. This phrasing was

developed through interviews with pre-pilot experiment participants, with the intention of creating

definitions that address the preconceptions of most people while clearly distinguishing between the

two constructs. After explaining the terms, participants were then asked to rate their perceptions of

each on a visual analog scale from 1 to 9 (lowest to highest perceived difficulty/effort). Their verbal

responses were recorded and later transcribed. Participants’ approval and compliance with this

system was much higher than with the computerized C-SWAT system used in the pre-pilot study.

4.1.2.4 Neuropsychological Tasks

The tasks were programmed in E-Prime (Psychology Software Tools Inc., Pittsburgh PA), so that

they could be automatically administered. E-Prime was also capable of synchronizing the start and

end of each task with the acquisition of other measurements through digital trigger signals. All text

prompts appeared at the center of the screen, serving the purpose of re-centering the participant’s

gaze at the start of each experimental task. Auditory prompts were broadcast via headphones in

stereo, so as not to encourage the participant to look toward their source. Verbal responses were

registered by a hidden microphone and later transcribed.

The three experimental, non-visual tasks were adapted from conventional neuropsychological tests,

and were selected with the aim of addressing those types of tasks that were most prevalent in the eye

movement literature. Each task had two versions: nominal and high difficulty. On all trials, the



60

participant was asked to carry out the task as accurately and (where applicable) as quickly as possible

without making errors.

The three tasks were verbal fluency, identification of words in background noise, and serial

subtraction.

Verbal fluency is a widely used clinical neuropsychological test of word generation. The task entailed

speeded retrieval of words beginning with a letter, for a fixed period of time. Participants were not

permitted to use proper nouns or to repeat words. Words were generated aloud and recorded for

later transcription. The dependent variable for performance was the total number of words

generated in a fixed amount of time. Two levels of difficulty were generated by selecting letters for

which the number of words starting with that letter is high (nominal difficulty) or low (high

difficulty), based on Borkwoski’s (1967) data. The selection of the letters was also confirmed in pre-

pilot testing of 27 participants. Because both the nominal and high difficulty versions of the task

were repeated twice during the experiment, two letters were selected for each difficulty level: ‘t’ and

‘m’ (nominal) and ‘j’ and ‘q’ (high).

Identification of words in background noise is an experimental test of auditory attention and

auditory perception. It entailed the presentation of single words in background noise, where the

background noise was unintelligible crowd “babble.” Participants were asked to say the words aloud

as soon as they identified them. The dependent performance variable was the number of correct

responses. The background noise level was consistent across the two levels of difficulty, but the

volume of the words was either high (nominal difficulty) or low (high difficulty).



61

The serial subtraction task is a test of mental control with auditory-verbal working memory and

computational skill demands. A variant of the task was used, in which participants were asked to

subtract numbers, alternating by 1’s and 2’s (nominal difficulty version) or 8’s and 9’s (high

difficulty version). Participants were assigned a start number and then asked to subtract the numbers

aloud. The dependent variable was the number of correct responses.

The three task types and their difficulty versions are summarized in Table 2. For the purpose of

brevity, the three tasks will be generally referred to as the fluency task, auditory task, and math task,

respectively.

Table 2. Descriptions of Experimental Tasks

Task Type Nominal Difficulty High Difficulty

Verbal FluencyGenerate words starting with

‘t’ and ‘m’Generate words starting with

‘j’ and ‘q’

Identification ofWords in

Background NoiseWords are spoken quietly. Words are spoken loudly.

Serial SubtractionAlternating subtraction by

1’s and 2’sAlternating subtraction by 8’s

and 9’s

The tasks each involved three stages:

1. Instruction Period: instructions for the task are presented at the center of the screen

2. Observation Period: the screen is blank; participant carries out the task for 60 seconds;

outcome measures and participant responses are recorded

3. Rest Period: participant is asked to relax before pressing a key to continue



62

There were two approaches to motivating participants. In the first approach, a contest with cash

prizes is introduced at the midpoint of the experiment as an incentive for participants to exert more

effort towards improving the performance the last half. In the second approach, the same contest is

introduced, but participants were also instructed to try “as little as possible” during the first half of

the experiment. The purpose of these instructions was to accentuate the difference in motivation

level between the first and second half (in which alternate forms of the same tasks were presented). It

was also a response to interviews with the initial participants, who felt that they had automatically

given their best effort on the first half, and therefore could not improve themselves in the second half.

The experimental groups who received these two manipulation approaches will be henceforth

referred to as the “motivated group” and “de/motivated group,” with those in de/motivated being

asked to try as little as possible in the first half of the experiment.

The contest involved three cash prizes ($100, $50, and $25), which were offered on the basis of task

performance improvement (from the first to second half of the experiment). Task performance was

said to be averaged over all three task types, so that participants would exert additional effort on all

types, rather than focusing on just one. Improvement in performance was used rather than absolute

performance so that participants would believe they had a chance at winning, regardless of their

perceived skill level. Three cash prizes, rather than one, were similarly offered to enhance the

participants’ perceptions of the contest’s odds.

4.1.3 Design

This study was originally conceived as a mixed between- and within-participants repeated measures

design, with two experimental groups: 1) with motivation manipulation (motivation group) and 2)

without motivation manipulation (controls). As has already been discussed, the perceived failure of

the initial motivation technique prompted the development of a second approach, and therefore a



63

third experimental group was created: the de/motivated group described earlier. All participants

repeated each task and difficulty level three times, over a practice block and two experimental blocks

(18 trials in total). Completion of the entire task battery took 30-40 minutes, which pre-pilot

experiments (n = 10) suggested as the maximum duration before participants tended to report

moderate fatigue levels. All participants in the motivation and de/motivated groups were given

notice of a contest for cash prizes after the first block. However, those in the de/motivated group

were also asked to try as little as possible during the first block, in order to accentuate the effect of

the contest on motivation levels. The control group was not told of the contest, and their only

instruction was to concentrate on the tasks while avoiding frustration. In sum, the control group

received two blocks of identical trials, while the motivated and de/motivated groups received two

blocks of trials wherein the first block is considered “unmotivated” and the second block was

“motivated.” The purpose of having a control group was to determine whether any changes in

outcome measures are due to factors such as fatigue or practice, rather than the presence of either

motivation manipulation.

The order of task presentation was counterbalanced by trials using a balanced latin square design to

control for carry-over effects. Furthermore, the presentation of specific prompts in each trial (i.e.,

letters, starting numbers, words in noise stimuli) were reversed for half of the participants in order

to verify that any apparent effect was not simply due to the order of the prompts. For example, while

half of participants were prompted with ‘q’ for the difficult fluency task in the first half of the

experiment and ‘j’ in the second half, the other half of participants were prompted with ‘j’ in the first

half.



64

The independent variables for this study were:

1. Task Type

2. Task Difficulty Level

3. Presence of Monetary Incentive (Motivation)

The outcome measures were:

1. Intersaccadic Interval (ISI) Length

2. Task Performance: number of correct and incorrect responses

3. Subjective Effort Ratings (1-9) for “Task Difficulty” and “Mental Effort”

4.1.4 Procedures

The inclusion and exclusion criteria described previously were indicated on the recruitment poster.

When participants contacted the study coordinator to discuss participation, these criteria were

confirmed. At that time, participants were also requested to follow some guidelines designed to

avoid excessive fatigue effects during the experiment:

On the night before the study,

o avoid excessive alcohol consumption

o avoid illicit drug use

o get a good nights rest

On the day of the study,

o avoid heavy exercise

o avoid eating a heavy meal before testing

o avoid abnormal caffeine consumption



65

A consent form and pre-test questionnaire (see Appendix B), which gathered basic demographic

information and verified that the preceding guidelines were met, were electronically provided to the

participant at least 24 hours in advance of the scheduled experimental session. Upon arrival at the

session, the participant were asked to read the form if they had not already. They were then asked

whether they had any questions, and subsequently, to sign the form. The ethics review board of the

Toronto Rehabilitation Institute and the University of Toronto approved all procedures.

The experimental room was maintained at a comfortable temperature and isolated from outside

noise. Overhead fluorescent lighting did not interfere with the operation of the tower mount system.

The participant was seated at a desk that was spare except for the eye tracking system and a

computer monitor placed approximately 40 cm from the participant’s face. The monitor was as close

as comfortable viewing allowed, so that the angular span of the eye tracker calibration targets was as

large as possible. A stationary chair was chosen to discourage large movements during the

experiment.

The experimenter placed electrodes on the fingers of their non-dominant hand, as the dominant

hand would press a button in order to start the experimental tasks. Electrodes were also placed on

wrists and/or lower leg of the participant, depending on which location resulted in the best signal for

each individual. A vest containing inductance respiratory effort belts was also worn by the

participant, on the outside of their clothing. Where the remote eye tracker was used (versus the

tower mount system), an adhesive “target” was also applied to participants’ foreheads. To allow time

for electrode gel to absorb into the skin and for the participant to become accustomed to their

environment, they were now asked to fill out the pre-test questionnaire.



66

When the questionnaire was completed, the participant was then situated for optimal eye tracker

performance. The desk height was adjustable for this purpose. In the case of the remote eye tracker,

optimal performance in terms of measurement range was achieved with the tracker approximately

10-15 cm below the height of their eyes. However, this position tended to draw attention to the

tracker itself and also obstructed the view of the monitor, which had to be located directly in front of

the participant for calibration purposes. Thus, the remote tracker was positioned at about 20 cm

below the eyes. In the case of the tower mount eye tracker, desk height and participant position was

adjusted so that their forehead rested comfortably against the rest. The eye tracker was then

calibrated using a 9-point calibration procedure (spanning approximately ± 20°).

Before starting the experiment, the participant was told:

1. the experimenter would be outside of the room during the experiment except after the

practice session and after the experiment halfway point, when they would re-enter the room

to answer questions

2. the experimenter would be monitoring the progress of the experiment at all times, and

would step in if there was an unforeseen problem

3. to avoid excessive body movements, especially turning around

4. (in the case of the tower mount system) they need only have their forehead in the rest during

the actual (60 s) task periods

5. it was important to us that they concentrate on the tasks but avoid becoming frustrated with

their performance

6. it was important to keep their eyes open (the experimenter monitored the eye tracker output

and reminded the participant whenever necessary)



67

7. (for the de/motivated group) they should try “as little as possible” during the tasks; in

particular they should try to relax and pay little attention to their performance

Participants were also instructed that the start of each task was self-paced. Therefore, they were able

to take as much or as little time as they chose between tasks.

If the participant belonged to the either the motivation or de/motivated manipulation groups, then

they would received notice of the motivating contest through the computer monitor. The

experimenter would also enter the room to answer any questions and to ensure that the contest

notice was taken seriously. If the participant belonged to the control group, they would only receive

notice that they were halfway through. The experimenter would also enter the room for the purposes

of experimental symmetry and to verify the comfort of the participant.

At the end of the session, the various instruments and electrodes were removed, and the participant

was debriefed on the full details of the study, the reasons for any previous nondisclosures, and the

study’s potential scientific and clinical impact. Control participants would be informed of the

contest and told that their results would be entered alongside all other participants.

The experimental session lasted 1 to 1.5 hours, including setup and debriefing.



68

4.1.5 Analysis

For each dependent measure, only a small number of datasets were useable (n = 8, see Results section,

below). Therefore, data were examined descriptively and graphically, as there were an insufficient

sample size to conduct inferential statistics.

The first step in the analysis was to use subjective effort and task performance data to verify that the

experimental manipulations of task difficulty and participant motivation level actually resulted in a

change in mental workload. An increase in workload would have been indicated by the following

outcomes:

o increase in subjective effort ratings

o increase in task performance from low motivation condition to high motivation condition

o decrease in task performance from nominal to high task difficulty conditions

Changes in subjective workload ratings and task performance were aggregated and plotted in terms

of the presence of change as well as magnitude, between experimental conditions. These plots were

visually analyzed with the intention of investigating the effects of task difficulty manipulations; the

effect of motivation versus the control group; and the relative effectiveness of the two motivation

groups: motivated group and de/motivated group.

Eye movement data from eight participants was used in a visual inspection of ISI lengths. Two of the

participants were from the control group and six from the de/motivated group (none from the

motivated group). Eye movement data were presented in comparative histograms of ISI length for

each independent variable (task type, task difficulty, and participant group membership). That is, the

data from each participant were represented by 12 figures, each figure containing two normalized



69

histograms for the purposes of comparing ISI length distributions between two difficulty levels of a

task or halves of the experiment. The effect of motivation was investigated by comparing the

consistency and magnitude of any response in the de/motivated group participants to those of the

control group.

As previously mentioned, heart rate, electrodermal activity, and respiratory rate were also collected

as an investigation of potential technical issues, but were not subject to analysis due to time

constraints.

4.2 Results and Discussion

4.2.1 Subjective Workload Data

Participants were asked to rate their perceived difficulty of and effort expended on each 60 second

task immediately after they completed it. There are two important trends exhibited by this data: 1)

subjective effort and task difficulty ratings generally corresponded to both motivation and task

difficulty manipulations; 2) those participants in the de/motivated group, who were told to “try less”

in the first half of the experiment, reported a larger change in effort and task difficulty from the first

to second half, with respect to both the motivated group and the control group.

Figure 5 presents the change in participants’ perceived task difficulty ratings when presented with a

more difficult version of each task. This figure represents the pooled observations of all experimental

groups and at both halves of the experiment, as similar trends were observed regardless of group

membership of experiment half. The consistency of the association between imposed and perceived

task difficulty is remarkable considering that the participants could not reference their ratings of

previous tasks, except by memory. Variations in the consistency of this association between task



70

types could be explained by relative face validity of each task’s difficulty manipulation. For example,

whereas it was obvious to the participant when the spoken words were made more audible in some

trials, it was perhaps less obvious when a more difficult letter was presented for the fluency task. In

the latter case, it could be that participants were able to rely upon their perceived (in)frequency of

the starting letters in the English language, or they were able to recognize a change in the number of

responses that they produced for each letter.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Math Fluency Auditory

Task Type

P o r t i o n o f R e p o r t

s

Reported NoChange

Reported aNegative Change

Reported aPositive Change

Figure 5. Change in task difficulty ratings from the nominal to high difficulty versions of each task.

Because the manipulation of task difficulty was only a means of affecting mental effort, the

association between task difficulty and subjective effort (Figure 6) is more important. It is to be

expected that subjective effort ratings should roughly associate with difficulty, not only because

more difficult tasks theoretically demand more effort, but also because of the connection between

difficulty and effort that was described in the instructions to the participant. Another explanation is

that participants’ mental effort ratings are influenced by their ability to recognize the intent of the

experiment, rather than their actual “feelings” of effort. Comparing Figures 5 and 6 does indicate

that at least some participants seem to draw a distinction between perceived difficulty and effort, as



71

per the assertions of Naccache et al. (2005). However, their relative similarity may either suggest the

success of the difficulty manipulation in affecting effort, or they simply cast doubt on the ability of

participants to distinguish between subjective effort and imposed demands. Unfortunately, this is a

enduring problem with the interpretation of subjective effort ratings.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%


Task Type

P o r t i o n o f R e p o r t s

Reported NoChange



Figure 6. Change in subjective effort ratings from nominal to high difficulty version of each task.

The relative success of the two approaches to motivating participants, used with the motivated group

and de/motivated group, can be inferred from Figure 7. Clearly, those participants that were asked to

“try less” in the first half of the experiment (de/motivated group) more consistently report a change

in subjective effort after the introduction of the contest. However, as with the ratings of perceived

task difficulty, it is conceivable that this effect may have been more a consequence of preconceptions

than the relative presence of effortful feelings. That is, perhaps participants were more apt to indicate

a change in effort when they had been expressly asked to “try less” in the first half of the experiment.

Looking to performance data may lend further evidence in favour of one approach over the other.



72

Figure 7. Change in subjective effort ratings from 1st

to 2nd

half of experiment, by experimentalgroup; data is pooled over all task types.

It should also be noted that reports were excluded from the data of Figure 7 if extreme scale values

(either 1 or 9) were assigned to both halves of the experiment. It is conceivable that participants may

have assigned a higher or lower rating in these cases, had they a longer scale or previous knowledge

of the motivation manipulation. Approximately 10% of observations were thus disregarded. This

stipulation was not made for the rating datasets in Figures 5 and 6, because all tasks/difficulty levels

had been previously presented to the participant in the practice block.

In post-test interviews concerning the contest, it was common for participants to mention that the

contest would have been more effective had they the guarantee of receiving prize money

immediately after the experiment. The option of compensating motivation group participants based

on their performance (e.g., 50 cents per correct responses) was considered, but would have required

the calculation of performance scores during the experiment, rather than being transcribed

afterwards. Because the eye tracking equipment required the experimenter’s attention during the

experiment, this would have posed a feasibility issue.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Motivated Group 1 Control Group Motivated Group 2

P o r t i o n o f R e p o

r t s

Reported NoChange



Motivated Group Control Group De/Motivated Group



73

4.2.2 Task Performance Data

Task performance data were used to verify that the experimental manipulations of task difficulty (i.e.,

more demanding math problems, more uncommon starting letters, quieter words in noise) actually

resulted in lower task performance. The concept of a control loop that allocates mental effort in

response to our perceptions of task performance is very common in the literature, from the early

work of Kahneman (1973). Therefore, a distinct change in task performance bodes well for the

success of the manipulation in affecting mental workload. Figure 8 illustrates that task selection was

successful in this respect.

-90

-80

-70

-60

-50

-40

-30

-20

-10

0Math Fluency Auditory

Task Type

M e a n C h

a n g e i n N u m b e r o f C o r r e c t

R e s p o n s e s ( % )

Figure 8. Mean and standard deviations of (percent) change in number of correct responses

from nominal to high difficulty task versions

To answer the question of whether either of the motivation manipulations affected task performance,

it is necessary to view the data for each task because some participants reported that the effect of

motivation on performance differed between them. These data are presented in Figures 9a and 9b, as

the proportion of observations where performance improved and degraded, respectively, between

the first and second halves of the experiment. The figures clearly suggest that the second approach to



74

motivation, which included instructions to exert less effort in the first half, was more successful than

the first approach, which did not have any special instructions for the first half. The superiority of

the second approach is particularly evident in Figure 9b, which shows that the performance of the

de/motivated group participants less frequently degraded from the 1st to 2nd halves than that of

motivated group participants. However, the general success of the motivation manipulations was

reasonably poor with respect to the control group. For example, Figure 9a shows that roughly the

same proportion of control participant trials as motivated participant trials improved their

performance on the math task. The most obvious interpretation of this result is that any

improvements in performance on the math task were due to practice effects. However, Figure 9b

shows that in many cases, control participants did not benefit from practice, and perhaps even

experienced fatigue, leading to a low effort strategy. Because all participants reported that the contest

increased their motivation to do well, loss of interest is not likely responsible for any degradation in

motivated participants’ performance. A more plausible explanation is that the contest led to above

optimal arousal in some cases, leading to “mental blocks,” and generally poor composure. Subjective

assessments of participants’ voice recordings support this conclusion.

Figure 9a (left). Portion of trials in which performance improves from 1st to 2nd half of experiment (pooled over

both difficulty levels); Figure 9b (right). Portion in which performance degrades.

0%

20%

40%

60%

80%

100%


Task Type

Motivated

Group

Control Group

De/Motivated

Group

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%


Task Type

P o r t i o n o f T r i a l s



75

Another way to analyze the effect of motivation is to examine the magnitude of performance change,

with the purposes of 1) determining whether there was an effect of motivation compounded with a

presumed practice effect, and 2) to further evaluate the two motivation techniques. Note that

observations of no improvement or negative change were excluded from these data in case they had

been confounded by excessive anxiety due to the contest or (in controls) lack of interest due to

fatigue. Comparing the magnitudes of change in motivated group observations versus control group

observations (Figure 10), it seems that the effect of the contest was actually detrimental, rather than

additional to the effect of practice. With the exception of the auditory task, the larger magnitudes

exhibited by the de/motivated group seems to suggest that task performance was affected by

instructions to exert less effort in the first half of the experiment. However, the large standard

deviations belie the variability between individual trials as well as the small sample sizes from which

these means were calculated. In particular, the control group/auditory task data had a very small

sample size (n = 2) as most participants did not improve their performance. The other group/task

combinations contained between 5 and 7 trial cases.

0

30

60

90

120


Task Type

M e a n P e r f o r m a n c e I n c r e a s e ( % )

Motivated

Group 1

Control

Group

Motivated

Group 2

Figure 10. Mean and standard deviations of change in task performance from 1st to 2nd half (pooled

over difficulty levels), using data from only those participants who improved.

MotivatedGroup

ControlGroup

De/MotivatedGroup



76

Most importantly with respect to the manipulation of motivation, the performance data corroborate

with subjective rating data as well as informal post-experiment interviews with participants. That is,

participants were generally sceptical of the effectiveness of the contest as a motivation tool unless

they were part of the group that was explicitly instructed to exert low effort in the first half of the

experiment. Otherwise, participants often reported that they had performed optimally in the first

half and therefore did not feel as if it was possible to improve in the second half.

A further observation is that the introduction of the contest seems to cause a high stress condition in

many people, possibly leading to a detriment rather than enhancement of performance. Though

stress could be thought of as a component of workload (see Robert & Hockey, 1997), this condition

may lack ecological validity as a replication of clinical conditions. It is unlikely that a clinician or

psychometrist would allow a client to reach such a high level of stress that it is detrimental to their

performance on a rehabilitation treatment task or neuropsychological test.

4.2.3 Eye Movement Data

As previously mentioned, only eight participants’ eye movement datasets were suitable for

intersaccadic interval (ISI) length analysis. Data from the other participants was deemed too noisy,

due to the prevalence of pupil occlusions or head movements. The greatest improvement to signal

quality resulted from switching from the remote eye tracker to the tower mount system; all eight of

the “suitable data” participants were tested on the latter. Six of these participants belonged to the

de/motivated group (informed of contest and asked to try less in the first half), while two belonged

to the control group.

Histograms of ISI lengths for each participant and each experimental condition (experiment half,

task type, and difficulty level) were compared visually. An inspection of histograms comparing



77

difficulty levels for each task type revealed no obvious effect resulting from a change in difficulty.

Although the distribution of participants’ ISI lengths would vary widely between conditions, the

direction of the shift was not consistent for any of the task types. Histograms comparing ISI lengths

from the first to the second half of the experiment similarly did not reveal any consistent effect for

neither the math task nor the fluency task.

However, an effect of motivation is suggested by the auditory task data, particularly in the high

difficulty (quiet words in noise) version. That is, the frequency of very long ISI (on the order of

several seconds long) appears to be higher on the auditory task in de/motivated group participants,

relative to control group participants. This conclusion was drawn from the histograms in Appendix

C, which are a compilation of each participants’ ISI data for the auditory task, comparing ISI

distributions between the first and second half of the experiment. Note that the distributions appear

to be either bimodal or unimodal with a strong positive skew. Therefore, the histograms have a non-

linearly scaled x-axis to enhance the visualization of both short and long ISI lengths.

Again, post-experiment interviews were valuable in understanding participants’ eye movement

behaviour during the experiment. In particular, some participants assumed that they were required

to stare straight ahead during the tasks. This misunderstanding is important because it would

obscure any treatment effect due to ISI lengths being generally very long. Further, a reverse effect

may occur, wherein effortful tasks cause a lapse in participants’ conscious efforts to stare straight

ahead, thus leading to more frequent eye movements, not less. Participants reported that this

assumption was mainly a result of having their head movements constrained by the forehead rest.

Another factor identified was the proximity of the monitor to the face (40 cm), which was intended

to improve calibration accuracy. Participants also reported that the calibration procedure itself drew



78

attention to their eye movements, as the purpose of the eye tracker became obvious to them. An

additional finding from the interviews was that eye closures may have actually been encouraged by

the request that participants avoid them. One participant spontaneously revealed that the thought of

this requirement led to a psychosomatic irritation and a subsequent urge to blink. Subsequently,

other participants reported the same experience when specifically prompted about it.

In addition to frequent blinking, there were a number of other contributors to eye movement signal

quality. In some participants, it was not possible for the tracker to properly detect the corneal

reflection and pupil center through a large enough range of movements. Reasons include anatomical

characteristics (size and shape of the eyes) and squinting. Because the tracker detected the pupil as

an area of sub-threshold infrared reflectance, it was also important that the level of infrared (IR)

light reflected through the pupil was lower than that reflected by surrounding tissues. In some

participants, this issue arose due to mascara use, which readily absorbs IR light, while in others, the

IR reflectance of their pupils was inexplicably high. Contact lens use was not identified as a factor.

Within individual participants, signal quality was also periodically degraded by eyelid occlusion of

the pupil when looking downward and (much less so) by gross head movements. Slight head

movements due to verbalization resulted in only low amplitude noise levels (< 0.3 visual angle).

Finally, although the eye tracker was specified to function with non-bifocal corrective lenses, all

corrective lens use restricted the visual angle range substantially. Further, they necessitated very

precise adjustments to the dichroic mirror angle, so that if the participant removed their forehead

from the rest and then returned to a slightly different position, readjustment would be necessary.

Soft contact lenses also did not pose any problems in this regard, while tracking with hard contact

lenses was not attempted.



79

4.3 Conclusions

The results of the pilot study do not suggest a general relationship between ISI length and effort on

non-visual tasks. However, the possibility of a correlation between motivation in the auditory task

(identification of words in noise) and ISI length is worthy of further investigation, considering that it

has not been previously reported. Furthermore, the very low sample size and unresolved

methodological issues require that these results be verified with an improved experiment involving

more participants. Improvements should address the following issues:

o efficacy of the motivation manipulation

o

conscious control of eye movements by participants

o generally poor eye tracker data quality due to:

o prolonged blinks/closures

o pupil occlusion when looking downwards

o easily controlled factors: corrective lens and mascara use

o the eventuality of low eye signal quality on some trials due to head movements or

uncontrollable pupil occlusions



80

Chapter 5. Full-Scale Study

5.1 Restatement of Hypotheses

Hypothesis (1):

Based on the results of the pilot study, which are discussed in the Pilot Study section, the response

of average saccade rate to difficulty level will differ between task types. In the auditory task, it is

expected that the average saccade rate will be lower for the high difficulty condition than in the

nominal difficulty condition. In the math and fluency tasks, the presence of a significant change is


Hypothesis (2):

Based on the results of the pilot study, the response of average saccade rate to motivation level

will differ between task types. In the auditory task, it is expected that the average saccade rate will

be lower in the standard motivation condition than in the low motivation condition. In the math

and fluency tasks, the presence of a significant change is hypothesized, but its direction is not.

Hypothesis (3):

For all three task types, average saccade rate will correlate with the electrophysiological findings,

task performance, and self-reports of subjective effort, for both task difficulty and motivation

manipulations. Specifically, it is expected that task performance will be negatively impacted by

task difficulty, but positively impacted by an increase in motivation level, while heart rate and the

rate of spontaneous skin conductance responses will be positively correlated to effort for both

experimental manipulations.



81

Hypothesis (4):

Based on the results of the pilot study, Long ISI, which are defined as being longer than 1500 ms,

will be significantly more prevalent where either the difficulty or motivation level during the

auditory task is increased. For the math and fluency tasks, the presence of a significant effect is


5.2 Methods

Many of the methods for the full-scale study were identical to those of the pilot study, therefore this

section will only highlight differences between the two.

5.2.1 Participants

37 participants were recruited, but the “healthy vision” inclusion criterion was changed to:

o Have normal hearing and vision (may wear glasses with a low prescription or any soft

contact lens)

The purpose of this modified criterion was to ensure that participants could remove their glasses for

the duration of the experiment while still being able to read the task instructions. This ability was

confirmed in the screening process, via email or telephone.

Of these 37 participants, the data of 13 were excluded from the analysis on the basis of poor eye

movement signal quality (11 participants) or other technical difficulties (2 participants). The

remaining 24 participants comprised of an equal number of motivation manipulation and control

group participants, and represented each counterbalanced trial combination of the latin square

design described earlier. According to the demographic summary in Table 3, the groups are well

matched in terms of participant age, but not in terms of gender.



82

Table 3. Participant Demographic Summary

Group Membership Median Age

(Standard Deviation)Number of Females

(Males)

Control 27 (8) 2 (10)

Motivation Manipulation 29 (13) 8 (4)

5.2.2 Materials

5.2.2.1 Eye Tracking

Eye movements were tracked using the tower mount system that is described in the Pilot Study

section. Data quality was substantially improved by fixing participants’ eyebrows in a slightly raised

position using a piece of medical tape, a technique documented by Johansson et al. (2001). Although

participants could blink freely, the eyelid was less likely to occlude the pupil when they looked

downward. This technique also had the unexpected benefit of reducing blink frequency in most

participants, presumably because it served as an implicit reminder to avoid closing their eyes.

Eye movement data collected under these conditions was of a high enough quality that the

proprietary DataViewer software (SR Research, Mississauga ON) was able to adequately detect the

majority of saccades correctly. Therefore, processing with the main sequence-based saccade

detection algorithm developed during the pilot study was not necessary. However, all of the raw data

was again visually inspected and any problem areas corrected manually. As in the pilot study,

saccade data was also post-processed for minimum amplitude and detection of saccades during short

periods of pupil occlusion.

ISI/Saccade data were summarized using three variables: “trial proportion of long ISI” (percent),

median ISI length, and average saccade rate. While long ISI have been previously recorded in terms

of their frequency (Callan, 1998), this measure would have been misleading during the short tasks



83

involved with this study. There were several cases wherein only one or two saccades were executed

during the 30 second trial. Instead, the cumulative duration of all long ISI during a trial were

expressed as a percentage of the total trial duration. In this way, trials with very few, extremely long

ISI would be more adequately compared with those in which several, moderately long saccades were

executed. A long ISI was defined as being at least 1500 ms in duration. This threshold was found to

best distinguish a task difficulty and motivation treatment effect. Finally, note that eye movement

data from the first 2000 ms of each trial were disregarded in the calculation of these summary

variables. It was assumed that this time was necessary for the most participants to reach a steady-

state in their execution of the tasks.

The eye tracker also recorded blink events, defined by the loss of the corneal reflection and/or pupil.

This loss was presumed to be caused by occlusion of the eyelid, but observations of the participants

revealed that it could also be caused by eye positions outside the range of the tracker. At these

extreme visual angles, the tracker had problems for two reasons: 1) reflectance threshold of the pupil

was too high (i.e., apparent pupil reflectance began to approach that of surrounding tissues), and/or

2) the curvature of the eyeball was such that the corneal reflection was not present. With the use of

the eyebrow lifting technique described previously, actual blink rates were very low in some

participants, so that the majority of their “blink” events were actually due to out of range looking.

Therefore, blink rate was not used as a measure in this study.

5.2.2.2 Electrodermal Activity and Heart Rate

EDA and heart rate information were collected as in the pilot study, with the only notable change

being the use of a high impedance ground strap on the participant’s ankle. Through the course of the

experiment, static electricity became an issue as humidity levels dropped with the change of season.



84

In participants tested before the issue was identified, lifting of the heel resulted in an EDA signal

artefact, much like a movement artefact. The raw EDA waveform of all participants was visually

inspected and regions with either artefacts were excluded from analysis.

The EDA signal was characterized by the frequency of spontaneous skin conductance responses

observed. The signal was digitally filtered with a 1 Hz low-pass FIR filter (700 coefficients), which

effectively enforced the one second minimum wave period limit used by Storm et al. (2000). An

algorithm was created to process the filtered signal, searching for local max/minima and

determining the amplitude of individual responses. As per the recommendation of Storm et al., a

minimum 0.02 microSiemens amplitude threshold was used.

Average heart rate was calculated using the “AcqKnowledge” data acquisition/analysis software

(BioPac Systems, Chicago IL) following a visual inspection of the R-wave signal for movement

artefacts. The use of heart rate variability as a dependent measure was considered, the trial time was

deemed too short for reliable estimates of medium range variability (0.07 - 0.14 Hz) favoured by

previous studies of mental workload. Although it has been stated that measurement windows as

narrow as 30 seconds can be used (Mulder, 1992), preliminary testing suggested that conventional

analysis methods were not suitable. Therefore, heart rate variability measurement was considered

outside the scope of this study.

As with eye movements, heart rate and EDA data were ignored for the first 2000 ms of each trial.



85

5.2.2.3 Neuropsychological Tasks

The task battery was identical to that administered in the pilot study, but the length of each trial was

decreased from 60 to 30 seconds. The reason for this change was to allow for a single repetition of

each task condition without substantially increasing the overall length of the experiment. Trial

repetition not only improved the reliability of the experiment, but also served as a safeguard against

data loss due to eye closure, misunderstanding the instructions, or other unforeseen circumstances.

It was also thought that participants would better avoid distraction during a shorter task.

As in the Pilot Study section, the three neuropsychological tasks, properly called serial subtraction,

verbal fluency, identification of words in noise, will be referred to as “math task,” “fluency task,” and

“auditory task” in this section.

Having two repetitions of each task required double the number of fluency task starting letters, so a

new set was chosen using the results of a previous investigation of this task by Borkwoski (1967).

Tests in naive participants (n = 5) verified that letters in the nominal difficulty set (b, m, t, and f)

and the high difficulty set (k, q, y, and j) resulted in response counts that were consistent within each

set but substantially different between difficulty levels.

The task battery script was also modified to use a white background instead of a black background.

This modification caused participants’ pupils to be more constricted, which improved the eye

tracker response by helping to preventing pupil occlusion during partial eyelid closure or downward

looking.



86

5.2.2.4 Motivation Manipulation

The “motivation” of one group of participants was manipulated by instructing them to try less hard

on the second half of the experiment. Unlike the pilot study, there was no mention of a contest. The

new method was adopted because it had been demonstrated that participants were generally able to

self-regulate their effort, but the previous “second approach” had two important problems: 1)

possible ecological invalidity due to extremely high stress levels in some participants, and 2)

experimental asymmetry. The asymmetry arose because control group and motivation group

participants were not given equivalent treatment in the first half of the experiment. Considering the

discussion of learning and effort in the Literature Review section, it stands to reason that the effect of

practice would have been higher in the control group. It is furthermore conceivable that the control

group was potentially more susceptible to fatigue effects. These possibilities would have unduly

complicated interpretation of the results.

One drawback to the new motivation manipulation approach is that it requires participants to be

sufficiently motivated in the first half of the experiment to create some contrast with the second half.

However, pilot experimentation had suggested that the majority of participants were intrinsically

motivated to do well, and this inclination was augmented by clearly instructing participants to “try

their best” in the first half.

5.2.2.5 Post-Experiment Questionnaire

Subjective workload ratings were not interspersed within the task battery session, as they were in the

pilot study, being instead replaced by a post-experiment questionnaire. The primary reason for this

change was to save time, as doubling the number of trials would have otherwise resulted in an

unfeasibly long session duration. Furthermore, the results of the pilot study indicated that the



87

relationship between subjective ratings with experimental conditions was only uncertain for the

manipulation of verbal fluency difficulty as well as that of motivation for all tasks. Participants’

perceptions of the verbal fluency task are addressed in Parts 1(c) and 1(f) of the questionnaire

( Appendix D), where they are asked to sort the various starting letters according to their associated

difficulty and effort levels. Participants assess the effects of the motivation manipulation in Part 4 of

the questionnaire, which asks whether they were successful in down regulating their effort level on

the second half of the experiment. Part 1(g) is also important, as it verifies that participants entered

the experiment with a high level of motivation.

The questionnaire also includes questions on the participants’ awareness of their eye movements

(Part 3), or lack thereof. They were asked whether they consciously stared, whether they felt as if they

were supposed to be looking anywhere in particular, and whether they otherwise thought about their

eye movements during the experiment.

Questions regarding changes in mental strategy were also included, to identify any gross changes in

strategy that would cause misleading mental effort or task performance effects.

5.2.3 Design

The design of the experiment is similar to that of the pilot study, but with an additional block (two in

total) of practice trials to minimize practice effects and four blocks of experimental trials (rather

than two blocks, as in the pilot study). Because the trials were half as long at in the pilot study (30

seconds) and subjective ratings were queried after task battery completion, the session still lasted 30-

40 minutes. Control group and motivation manipulation groups are treated identically for the

practice blocks and the first two blocks of the experiment, but only the motivation group was asked



88

to exert less effort for blocks three and four. As with the pilot study, the presentation order of the

three task types and two difficulty versions (six different tasks in total) in each block was

counterbalanced between participants using a balanced latin square design. Task prompting stimuli

(e.g., starting numbers for serial subtraction or starting letters for verbal fluency) were also

counterbalanced so that half of participants received a given stimuli set in blocks one and two, while

the other half received that set in blocks three and four. Thus, there were 24 group, task order, and

stimuli order combinations in total, which were each administered to a single participant.

The independent variables for this study were:

4. Task Type

5. Task Difficulty Level

6. Presence of Demotivation Instructions

The outcome measures were:

1. Subjective Reports of Effort and Perceived Task Difficulty

2. Number of Correct Responses

3. Number of Incorrect Responses

4. Trial Proportion of Long ISI

5. Median ISI Length

6. Average Saccade Rate

7. Average Heart Rate

8. Spontaneous Skin Conductance Rate



89

5.2.4 Procedures

The testing procedures were identical to those in the pilot study, but for the following changes:

o in the guidelines included with the consent form, participants were asked to avoid wearing

mascara to the study; makeup removal pads and the option of rescheduling were both offered

to any participants who had not complied

o the monitor was placed as far away from the participant as was comfortable (60 cm), in order

to address concerns that the proximity of the monitor may cause participants to assume that

they are required to stare straight ahead

o a high impedance grounding strap was attached to participants’ ankles, to prevent static

electricity-related artefacts in electrophysiological signals

o participants were told that the eye tracker was being used to measure pupil diameter (which it

is also capable of recording), and they were told they did not need to remember to stare

straight ahead because it could make this measurement regardless of gaze direction

o a piece of medical tape was used to affix the eyebrow in a slightly raised position, ensuring that

the participant could still blink freely

o participants were not asked to avoid closing their eyes unless eye closure was observed during

the practice session or later in the experiment; this request was only necessary in a few

participants

o

after the practice blocks, all participants were told that it was necessary for them to give their

best effort on the tasks, while avoiding frustration on those that are more difficult

o participants belonging to the motivation manipulation group were given instructions midway

through the experiment to “try as little as possible” on the last half of the experiment; the



90

research coordinator entered the testing room at this point for all participants, to answer any

questions and ensure their comfort

o calibration was not performed on each participant, but on the research coordinator before the

participant arrived, in order to avoid drawing attention to the actual purpose of the eye

tracker; the accuracy of this calibration was verified for each participant after the task battery

was completed, using a four-point validation pattern; these validations revealed less than 20%

gaze position error over a visual angle of approximately 30°, which is adequate for the current

study because accurate gaze position and velocity estimates were unnecessary

o at the end of the experiment, participants were asked to fill out a post-test questionnaire

( Appendix D), which included questions about subjective effort, task difficulty, any conscious

control of eye movements, and (if applicable) perceived success in down-regulating their effort

during the last half of the experiment

5.2.5 Analysis

The analysis was completed used a condensed dataset, wherein repeated observations from blocks 1

and 2 were combined into an average “1 st half” measurement, and those from blocks 3 and 4 into a

“2nd half” measurement. Where one of the observations was deemed unusable due to artefacts,

equipment failure, or less commonly, misinterpretation of task instructions, its equivalent trial

repetition was used alone.

As in the pilot study, the goals of the analysis were to seek dependent variable responses to

experimental manipulations of task difficulty and motivation level. These responses were sought for

each dependent variable and for each task type (math, fluency, and auditory). The hypotheses of this

study were:



91

1. Average saccade rate will be significantly, positively correlated with task difficulty on the

auditory task

2. Average saccade rate will be lower where participants are motivated to do well on the

auditory task

3. For both manipulations of motivation and difficulty, average saccade rate findings will

converge with those in heart rate, spontaneous skin conductance response rate, self-reports

of subjective effort, and task performance

4. The trial portion of long ISI will be greater where either the difficulty or motivation level

during the auditory task is increased

A preliminary inspection of the data was completed using box and whisker plots of raw change

scores resulting from each dependent and independent variable combination ( Appendix E). There

were three conclusions: 1) observed effects were very small and vary widely between measure and

group membership, 2) the distributions of observation groups (i.e., individual conditions,

represented by a single box and whiskers) were often not normal, and 3) the distributions of

observation groups within each independent variable combination varied widely (i.e., different

skews and variances). An independent variable “combination” is referring to a set of observation

groups that would be compared to test an effect hypothesis.

5.2.5.1 Parametric Model Assumptions

The third conclusion of the box plot inspections is particularly critical to the model choice for

hypothesis testing. The obvious choice of analysis for the design of this study is a mixed (repeated

and between subjects) analysis of variance (ANOVA). However, ANOVA are parametric analyses

that model populations using assumptions that should be reflected in the observed samples. One of



92

those assumptions is that all populations are normally distributed1. It is often possible to transform

data that do not initially meet this criterion, but this may not be the case where different observation

groups exhibit characteristics that require different transformations. Simply put, it can occur that a

transformation may increase normality for one observation group while decreasing it in another.

This problem was confirmed in some subsets of the data through the (failed) use of these common

transformations:

)1log( i

t

i X X += (1)

( ) ii

t

i X X

X −+

=

1

1

max,

(2)

i

t

i X X += 1 (3)

where t

i X is the transformation of observation

i X . Note that in Eqn. 1 and 2, 1 was only added

toi X where it was required to transform a null measurement. Eqn. 2 is a form of a more common

transformation wherein each observation is subtracted from the maximum observation (+1) in order

to preserve the direction of change scores, as recommended by Field (2005). Normality between

observation groups was tested using the Shapiro-Wilks test, which was administered as with all

statistical methods using SPSS v17 (SPSS Inc., Chicago IL).

Another criterion for the parametric analysis of variance (ANOVA) is that of homoscedascity (equal

variances). A mixed ANOVA requires equal variance of changes between treatment levels for each

factor (versus equal variances of sample populations in a between subjects ANOVA).

1 ANOVA for independent sample groups (i.e., between subjects) requires normality of observation group

residuals, which are the difference between each observation and their group mean. However, a mixed

ANOVA is an analysis of change scores to determine the significance of change between treatments as well as

whether the average change seen in one participant group is different to that in another group. Thus, this

model assumed the normality of change score residuals. It can be shown that normality of observation group

samples is mathematically equivalent to normality of change score residuals.



93

Heteroscedascity is of greatest concern where sample sizes are unbalanced between groups, as the

variance of the larger group will bias the total (pooled) variance calculation. Therefore, although

Levine’s test of equal variances was generally used to determine whether this was a concern for a

given dataset, a more conservative criterion was used for unbalanced datasets (e.g., where

observations are missing). This threshold was taken from Field (2005), who recommends that the

ratio of the maximum to minimum observed sample variance should not exceed two.

Although outlier removal is another option for improvement of both homoscedascity and normality

of the comparison groups, none were omitted in this analysis. There were certainly outliers present

in the data by any definition of the term (e.g. 1.5 inter-quartile range criterion), but there was no

simply no practical justification for their removal. The validity of individual observations could be

confirmed with some certainty because each trial was documented through audio recordings. Thus,

participants’ behaviour on outlier trials was thereby verified as being consistent with that on other

trials.

5.2.5.2 Hypothesis Testing Methods

Where datasets exhibit problematic heteroscedascity or non-normality, non-parametric hypothesis

tests must be used. Unfortunately, no non-parametric test of mixed design effects has gained any

widespread acceptance, so any non-parametric analysis of the data must be somewhat more

piecemeal, addressing each experimental manipulation separately. The effect of difficulty level was

assessed where necessary using a Wilcoxon’s signed-rank test (analog to paired t -test) between levels.

The effect of motivation could have been assessed in two possible approaches: 1) use a Wilcoxon’s

test to determine 1st to 2nd half effect sizes in both the control and motivation group, then compare

effect sizes, and 2) compute change scores for each group, then use a Kolmogorov-Smirnov (K-S)



94

test (analog to unpaired t -test) to indicate whether the two groups experienced the same effect. The

K-S test is similar to the more common Mann-Whitney test, but has been recommended for small

sample sizes (Field, 2005). The second approach, involving change scores, was preferred because it

did not depend on the reliability of the effect size calculation. However, it will be shown that either

of these methods lack an important feature, which is the ability to account for the correlation

between pre- and post-treatment measures. Therefore, the results had to be interpreted with caution.

Where data fulfilled the assumptions of a parametric test, these tests were favoured, as they are more

powerful under these circumstances. As previously mentioned, a mixed ANOVA was one approach

to analyzing the data, but it was not deemed the most appropriate parametric test under all

circumstances. In many cases, its interpretation would have been problematic because 1st half

observations were highly correlated with the magnitude of change scores, and they tended to be

higher in one of the control or motivation groups compared to the other. The correlation of change

with pre-treatment levels can be the result of two very common phenomenon: 1) regression to the

mean, and 2) the law of initial values (LIV). LIV refers to a property of many psychophysiological

measures wherein the baseline level limits the amount of change that is possible (Stern, 2001). In

general, these phenomenon do not pose a problem to the use of the mixed ANOVA because mean

pre-treatment levels are roughly equivalent in the control and experimental groups, by virtue of

random assignment. However, if pre-test observations happen to be higher in the experimental

group and the change is strongly correlated to change scores, as was encountered, then ANOVA will

rightly report a difference between the groups in their average change scores. Although this

interpretation is valid, Hedeker’s (2006) interpretation of The Lord’s Paradox illustrates that it is not

the only one available. Another, more appropriate interpretation would answer the question of

whether a motivation group participant tends to exhibit more or less change compared to a control



95

participant, given that they started at the same pre-treatment level. It should be noted that this is not

the goal of the study, as a mental effort measurement tool should practically be capable of detecting a

difference regardless of starting level (to a point). However, this approach was simply a means of

rectifying the problem of experimental groups exhibiting non-equivalent pre-test conditions for a

measure that was correlated to change scores.

There are two common methods of distinguishing pre-treatment correlation effects from treatment

effects, and they are both based on regressing the change scores (or equivalently, the post-treatment

measure) on the pre-treatment measure. In one approach, the residualized change score, the

regression is performed by pooling the data from both experimental and control groups together

and fitting a regression line x2,i = mx1,i + C, where x1,i are the pre-treatment and x2,i the post-

treatment observations. The residuals for each observation i are then taken as the residualized

change scores. Another approach is to perform an analysis of covariance (ANCOVA) on the post-

treatment measure versus group, but identify the pre-treatment measure as a covariate, thus

accounting for its effect. Although the difference between these methods is generally small, the

ANCOVA method is preferred where the groups have very disparate mean pre-treatment levels

Forbes (2005). Figure 11a illustrates the issue with its representation of the residual change score

method being applied to a data set with a wide spread between control and experimental groups.

Because the regression line does not actually reflect the relationship between pre- and post-

treatment measures, x1,i and x2,i, the mean difference in residual change scores clearly underestimates

the actual treatment effect ( A-B). However, the ANCOVA (Figure 11b) method more accurately

interprets the effect ( A-B) by assigning individual regression lines to each group and identifying the

difference the two regression lines’ intercepts. The significance of this difference is calculated with

respect to the variance of residuals for each group. Note that any linear regression technique requires



96

a parametric model, with assumptions of homoscedascity and normality of residuals. Furthermore,

these methods obviously require equivalence of within-groups regression slopes, otherwise the

treatment effect cannot be distinguished (Cronbach & Furby, 1970). Where slopes agree and the

correlation between pre- and post-treatment measures is high (r > 0.4), they recommend the use of

the ANCOVA method described above. It will be seen that these conditions applied in a number of

cases.

Figure 11a (left). Illustrating error resulting from residual change score method on unmatched groups;

Figure 11b (right). ANCOVA method is preferred in this case because it does not pool data for regression.

5.2.5.3 Multiple Comparisons Correction

An important consideration in this study was the correction for multiple comparisons, which is

necessary to correct for the chance of a type I error; that is, the greater number of hypothesis tests

taken, the greater the likelihood of a false positive error occurring. This effect can be balanced using

the Bonferroni correction, which essentially involves decreasing the significance threshold (α) by a

x 1

ControlGroup

ExperimentalGroup

pooledregressionline A

B

ControlGroup

ExperimentalGroup

within-groupsregression

A

B

x 2 x 2

x 1



97

factor of the number of tests applied (k). However, the Holm-Bonferroni method, which is a less

conservative alternative, was used. After each hypothesis test was completed, their associated p-

values were ordered and the lowest p-value tested against the Bonferroni adjusted criterion (α/k). If

the null hypothesis was rejected, then the next p-value was compared with (α/k-1). This process

continued until a null hypothesis was eventually rejected. Note that this correction process was

completed separately for each experimental manipulation, the rationale being that each

manipulation could be considered relatively independent of each other. That is, making k

comparisons toward the hypothesis that the experimental manipulation had some effect should not

have affected the type I error likelihood for subsequent tests of the difficulty manipulation. By the

same token, it could be argued that each outcome measure was investigating an independent

phenomena, and that each task represented an independent (sub-) experiment. However, it was

determined that these variables were each too closely inter-related due their common reliance on the

effectiveness of the manipulations, which must always be considered an uncertainty in studies of

mental workload. Thus, considering the small effect sizes predicted by the visual inspection, it was

clearly necessary to control type II error (false negative) by limiting the number of comparisons for

each experimental manipulation.

To this end, four of the original nine dependent variables were excluded from analysis (subjective

effort data is not suitable for hypothesis testing). The excluded variables were performance (correct

and incorrect responses), median ISI length, and spontaneous SCR rate. Exclusion of performance

data was simply a consequence of having not identified any promising/interesting trends during

visual inspection. The exclusion of median ISI length and SCR rate require more explanation.



98

Median ISI length was effectively replaced by the average saccade rate, as it was found that the these

two measures correlated very closely, which is illustrated in Figure 12 (note logarithmic axes). The

inverse relationship is predicted when the sum of all saccade durations during a trial are negligible

with respect to the observation period. The median ISI length offers two advantages over the median

ISI length: 1) confidence in measure, and 2) resistance to outliers. With reference to the first

advantage, recall that it was necessary to detect the presence of saccades during “blink” periods in

which the eye position was unknown. If detected, the precise moment of the saccade had to then be

estimated at the centre of the blink period. This practice obviously lends some uncertainty to

subsequent ISI length calculations, while the mere presence of a saccade is more certain. With

reference to outliers, it was found that median ISI length data exhibit extreme positive skew and was

resistant to common transformations toward a normal distribution. Saccade rate data was much less

problematic in this regard.

y = 19000x-0.9

R2 = 0.86

1

10

100

1000

100 1000 10000 100000

Median ISI (ms)

S a c c a d e R a t e ( 1 / s )

Figure 12. Plot of median ISI versus average saccade rate for all trials reveals a very

consistent (r 2 = 0.86) inverse relationship; note logarithmic axes.

Spontaneous SCR data was primarily disregarded because it was sparse. Large sections of the EDA

signal had to be excluded because they exhibited movement artefacts, which masked any actual



99

response occurrences. As a result, almost 20% of observations were missing values. In some cases,

comparison group sizes were thereby high unbalanced, which can be problematic with variance

violations in parametric tests, as previously discussed. Furthermore, interpretation of the data would

have been complicated by the inclusion of several EDA “non-responders,” which is defined by

Kettunen (2000) as a person in which less than one spontaneous response is elicited per two minutes.

A less conservative approach to the Holm-Bonferroni correction was also realized by limiting the

number of comparisons to those identified as “promising” through a visual inspection. The most

promising trends were identified from the aforementioned change score box plots and subsequently

tested for statistical significance using the method most recommended by the discussion in the

preceding section. Table 4 is a summary of these trends, with references to locate their positions on

the box plots of Appendix E.

Table 4. Summary of Trends Subjected to Statistical Analysis

MeasureTask

Type

Description of TrendAppendix E

Reference

(-) difficulty effect in 1st

halfplot group: 1reference: A

Fluency(+) motivation effect for high difficultyversion of task

1B


half1C

AverageHeart Rate

Auditory(+) motivation effect for high difficultyversion of task

1D

Math (+) difficulty effect in 1st

half2E


half2F

AverageSaccade

Rate Auditory(+) motivation effect for high difficultyversion of task

2G

(+) difficulty effect in 1st

half3HTrial

Proportionof Long ISI

Auditory(-) motivation effect for high difficultyversion of task

3I



100

It should be noted that the problem of multiple comparisons is not conventionally alleviated through

visual inspection of the data and subsequently omitting dependent variables and individual

conditions from analysis. Arguably, the act of visual inspection is a form of an hypothesis test, so

that the resultant analysis must be corrected for the total number of possible comparisons, rather

than just the number of “promising” trends. However, this approach was chosen because no

alternative method of reducing type II error likelihood was found. In short, the conventional Holm-

Bonferroni method was deemed too conservative.

To be thorough, the results of a post-hoc, full hypothesis test battery are also included ( Appendix F ).

This analysis included comparisons between all experimental conditions, for all dependent variables

not excluded on the grounds of technical difficulties. It can be shown retrospectively that the

findings of the conventional approach do not differ to any consequential extent from those

presented in the Results and Discussion.

5.3 Results and Discussion

5.3.1 Hypothesis Testing

Hypothesis tests were conducted on each of the trends noted in a visual inspection of the data, which

were presented in Table 4. The method of testing was dependent on the characteristics of each

dataset, so they are detailed below. Where p-values meet the criterion of significance (α = 0.05 for

two-tailed tests), they are reported without reference to its significance, as this determination

depends on the final results of the Holm-Bonferroni correction method.



101

5.3.1.1 Average Heart Rate

Fluency Task

The comparison groups in this dataset all passed the Shapiro-Wilk test for normality and Levine’s

test for homoscedascity. Because the sample groups were unbalanced due to a technical issue with

one of the control participants, the ratio of maximum to minimum variance was calculated: 1.9.

Since this value was less than the threshold suggested by Field (2005) of 2, parametric analyses were

deemed appropriate. A mixed ANOVA reported p = 0.010 for the effect of task difficulty (F (1,21) =

7.94). The effect size was r = 0.52 Although it is common to report ω2 for ANOVA effect sizes, Field

(2005) recommends the more focused contrast effect size:

R R

R

df df F

df F r

+

=

),1(

),1((4)

Looking to the effect of motivation during the high difficulty version of the task, it was found that

the correlation between 1st half and 2nd half measures was very high, with ρ = 0.92. As per the

recommendations of Chronbach & Furby (1970), the effect of motivation level was tested using the

ANCOVA method rather than mixed ANOVA, with 1st half measures as a covariate. The test

confirmed a very strong relationship between 1st half and 2nd half measures with p < 0.000 (F (1,20) =

365.4, r = 1.00). It reported p = 0.002 (F (1,20) = 12.03, r = 0.88) for the effect of motivation during

the high difficulty version of the task. Note that the effect size, r , was calculated based on the

regression parameter t -statistic, as recommended by Field (2005) for unbalanced ANCOVA:

22

2

−+=

N t

t

r (5)

where N is equivalent to the total number of observations, including repeated measures.

Homogeneity of 1st versus 2nd half observation regression slopes was confirmed post-hoc, as per the

instructions of Field (2005).



102

Auditory Task

Although the data were tested to be sufficiently normal and homoscedastic, they did not meet Field’s

recommended criterion of max/min variance, with a ratio of 2.1. Furthermore, attempts to correct

the data using conventional data transformations (Eqn. 1,2, and 3) were unsuccessful.

Therefore, non-parametric analyses were chosen to test the effect of task difficulty. Two Wilcoxon

signed-rank tests were used for each experiment half over all participants’ data ( n = 23). The test

returned p < 0.001 (z = -4.04, r = -0.60) for the 1st half. The effect size was calculated as per Field:

N

zr = (6)

where N is equivalent to the total number of observations, included repeated measures.

Non-parametric analysis were also necessary to test the effect of motivation for the high difficulty

task version. A Kolmogorov-Smirnov test of change scores reported no significant effect, but this

result must be more closely scrutinized because it does not account for the very strong correlation of

1st and 2nd half measures ( ρ = 0.96). However, a plot of observations from the two experimental

groups (Figure 13) indicates that the there was likely no effect of motivation on HR.



103

50

60

70

80

90

100

110

50 60 70 80 90 100 110

1st Half Average HR (bpm)

2 n d H a l f A v e r a g e H R

( b p m )

Control

Group

Motivation

Group

Figure 13. Individual observations of average heart rate in 1st half versus 2nd half for high difficulty

auditory task suggests no effect of motivation.

5.3.1.2 Average Saccade Rate

Math Task

The comparison groups of this dataset exhibited a high degree of homoscedascity (max/min variance

ratio = 6.3), and their characteristics that were not improved through the aforementioned

transformation techniques. Therefore, the effect of task difficulty in the first half of the experiment

was tested using Wilcoxon signed-rank tests over all participants’ data (n = 24). The result was that

the effect is not significant.

Auditory Task

All comparison groups passed the Shapiro-Wilk and Levine’s tests of normality and homoscedascity.

Therefore, parametric hypothesis tests were used.



104

The effect of difficulty was tested with a mixed ANOVA, returning p = 0.002 (F (1,22) = 14.91, r =

0.60) for the general effect of task difficulty and p = 0.036 (F (1,22) = 5.01, r = 0.4) for the combined

effect of difficulty and group. Effect sizes were calculated using the contrast effect method (Eqn. 4).

The latter effect is referring to the tendency of motivated participants to exhibit a larger response to

task difficulty manipulations. It is difficult to interpret this result as being an interaction effect of

motivation or simply being due to an imbalance in the groups’ baseline observations, as difficulty

change scores are positively correlated to observations at the nominal level ( ρ = 0.64/0.72 for 1st/2nd

half).

The effect of motivation for the high difficulty version of the task was tested using the ANCOVA

method because of the high correlation between 1st and 2nd half observations ( ρ = 0.6). However, the

test did not find a significant effect for motivation, even for a one-tailed criterion (α = 0.1). A post-

hoc test for homogeneity of regression slopes verified that the model criteria were met.

5.3.1.3 Trial Proportion of Long ISI

Auditory Task

Without transformation, the comparison groups in this dataset passed normality and

homoscedascity tests.

The data tested using a mixed ANOVA, which reported p = 0.004 (F (1,22) = 10.07, r = 0.56) for the

effect of task difficulty. The effect of motivation was below a one-tailed significance criterion (α =

0.1), having p = 0.057 (F (22,1) = 4.02, r = 0.39).



105

However, because the 1st and 2nd half observations were highly correlated ( ρ = 0.61) , the ANCOVA

method of testing the effect of motivation is the preferred method of analysis. Contrary to the mixed

ANOVA results, it reported a non-significant effect for motivation.

In view of this disagreement, a scatter plot of change scores versus 1 st half observations for the high

difficulty version of the auditory task is presented in Figure 14. A distinction between the two groups

is evident, though their within-groups variance may preclude statistical significance. Note that when

two motivation group outliers (marked in Figure 14 with “****”) were removed from the dataset, the

ANCOVA method returned p = 0.029 (F (1,19) = 5.58, r = 0.48) for the effect of motivation. However,

outlier removal was not justified, as these two participants exhibited this behaviour consistently

(considering both trial repetitions) and were furthermore not found to differ from the other

participants in any other aspect.

0

20

40

60

80

100

120

0 20 40 60 80 100 120

1st Half Long ISI Portion (%)

2 n d H a l f L o n g I S I P o r t i o n ( %

)

Control

Group

Motivation

Group

Figure 14. Individual observations of Trial Portion Long ISI in 1st half versus 2nd half for high

difficulty auditory task suggest a weak motivation effect.

*

*



106

5.3.1.4 Holm-Bonferroni Correction

Following the multiple comparisons correction method described in the Analysis section, the

significances of the preceding results were determined. In total, 5 hypothesis tests were performed

with regards to the effect of task difficulty and 4 tests regarding the effect of motivation. Therefore,

the minimum (starting), corrected significance criteria were 0.010 and 0.013 for difficulty and

motivation manipulations. The significance of interaction effects (involving changes in both the

difficulty and motivation conditions) were evaluated using whichever of the two criteria that was

smaller (more stringent).

As a result, only the effects if task difficulty were found to have significant effects on any of the

measures. Specifically:

o An increase in verbal fluency task difficulty was associated with a decrease in average heart

rate (effect size, r = 0.52)

o A decrease in motivation during the high difficulty version of the verbal fluency task

difficulty was associated with an increase in average heart rate (effect size, r = 0.52)

o An increase in auditory task difficulty was associated with a decrease in average heart rate

(effect size, r = 0.60 /0.42 for 1st and 2nd halves of experiment)

o An increase in auditory task difficulty was associated with a decrease in average saccade rate

(effect size, r = 0.60); this finding confirms Hypothesis (1), where it refers to the auditory

task

o An increase in auditory task difficulty was associated with an increase in the trial proportion

of long ISI, which are defined as exceeding 1500 ms in duration (effect size, r = 0.56); this

finding confirms Hypothesis (4), where it refers to the auditory task



107

Regarding Hypothesis (4), A non-significant, but strong trend was also noted: a decrease in

motivation during the high difficulty version of the auditory task was associated with a decrease in

the trial proportion of long ISI. This result is consistent with the observations of the Pilot Study,

wherein more motivated participants exhibited more frequent long ISI compared to less motivated

participants.

Findings with respect to Hypothesis (3) will be discussed in a later section, Convergence of Measures.

5.3.2 Post-Experiment Questionnaire

With respect to participants’ ratings of task difficulty and effort, in Part 1 of the questionnaire

( Appendix D), the results were similar to those gathered in the pilot study, but there was clearer

evidence that participants were able to differentiate between perceived effort and difficulty. Whereas

participants unanimously identified manipulations of math and auditory task difficulty, they less

consistently equated increased task difficulty with increased effort. 21% (5 participants) and 12% (3)

reported no change in effort between math and auditory task difficulty levels, respectively. The

observation that some participants made a blatant distinction between effort and difficulty lends

further support to the capacity of subjective effort ratings to differentiate between them. Not only is

this important because there have been proposals otherwise (see Gopher & Donchin, 1986), but it is

arguably counterintuitive to the everyday experience where effort is presumably very closely coupled

to task difficulty. This coupling is reflected in the “control-system” model of mental workload

wherein our perceptions of task performance serve as feedback in effort allocation decisions (Robert

& Hockey, 1997). However, in the context of the experiment, where the goal was to “try” rather than

perform, it is conceivable that effort could have been independent of task difficulty.



108

Subjective ratings of the fluency task were of particular interest because those ratings taken during

the pilot study were less consistent in supporting the success of the difficulty manipulation than in

those concerning the math and auditory tasks. Furthermore, the length of the trials had been

shortened compared to the pilot study, which may have narrowed the performance gap between

difficulty levels. Participants were given a list of the stimulus letters and asked to sort them into three

categories ( Appendix D, Part 1c): “Less Difficult,” “More Difficult,” “Neutral or Don’t Remember.”

This procedure was repeated for the relative effort levels (e.g. “Tried Less”) associated with the letters.

Each participant’s reports were compiled into Figure 15 and 16, which illustrate that ratings of

starting letter difficulty were more consistent than those for effort. However, in both cases, the trend

clearly distinguishes between those letters defined as nominally difficult and those defined as highly

difficult. Furthermore, box plots of fluency task change scores between difficulty levels ( Appendix E,

plot group 9, reference J) show that the manipulations of fluency task difficulty were successful in

affecting task performance.

0

5

10

15

20

25

k b f t j m y q

Starting Letter

R e p o r t C o u n t

"Neutral or Don't

Remember"

"Less Difficult"

"More Difficult"

Figure 15. Tally of participants’ starting letter perceived difficulty classifications.



109

0

2

4

6

8

10

12

14

16

18

k b f t j m y q

Starting Lette r

R e p o r t C o u n t

"Neutral or Don't

Remember"

"Tried Less"

"Tried More"

Figure 16. Tally of participants’ starting letter perceived effort classifications.

Parts 1g and 1h asked participants to rate their effort level on the 1 st half of the experiment and

whether their effort level changed from the 1st to 2nd halves. The number of participants who

reported trying their “hardest” on the 1st half of the experiment were roughly equivalent between the

motivation and control groups (9 and 7 participants, respectively), while all others reported trying

“somewhat.” This result indicates that, in general, the assumption of intrinsic motivation was valid.

Participants’ ratings of relative effort between the 1st and 2nd half are also divided between the

motivation and control group, with the number of participants who reported trying harder in the 1 st

half being 9 in the motivation group but only 1 in the control group. This difference is not surprising

considering that motivation group participants were explicitly given instructions to try less in the 2 nd

half, but it is interesting that so few control group participants tried harder in the second half. This

result seems to indicate that the effect of practice may have been offset by that of fatigue.

Another section of the post-experiment questionnaire (Part 2) queried participants on their

perceptions of their eye movement behaviours. Half of participants reported having consciously

fixated for the purposes of concentration in general. This result could either be interpreted as a



110

failure of the experiment to avoid participants’ conscious control of their eye movements, or an

explanation of the underlying cause of any trend between task difficulty and eye movements, as will

be discussed later on. A single participant reported having assumed that they were required to look

straight ahead, despite instructions otherwise, but also that the assumption did not preoccupy them.

Furthermore, this participant’s eye movement observations did not exhibit outlier behaviour.

References to mental visualization were notably present in descriptions of mental strategy (Part 3).

17% (4 participants) reported visualizing the numbers during the math task; 8% (2) visualized

objects during the fluency task; and 4% (1) visualized the speaker during the auditory task. In

retrospect, a question specifically targeting mental visualization strategies should have been included

in the questionnaire. It is conceivable that many more participants used visualization strategies, but

would not have realized it unless asked.

The last section of the questionnaire (Part 4) queried motivation group participants on whether they

felt that a) they actually tried less in the 2nd half, and b) their task performance was subsequently

lower. The results of this section contrast starkly with those of Part 1h (above) in assessing the

effectiveness of the motivation manipulation. Roughly half of participants felt that they were

generally able to try less, and almost all of them did not feel their performance decreased, although

the majority acknowledged the effect of practice. Three participants mentioned that it was

particularly difficult to exert lower effort during the math tasks, as they repeatedly “caught”

themselves working harder.

5.3.3 Convergence of Measures

Because there is no gold standard for mental workload measurement, this study was designed to test

for a relationship between eye movements and workload through a convergence of several measures.



111

This approach has the goals of demonstrating not only that eye movements were affected by

experimental manipulations, but that these manipulations were actually successful in changing effort.

However, the current study did not yield significant effects or strong trends for each measure,

manipulation, and each task type. In terms of eye movement measures, the auditory task was the

only case in which effects were exhibited with some confidence. At face value, the effect of the

motivation and task difficulty manipulations were as predicted, suggesting a negative correlation

between saccade occurrence and effort during the auditory task. But was effort actually manipulated?

For the manipulation of task difficulty during the auditory task, convergence of measures is

acknowledged, but not without caution. Subjective ratings appear to indicate a change in effort, but

these data must be interpreted with the understanding that subjective effort measures may only be

an indication of participants’ perceptions of experimental intent (see Gopher & Donchin, 1986),

rather than actual effortful feelings. The results of the pilot study should be more reliable in this

regard, as the presentation of a numerical scale is intuitively better suited to a qualitative assessment

than a categorical one. Looking back at the pilot study data, participants were almost unanimous in

rating their effort higher for the higher difficulty task. Turning to performance data, there is a clear

distinction in the number of correct responses. If effort regulation is viewed as a control system with

perceived task performance as a feedback variable, as in Robert & Hockey’s (1997) model, then a

performance effect is at least indicative of the potential for a change in effort. That said, it is entirely

possible that at least some participants could try equally hard on both tasks and still achieve different

scores. However, the model speaks to the most common strategy that people employ, which is to

increase their effort in response to performance decrements. There was also a significant effect of

task difficulty on average heart rate. However, this effect was the opposite of what is predicted by the

literature and therefore requires further discussion later on.



112

There are fewer indications that the manipulation of motivation was successful in changing effort

levels. Half of post-experiment questionnaires from the motivation group did not indicate that the

manipulation was successful in affecting effort. Furthermore, these results should be considered

optimistic, due to participants’ perceptions of the experimental intent. Performance measures also

show very little effect of motivation, but although this result does not bode well, it is very possible

that a change in effort could have no effect on performance, as the auditory task was very likely data-

limited for most people (see Literature Review for explanation of “data-limited”). In other words,

any effort expended beyond a very nominal level would not affect participants’ ability to hear and

understand the words. That said, looking to the lack of any motivation effect on math and fluency

task performance, which should be resource-limited, may indicate that the manipulation was

generally unsuccessful in changing participants’ motivation levels. It could also indicate that the

theorized relationship between effort and performance is simply false. Recall that this view was

expressed by Kahneman (1973).

As with task difficulty, the effect of motivation caused the opposite effect in heart rate compared to

that predicted by previous literature. Heart rate is thought to be linked to effort through a stressor

response that can accompany mental workload, especially where there are perceived consequences to

participants’ performance (see Wilson 1991). However, this link would suggest a positive correlation

of heart rate and effort, rather than the negative correlation observed here. At face value, these

seemingly spurious results lend further support to the consideration of heart rate as an unreliable,

easily confounded measure of effort. However, an alternate explanation is that the decreases in heart

rate are a result of breath-holding, which is known to cause bradycardia (Smith, 1977). This

explanation lends itself best to the effect of auditory task difficulty, as breath-holding could have

been employed as a strategy to minimize breathing noise during the high difficulty version of the



113

task. In the fluency task, where significant heart rate effects were also observed, breath-holding may

have corresponded with responses, leading to bradycardia where fewer responses were given (high

difficulty level) or the participant was concentrating more intently on the task (high motivation

level). Conceivably, breath-holding occurred during memory searches between each response, just as

one pauses mid-step when they are trying to remember where they left their keys. Although tenable,

confirmation of this theory requires respiratory event data. Therefore, average heart rate trends

cannot be considered to have indicated any associated change in effort level due to motivation or

difficulty.

To sum the results of this section and previous sections regarding task performance and subjective

effort ratings regarding Hypothesis (3): A convergence of measures was strongly demonstrated for

subjective effort ratings, task performance, and eye movements (average saccade rate and portion

long ISI) for difficulty manipulations during the auditory task. Further, a somewhat weaker

convergence was shown for these measures for motivation manipulation during the auditory task. In

the math and fluency tasks, there was also demonstrated a weak convergence between subjective

effort ratings and task performance in the manipulation of task difficulty. However, convergence was

not otherwise demonstrated.

5.3.4 Agreement of Eye Movement Results with Previous Literature

As highlighted in the Literature Review, there have been very few studies that manipulated task

difficulty and/or motivation level while recording eye movements during non-visual tasks. Only

three applicable studies were uncovered, though they have been more generally discussed previously,

the details of their methodology and findings are summarized in Table 5.

Table 5. Summary of Relevant Previous Literature



114

CitationExperimental

Manipulation orComparison

Eye Movement (EM)Measure

Mean ResultsSample

Size(# Trials)

Klinger et

al., 1973

Math Task:Single Digit Addition

vs.“Moderately Difficult

Problems”

Number of seconds in a 30second trial that contained

at least one EM with a“deflection” > 65°

Increase in numberof seconds (does

not specify)

21

(3-8)

Ruth &Giambra,

1974

Verbal fluency task:High Motivation

vs.Low Motivation

Instructions

frequency of EM withcausing deflection > 1mm

(on EOG polygraphreadout)

70-95 EM/minutevs.

45-55 EM/minute

24(4)

Antrobus,1973

Identification of AudioTones:

Single Tone vs.Two-Tone vs.

Three-Tone Sequence

Frequency of EM withamplitude > 3°

Decrease infrequency (does

not specify)

NotDisclosed

Although the observations of the current study generally agree with the previous research on

auditory task workload (Antrobus, 1973), they are discordant with those on the math (Klinger et al.,

1973) and fluency (Ruth & Giambra, 1974) tasks. In either case, a positive correlation of eye

movement rates and workload is implied, whereas in the current study there was no trend observed.

Although this disagreement is certainly cause for concern, the credibility of these previous studies is

subject to some scrutiny.

A very important feature of Ruth & Giambra’s (1974) study is that participants in the high

motivation group were instructed to make their responses aloud to an observer, while those in the

low motivation group only thought about their responses. This difference may have had a large

impact on the observed effect, considering that the act of verbalization could have prompted the

occurrence saccades, just as it is associated with blinking (Stern et al., 1984). However, their

technique should also be noted as a clever manipulation of participants’ motivation because



115

participants whose performance is observed will be much more motivated than those who are not.

Such an approach might be considered in future research.

Klinger et al.’s (1973) results are unfortunately very difficult to interpret due to their unusual choice

of summary measure and eye movement amplitude threshold. It is also notable that there was a

much broader range of math problem difficulties presented by Klinger et al. than the current study.

Whereas Klinger et al. appear to have presented participants with two very different math tasks, the

two difficulty versions in the current study (counting backwards by 1’s and 2’s versus 8’s and 9’s)

were chosen because they are expected to involve very similar cognitive processes and strategies.

However, due to the author’s brevity in describing the moderate difficulty tasks, it is not possible to

speculate on their specific differences with the single digit addition task.

5.3.5 Eye Movements and Mental Workload

As previously discussed, mental effort has been inferred from the occurrence of saccadic eye

movements in operators engaging in a secondary, non-visual task though the relationship between

non-visual task workload and eye movements is poorly characterized. The results of the current

study confirm suspicions raised by very early research, indicating that a variety of eye movement

responses can be expected, depending on task characteristics. As has been the approach of previous

studies of non-visual tasks, these various responses can be conceptualized in a model of eye

movement control and mental workload, such with the arousal/interference avoidance model

introduced in the Literature Review section.

It is beyond the scope of the current study to prove or disprove any theorized links between effort

and eye movements. However, their general approach may be problematic because they tend to



116

imply a “hard-wired” link between eye movements and effort, when such an implication may be

unnecessary and misleading. In looking at the results of the current study alone, without the

influence of any preconceived model or previous results (of which there are precious few), an

alternate, more pragmatic perspective is equally viable.

Earlier in this discussion, it was suggested that the large number of participants reporting the use of

a conscious fixation strategy may be an indication of bias because it is the goal of the study to record

“natural” eye movements rather than affected ones. However, another perspective is that the

commonality of this strategy is actually the root cause of any observed link between eye movements

and workload. That is, people consciously or habitually stare when they are listening, and this

behaviour is more prominent when they either have difficulties with hearing a stimulus or when they

are more motivated to hear better. This explanation is not to say that the behaviour is not linked to

actual or perceived interference avoidance, but regarding it as a learned, rather than an intrinsic

response is of practical consequence. For example, with regards to the use of eye movements as a

clinical mental effort tool, this distinction is important because it means that it could never be a

sensitive measure in people that do not exhibit this behavioural pattern. If this is the case, then an

eye movement based measure of effort could join the ranks of many other physiological measures,

which may reliably demonstrate an effect over a sample group, but not necessarily in each individual.

5.4 Conclusions

The current study was successful in reinforcing the results of previous work in which eye movements

were linked to audio-perceptual task workload. In particular, average saccade rate and the

occurrence of long ISI (> 1500 ms) were correlated with task difficulty, and there was a non-

significant relationship suggested between long ISI and participant motivation level. The effect of



117

motivation appeared to be stronger in the more difficult version of the task, indicating an interaction

effect between difficulty and motivation.

Eye movement measures were not shown to correlate with effort manipulations during math and

verbal fluency tasks, which is at odds with previously documented findings (Antrobus, 1973; Klinger

et al., 1973; Ruth & Giambra, 1974). This disagreement casts some doubt on the success of the

motivation manipulation in changing participants’ effort levels, at least during the fluency task, as

Ruth and Giambra demonstrated a very strong motivation effect.

For all task types, a high degree of variance, both within- and between-participants, was observed.

However, this variability cannot necessarily be attributed to unreliability of the eye movement

measures themselves, as variation in individual participants’ responses to experimental

manipulations could not be ruled out through convergence of measures. In particular, post-

experiment questionnaire results suggested that the manipulation of motivation may have been

unsuccessful in affecting many participants’ effort levels.

At face value, the results of this preliminary study indicate that an ISI length- or average saccade-

based effort measurement tool would find only narrow application in neuropsychological testing

and rehabilitation treatment. Firstly, between subjects variance in eye movement effects was very

high, so the universality of the relationship between eye movements and effort may be questionable.

Secondly, there was only one task in which the response appeared to be significant with respect to

this variance: the auditory task. Thirdly, it was found that accurate detection of saccades during non-

visual tasks through current video tracking methods is not a trivial endeavour, especially in people

that have smaller eyes and/or tend to squint.



118

However, these apparent roadblocks to clinical implementation are only overwhelming when the

needs of all clinical applications are considered together. Although eye movements may not hold

promise as a global measure of workload, one that is appropriate for all applications, there may well

be a niche for it. For example, the findings of the current study offer promise for the use of eye

movements as a diagnostic tool for mild traumatic brain injury. In this capacity, even an effort

measure that is effective for a single domain, such as auditory perception, would provide valuable

information on a clients’ cognitive functioning.



119

Chapter 6. Limitations

The limitations of this study are expressed in terms of its ability to demonstrate the utility of eye

movements as a clinical measure of mental effort.

Use of Non-Clinical Population

It is likely that special considerations and findings may be associated with certain clinical

populations. Therefore, these populations will require specific investigation before the tool could be

employed with them.

Experimental Manipulations of Effort

Although the experimental manipulations of effort through task difficulty and motivation were

carefully developed through the use of pre- and pilot study observations, it is clear that they can be

further improved.

Only Two Levels per Experimental Manipulation

In order to conserve the total experiment duration, only two levels of each experimental

manipulation were made. The demonstration of an effect through three or more levels would not

only be a more convincing demonstration of the phenomenon, but the characteristics of the

response may speak to the resolution of the tool and the presence of any ceiling/floor effects.

Limited Effort Range

This limitation is particularly relevant to the math and fluency tasks, where the manipulation of

effort through difficulty was less strongly perceived by participants than with the auditory task. It

has been argued that the subtlety of difficulty manipulations in these tasks may have been



120

responsible for the lack of any observation effect, a finding that does not agree with the (limited)

previous literature.

Ecological Validity

Whereas the experiment was modelled after neuropsychological testing, there were also a variety of

departures that were deemed necessary for the sake of isolating particular phenomena. Once these

fundamental relationships have been demonstrated, a more ecologically valid experimental method

should

Limited Number of Task Types

The tasks were chosen in the current study because they had been previously associated with very

different eye movement responses to effort. The variety of responses observed in the current study

suggest that any generalization of these results must be approach with caution. Therefore, testing

with variety of task characteristics is recommended.

Inadequate Assessment of Underlying Mechanisms

This study was designed to demonstrate a relationship between eye movements and effort in the

context of the intended applications, rather than explain it. Although a full understanding of such a

complex phenomenon may not be possible, the clinical use of an eye movement based tool would

require identification of at least its primary factors. Otherwise, it would be very difficult to anticipate

confounding factors

Small Sample Size

Although any viable clinical effort measurement tool should demonstrate a reliable effect regardless

of sample size, the subtlety of any effort manipulation dictates that even the best planned experiment



121

will not have an equivalent effect across all participants. A general divergence of measures in the

current study indicates that this may have been the case in the current study. Thus, these

experiments can exhibit very high variance, though the measurement itself is sound in principle.

Repeated Measures Design

Although a repeated measures design is advantageous from an experimental point of view, in many

circumstances involving the clinical applications of interest it would be necessary to gauge variations

in effort across subjects. For example, during neuropsychological testing it may be useful to gauge a

client’s effort against some standard. Though indicative, a within-subjects effect does not

immediately confirm a reliable between-subjects effect.



122

Chapter 7. Extensions

Clinical Applications

It is important to reinforce one of the conclusions of the Literature Review: that researchers have

generally turned from seeking a global measure of workload to seeking measures that are application

specific. Therefore, in further evaluations of eye movements as a clinical effort measure, the first step

should be to determine which neuropsychological and rehabilitation medicine applications are likely

to be compatible with the method. Although the results of the current study recommend tasks

involving auditory perception, it is necessary that future research confirms and builds upon these

findings. In particular, a logical extension would be the exploration of other perceptual domains,

such as touch and smell.

After this preliminary work has been completed, there will still be a great deal of research necessary

to characterizing an eye-based effort measurement tool’s capabilities. Inter- and intra-participant

reliability, sensitivity, ceiling/floor effects, and resolution are all basic measurement tool

characteristics that will need to be assessed. In addition, it will be necessary to investigate whether

certain patient populations are unsuited to the tool, whether because tracking their eyes poses a

technical issue, or because they do not exhibit the same eye movement responses that healthy people

do.

Additionally, the use of mental effort as a diagnostic tool requires research into the validity of the

endeavour itself. As the Literature Review suggests, there have been indications that effort

measurements would be an effective extension to conventional neuropsychological tests, especially

in the detection of very mild impairments. However, the consequences of false diagnoses call for a

very rigorous investigation of any new methodologies.



123

Eye Movements and General Workload Measurement

The results of the current study suggest that the most promising application of an eye movement

based effort measurement tool is in tasks involving audio perceptual workload. Therefore, it is

predicted that the most gainful research would expand on this particular finding. Considering that

the end goal of this research is to develop a tool that is sensitive to changes in individuals’ effort

rather than trends in a sample population, it would be prudent to adopt an approach that better

controls for individuals’ various responses to experimental manipulations. As it should be expected

that any experimental manipulation, no matter how cleverly posed, will not be successful in affecting

effort in each participant/trial, there is a need to 1) better gauge the success of experimental

manipulations, and 2) better account for any variation in their success.

The current study has demonstrated that subjective effort ratings are one tool for the former

objective. However, where the focus of the experiment is narrowed to audio perceptual workload, it

may be possible to seek other, more specialized measures that suit this particular application. For

example, pupil diameter has been shown to be effective in a previous study of driving and auditory

perception (Recarte & Nunes, 2003).

To the purpose of accounting for variations in experimental manipulation success: a secondary

measure may be used as a covariate in the analysis of any effects. An similar approach may be to set a

secondary measure threshold for the success of an individual trial, then either omit unsuccessful

trials or carry out a comparison of “unsuccessful” versus “successful” outcomes. Again, the purpose

of this technique would be to distinguish the reliability of the measure from that of the experiment.



124

Improvements to the Experimental Manipulations of Effort

To the same end, it is also necessary to take what has been learned from the current study and

improve the experimental manipulations themselves. The recommendations of incorporating more

levels of manipulation and, in some cases, a greater range of effort have already been outlined in the

Limitations section. Looking specifically to the motivation manipulation, which is a particularly

difficult experimental goal, there are a number of other suggestions.

First, the method of Ruth & Giambra (1974) has already been cited as an interesting approach. They

affected participants’ motivation levels not only through the wording of instructions but also in

having motivated participants give their responses verbally, thus introducing the pressure of “being

observed.” Although this method may have had the disadvantage of an unbalanced effect of

verbalization on eye movements, the general concept could also have been implemented with a non-

verbal response such as a conspicuous button press response. Alternately, participants may be asked

to verbalize in both groups, but only the motivation group is accompanied by an observer.

With respect to the monetary incentive technique, it was noted that some participants did not report

a strong change in motivation because the reward was neither guaranteed nor immediate. In light of

this observation, an alternate method is recommended, wherein participants are rewarded for each

correct response they give, and the money is promised immediately after the study. In the current

study, this approach was considered but not used because of the technical issues involved with

response counting during the experiment versus being transcribed afterwards. If the tallying of

responses is possible during future experiments, then this method would be highly recommended

where very high motivation levels are desired. Recall that where a repeated design is used, this

method would not preclude the issue of intrinsically high motivation during the pre-treatment phase



125

and in the control group. Simply asking participants to try very little has been shown to be effective

in some participants in the current study, but may pose a problem for others, especially during very

high difficulty tasks. It should also be noted that this motivation technique introduces an ethical

concern because participants are potentially being compensated different amounts for the same time

volunteered. This issue could be easily circumvented by “topping up” everyone to a pre-determined

level when the experiment has finished.

Considering the complexity of manipulating motivation, it is tempting to abandon it in future

investigations when it has been implied that the same variable (effort) is affected by through a

change in task difficulty. However, it must be clearly recognized that the concept of effort as a

unifying construct for a number of possibly independent phenomena should be considered a matter

of convenience rather than fact. The most prudent recommendation is that the best experimental

manipulation most closely mirrors the intended application, rather than representing a some general

effort condition. This practical point emphasizes the importance of carefully defining the application

in question.

Underlying Basis for Eye Movement – Audio Perceptual Workload Relationship

In the Literature Review, the approach of some previous literature was criticised for not considering

the mechanisms behind the various phenomena that were documented. In the current study, a

relationship between saccadic eye movement occurrence and audio perceptual workload has been

observed, and it is appropriately recommended that a better understanding of this relationship is

sought. In the Results and Discussion section of the full-scale study, a pragmatic explanation for the

observed effect was introduced. Briefly, it was suggested that the observation may simply reflect a

learned behavioural pattern wherein people tend to stare when they are straining to listen.



126

Importantly, this approach is at odds with theories of a “hard-wired” relationship between eye

movement inhibition and cognitive load. Not only is the behavioural explanation less general in

terms of predicting responses to various task types, it also implies that the effect could vary more

widely between people. Certainly, a wide variety of responses were observed in the current study,

both between task characteristics and individual participants. Based on the results of the current

study, future research should test this perspective.

Eye Movements and Aging

The relationship between eye movements and auditory perception could also stimulate an entirely

different line of research into perceptual deficits. If the effects observed in the current study are

taken to suggest an interference avoidance strategy during auditory perception, it would be

interesting to investigate any bearing that this effect may have on their performance. Put simply, is a

person’s ability to concentrate on auditory perception reflected in their eye movements? In

particular, the study of aging populations may provide insights into theories of inhibition and aging.

Hasher (2007) has suggested that a failure to eliminate task-irrelevant information, a process termed

deletion, is an important component of decline. Therefore, eye movements may be a meaningful

measure by which to gauge the failure of inhibitory mechanisms.



127

References

Amadeo, M., & Shagrass, M. D. (1963). Eye-movements, attention, and hypnosis. Journal of

Nervous and Mental Disease, 136(2), 139-145.

Andreassi, J. L. (1973). Alpha and problem solving: A demonstration. Perceptual and Motor Skills,

46, 905-906.

Andreassi, J. L. (2000). Human behaviour and physiological response (4th ed.). Mahwah, NJ:

Lawrence Erlbaum Associates Inc.

Annett, J. (2002). Subjective rating scales: Science or art? Ergonomics, 45(14), 966.

Antrobus, J. S., Antrobus, J. S., & Singer, J. L. (1964). Eye movements accompanying daydreaming,

visual imagery, and thought suppression. Journal of Abnormal Psychology, 69, 244-252.

Antrobus, J. S. (1973). Eye movements and nonvisual cognitive tasks. In Zikmund, V. (Ed.), The

oculomotor system and brain functions: Proceedings of the international symposium held at

Smolenice 19-22 October, 1970. London: Butterworth.Backs, R. W., & Seljos, K. A. (1994). Metabolic and cardiorespiratory measures of mental effort: The

effects of level of difficulty in a working memory task. International Journal of Psychophysiology,

16(1), 57-68.

Bagley, J., & Manelis, L. (1979). Effect of awareness on an indicator of cognitive load. Perceptual

and Motor Skills, 49(2), 591-594.

Bahill, A. T., Clark, M. R., & Stark, L. (1975). The main sequence, a tool for studying human eye

movements. Mathematical Biosciences, 24(3-4), 191-204.

Bailey, C. M., Echemendia, R. J., & Arnett, P. A. (2006). The impact of motivation on

neuropsychological performance in sports-related mild traumatic brain injury. Journal of the

International Neuropsychological society, 12(4), 475-484.

Becker, W., & Fuchs, A. F. (1969). Further properties of the human saccadic system: Eye

movements and correction saccades with and without visual fixation points. Vision Research,

9(10), 1247-1258.

Beda, A., Jandre, F. C., Phillips, D. I. W., Giannella-Neto, A., & Simpson, D. M. (2007). Heart-rate

and blood-pressure variability during psychophysiological tasks involving speech: Influence of

respiration. Psychophysiology, 44(5), 767-778.

Bergstrom, K. J., & Hiscock, M. (1988). Factors influencing ocular motility during the performance of

cognitive tasks. Canadian Journal of Psychology, 42(1), 1-23.

Berguer, R., Smith, W. D., & Chung, Y. H. (2001). Performing laparoscopic surgery is significantly

more stressful for the surgeon than open surgery. Surgical Endoscopy, 15(10), 1204.

Berka, C., Levendowski, D. J., Lumicao, M. N., Yau, A., Davis, G., Zivkovic, V. T., Olmstead, R.E.,

Tremoulet, P.D., & Craven, P.L. (2007). EEG correlates of task engagement and mental workload

in vigilance, learning, and memory tasks. Aviation Space and Environmental Medicine, 78(5),

B231-B244.

Bianchini, K. J., Mathias, C. W., & Greve, K. W. (2001). Symptom validity testing: A critical review.

Clinical Neuropsychologist, 15(1), 19-45.



128

Blanchard, H. E. (1985). A comparison of some processing time measures based on eye

movements. Acta Psychologica, 58(1), 1-15.

Borkowski, J. G., Benton, A. L., & Spreen, O. (1967). Word fluency and brain damage.

Neuropsychologia, 5(2), 135-140.

Brookings, J. B., & Damos, D. L. (1991). Individual differences in multiple-task performance. In D. L.

Damos (Ed.), Multiple-task performance (pp. 363-386). London: Taylor & Francis.

Brookings, J. B., Wilson, G. F., & Swain, C. R. (1996). Psychophysiological responses to changes in

workload during simulated air traffic control. Biol Psychol, 42(3), 361-77.

Cain, B. (2007). Review of the mental workload literature. Report #RTO-TR-HFM-121-Part-II.

Defense Research and Development Canada, Toronto.

Callan, D. J. (1998). Eye movement relationships to excessive performance error in aviation.

Proceedings of the Human Factors and Ergonomics Society, 2, 1132-1136.

Cardall, A. J. (1943). Purdue pegboard. Oxford, England: Science Research Associates.

Carpenter, P. A., & Just, M. A. (1978). Eye fixation during mental rotation. In J. W. Senders, D. F.

Fisher & R. A. Monty (Eds.), Eye movements and the higher psychological functions (pp. 115-133). L. Erlbaum Associates.

Carpenter, R. H. S. (1988). Movements of the eyes. (2nd ed.). London: Pion.

Carpenter, R. H. S. (1991). Eye movements. Boca Raton: CRC Press.

Chapman, P. R., & Underwood, G. (1998). Visual search of dynamic scenes: Event types and the

role of experience in viewing driving situations. In G. Underwood (Ed.), In eye guidance in reading

and scene perception (pp. 369-393). Amsterdam: Elsevier.

Chronbach, L. J., & Furby, L. (1970). How we should measure "change" - or should we?

Psychological Bulletin, 74(1), 68-80.

Cohen, A. (1977). Is the duration of an eye fixation a sufficient criterion referring to information input.

Perceptual Motor Skills, (45), 766.

de Waard, D. (1996). The measurement of drivers' mental workload. Traffic Research Centre,

University of Groningen.

Doctor, R. F., Kaswan, J. W., & Nakamura, C. Y. (1964). Spontaneous heart rate and GSR changes

as related to motor performance. Psychophysiology, 1(1), 73-78.

Duchowski, A. T. (2007). In ebrary Inc. (Ed.), Eye tracking methodology: Theory and practice (2nd

ed.). London: Springer.

Egeland, J., Sundet, K., Rund, B. R., Asbjornsen, A., Hugdahl, K., Landro, N. I., Lund, A., Roness,

A., & Stordal, K.I. (2003). Sensitivity and specificity of memory dysfunction in schizophrenia: A

comparison with major depression. Journal of Clinical and Experimental Neuropsychology, 25(1),

79-93.

Eggemeier, F. T., Crabtree, M. S., Zingg, J. J., Reid, G. B., & Shingledecker, C. A. (1982).

Subjective workload assessment in a memory update task. Proceedings of the 26th Human

Factors Society Annual Meeting.

Eggemeier, F. T., & Wilson, G. F. (1991). Performance-based and subjective assessment of

workload in multi-task environments. In D. L. Damos (Ed.), Multiple-task performance. (pp. 217-

278). London: Taylor & Francis.



129

Eggemeier, F. T., Wilson, G. F., Kramer, A. F., & Damos, D. L. (1991). Workload assessment in

multi-task environments. In D. L. Damos (Ed.), Multiple-task performance (pp. 207-216). London:

Taylor & Francis.

Ehrlichman, H., & Barrett, J. (1983). ‘Random’ saccadic eye movements during verbal-linguistic and

visual-imaginal tasks. Acta Psychologica, 53(1), 9-26.

Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the

acquisition of expert performance. Psychological Review, 100(3), 363-406.

Falkmer, T., & Gregersen, N. P. (1999). System for driver training and assessment using interactive

evaluation tools and reliable methodologies. Report #GRD1-1999-10024. Transport Resource

Knowledge Centre.

Farmer, E., & Brownson, A. (2003). Review of workload measurement, analysis and interpretation

methods. Report #CARE-Integra-TRS-130-02-WP2. Brussels: European Organization for the

Safety of Air Navigation (Eurocontrol).

Field, A. P. (2005). Discovering statistics using SPSS (2nd ed.). London: SAGE.

Filin, V. A. (2002). Saccade automaticity and pursuing eye movement.

Twenty-Fifth European Conference on Visual Perception, Glasgow, Scotland., 31(Supplement)

Findlay, J. M., & Gilchrist, I. D. (1998). Eye guidance and visual search. In G. Underwood (Ed.), Eye

guidance in reading and scene perception (pp. 295-312). Oxford, UK: Elsevier Science Ltd.

Findlay, J. M., & Kapoula, Z. (1992). Scrutinization, spatial attention, and the spatial programming of

saccadic eye movements. Quarterly Journal of Experimental Psychology A, 45(4), 633-47.

Fischer, B., & Breitmeyer, B. (1987). Mechanisms of visual attention revealed by saccadic eye

movements. Neuropsychologia, 25(1A), 73-83.

Fischer, B., Gezeck, S., & Hartnegg, K. (1997). The analysis of saccadic eye movements from gap

and overlap paradigms. Brain Research Protocols, 2(1), 47-52.

Fitts, P. M., Jones, R. E., & Milton, J. L. (1950). Eye movements of aircraft pilots during instrument-landing approaches. Aeronautical Engineering Review, 9(2), 24–29.

Fowles, D. C. (1988). Psychophysiology and psychopathology: A motivational approach.

Psychophysiology, 25(4), 373-391.

Fuchs, A. F., Kaneko, C. R., & Scudder, C. A. (1985). Brainstem control of saccadic eye movements.

Annual Review of Neuroscience, 8, 307-337.

Garbutt, S., Harwood, M. R., & Harris, C. M. (2001). Comparison of the main sequence of reflexive

saccades and the quick phases of optokinetic nystagmus. The British Journal of Ophthalmology,

85(12), 1477-1483.

Gauggel, S., & Billino, J. (2002). The effects of goal setting on the arithmetic performance of brain-

damaged patients. Archives of Clinical Neuropsychology, 17(3), 283-294.

Gauggel, S., & Fischer, S. (2001). The effect of goal setting on motor performance and motor

learning in brain-damaged patients. Neuropsychological Rehabilitation, 11(1), 33-44.

Gauggel, S., Hoop, M., & Werner, K. (2002). Assigned versus self-set goals and their impact on the

performance of brain-damaged patients. Journal of Clinical and Experimental Neuropsychology,

24(8), 1070-1080.

Gauggel, S., Wietasch, A., Bayer, C., & Rolko, C. (2000). The impact of positive and negative

feedback on reaction time in brain-damaged patients. Neuropsychology, 14(1), 125-133.



130

Gendolla, G. H. E., & Richter, M. (2005). Ego involvement and effort: Cardiovascular, electrodermal,

and performance effects. Psychophysiology, 42(5), 595-603.

Giolma, J. P., & Lyne, J. E. (1984). Identification and characterization of rapid eye movements by

computer. Midwest Symposium on Circuits & Systems, St. Louis, Missouri.

Goonetilleke, R.S., & Luximon, A. (2001). Simplified subjective workload assessment technique.

Ergonomics, 44(3), 229.

Gopher, D., & Braune, R. (1984). On the psychophysics of workload: Why bother with subjective

measures? Human Factors, 26(5), 519-532.

Gopher, D., & Donchin, E. (1986). Workload: An examination of the concept. In K. R. Boff, & L.

Kaufman (Eds.), Handbook of perception and human performance. Oxford, England: John Wiley

& Sons.

Gopher, D. (1973). Eye-movement patterns in selective listening tasks of focused attention.

Perception & Psychophysics, 14(2), 259-264.

Gorissen, M., Sanz, J. C., & Schmand, B. (2005). Effort and cognition in schizophrenia patients.

Schizophrenia Research, 78(2-3), 199-208.

Gould, J. D. (1973). Eye movements during visual search and memory search. J Exp Psychol, 98(1),

184-95.

Green, R. E., Melo, B., Christensen, B., Ngo, L., & Skene, C. (2006). Evidence of transient

enhancement to cognitive functioning in healthy young adults through environmental enrichment:

Implications for rehabilitation after brain injury. Brain and Cognition, 60(2), 201-203.

Greene, H. H., & Rayner, K. (2001). Eye movements and familiarity effects in visual search. Vision

Research, 41(27), 3763.

Harbluk, J. L., & Noy, Y. I. (2002). The impact of cognitive distraction on driver visual behaviour and

vehicle control. Report #1388E. Transport Canada.

Hasher, L., Lustig, C., & Zacks, R. T. (2007). Inhibitory mechanisms and the control of attention. In A.R. Conway, C. Jarrold, M. J. Kane, A. Miyake, & J. N. Towse (Eds.), Variation in working memory

(pp. 227-249). New York, NY: Oxford University Press.

Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes in memory. Journal of

Experimental Psychology: General, 108(3), 356-388.

Hedeker, D. R. (2006). In Gibbons R. D. (Ed.), Longitudinal data analysis. Hoboken, NJ: Wiley-

Interscience.

Hendy, K. C., Hamilton, K. M., & Landry, L. N. (1993). Measuring subjective workload: When is one

scale better than many? Human Factors, 35(4), 579-602.

Hill, S. G., Iavecchia, H. P., Byers, J. C., Bittner, A. C., Zaklad, A. L., & Christ, R. E. (1992).

Comparison of four subjects workload rating scales. Human Factors, 34(4), 429-439.

Hillburn, B. G. (1997). Free flight and air traffic controller mental workload. Ninth International

Symposium on Aviation Psychology. Columbus, Ohio.

Hinton, J. W. (1982). Ocular responses to meaningful visual stimuli and their psychological

significance. In R. Groner, & P. Fraisse (Eds.) (pp. 204-212). Amsterdam: North Holland.

Holland, M. K., & Tarlow, G. (1972). Blinking and mental load. Psychological Reports, 31(1), 119-

127.



131

Holland, M. K., & Tarlow, G. (1975). Blinking and thinking. Perceptual and Motor Skills, 41(2), 403-

406.

Hood, J. D. (1975). Observations upon role of peripheral retina in execution of eye-movements.

ORL-Journal for OTO-Rhino-Laryngology and its Related Specialties, 37(2), 65-73.

Hooge, I. T. C., & Erkelens, C. J. (1998). Adjustment of fixation duration in visual search. Vision

Research, 38(9), 1295-1302.

Johansson, R. S., Westling, G., & Backstrom, A. (2001). Eye-hand coordination in object

manipulation. Journal of Neuroscience, 21(17), 6917-6932.

Jorna, P. G. A. M. (1992). Spectral analysis of heart rate and psychological state: A review of its

validity as a workload index. Biological Psychology, 34, 237-257.

Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall.

Kennedy, D. O., & Scholey, A. B. (2000). Glucose administration, heart rate and cognitive

performance: Effects of increasing mental effort. Psychopharmacology, 149(1), 0063.

Kessels, R. P. C., Ruis, C., & Kappelle, L. J. (2007). The impact of self-reported depressive

symptoms on memory function in neurological outpatients. Clinical Neurology and Neurosurgery,109(4), 323-326.

Kettunen, J., & Ravaja, N. (2000). A comparison of different time series techniques to analyze

phasic coupling: A case study of cardiac and electrodermal activity. Psychophysiology, 37(4), 395.

Klinger, E., Gregoire, K. C., & Barta, S. G. (1973). Physiological correlates of mental activity: Eye

movements, alpha, and heart rate during imagining, suppression, concentration, search, and

choice. Psychophysiology, 10(5), 471-477.

Klingner, J., Kumar, R., & Hanrahan, P. (2008). Measuring the task-evoked pupillary response with a

remote eye tracker. Proceedings of the 2008 Symposium on Eye Tracking Research &

Applications. New York, NY: ACM.

Kramer, A. F. (1991). Physiological metrics of mental workload: A review of recent progress. In D. L.Damos (Ed.), Multiple-task performance (pp. 279-328). London: Taylor & Francis.

Kwakkel, G. (2006). Impact of intensity of practice after stroke: Issues for consideration. Disability

and Rehabilitation, 28(13-14), 823-830.

Lawrence, S., Robert, K., Susan, S., Derek, H., & Bruce, B. (1976). Saccadic suppression of image

displacement. Vision Research, 16(10), 1185-1187.

Layne, C. (1980). Motivational deficit in depression: People's expectations × outcomes' impacts.

Journal of Clinical Psychology, 36(3), 647-652.

Leigh, R. J., & Kennard, C. (2004). Using saccades as a research tool in the clinical neurosciences.

Brain : A Journal of Neurology, 127(3), 460-477.

Leigh, R. J. (1999). In Zee D. S. (Ed.), The neurology of eye movements (3rd ed. ed.). New York:

Oxford University Press.

Lorens, S. A., & Darrow, C. W. (1962). Eye movements, EEG, GSR, and EKG during mental

multiplication. Electroencephalography and Clinical Neurophysiology, 14, 739-746.

Lynch, W. J. (2004). Determination of effort level, exaggeration, and malingering in neurocognitive

assessment. The Journal of Head.Trauma.Rehabilitation, 19(3), 277-283.

Malmo, R. B. (1965). Finger-sweat prints in the differentiation of high and low incentive.

Psychophysiology, 1(3), 231-240.



132

Matessa, M., & Remington, R. (2005). Eye movements in human performance modeling of space

shuttle operations. 49th Annual Meeting of the Human Factors and Ergonomics Society. Human

Factors an Ergonomics Society Inc.

Mitchell, D. B., & Hunt, R. R. (1989). How much "effort" should be devoted to memory? Memory &

Cognition, 17(3), 337-348.

Moffitt, K. (1980). Evaluation of the fixation duration in visual search. Perception and Psychophysics,

27(4), 370-372.

Moray, N. (1986). Monitoring behavior and supervisory control. In K. R. Boff, L. Kaufman, & J. P.

Thomas (Eds.), Handbook of perception and human performance. Wiley-Interscience.

Moray, N. (1988). Mental workload since 1979. In D. J. Oborne (Ed.), International review of

ergonomics (pp. 123-150). London: Taylor & Francis.

Mousseau, M. B. (2004). The onset and effect of cognitive fatigue on simulated sport performance.

Kinesiology Abstracts, 17(2), 33-34.

Mulder, G. (1979). Sinus arrhymia and mental workload. In N. Moray (Ed.), Mental workload: Its

theory and measurement (pp. 327-343). New York: Plenum Press.

Mulder, G. (1986). The concept and measurement of mental effort. In G. M. Hockey, A. W. K.

Gaillard, & M. G. H. Coles (Eds.), Energetics and human information processing (pp. 175-198).

Dordrecht: Matinus Nijhoff.

Mulder, L. J. M. (1992). Measurement and analysis methods of heart rate and respiration for use in

applied environments. Biological Psychology, 34(2-3), 205-236.

Myung, R., & Ryu, K. (2005). Evaluation of mental workload with a combined measure based on

physiological indices during a dual task of tracking and mental arithmetic. International Journal of

Industrial Ergonomics, 35(11), 991-1009.

Naccache, L., Dehaene, S., Cohen, L., Habert, M. O., Guichart-Gomez, E., Galanaud, D., & Willer,

J.C. (2005). Effortless control: Executive attention and conscious feeling of mental effort aredissociable. Neuropsychologia, 43(9), 1318-1328.

Navon, D. (1984). Resources--a theoretical soup stone? Psychological Review, 91, 216-234.

Neumann, D., & Lipp, O. (2002). Spontaneous and reflexive eye activity measures of mental

workload. Australian Journal of Psychology, 54(3), 174.

Noel, J. B., Bauer, K. W., & Lanning, J. W. (2005). Improving pilot mental workload classification

through feature exploitation and combination: A feasibility study. Computers & Operations

Research, 32(10), 2713-2730.

Norman, D. A., & Bobrow, D. G. (1975). Data-limited and resource-limited processes. Cognitive

Psychology, 7(1), 44-64.

Oddy, M., Cattran, C., & Wood, R. (2008). The development of a measure of motivational changes

following acquired brain injury. Journal of Clinical and Experimental Neuropsychology, 30(5), 568-

575.

O'Donnell, R., & Eggemeier, F. T. (1986). Workload assessment methodology. In K. R. Boff, L.

Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance. Wiley-

Interscience.

Ohira, H. (1996). Eyeblink activity in a word-naming task as a function of semantic priming and

cognitive load. Perceptual and Motor Skills, 82(3), 835.



133

Oohira, A., Zee, D. S., & Guyton, D. L. (1991). Disconjugate adaptation to long-standing, large-

amplitude, spectacle-corrected anisometropia. Investigative Ophthalmology & Visual Science,

32(5), 1693-1703.

Paas, F. G. W. C., Van Merrienboer, J. J. G., & Adam, J. J. (1994). Measurement of cognitive load in

instructional research. Perceptual and Motor Skills, 79(1), 419.

Papadelis, C., Kourtidou-Papadeli, C., Bamidis, P., & Albani, M. (2007). Effects of imagery training

on cognitive performance and use of physiological measures as an assessment tool of mental

effort. Brain and Cognition, 64(1), 74-85.

Pashler, H. E. (1998). The psychology of attention. Cambridge, MA: MIT Press.

Podsakoff, P. M., & Farh, J. L. (1989). Effects of feedback sign and credibility on goal setting and

task performance. Organizational Behavior & Human Decision Processes, 44(1), 45-68.

Polley, D. B., Steinberg, E. E., & Merzenich, M. M. (2006). Perceptual learning directs auditory

cortical map reorganization through top-down influences. Journal of Neuroscience, 26(18), 4970-

4982.

Porges, S. W., & Byrne, E. A. (1992). Research methods for measurement of heart-rate and

respiration. Biological Psychology, 34(2-3), 93-130.

Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32(1), 3-

25.

Pribram, K. H., & McGuiness, D. (1975). Arousal, activation, and effort in control of attention.

Psychological Review, 82(2), 116-149.

Prokasy, W. F., & Raskin, D. C. (Eds.). (1973). Electrodermal activity in physiological research. New

York: Academic Press.

Rahimi, M., Big, R. P., & Thom, D. R. (1990). A field evaluation of driver eye and head movement

strategies toward environmental targets and distracters. Applied Ergonomics, 21(4), 267-274.

Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research.Psychological Bulletin, 124(3), 372-422.

Recarte, M. A., & Nunes, L. M. (2000). Effects of verbal and spatial-imagery tasks on eye fixations

while driving. Journal of Experimental Psychology: Applied, 6(1), 31-43.

Recarte, M. A., & Nunes, L. M. (2003). Mental workload while driving: Effects on visual search,

discrimination, and decision making. Journal of Experimental Psychology: Applied, 9(2), 119-137.

Rees, L. M., Tombaugh, T. N., Gansler, D. A., & Moczynski, N. P. (1998). Five validation

experiments of the test of memory malingering (TOMM). Psychological Assessment, 10(1), 10-20.

Richards, P. M., & Ruff, R. M. (1989). Motivational effects on neuropsychological functioning:

Comparison of depressed versus nondepressed individuals. Journal of Consulting and Clinical

Psychology, 57(3), 396-403.

Riese, H. (1999). Mental fatigue after very severe closed head injury: Sustained performance,

mental effort, and distress at two levels of workload in a driving simulator. Neuropsychological

Rehabilitation, 9(2), 189.

Robert, G., & Hockey, J. (1997). Compensatory control in the regulation of human performance

under stress and high workload: A cognitive-energetical framework. Biological Psychology, 45(1-

3), 73-93.



134

Roscoe, A. H. (1992). Assessing pilot workload - why measure heart-rate, HRV and respiration.

Biological Psychology, 34(2-3), 259-287.

Rubio, S., Díaz, E., & Martín, J. (2004). Evaluation of subjective mental workload: A comparison of

SWAT, NASA-TLX, and workload profile methods. Applied Psychology: An International Review,

53(1), 61-86.

Ruff, R. M. (1985). San Diego neuropsychological test battery (manual). San Diego: San Diego

University.

Ruth, J. S., & Giambra, L. M. (1974). Eye movements as a function of attention and rate of change in

thought content. Perceptual and Motor Skills, 39, 475-480.

Salthouse, T. A. (2006). Mental exercise and mental aging: Evaluating the validity of the "use it or

lose it" hypothesis. Perspectives on Psychological Science, 1(1), 68-87.

Sanders, A. F. (1997). A summary of resource theories from a behavioral perspective. Biological

Psychology, 45(1-3), 5-18.

Sanders, A. F. (1983). Towards a model of stress and human performance. Acta Psychologica,

53(1), 61-97.

Schagen, S., Schmand, B., deSterke, S., & Lindeboom, J. (1997). Amsterdam short-term memory

test: A new procedure for the detection of feigned memory deficits. Journal of Clinical and

Experimental Neuropsychology, 19(1), 43-51.

Schooler, C. (1987). Cognitive effects of complex environments during the life-span: A review and

theory. In C. Schooler, & S. K. Warner (Eds.), Cognitive functioning and social structure over the

life course (pp. 29-49). Ablex.

Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye movements and

spatial attention. The Quarterly Journal of Experimental Psychology, 38(3), 475-491.

Sigman, M., & Coles, P. (1980). Visual scanning during pattern recognition in children and adults.

Journal of Experimental Psychology, 30, 267-276.Singer, J. L., & Antrobus, J. S. (1965). Eye movements during fantasies. Archives of General

Psychiatry, 12, 71-76.

Singer, J. L., Greenberg, S., & Antrobus, J. S. (1971). Looking with the mind's eye. Transactions of

the New York Academy of Science, 33(2), 694-709.

Sirevaag, E. J., & Stern, J. A. (1999). Ocular measures of fatigue and cognitive factors. In R. W.

Backs, & W. Boucsein (Eds.), Engineering psychophysiology: Issues and applications (pp. 267-

287). Mahwah, NJ.: Lawrence Erlbaum Associates, Inc.

Smith, R. M., & Hong, S. K. (1977). Heart rate response to breath holding at 18.6 ATA. Respiration

Physiology, 30(1-2), 69-79.

Stelmach, L. B., Campsall, J. M., & Herdman, C. M. (1997). Attentional and ocular movements.

Journal of Experimental Psychology, Human Perception, and Performance, 23(3), 823-844.

Stern, J. A., Boyer, D., Schroeder, D., Touchstone, M., & Stoliarov, N. (1994). Blinks, saccades, and

fixation pauses during vigilance task performance. 1: Time on task. Report # DOTFAAAM9426.

St. Louis, MO. Dept. of Civil Engineering; United States: Washington Univ.

Stern, J. A., Boyer, D. J., Schroeder, D. J., Touchstone, R. M., & Stoliarov, N. (1996). Blinks,

saccades; and fixation pauses during vigilance task performance: 2: Gender and time of day.

Report # DOTFAAAM969. St. Louis, MO. Dept. of Psychology; United States: Washington Univ.



135

Stern, J. A., Walrath, L. C., & Goldstein, R. (1984). The endogenous eyeblink. Psychophysiology,

21(1), 22-33.

Stern, R. M. (2001). In Ray W. J., & Quigley K. S. (Eds.), Psychophysiological recording (2nd ed.).

New York: Oxford University Press.

Sternberg, S. (1969). Memory-scanning: Mental processes revealed by reaction-time experiments.

American Scientist, 57, 421-457.

Storm, H., Fremming, A., Ødegaard, S., Martinsen, Ø G., & Mørkrid, L. (2000). The development of

a software program for analyzing spontaneous and externally elicited skin conductance changes

in infants and adults. Clinical Neurophysiology, 111(10), 1889-1898.

Surwillo, W. W., & Quilter, R. E. (1965). The relation of frequency of spontaneous skin potential

responses to vigilance and to age. Psychophysiology, 1, 272-276.

Tole, J. R., Stephens, A. T., Harris, R. L.,Sr., & Eprath, A. R. (1982). Visual scanning behavior and

mental work load in aircraft pilots. Aviation Space and Environmental Medicine,53(1), 54-61.

Tsai, Y., Viirre, E., Strychacz, C., Chase, B., & Jung, T. (2007). Task performance and eye activity:

Predicting behavior relating to cognitive workload. Aviation, Space, and Environmental Medicine,

78(5), B176-185.

Tsang, P. S., & Vidulich, M. A. (1987). Absolute magnitude estimation and relative judgment

approaches to subjective workload assessment. Proceedings of the Human Factors Society 31st

Annual Meeting.

Tsang, P. S., & Vidulich, M. A. (2006). Mental workload and situation awareness. In G. Salvendy

(Ed.), Handbook of human factors and ergonomics (3rd ed.). New York: Wiley.

Unema, P. J. A. (1995). Eye movements and mental effort. (Ph.D., Verlag Shaker).

Van der Stigchel, S., & Theeuwes, J. (2005). The influence of attending to multiple locations on eye

movements. Vision Research, 45(15), 1921-1927.

Van Orden, K. F., Jung, T. P., & Makeig, S. (2000). Combined eye activity measures accuratelyestimate changes in sustained visual task performance. Biological Psychology, 52(3), 221-240.

Van Orden, K. F., Limbert, W., Makeig, S., & Jung, T. P. (2001). Eye activity correlates of workload

during a visuospatial memory task. Human Factors, 43(1), 111-121.

Van Zandvoort, M. J. E., Kappelle, L. J., Algra, A., & De Haan, E. H. F. (1998). Decreased capacity

for mental effort after single supratentorial lacunar infarct may affect performance in everyday life.

Journal of Neurology, Neurosurgery, and Psychiatry, 65(5), 697-702.

Velichkovsky, B. M., Domhoefer, S. M., Pannasch, S., & Unema, P. J. A. (2000).

Visual fixations and level of attentional processing. Proceedings of the International Conference

Eye Tracking Research and Applications, Palm Beach Gardens, FL.

Veltman, J. A., & Gaillard, A. W. K. (1993). Measurement of pilot workload with subjective and

physiological techniques. Proceedings of Workload Assessment and Aviation Safety 3.1-3.13.

Veltman, J. A., & Gaillard, A. W. K. (1996). Physiological indices of workload in a simulated flight

task. Biological Psychology, 42(3), 323-342.

Vetlaman, J.A., & Gaillard, A.W.K. (1998). Physiological workload reactions to increasing levels of

task difficulty. Ergonomics, 41(5), 656.



136

Vickery, C. D., Berry, D. T. R., Inman, T. H., Harris, M. J., & Orey, S. A. (2001). Detection of

inadequate effort on neuropsychological testing: A meta-analytic review of selected procedures.

Archives of Clinical Neuropsychology, 16(1), 45-73.

Vidulich, M. A., & Wickens, C. D. (1986). Causes of dissociation between subjective workload

measures and performance - caveats for the use of subjective assessments. Applied Ergonomics,

17(4), 291-296.

Viirre, E., Cadera, W., & Vilis, T. (1987). The pattern of changes produced in the saccadic system

and vestibuloocular reflex by visually patching one eye. Journal of Neurophysiology, 57(1), 92-

103.

Vossel, G., & Zimmer, H. (1990). Psychometric properties of non-specific electrodermal response

frequency for a sample of male students. International Journal of Psychophysiology, 10(1), 69-73.

Walker, R., Walker, D. G., Husain, M., & Kennard, C. (2000). Control of voluntary and reflexive

saccades. Experimental Brain Research, 130(4), 540.

Washburn, D. A., & Putney, R. T. (2001). Attention and task difficulty: When is performance

facilitated? Learning and Motivation, 32, 36-47.

Wechlser, D. (1955). Wechsler adult intelligence scale. New York: Psychological Corp.

Weiner, J. M., & Ehrlichman, H. (1976). Ocular motility and cognitive process. Cognition, 4(1), 31-43.

Wickens, C. D. (1991). Processing resources and attention. In D. L. Damos (Ed.), Multiple task

performance. London: Taylor & Francis.

Wickens, C. D., & Hollands, J. G. (2000). Engineering psychology and human performance (3rd ed.).

Upper Saddle River, NJ: Prentice Hall.

Wickens, C. D. (1984). Processing resources in attention. In R. Parasuraman, & D. R. Davies (Eds.),

Varieties of attention (pp. 82-85). Orlando: Academic Press.

Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical Issues in

Ergonomics Science, 3(2), 159.

Wientjes, C. J. E. (1992). Respiration in psychophysiology: Methods and applications. Biological

Psychology, 34, 179-203.

Wierwille, W. W., & Eggemeier, F. T. (1993). Recommendations for mental workload measurement

in a test and evaluation environment. Human Factors, 35(2), 263(19)-282.

Wierwille, W. W., Rahimi, M., & Casali, J. G. (1985). Evaluation of 16 measures of mental workload

using a simulated flight task emphasizing mediational activity. Human Factors, 27(5), 489-502.

Wilson, G. F. (1991). Progress in the psychophysiological assessment of workload. Interim Report

#1. Armstrong Lab Wright-Patterson AFB, OH.

Wilson, G. F. (2002). An analysis of mental workload in pilots during flight using multiple psycho-

physiological measures. The International Journal of Aviation Psychology, 1, 3-18.

Wilson, G. F., & Eggemeier, F. T. (1991). Psychophysical assessment of workload in multi-task

environments. In D. L. Damos (Ed.), Multiple task performance (pp. 329-360). London: Taylor &

Francis.

Wu, C., Szymanski, C., & Cain, Z. (2007). Conjugated polymer dots for multiphoton fluorescence

imaging. Journal of the American Chemical Society, 129(43), 12904-12905.

Wyatt, H. J. (1998). Detecting saccades with jerk. Vision Research, 38(14), 2147-2153.



137

Xie, G., Salvendy, B. (2000). Review and reappraisal of modeling and predicting mental workload in

single- and multi-task environments. Work & Stress, 14(1), 74.

Yarbus, A. L. (1973). Eye movements and vision. New York, NY: Plenum Press.

Zhang, Y., Owechko, Y., & Zhang, J. (2004). Driver cognitive workload estimation: A data-driven

perspective. Intelligent Transportation Systems Conference. Washington, D. C.

Zijlstra, F. R. H. (1993). Efficiency in work behavior: A design approach for modern tools. Delft,

Netherlands: Delft University of Technology.



138

Appendix A – Tower and Remote (Desktop) Eye Tracker Descriptions

Note: Adapted from SR Research website: http://www.eyelinkinfo.com/fixed_main.php [Retrieved 11/26/08]



139

Appendix B - Pre-Test Questionnaire (Pilot and Full-Scale Study)

Participant #: ______

Date: _____________

1. What is your gender?

Male

Female

2. What is your age? ____

3. How do you feel today?

Better than normal

Normal

Worse than normal Much worse than normal

4. Do you have any aches or pains or other medical problems that are bothering you today?

Better than normal

Normal

Worse than normal

Much worse than normal

If yes, describe: ______________________________________________________

5. How did you sleep last night?

Better than normal Normal

Worse than normal

Much worse than normal

6. Have you had any of the following today or yesterday?

Flu or cold □ Yes □ No

Vomiting □ Yes □ No

Fever □ Yes □ No

Diarrhea □ Yes □ No

7. Did you participate in any physical activity today or yesterday? More than normal amount of physical activity

Normal amount of physical activity

Less than normal amount of physical activity

If yes, describe the activity: ______________________________________________



140

8. How has your last meal left you feeling?

Uncomfortably full or lethargic

Satisfied

Hungry

When did you eat your last meal?__________________________________________

9. Did you consume any alcohol within the last 12 hours?

No

Yes

i. What? □ wine □ beer □ other

ii. When? __________________________________________________

iii. How much? ______________________________________________

10. Did you consume more or less caffeine today than you normally do?

No

Yes

i. What?□

tea□

coffee□

energy drink □

otherii. When? __________________________________________________

iii. How much more/less than normal?____________________________

11. Do you take any psychoactive medication, such those that make you feel tired or affect

your mood (e.g., anti-depressant, ant-anxiety, anti-seizure medications)?

No

Yes

i. Please specify: ____________________________________________

12. Have you taken any illicit drugs or new medications today or yesterday (you do not need

to specify)?

No

Yes

13. Are you wearing contact lenses today?

No

Yes



141

Appendix C – Pilot Study ISI Length Histograms for Auditory Task



142

Appendix D – Full-Scale Study Post-Test Questionnaire

Participant #: ______

Date: _____________

1. You will be asked to describe your perception of the various tasks.

First, you will be asked about the difficulty of the tasks. Difficulty refers to the demands

placed on you by the task, and you have no control over them.

Next you will be asked about "how hard you tried" during the various tasks. Unlike task

difficulty, you can control your effort. However, some people feel that they have to try

harder to complete a more difficult task successfully.

Some of these questions may seem obvious or intuitive, but we are interested in hearingabout your actual experience during the experiment.

a) Which task did you find more difficult (circle one):

1’s & 2’s Subtraction 8’s & 9’s Subtraction Neither (the same)

b) Which task did you find more difficult (circle one):

Quiet Words in Noise Loud Words in Noise Neither (the same)

c) Please group the following letters into one of three groups, as you perceived them during

the word generation task: (k, b, f, t, j, m, y, q)

____________________ __________________ _________________

Less Difficult More Difficult Neutral or Don’t Remember

d) On which task did you try harder (circle one):

1’s & 2’s Subtraction 8’s & 9’s Subtraction Neither (the same)

e) On which task did you try harder (circle one):

Quiet Words in Noise Loud Words in Noise Neither (the same)



143

f) Please sort the following letters into three groups, as you perceived them during the

word generation task: (k, b, f, t, j, m, y, q)

____________________ __________________ _________________

Less Difficult More Difficult Neutral or Don’t Remember

g) How would you describe your effort level on the first half of the experiment (circle one):

I Tried My Hardest I Tried Somewhat I Tried Very Little

h) On which did you generally try harder (circle one):

First Half Second Half Neither

of Experiment of Experiment (the same)

2. During the tasks, did you consciously look or stare anywhere in particular, such as straight

ahead?

Yes No Don’t RememberIf so, why?

Did you think that you were supposed to look anywhere in particular, such as straight ahead?

In other words, did you feel like we wanted you to do this?

Yes No Don’t Remember

Did you otherwise think about your eye movements during the experiment?

Yes No Don’t Remember

3. As the experiment went on, did you notice a change in mental strategy for any of the

following types of tasks? Please explain.

a) Subtraction Task:



144

b) Word Generation Task:

c) Words in Noise Recognition Task:



145

4. (only answer these last questions if you were asked to regulate your effort between the first

and second half of the experiment)

a) Did you feel that you were able to try less during the second half of the experiment? If

you feel that your answer to this question is different for some tasks versus others, please

mention how it differs.

b) Do you think that your performance was any lower on the second half of the experiment

compared to the first? If you feel that your answer to this question is different for some

tasks versus others, please mention how it differs.



146

Appendix E – Full-Scale Study Change Score Box Plots

Plot Group 1. Change Score Box Plots for Average Heart Rate (bpm)

-8

-6

-4

-2

0

2

4

6

8


N o m i n a l t o H i g h D i f f i c u l t y

C h a n g e

( 1 s t H a l f , A l l S u b j e c t s )

-15

-10

-5

0

5

10Math Fluency Auditory

1 s t t o 2 n d H a l f C h a n g e

( N o m i n a l D i f f i c u l t y T a s k s )

-15

-10

-5

0

5

10

motivation

group

control

group

1 s t t o 2 n d H

a l f C h a n g e

( H i g h D i f f i c

u l t y T a s k s )

A

D

C

B



147

2. Change Score Box Plots for Saccade Rate (saccades/minute)

-120

-100

-80

-60

-40

-20

0

20

40

60



C h a n g e

( 1 s t H a l f , A l l S u b j e

c t s )

-80

-60

-40

-20

0

20

40

60

80

100




-80

-60

-40

-20

0

20

40

60

80

100

120

motivation

group

control

group

1

s t t o 2 n d H a l f C h a n g e

( H i g h D i f f i c u l t y T a s k s )

E F

G



148

3. Change Score Box Plots for Trial Proportion of Long ISI (percent)

-60

-40

-20

0

20

40

60



C h a n g e


-50

-40

-30

-20

-10

0

10

20

30




-80

-60

-40

-20

0

20

40

60

motivation

group

control

group

1



H

I



149

4. Change Score Box Plots for Median ISI (ms) with Outliers (|X| > 1000 ms)

-8000

-6000

-4000

-2000

0

2000

4000

6000



C h a n g e


-8000

-6000

-4000

-2000

0

2000

4000

6000

8000




-6000

-4000

-2000

0

2000

4000

6000

8000

10000

12000

14000

motivation

group

control

group

1





150

5. Change Score Box Plots for Median ISI (ms) with Outliers Present

-1000

-800

-600

-400

-200

0

200

400

600

800



C h a n g e


-800

-600

-400

-200

0

200

400

600

800




-1200

-1000

-800

-600

-400

-200

0

200

400

600

800

motivation

group

control

group

1





151

7. Change Score Box Plots for Blink Rate (blinks/minute)

-80

-70

-60

-50

-40

-30

-20

-10

0

10

20

30



C h a n g e


-40

-30

-20

-10

0

10

20

30

40

50

60

70




-40

-30

-20

-10

0

10

20

30

40

50

60

motivation

group

control

group

1





152

8. Change Score Box Plots for Average Spontaneous SCR Rate (responses/minute)

-6

-5

-4

-3

-2

-1

0

1

2

3

4



C h a n g e


-5

-4

-3

-2

-1

0

1

2

3

4




-4

-3

-2

-1

0

1

2

3

motivation

group

control

group

1





153

9. Change Score Box Plots for Number of Correct Responses (/trial**)

**Auditory task scores are multiplied by a factor of four so that they can be visualized on a common

scale with the other tasks

-18

-16

-14

-12

-10

-8

-6

-4

-2

0



C h a n g e


-10

-8

-6

-4

-2

0

2

4

6

8




-6

-4

-2

0

2

4

6

8

motivation

group

control

group

1 s t t o 2 n d H a l f C

h a n g e

( H i g h D i f f i c u l t y T

a s k s )

J



154

10. Change Score Box Plots for Number of Incorrect Responses (/trial)

-7

-6

-5

-4

-3

-2

-1

0

1

2

3



C h a n g e

( 1 s t H a l f , A l l S u b j e c

t s )

-7

-6

-5

-4

-3

-2

-1

0

1

2




-4

-3

-2

-1

0

1

2

3

motivation

group

control

group

1





155

Appendix F – Full Hypothesis Testing Battery

Average Heart Rate Saccade Rate

Difficulty mixed ANOVA: p > 0.1failed Levine’s test…

Wilcoxon: p > 0.1

Nom. ANCOVA: p > 0.1ANCOVA: p > 0.1

Kolmogorov-Smirnov: p > 0.1

M a t h T a s k

M o t i v a t i o n

High ANCOVA: p > 0.1ANCOVA: p > 0.1


Difficultymixed ANOVA: p = 0.010(half*diff*group p = 0.044;

diff effect stronger in 2nd

half, controls)

failed Levine’s test…Wilcoxon: p = 0.10

Nom. ANCOVA: p > 0.1

failed Levine’s test…

Kolmogorov-Smirnov: p > 0.1ANCOVA: p > 0.1

F l u e n c y

T a s k

M o t i v a t i o n

High ANCOVA: p = 0.002failed Levine’s test…


Difficultyvar. ratio > 2; unbalanced groups…

Wilcoxon: p < 0.001mixed ANOVA: p < 0.001

mixed ANOVA: p = 0.004(diff*group p = 0.036; largereffect in motivation group)

Wilcoxon: p = 0.014

Nom.var. ratio > 2; unbalanced groups…


ANCOVA: p > 0.1Kolmogorov-Smirnov: p > 0.1

A u d i t o r y T a s k

M o

t i v a t i o n

High

var. ratio > 2; unbalanced groups…


ANCOVA: p > 0.1Kolmogorov-Smirnov: p = 0.10



156

...continued next page...

Notes:Notes:Notes:Notes:

• boldboldboldbold font denotes significant findings (before any multiple comparisons correction)

• unless otherwise noted, comparison groups all passed tests of assumptions for parametric

tests used

• parametric and non-parametric test results given where data is suspected to not fit

parametric assumptions, though it does not fail appropriate tests

• difficulty effects calculated based on 1st half data only (from all subjects, regardless of

group); did not test for interaction effects of difficulty*group or difficulty*half, unlessautomatically tested through the mixed ANOVA method

• explicitly tested for motivation*difficulty effect (by dividing comparisons into high and

nominal difficulty level) because splitting data into nominal and high difficulty levels was

necessary for non-parametric analysis methods

• total number of task difficulty comparisons: 12; starting Holm-Bonferroni significance

criterion, α = 0.004 (two-tailed)

Trial Portion Long ISI Correct Response Count

Difficultymixed ANOVA: p > 0.1

Wilcoxon: p > 0.1

failed Shapiro-Wilks… Wilcoxon: p < 0.001

mixed ANOVA: p < 0.001

Nom.ANCOVA: p > 0.1Wilcoxon: p > 0.1

failed Shapiro-Wilks… Kolmogorov-Smirnov: p > 0.1

ANCOVA: p = 0.070 M a t h T a s k

M o t i v a t i o n

HighANCOVA: p > 0.1Wilcoxon: p > 0.1

failed Shapiro-Wilks…Kolmogorov-Smirnov: p > 0.1

ANCOVA: p > 0.1

Difficulty failed Levine’s

and Shapiro-Wilks…Wilcoxon: p > 0.1



Nom.failed Levine’s and Shapiro-Wilks…


failed Shaprio-Wilk’s &unequal slopes (ANCOVA assumption)…

Kolmogorov-Smirnov: p = 0.10

F l u e n c y T a s k

M o t i v a t i o n

Highfailed Levine’s and Shapiro-Wilks…

Kolmogorov-Smirnov: p > 0.1failed Shapiro-Wilks…


Difficultymixed ANOVA: p = 0.004

Wilcoxon: p = 0.018



Nom.ANCOVA: p > 0.1



ANCOVA: p > 0.1

A u d i t o r y T a s k

M o t i v a t i o n

HighANCOVA: p > 0.1

Kolmogorov-Smirnov: p = 0.10


ANCOVA: p > 0.1



• total number of motivation comparisons: 24; starting Holm-Bonferroni significance

criterion, α = 0.002 (two-tailed)

Thesis Draft Mar 30_b&w

Documents

Transcript of Thesis Draft Mar 30_b&w