Extensions and applications of stochastic accumulator models in attention and decision ... · 2013....

Extensions and applications of

stochastic accumulator models in

attention and decision making

Samuel F. Feng

A Dissertation

Presented to the Faculty

of Princeton University

in Candidacy for the Degree

of Doctor of Philosophy

Recommended for Acceptance

by the Program in

Applied and Computational Mathematics

Adviser: Philip J. Holmes

November 2012

c© Copyright by Samuel F. Feng, 2012.

All Rights Reserved

Abstract

The research presented in this thesis is a collection of applications and extensions of

stochastic accumulator models to various areas of decision making and attention in

neuroscience.

Ch. 1 introduces the major techniques and experimental results that guide us

throughout the rest of the thesis. In particular, we introduce and define the leaky,

competing accumulator, drift diffusion, and Ornstein-Uhlenbeck models.

In Ch. 2, we adopt an Ornstein-Uhlenbeck (OU) process to fit a generalized version

of the motion dots task in which monkeys are now faced with biased rewards. We

demonstrate that monkeys shift their behaviors in a systematic way, and that they

do so in a near optimal manner. We also fit the OU model to neural data and find

that OU model behaves almost like a pure drift diffusion process. This gives further

evidence that the DDM is a good model for both the behavior and neural activity

related to perceptual choice.

In Ch. 3, we construct a multi-area model for a covert search task. We discover

some new trends in the data and systematically construct a model which explains

the key findings in the data. Our model proposes that the lateral intraparietal area

(LIP) plays an attentional role in this covert search task, and suggests that the two

monkeys used in this study adapted different strategies for performing the task.

In Ch. 4, we extend the model of noise in the popular drift diffusion model (DDM)

to a more general Lévy process. The jumps introduced into the noise increments

dramatically affect the reaction times predicted by the DDM, and they allow the

pure DDM to reproduce fast error trials given unbiased initial data, a feature which

other models require more parameters to reproduce. The model is fit to human

subject data and is shown to outperform the extended DDM in data containing fast

error reaction times.

In Ch. 5, we construct a model for studying capacity constraints on cognitive

iii

control using the DDM as a generalized model for a task. After studying various

aspects of the constructed model, large scale simulations demonstrate that a severe

capacity constraint does indeed arise out of the need for optimizing overall rewards.

The thesis concludes with some summarizing remarks in Ch. 6.

iv

Acknowledgements

First I would like to thank my thesis adviser Philip Holmes for his intellectual and

personal guidance over these past years. Your candor, integrity, and approach to life

have inspired me more than you realize. I am blessed to have had the opportunity to

be your student.

It has also been a pleasure to work with my collaborators Alan Rorie, William

Newsome, Sam Gershman, and Jonathan Cohen. I am particularly grateful to Alan

Rorie and William Newsome – when we collaborated several years ago I had no idea

how fortunate I was to work with you.

In a category by himself is Michael Schwemmer, both as a friend and a colleague.

Fitting that data was a pain! Your drive and work ethic are infectious, and I will

miss your jokes. I wish you blessings as you move on in your career.

I would also like to thank Carlos Brody and Eric Shea-Brown for taking time to

read this thesis.

I am most thankful for the various friends and family who have made Princeton

my new home. I thank my parents Pen and Janet Feng for their unending love and

support. I thank Philip Eckhoff, Adam Hincks, Arie Israel, Richard Jordan, Jun

Kitagawa, and Ross Willford for their friendship as roommates over the years. I

thank Westerly Road Church for their Christ-centered prayers and encouragement. I

thank my new wife Siyi for being at my side during the highs and lows of graduate

study, and for focusing me on the more important things of life. You remind me of

Christ’s sacrificial love more than anyone else I know.

And finally, all thanks be to God. You are good, your love endures forever.

v

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1 Introduction to modeling evidence accumulation in two alternative

forced choice (2AFC) tasks 1

1.1 Experimental background of 2AFC perceptual tasks . . . . . . . . . . 2

1.1.1 A crash course in visual processing . . . . . . . . . . . . . . . 3

1.1.2 LIP’s role in perception and attention . . . . . . . . . . . . . . 6

1.2 Building up: constructing drift diffusion (DD) processes via statistical

inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 A simple 2AFC statistical inference task . . . . . . . . . . . . 12

1.2.2 The Neyman-Pearson lemma and the sequential probability ra-

tio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.2.3 The continuum limit of the SPRT . . . . . . . . . . . . . . . . 20

1.2.4 Relevant experimental results concerning drift diffusion pro-

cesses and optimality . . . . . . . . . . . . . . . . . . . . . . . 22

1.3 Working down: Reducing a biologically plausible model to an OU process 24

1.3.1 From spiking neurons to the Leaky Competing Accumulator

model (LCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3.2 Reduction of LCA to an Ornstein-Uhlenbeck (OU) process . . 27

1.3.3 Working with the OU and DDM models . . . . . . . . . . . . 29

vi

1.4 Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.5 Appendix: Two popular models for decision making . . . . . . . . . . 31

1.5.1 The Extended DDM of Ratcliff . . . . . . . . . . . . . . . . . 32

1.5.2 The Linear Ballistic Accumulator Model . . . . . . . . . . . . 33

2 Can monkeys choose optimally when faced with noisy stimuli and

unequal rewards? 35

2.1 Unequal rewards in the motion dots task . . . . . . . . . . . . . . . . 36

2.2 Predicting psychometric functions (PMFs) with an accumulator model 38

2.2.1 Two approaches for modeling biased rewards: shifted initial

conditions vs. persistent reward signals . . . . . . . . . . . . . 41

2.2.2 Fixed time interrogation reaction times cannot distinguish the

form for integrated drift and noise . . . . . . . . . . . . . . . . 43

2.2.3 Examples of psychometric functions . . . . . . . . . . . . . . . 45

2.3 Optimality analysis of a two parameter psychometric function . . . . 47

2.3.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . 47

2.3.2 Blocks with mixed stimuli: a continuum of coherences . . . . . 49

2.3.3 Blocks with mixed stimuli: finite sets of coherences . . . . . . 51

2.4 Applying the model to experimental data . . . . . . . . . . . . . . . . 53

2.4.1 PMF fits to data averaged over multiple sessions . . . . . . . . 53

2.4.2 How close are the animals, on average, to optimal performance? 57

2.4.3 Variability of behaviors in individual sessions . . . . . . . . . . 61

2.5 Fitting the OU process to the LIP neural data . . . . . . . . . . . . . 63

2.6 Discussion of results and future directions . . . . . . . . . . . . . . . 67

3 Modeling a covert visual search task with a multi-area stochastic

model 71

3.1 Data analysis of the covert search task . . . . . . . . . . . . . . . . . 72

vii

3.1.1 LIP encodes target location, limb preference, set-size effect, and

cue-hemifield congruence . . . . . . . . . . . . . . . . . . . . . 75

3.1.2 Accuracy vs reaction time . . . . . . . . . . . . . . . . . . . . 81

3.1.3 Searching for a target . . . . . . . . . . . . . . . . . . . . . . . 82

3.2 A multi-area model for the covert search task . . . . . . . . . . . . . 85

3.3 Fitting methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.4 Model fits and results . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.5 Discussion and concluding remarks . . . . . . . . . . . . . . . . . . . 100

3.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.6.1 Search analysis computations . . . . . . . . . . . . . . . . . . 102

3.6.2 Model equations . . . . . . . . . . . . . . . . . . . . . . . . . 103

4 Changing the noise in the DDM: jumpy noise can be good 109

4.1 Lévy processes as models for 2AFC reaction times . . . . . . . . . . . 111

4.2 Fitting the Simen et al. behavioral data . . . . . . . . . . . . . . . . 115

4.3 Closing discussion on the jump DDM . . . . . . . . . . . . . . . . . . 120

5 Multitasking vs. Multiplexing: Uncovering capacity constraints on

cognitive control 122

5.1 Stroop as a model for cognitive control . . . . . . . . . . . . . . . . . 124

5.2 Defining many parallel tasks with drift diffusion processes . . . . . . . 127

5.3 Multitasking model equations and key simplifications . . . . . . . . . 128

5.4 Control as reward maximization . . . . . . . . . . . . . . . . . . . . . 133

5.4.1 Free response: maximizing reward rate and optimizing thresholds134

5.4.2 Interrogation: reward rates based solely on accuracy . . . . . . 135

5.4.3 Scaling the overall reward rates . . . . . . . . . . . . . . . . . 136

5.5 Methods for studying capacity constraints . . . . . . . . . . . . . . . 137

5.5.1 Input values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

viii

5.5.2 Incongruency . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5.5.3 Network connectivity and fan out . . . . . . . . . . . . . . . . 139

5.5.4 Simulating capacity constraints on cognitive control . . . . . . 141

5.6 Model behavior in simple cases . . . . . . . . . . . . . . . . . . . . . 142

5.6.1 Two inputs, one output . . . . . . . . . . . . . . . . . . . . . 143

5.6.2 Two inputs, two outputs . . . . . . . . . . . . . . . . . . . . . 143

5.6.3 Maximizing overall drift rate . . . . . . . . . . . . . . . . . . . 147

5.7 Model simulation results . . . . . . . . . . . . . . . . . . . . . . . . . 150

5.7.1 Full model simulation, 10 pathways . . . . . . . . . . . . . . . 152

5.7.2 Choosing values for peripheral parameters . . . . . . . . . . . 153

5.7.3 Capacity constraints for larger networks . . . . . . . . . . . . 157

5.8 Key conclusions and experimental remarks . . . . . . . . . . . . . . . 159

5.9 Appendix: implicit threshold optimization . . . . . . . . . . . . . . . 162

6 Closing Remarks 164

7 Bibliography 169

ix

1 Introduction to modeling evidence

accumulation in two alternative forced

choice (2AFC) tasks

The common thread throughout this entire thesis is the use of a simple stochastic

model for decision making called the drift diffusion model (DDM). This model has

grown in popularity over the past several decades, and dozens of scientific papers

have been published using it to model and explain various phenomenon in the world

of perceptual decision making. The contributions of this thesis are to generalize and

apply the DDM to new situations, and to show that using the DDM (and closely

related accumulator models) is still a valuable technique in understanding simple

decision processes. The DDM provides the right amount of complexity to reproduce

desired behavioral properties and illuminate neural mechanisms while at the same

time remaining tractable enough for mathematical analysis and efficient computation.

The overarching message of the research presented in chapters 2-5 is simple: models

based on the drift diffusion process are useful tools for computational neuroscientists

interested in modeling perceptual decisions, and their potential is still not fully realized.

Each chapter of this thesis carries the DDM (or a close relative) in a new direction by

either generalizing its dynamics or using it as a critical element of a modeling effort

to describe some neural or psychological phenomenon.

The main purpose of this chapter is to survey the mathematical and experimental

1

background which has contributed to the success of drift diffusion as a stochastic

model for perceptual decisions. We have been careful to cite articles and resources

whenever appropriate – in some sense, this introduction represents the type of docu-

ment that might aid a new student beginning research in modeling perceptual decision

making.

In §1.1 we recap the body of experimental work which leads to the experimental

data analyzed in chapters 2, 3, and 4. In §1.2 we present a systematic construction of

the drift diffusion model starting from a basic statistical inference task. This section

is written in a more pedagogical manner, although the reader is referred to outside

sources for proofs of the longer results. In §1.3 we give a second account of the DDM

by reviewing a series of computational results over the past decade which demonstrate

how biologically-based models of spiking neurons can be reduced to the DDM. Finally

in §1.4 we give a more detailed description of the rest of this thesis. In the appendix

after this chapter we review two other popular models for decision making.

1.1 Experimental background of 2AFC perceptual

tasks

In this section we present a basic overview of the structures of the mammalian brain

that are implicated in decisions, actions, and choice. The experimental background

and history presented here should communicate a feel for the major lines of thought

that comprise the current understanding of our visual system and how our brains make

simple perceptual decision. Such an exposition is included not only for completeness,

but also in order to properly frame the modeling efforts which form the backbone

of this thesis. We have taken particular care in each chapter to connect our results

with experimental data, and all models have been methodically constructed with the

experimental literature in mind. Our hope is that this survey paints a broad picture

2

of this literature and the various experimental phenomenon that motivate the rest of

this thesis.

We begin with a brief overview of the visual processing system in §1.1.1. The

contents of §1.1.1 can be found in any good neuroscience textbook (e.g., [KSJ00,

BCP06]) or in the expository sections of several papers, including [Sch01, MC01,

DD95]. A more detailed overview of the first several decades of results concerning

visual processing in monkeys can be found in [MN87]. §1.1.2 then recounts the more

specific history of the lateral intraparietal area (LIP) as an area involved in perceptual

decision making. We trace two main lines of experiments: LIP as an area which

correlates with planned eye movements and LIP as an area which carries attentional

signals.

1.1.1 A crash course in visual processing

Some of the most complex tasks performed by humans involve decisions of varying

timescales and complexity. Not surprisingly, neural correlates of deciding, choosing,

and acting occur in numerous areas of the mammalian brain. We focus mainly on

decisions made on the basis of visual evidence – we call these perceptual decisions.

Vision begins at the photoreceptor cells in the retina, and the majority of the con-

sequent signals travel along the optic nerve to the lateral geniculate nucleus (LGN)

in the thalamus. The optic nerve also carries signals from the retinal ganglion cells

to the suprachiasmatic nucleus and pretectal nucleus, areas which are involved with

sleep and reflective eye movements, respectively. The LGN serves as a sensory relay

station by passing the visual stimulus to the primary visual cortex (V1), which is

located at the back of the brain in the occipital lobe1. The lateral geniculate nucleus

also serves as a processing station for input from several other sections of the cortex.

1Each hemisphere of the cerebral cortex is divided into four major lobes (frontal, parietal, tem-poral, occipital) by the folds and bumps which are characteristic of the surface of the brain. SeeFig. 1.1.

3

V1/V2

V4

MT/MST

IT

PPC

PFC

FEF

Figure 1.1: Figure reproduced and adapted from [Gra18]. The four major lobes of thebrain are color coded, with the frontal lobe in blue, the parietal lobe in yellow, thetemporal lobe in green, and the occipital lobe in pink. Signals from the retina travel,via the lateral geniculate nucleus, into area V1 of the visual cortex, and then travelalong one of two streams. Along the ventral stream, signals travel from V1 to V2 andV4 into the inferior temporal cortex (IT). The dorsal stream carries signals from V1to V2 into area MT and into the posterior parietal cortex (PPC). Both the ventraland dorsal streams contain projections to and from the prefrontal cortex (PFC). Alsonote the frontal eye field (FEF) which is a region of the anterior part of the PFC.

Area V1, within the visual cortex, is generally considered the first stage of visual

processing2, consisting of neurons with small receptive fields which form a precise

topographic covering of the entire field of view. The neurons in V1 may be tuned

to specific visual elements such as orientation, stereoscopic depth, and color. Out-

puts from the primary visual cortex project to secondary and tertiary areas which

themselves project to other visual areas in the parietal and temporal lobes.

2Actually the retina itself has been shown to perform a considerable amount of processing andadaptation to visual stimuli. For example, ambient light levels may vary several orders of magnitude,but the retinal ganglion cells have been shown to adapt to both image contrast (the range of lightintensities) and spatial correlations even when mean intensity is fixed [SBW+97]. Retinal ganglioncells have also been shown to adapt to moving stimuli and anticipate trajectories of moving objects[BBJM99]. See [GSBH09, SSSB11] for some more recent results and modeling work on processingin retinal ganglion cells.

4

From here, visual processing is organized into two main streams, called the dorsal

stream and ventral stream. In the ventral stream, signals pass from V1 through visual

areas V2 and V4 into the inferior temportal (IT) cortex. Neurons in the posterior IT

are tuned for stimulus features such as color or shape, whereas anterior IT neurons

are tuned for complex features like faces. In the dorsal stream, signals pass from V1

through the middle temporal (MT) area into the posterior parietal cortex. Neurons in

area MT respond to stimuli moving in specific directions, and lesion studies support

the idea that the signals carried by MT are important in planning eye movements

in perceptual tasks ([BB05] and see §1.1.2 below). Neurons in posterior parietal

cortex modulate visual responses related to the orientation of stimuli, and play an

important role in producing planned movements. The posterior parietal cortex has

also been shown to contain neurons which correlate with saccadic and limb movements

[DHP12]. Much of the output from the posterior parietal cortex then projects to the

frontal motor cortex, which is indicative of its role in response planning.

The posterior parietal cortex will appear in several places in this thesis. The

experiments studied in chapters 2 and 3 examine the lateral intraparietal area (LIP),

which is located within the intraparietal sulcus within the posterior parietal cortex,

which itself lies along the dorsal stream [RGMN10, OSBG06]. LIP’s role in attention

and planned saccadic eye movements is an area of active research, as we will see in

§1.1.2 below.

The two streams at this point converge by passing to the prefrontal cortex (PFC),

which has been linked to complex cognitive elements such as personality and task

representations [MC01]. We will study these task representations in Ch. 5, and give

an account for how overlap between them may explain certain limitations on cognitive

control capacity. One particular structure to note within the PFC is the frontal eye

field (FEF), which lies at the top of the PFC. The FEF is intimately linked with the

location of salient stimuli. Indeed, studies have shown that electrical stimulation of

5

the FEF evokes saccades, and that signals in FEF correlate strongly with saccades

to visual targets and may reflect the outcome of an automatic visual selection pro-

cess [Sch04, Sch02]. Numerous studies have also linked FEF to covert attention and

visual salience3 [KMK97, TB05, MTT08], although there is some contrary evidence

which shows that certain FEF neurons do not respond when monkeys perform tasks

that require attention with no eye movement [GB81]. These are still areas of active

research.

This description of the visual system as two cortical pathways was first presented

by Mishkin et al [MUM83] and is almost certainly a gross oversimplification of the

actual neural architecture behind how mammals formulate decisions and produce

choices based on visual information. The illustration is, however, very useful in

demonstrating that the areas with which we are concerned are only components of

the entire apparatus at hand. Our focus primarily lies along the dorsal stream which

contains the regions specifically linked to the formation of perceptual decisions and

attention. Among these regions, we are particularly interested in area LIP.

1.1.2 LIP’s role in perception and attention

In this subsection we trace the lineage of results which led to the experiments analyzed

in Ch. 2 and Ch. 3 [RGMN10, OSBG06]. This section is entirely focused on 2AFC

tasks, most of which are also aimed towards uncovering LIP’s role in decision making

and attentional tasks. We will first discuss a well known random dots motion task,

and the idea that LIP is responsible for the planning of eye movements. Subsequently

we discuss a more recent series of experiments focused on LIP’s role in attention.

3Overt attention occurs when a sensory system, such as the eye, orients itself to a target (e.g.saccade to a visual target). Covert attention occurs when we mentally focus on one of several possiblesensory stimuli.

6

LIP and the planning of eye movements in motion dots tasks

In this section, we hope to communicate a feel for why the random dots motion

stimulus is so useful for perceptual decision tasks, and how the study of intended

saccades (eye movements) has become a primary area of study in visual perception

and decision making. A good but older review of experiments implicating LIP as a

center for processing eye movements can be found in [ABM92]. It all begins with

the posterior parietal cortex, which contains LIP and has long been connected to the

processing of eye movements. In the early 1900s, [B0́9] showed that bilateral damage

to the posterior parietal cortex resulted in human subjects being unable to fixate their

eyes and unable to reach/grab objects. Later on in the 60s and 70s it was demon-

strated that electrical stimulation of the posterior parietal cortex produced saccades

[FC55, Ben64], and that lesions resulted in deficits in saccades [Sun79, LMTY77], al-

though the effects of lesions were not properly quantified until the 80s [LM89, KG88].

In the 70s, Andersen et al [AAC85] first identified and named area LIP after retro-

grade tracers in the frontal eye field and dorsolateral prefrontal cortex revealed strong

connections predominantly within the lateral bank of the intraparietal cortex. This

major finding was performed following key observations from Mountcastle and Lynch

[MLG+75, LMTY77], who first reported cells selective for saccades in the inferior

parietal lobule. Following this, several electrophysiological experiments showed that

most LIP cells were related to eye movements, with many responding before saccades

[BBF+91, ABB+90, GA88]. These experiments established LIP as a key region where

we might find signals regarding response preparation, intention, and attention. LIP’s

established position between the input sensory processing regions and the output mo-

tor execution (i.e. saccade) regions is a primary reason why it is so heavily studied

today.

The random dots motion task used in the experiment analyzed in Ch. 2 also takes

advantage of a series of results related to the visual processing of motion stimuli

7

along the dorsal stream. It has been established that area MT and the medial supe-

rior temportal area (MST) contain the predominant signals relevant to the cortical

analysis of visual motion. These regions lie along the dorsal stream, and within the

dorsal stream there seems to be a more specialized pathway of areas that are used

to discriminate visual motion information. The key feature shared by areas along

this specialized pathway is a high proportion of neurons selective to motion direction,

which we will henceforth call directionally sensitive for brevity. This pathway begins

in a sub-region of V1 called layer 4B, where experiments in the 70s and 80s discovered

a higher percentage of directionally sensitive cells relative to other sub-regions of V1

[Dow74, BF84, LH84, Mic85]. This region projects to area MT where over 80% of the

neurons are directionally selective [DZ71, MVE83, Alb84]. Another region considered

to lie along the dorsal stream, area V3, has also been found to consist of about 40%

directionally selective neuron[FVE87], and that area MT also receives some indirect

input from V1 through V2 and V3 [Mv83]. From MT there are many neuronal pro-

jections to area MST, which also contains a high percentage of directionally sensitive

neurons [VEMB81, DU86, THS+86].

In [NP88], Newsome and Pare established the connection between a dynamic

random dot display and the relevant signals in MT which represent the dynamic

perception of motion. Over the next few years, [BSNM92, BNS+96, SN96, SMBN92,

CA99, CN95] established that MT and MST do indeed contain the relevant signals for

formulating a decision based on visual information collected from the random dots

motion stimulus used in experiments like that of Rorie et al [RGMN10] which we

study in Ch. 2. These neurons are tuned so that if one computes averaged firing rates

over many trials, this averaged firing rate smoothly varies according to the amount

of energy in a band of velocities. This enabled experimenters to control the level of

difficulty of the perceptual tasks, and established the neural correlate which represents

this change in the amount of perceived motion. Still, there is ongoing research aimed

8

towards uncovering the complex and intertwined interactions among V1, MT, and

MST, and how signals from these areas are combined to form inputs to LIP, FEF and

other downstream visual areas. In addition to anatomical and electrophysiological

studies carrying on the work cited above [BW10, PHP+11], some studies like [PB00]

use cleverly constructed visual illusions to probe relationships among these areas.

The 2AFC random dots motion task was first used in its current form in [SBNM96,

SN01], where Shadlen et. al recorded from LIP during a 2AFC motion dots discrim-

ination task. Here monkeys had to indicate the direction of motion of coherently

moving dots embedded within a field of randomly moving dots via a saccade to a

fixed target along the direction of motion. LIP neurons were first identified using

the memory saccade task [HW83], in which monkeys are first required to fixate on a

certain point for about 100 msec. Then a second stimulus flashes pseudo-randomly

for about 300 msec at one of several locations in the monkey’s peripheral vision, while

the monkey maintains fixation. After a short delay period ( 400-700 msec), the mon-

key is required to saccade from the fixation point to the remembered location of the

flashed target, and the cell is identified with that particular receptive field if it demon-

strated persistent activity during the delay period. This procedure is commonly used

to identify certain cells to be studied in LIP.

Shadlen et al. identified cells using this memory saccade task, and oriented the

visual stimulus of their task so that one of the saccade targets was situated within the

recorded cell’s receptive field. They discovered that these LIP neurons exhibited firing

rates which predicted the saccadic eye movement that the monkey would make at the

end of the trial. We display averaged LIP recordings from Shadlen et al. [SN01] in

Fig. 1.2 as an illustration of the remarkable phenomenon. The experiment performed

by Rorie et al. which we have modeled and analyzed in Ch. 2 is an extension to the

experiment of Shadlen et al [SN01].

9

Figure 1.2: The average firing rates from 104 LIP neurons during the direction dis-crimination task of Shadlen et al. [SN01]. Solid and dashed curves are from trials inwhich the monkey judged direction toward and away the receptive field, respectively.Only correct trials are displayed. The various colors represent different strengthsof the random-dot motion, and we can see that the time course and magnitude ofresponse are affect by stimulus strength. Figure taken from Fig. 8 of [SN01].

LIP and the control of attention

The results in the previous subsection have established that LIP plays a role in the

execution of saccades during perceptual decision tasks, and in particular the random

dots motion task. However, the exact nature of LIP’s contribution to the processing

of visual information is still not fully understood. In the late 1970s it was discovered

that parietal neurons can also exhibit activity during fixation when a visual stimulus

appeared in a neuron’s receptive field, which indicated that these neurons may have a

role beyond the planning of saccades [RGS78]. Furthermore, when the visual stimulus

was made behaviorally relevant, these visual neurons exhibited even further elevated

activity, suggesting an attentional role for parietal cortex [RGS78, GB81].

Since then, several studies have demonstrated neural correlates for attention in

10

LIP [CG99, CDG96, RBK95, GG99, GKG98, PG00, BG03, GBP+02, Mes99]. In

particular, Gottlieb et al demonstrated that LIP cells can exhibit little or no increased

response to stimuli in their receptive fields unless the stimuli are behaviorally relevant

[GKG98]. Gottlieb and Goldberg also discovered that in an anti-saccade task, where

monkeys were required to saccade away from a visual target, the majority of LIP

neurons encoded visual stimulus location instead of the target of the intended saccade

[GG99]. It has also been established that chemical inactivation of LIP produces

deficits in both saccade target selection and covert attention[WOD02, WOD04], and

that a similar chemical inactivation can also produce deficits in attention [BG09], but

without entirely compromising task performance.

These results implicating LIP in attention and visual salience led Oristaglio et. al

to construct a task which involved a covert search for a visual target and a nonsaccadic

motor response [OSBG06]. Here both covert attention and a response was required,

and furthermore the motor response is dependent upon successfully detecting one

of two possible targets in an array of distractors, while maintaining fixation. The

recorded signals from LIP demonstrate interesting and puzzling activity which has

not been fully understood. In Ch. 3 we attempt to construct a model which fits this

data (along with that from [BOSG08]) in order to determine if this relatively complex

experiment helps us better understand LIP’s role in attention and perception.

Much of the work presented earlier in this section suggest that parietal neurons

in area LIP encode saccade motor decisions [SN01, NP88, SBNM96]. There is also

evidence that LIP carries signals of attention, or perceptual selection, which are

independent of the metrics, modality and reward of the required response [GB10].

In fact, Gottlieb et al. suggest that the attentional responses seen in LIP represent

a distinct type of decision that assigns value to sources of information rather than

specific actions.

This thesis is not primarily concerned with stating the precise role of LIP as a

11

either a center of attention or saccade motor decisions. Rather, we are interested

in how accumulator models may shed light on this discussion, as well as in other

areas of cognitive psychology and neuroscience. We demonstrate in several cases

that the computational and mathematical tools surrounding drift diffusion processes

(and closely related OU processes) can and should be used in understanding neural

activity and animal/human behaviors in these decisions tasks. Next, we will begin

our presentation of these tools by deriving the drift diffusion process from a simple

statistical inference task.

1.2 Building up: constructing drift diffusion (DD)

processes via statistical inference

Probably the most common way to understand the diffusion equation (Eq. 1.19

below) is seen in undergraduate physics and chemistry courses when one encounters

molecular diffusion of a chemical compound or gas. There one imagines particles in

an animated and irregular state of motion which is caused by frequent impacts of

particles on each side and/or on the surrounding medium, normally a fluid. Here

we consider a less physical basis, and instead derive the diffusion equation by first

trying to solve a very simple statistical inference task. Following this approach, we

build an alternative understanding of diffusion processes based upon the probabilistic

accumulation of evidence for two available choices. Much of the work presented in

this section and the next derives from the presentations in [BBM+06] and [GS07],

although a few of the mathematical details have been reworked.

1.2.1 A simple 2AFC statistical inference task

The most natural language with which to precisely describe decision making is that

of probability and statistics. This thesis only explicitly deals with decision processes

12

which solve two alternative forced choice tasks, or 2AFC for short4. These tasks

require a choice between two hypothesis h1 and h2, each of which represents a property

of the world that may be true or false (e.g. the array of moving dots mentioned in

§1.1.2 are moving left or right). 2AFC tasks are also considered forced in that the

subject cannot elect to not respond (this is typically enforced by penalizing the subject

more severely for a non response than even an incorrect response would have produced,

e.g. inflicting a time out in addition to lack of reward). In performing these tasks, we

also consider two distinct paradigms for how the subject must collect evidence and

respond: the free response and interrogation paradigms (in this context we will at

times use the words “paradigm” and “protocol” interchangeably). In the interrogation

paradigm, subjects are required to respond at a fixed deadline. This implies that

subjects have a fixed time epoch for which they may accumulate evidence, and that

their decisions must be made based on whatever evidence is collected during this

fixed amount of time. In the free response paradigm, subjects are allowed to respond

whenever they please. This implies that subjects need to determine the proper speed-

accuracy trade-off, because faster response times will increase the potential rate of

rewards but also typically decrease accuracy.

Before any evidence accumulation for either h1 or h2 takes place, however, one

must first consider the prior probabilities P(h1),P(h2). These represent the probabil-

ities that either hypothesis is true before obtaining any evidence. For the sensory-

motor tasks described above (e.g. moving dots task), the prior typically represents

the predicted probability of seeing a particular stimulus on the upcoming trial (e.g.

probability that subject is presented with left-moving dots), which may be explicitly

communicated to the subject or inferred from the relative frequency of a particular

hypothesis on previous trials. In Ch. 2, we analyze a case where the amount of reward

for a particular hypothesis is modulated trial-by-trial; indeed we find that manipu-

4Others often write NAFC for the N alternative version

13

lating rewards in this way effects monkey behavior in a manner similar to shifting

prior probabilities. Also, in Ch. 4 we fit some behavioral data collected by Simen et

al. [SCB+09] where these prior probabilities were manipulated and human subjects

were required to take this into account.

The subject, given his priors, now collects evidence from a stimulus and combines

all of his information (priors, evidence, expected rewards rewards) into a quantity

which we call the decision variable. Using this, the subject formulates a decision rule

which determines from the decision variable when and how to respond.

Throughout the past several decades, the most successful conceptual framework

with which one may study 2AFC tasks is signal detection theory [GS66]. In its

simplest form, the subject obtains one unit of noisy evidence e which is extracted

from an experimentally controlled stimulus. That is, the experimenter presents a

stimulus corresponding to either h1 or h2, and the subject obtains some noisy evidence

e by observing the experimental event. The subject now needs to determine which of

two conditional probability distributions P(e|h1) or P(e|h2) gave rise to the observed

evidence e.

In the case of two alternatives studied here, a simple choice for the decision variable

is the ratio between the two relevant likelihoods P(e|h1) or P(e|h2):

l12(e) =P(e|h1)P(e|h2)

. (1.1)

The quantity l12 is often referred to as the likelihood ratio. The subject now bases his

decision on the likelihood ratio l12 by choosing some threshold β, and elects to choose

h1 if l12 ≥ β and h2 if l12 < β (if l12 = β, we say that the subject arbitrarily chooses

h1).

The critical question is: how do we choose β? Suppose we wish to maximize the

14

overall accuracy and that P(h1) = P(h2). The overall accuracy is written as

P(H1|h1) + P(H2|h2) , (1.2)

where H1 (H2) is the event that the subject chooses hypothesis h1 (h2). Since subjects

are forced to respond, P(H1|h2)+P(H2|h2) = 1 (also P(H1|h1)+P(H2|h1) = 1), which

means that maximizing Eq. 1.2 is equivalent to maximizing

P(H1|h1)− P(H1|h2). (1.3)

If we designate a region Λ as all of the evidences (point events) which lead to the

acceptance of h1, then the probability that h1 is accepted when h1 is true can be

written as

P(H1|h1) =∑

e∈ΛP(e|h1) , (1.4)

and the probability of an incorrect acceptance of hypothesis h1 is

P(H1|h2) =∑

e∈ΛP(e|h2). (1.5)

The task of maximizing Eq. 1.2 is now the task of selecting a set of points Λ so that

we maximize∑

e∈ΛP(e|h1)− P(e|h2). (1.6)

Now consider some sample of evidence ē. The point event ē should be included in

Λ if and only if its inclusion contributes positively to this sum of Eq. 1.6, i.e.

P(ē|h1)− P(ē|h2) ≥ 0. (1.7)

This implies we want to include in Λ only the observations where this inequality holds,

15

which means that a point event ē should be included in Λ if and only if we have

P(ē|h1)P(ē|h2)

≥ 1 . (1.8)

This defines our acceptance region Λ for hypothesis h1 as all sample evidences e for

which the likelihood ratio l12(e) ≥ 1, which means β should equal 1. To summarize,

in order to maximize the overall accuracy Eq. 1.2 given a sample of noisy evidence

e, we should choose hypothesis h1 if and only if l12(e) ≥ β, where β = 1.

The above argument easily generalizes to the case where one wishes to maximize

the weighted accuracy

P(H1|h1) + aP(H2|h2) . (1.9)

Again we can rewrite this as trying to maximize P(H1|h1) − aP(H1|h2) and observe

again that we want to select a region Λ so that∑

e∈Λ P(e|h1) − aP(e|h2) is as large

as possible. The only point events which contribute positively to this sum are those

events ē for which we have

P(ē|h1)− aP(ē|h2) ≥ 0, orP(ē|h1)P(ē|h2)

≥ a . (1.10)

This means the proper acceptance region for the hypothesis h1 is all of the events

whose likelihood ratios equal or exceed a, which shows that in order to maximize the

weighted accuracy Eq. 1.9 we simply set β = a. Knowing this, we will now write the

weighted accuracy as

P(H1|h1) + βP(H2|h2) , (1.11)

since maximizing this form immediately tells us the value of β that should be set.

This result actually shows us how to choose β for many different useful cases,

most notably the case where one wishes to maximize expected value. Suppose we are

given costs and rewards for all four possible trial outcomes:

16

r11 : amount of reward for correctly choosing h1,

r21 : amount of penalty for incorrectly choosing h1,

r22 : amount of reward for correctly choosing h2,

r12 : amount of penalty for incorrectly choosing h2.

The expected reward is

E(R) = r11P(h1)P(H1|h1) + r22P(h2)P(H2|h2)

−r21P(h2)P(H1|h2)− r12P(h1)P(H2|h1).(1.12)

Note that there are no assumptions on the priors P(h1),P(h2). We can rework Eq.

1.12 into a form of Eq. 1.9:

E(R) = r11P(h1) (1− P(H2|h1)) + r22P(h2)P(H2|h2)

−r21P(h2) (1− P(H2|h2))− r12P(h1)P(H2|h1)

≃ (r22 + r21)P(h2)P(H2|h2)− (r11 + r12)P(h1)P(H2|h1)

≃ (r22 + r21)P(h2)P(H2|h2) + (r11 + r12)P(h1)P(H1|h1)

≃ (r22+r21)P(h2)(r11+r12)P(h1)

P(H2|h2) + P(H1|h1) ,

(1.13)

where ≃ indicates equivalence up to a constant factor (i.e. equivalent maximization

problems). We see that in order to find the acceptance region for hypothesis h1

which maximizes the expected reward, we need to choose hypothesis h1 whenever the

likelihood ratio l12(e) exceeds

β =(r22 + r21)P(h2)

(r11 + r12)P(h1). (1.14)

In particular, when r22 + r21 = r11 + r12, this reduces to the case where we wish to

maximize overall accuracy, which means we set

β =P(h2)

P(h1); (1.15)

17

so, if hypothesis h1 is more likely, it takes a smaller value of l12 to decide upon choosing

h1. Finally, in the case where both hypotheses are equally likely we recover β = 1 as

demonstrated above in Eq. 1.8.

The formulation presented in this section can be thought of as how a subject may

compute a decision in a 2AFC task given only one unit of evidence. However, more

realistic tasks usually demand that subject incorporate multiple pieces of evidence

over some time epoch. How does one properly adjust their decision variable in order

to accommodate for multiple pieces of evidence? This question is answered in the

next section by the Neyman-Pearson lemma and the sequential probability ratio test.

1.2.2 The Neyman-Pearson lemma and the sequential prob-

ability ratio test

We now study the situation in which a subject is successively presented with pieces

of evidence. First, we study the interrogation paradigm, in which subjects are given

a fixed amount of evidence {e1, . . . , eN} and are required to identify if these are

sampled from the distribution corresponding to hypothesis h1 or that corresponding

to hypothesis h2. The natural adjustment to our decision variable from above is to

consider the likelihood ratio of all of the data,

log [LR12] = log

[

P(e1, e2, . . . , eN |h1)P(e1, e2, . . . , eN |h2)

]

=N∑

j=1

log

[

P(ej|h1)P(ej|h2)

]

. (1.16)

where we have taken a logarithm to let us work with sums instead of products.

We have also assumed that the data are independent, an assumption which is usually

enforced by the experimentalist. Eq. 1.16 will often be referred to as the log-likelihood

or log-likelihood ratio. The decision rule is then essentially equivalent to that from

above: if logLR12 ≥ Θ for some threshold Θ ∈ R, then accept hypothesis h1, and if

logLR12 < Θ, accept hypothesis h2.

18

This widely used likelihood-ratio test was proven to be optimal by Neyman and

Pearson in 1933 [NP33]. The Neyman-Pearson lemma states that the likelihood-ratio

test which accepts h1 if and only if logLR12 ≤ Θ is the most powerful statistical test

of its size. Size here refers to the false alarm probability or error rate: α = P(H2|h1),

where H2 refers to the event that hypothesis h2 is chosen according to this decision

rule. There are several more updated proofs of the Neyman-Pearson lemma in the

statistical literature [GS66, Gho70, Leh59].

The Neyman-Pearson lemma implies that if Θ = log[1] = 0, then the likelihood-

ratio test above maximizes accuracy by delivering the most likely hypothesis and

minimizing the total error probability. Thus, the Neyman-Pearson lemma is optimal

for the interrogation paradigm.

The case of the free response paradigm is less straightforward. These 2AFC tasks

require the subject to consider two elements: the decision between the hypotheses h1

and h2, and the decision about whether to either respond or continue accumulating

evidence5. More precisely, after collecting the nth piece of evidence en, the subject

determines whether or not to immediately execute his decision based on e1, . . . , en,

or to wait and collect en+1. After collecting the nth piece of evidence, our decision

variable is essentially the same as before, but now the log-likelihood ratio needs to be

updated for each new piece of evidence so that after n observations,

log [LR12(n)] = log

[

P(e1, e2, . . . , en|h1)P(e1, e2, . . . , en|h2)

]

=n∑

j=1

log

[

P(ej|h1)P(ej|h2)

]

. (1.17)

A sensible choice of decision rule is to update logLR12 after each new unit of evidence

5Such language makes us think of the trade-off between exploration and exploitation. In ourcase, the subject is faced with a similar explore vs exploit decision at each moment of evidenceaccumulation. Indeed, the subject may choose to explore by waiting and collecting more evidence, orthe subject may exploit the collected evidence by choosing either h1 or h2 and responding accordingly.In general the trade-off between exploitation and exploration is poorly understood: even when theobjective functions are well specified there may still not be a known optimal policy for trading offbetween explore vs. exploit. Here we have a simple situation in which we may derive the solutionfor a nontrivial explore-exploit task.

19

is collected, and to ask whether or not logLR12 has crossed some positive or negative

number. Hypothesis h1 is accepted if logLR12 is greater than some positive threshold

Z0, and h2 is accepted if logLR12 drops below some negative threshold Z1. This

choice of decision rule applied to LR12 from Eq. 1.17 is what comprises the sequential

probability ratio test (SPRT).

Barnard [Bar46] and Wald [WW48, Wal04] independently showed that the SPRT

is optimal for the free response paradigm in the sense that given a fixed level of

accuracy that must be attained on average, the SPRT requires the smallest number

of samples to settle on a decision. Other proofs of the optimality of the SPRT may

be found in [GS66, Gho70, Leh59].

1.2.3 The continuum limit of the SPRT

When considering the SPRT, one often thinks of the decision variable as a discrete

random walk, where the log likelihood ratios of independent samples of evidence

produce independent increments at each time step. Writing log[LR12(n)] = xn for

simplicity, we have

xn = xn−1 + logP(en|h1)P(en|h2)

. (1.18)

The SPRT is performed by choosing some initial value for x0 and incrementing ac-

cording to Eq. 1.18 until xn crosses some positive threshold Z0 or negative threshold

Z1. The three values x0, Z0, and Z1, encode the priors for the hypotheses h1 and

h2, as well as any reward biases that are present. Because the SPRT is unchanged

if we shift x0, Z0 and Z1 by a constant value, we may assume that the thresholds

are symmetric, that is Z0 = Z and Z1 = −Z for some number Z. The way in which

thresholds and initial conditions are to be chosen is carefully studied in [BBM+06]. In

the case of unbiased stimuli and equal rewards, the SPRT can bee seen as a discrete

random walk with x0 = 0 and continuing according to Eq. 1.18 until it crosses some

20

positive threshold +Z or negative threshold −Z.

As discrete samples are taken more and more rapidly, the discrete random walk

Eq. 1.18 approaches a continuous time variable X(t) after proper technical consider-

ations are made. The details of this limiting procedure are covered in appendix A of

[BBM+06], where it is shown that the continuum limit of SPRT then produces the

drift diffusion model (DDM)

dX = Adt+ σdW, X(0) = x0, (1.19)

where X is the state of accumulated evidence for a particular choice, Adt is the aver-

age increase in evidence per unit time (drift), and the diffusion term σdW represents

Gaussian distributed white noise with mean 0 and variance σ2dt. It is important

to note that A and σ depend upon specific properties (i.e. means and variances) of

the distributions from which the samples are drawn, and that in our form we have

implicitly assumed we are sampling from Gaussian distributions. When X(t) crosses

either of the thresholds ±Z, accumulation halts and a decision is made (accepting h1if +Z is crossed and h2 if −Z is crossed). Often we use another parameter called the

non-decision time T0 to account for any sensory delays before any evidence accumu-

lation is made, and motor delays prior to the response. This is introduced to avoid

arbitrarily small reaction times which are impossible in real data because of the time

it takes for signals from the retina to travel to the relevant decision areas of the brain,

and for the execution of motor actions.

The DDM was introduced into the psychological literature in [Rat78] as a proposed

theory for memory retrieval. We sometimes refer to the DDM as presented in this

section as the pure DDM to distinguish it from the extended DDM that we summarize

in §1.5.1. In Ch. 4 we construct a version of the DDM involving non-Gaussian noise,

and compare its data fitting power with that of the extended DDM.

21

The pure DDM can also be used to solve a continuous time version of the interro-

gation paradigm by supposing that at the interrogation time T , the subject responds

according to the sign of X(T ). If X(T ) > 0, then the subject accepts hypothesis h1,

and if X(T ) < 0, the subject accepts hypothesis h2.

When working with the DDM, we can write down explicit formulae for the mean

decision times and error rates which can be used to describe overall model behavior.

These details are presented in §1.3.3 when we restate some relevant formulas for the

Ornstein-Uhlenbeck model, of which the DDM is a special case.

The optimality of the DDM in both the interrogation and free response paradigms

can be heuristically justified by recalling that it is the continuous time limit of either

the SPRT or likelihood ratio test from §1.2.2, both of which have been shown to

be optimal. Furthermore, [BBM+06] gives other direct arguments establishing the

optimality of the DDM, and also provides formulas for how to optimally set the

thresholds as well as how to modify the pure DDM presented here for biased priors.

Although these results are indeed very useful, we choose to not review them here.

The reader is encouraged to refer to appendix A of [BBM+06] for a clear exposition

of these results.

1.2.4 Relevant experimental results concerning drift diffu-

sion processes and optimality

The various ratio tests and the DDM presented above provide us with a good set

of tools for describing and fitting data for various 2AFC tasks. One then wonders

if these statistical tests represent how animal and human brains actually behave.

Not surprisingly, animals do not always behave optimally. One famous example is

the matching law first noted by Herrnstein in 1961 [Her61, Her70] which is reviewed

in [Her97]. Herrnstein showed that pigeons would often choose at only twice the

frequency a button which yielded twice as much reward, where an optimal pigeon

22

would choose the higher rewarded button 100% of the time. There are also several

cases of humans performing suboptimally (e.g. [LS91, LT89]).

Yet in tasks where subjects perform suitably constrained 2AFC tasks with suf-

ficient training, the DDM seems to provide good fits to both monkey and human

behavioral data [BHHC10, BBM+06, BHHC10, SCB+09, RM07, GP08, RGMN10,

BSN+11, RHH+07, RCS03]. Even more impressive are the neurophysiological con-

nections that have linked the DDM process from Eq. 1.19 to neural activity. For

example, Hanes and Schall demonstrated that neurons in the frontal eye field exhibit

activity which resembles the drift rate of a diffusion process [HS96], and others have

contributed further evidence from in vivo recordings in monkeys that occulomotor

decision making in the brain mimics a DDM, with neural activity rising to a thresh-

old before movement initiation [Sch01, MRDS03, GS01, SR04]. In the case of the

random dots motion task, LIP recordings are believed to represent accumulating evi-

dence in neurons having receptive fields containing the response target corresponding

to a particular alternative, as shown in Fig. 1.2. Differences between the firing rates

corresponding to the two different directions of motion then appear to behave like

sample paths of the DDM [RS02]. It should be noted, however, that the DDM has

only been shown to fit the evidence accumulation period during these dots motion

tasks. For other phases of the trial where evidence is not being accumulated, the

DDM does not necessarily fit the neural data well. An example of this is in the neu-

ral data shown at the end of Ch. 2 – the LIP recordings during the reward period

before the accumulation phase do not look at all like DDM processes. Furthermore,

the neural data for a different task in Ch. 3 shows many features that a straightfor-

ward application of the DDM would not be able to fit. There, we construct a model

which incorporates integrate-to-threshold units with other types of units in order to

fit the neural data.

These various experimental results show that we can find signals in the brain which

23

represent the decision variables in a statistical test. But how does the brain assemble

these signals? How might neurons actually perform the computations required for

evidence accumulation and hypothesis testing? In the next section we summarize

a series of biophysically-motivated modeling results which begin to systematically

connect single cell spiking models to the DDM.

1.3 Working down: Reducing a biologically plau-

sible model to an OU process

Throughout the animal kingdom, brains can contain up to billions of neurons (with

humans having ≈ 1011 neurons), each of which are nontrivial to model. Perhaps the

most detailed dynamical models we have of neurons are those inspired by the Hodgkin

and Huxley, where one may construct large systems of ODEs and PDEs which approx-

imate the specific ionic currents at specific locations along a neuron’s axon, cell body,

or dendritic tree ([HH52b, HHK52, HH52a, HH52d, HH52e, HH52c, GC10, DA05]).

These models have been very successful at reproducing the key spatiotemporal prop-

erties of single cells, most important of which is that of an action potential. However,

such models are analytically intractable and relatively expensive to simulate, and

if one wishes to model networks of even hundreds of neurons, the systems quickly

become computationally infeasible.

A common simplification of these multiunit compartmental models is the integrate

and fire model. Here, one does not explicitly simulate the time course of an action

potential, but rather only models the subthreshold voltage potential up until some

threshold. When the threshold is reached, a delta function spike occurs and the

voltage is reset to its resting potential. Various flavors of these integrate and fire

neurons have been used to simulate models of tens of thousands of neurons [IE08,

Izh03], and one simulation by Izhikevich explicitly modeled 1011 neurons and almost

24

1015 synapses (although this simulation of 1 second of real time took 50 days on 27

3GHz processors).

The key to all of these approaches is that they aim to directly model neurons,

which are the basic building blocks of the brain. Any results about the overall network

behavior then need to emerge from the explicit modeling of single neurons and their

connections.We say that models constructed in this way are biophysically-based or

biophysically-motivated.

In this section we wish to present how the DDM can also be viewed as a reduction

from biophysically-based models for perceptual decision making. We begin in §1.3.1

with an overview of some biophysically-based models and illustrate their connection

with a popular model called the Leaky Competing Accumulator (LCA) model. Then

in §1.3.2 we show how the LCA can be reduced, given certain parameter ranges and

assumptions, to an OU model, of which the DDM is a specific case.

Throughout this development, the key point is to see the DDM in a different way,

this time as a simplified version of a biophysically motivated model. This gives us

the justification for considering the DDM as a plausible model for neural activity.

Furthermore, this reduction from realistic models also allows us to further justify the

link between the DDM and the related neural activity that was reviewed in §1.2.4.

1.3.1 From spiking neurons to the Leaky Competing Accu-

mulator model (LCA)

Over the past decade, researchers have been able to demonstrate in computer sim-

ulation that biophysically realistic cortical networks of spiking neurons are able to

reproduce the behavioral dynamics of perceptual decisions. In 2002, Wang presented

a model of area LIP by simulating spiking neurons and drawing neural and synap-

tic properties of the model from anatomical and physiological observations [Wan02].

Subsequent studies have further demonstrated how this biophysically-based spiking

25

model reproduces experimentally observed behaviors in perceptual decision tasks.

These studies have been able to connect many of these observations with biological

properties of the spiking neurons, although rigorous reduction theorems are lacking,

except for local bifurcations [WW06, WHMNW07, EWLH11, RL08, GH83].

A looser connection between Wang’s model and accumulator models was estab-

lished by Bogacz et al in [BBM+06]. It is still not completely understood how one

may reduce a detailed neural network model to a noisy connectionist unit, but by

assuming such a reduction Bogacz et al. showed that an averaged version of Wang’s

model in [Wan02] may be viewed as a biologically realistic implementation of the

Leaky Competing Accumulator model of Usher and McClelland [UM01]. We also

note that another biologically inspired model of perceptual choice which has been

reduced to the LCA [BUZM07, BBM+06] is that of Mazurek et al. [MRDS03].

The LCA model is a two-dimensional system of stochastic differential equations

whose state variables (x1(t), x2(t)) describe the activities of two mutually-inhibiting

populations of neurons, each of which receives noisy sensory input [UM01, McC79]:

dx1 = [−γx1 − βf(x2) + I1(t)]dt+ σdW1 ,

dx2 = [−γx2 − βf(x1) + I2(t)]dt+ σdW2 ,(1.20)

where f(·) is a sigmoidal type activation function, γ and β, respectively, denote the

strengths of leak and inhibition, and σdWj are independent white noise (Weiner)

increments of r.m.s. strength σ. The inputs I1(t), I2(t) are generally time dependent

signals which vary over the course of a single decision process (i.e. trial).

Under the interrogation paradigm the choice is determined by the difference x(t).=

x1(t) − x2(t): at interrogation time T if x(T ) ≥ 0 the hypothesis corresponding to

stimulus I1 is chosen, and vice versa for x(T ) < 0. Similarly, for the free response

paradigm we set positive and negative thresholds for the decision variable x(t), and

when a threshold is crossed the corresponding decision is made.

26

1.3.2 Reduction of LCA to an Ornstein-Uhlenbeck (OU) pro-

cess

It has been shown that under suitable conditions that the LCA model from above

can be reduced to a one-dimensional Ornstein-Uhlenbeck model, which is a simple

generalization of the DDM [BBM+06, BGH+05, BH01, FHRN09]. Here we present

the reduction from [FHRN09], which demonstrates how the noiseless version of the

LCA reduces to an OU process in the presence of constant inputs. We present this

noise-free reduction for clarity, so that the technical tools involving stochastic center

manifolds [Box89, Box91, AJM+95] need not be developed here. When the inputs

Figure 1.3: Illustration showing nullclines and fixed points for the LCA, Eq. 1.20.The two curved lines represent the nullclines, and their intersections are the threefixed points. The two outermost fixed points are stable, and the middle fixed point isa saddle. The two stable fixed points shown represent relatively high activity in onepopulation versus the other, whereas the saddle point represents moderate activity inboth populations. The dashed diagonal line represents the slow attracting manifold.

I1, I2 are constant and σ = 0, equilibrium solutions of Eq. 1.20 lie at the intersections

of the nullclines given by γx1 = −βf(x2) + I1 and γx2 = βf(x1) + I2. Depending

27

upon the precise form of f(·) and the parameter values I1, I2, β, γ, there may be

one, two, or three equilibrium points. Each equilibrium point corresponds to either

moderate activity in both populations or relatively high activity for one population

versus the other, as illustrated in Fig. 1.3. Also shown in Fig. 1.3, if these nullclines lie

sufficiently close to each other over the activity range that encompasses the equilibria,

it follows that a one-dimensional, attracting, slow manifold exists that contains both

stable and unstable equilibrium points, and the solutions that connect them [GH83,

BH01].

To demonstrate these ideas more clearly, we first linearize the sigmoidal activation

function in Eq. 1.20 at the central equilibrium point (x̄, x̄) in the case of equal inputs

I = I1 = I2, where x̄ =1γ[−βf(x̄) + I]. After parameterizing the sigmoid f(·) so

that dfdx(x̄) = 1, the linearized system can be written as

dx1 = [−γx1 − βx2 + I1(t)]dt+ σdW1,

dx2 = [−γx2 − βx1 + I2(t)]dt+ σdW2,(1.21)

and subtracting these equations yields a single scalar SDE for the differenced activity

x(t) = x1(t)− x2(t):

dx = [λx+ A(t)] dt+ σdW, (1.22)

where λ = β − γ, A(t) = I1(t)− I2(t) and dW = dW1 − dW2 are independent white

noise increments. This is an Ornstein-Uhlenbeck (OU) process.

Eq. 1.22 describes the decision variable x(t) for our OU model. In the same way

as before, in the interrogation paradigm the sign of x is used to determine the choice

of response at interrogation time T . In the free response paradigm, a choice is made

when x crosses either a positive or negative threshold.

The stimulus differences A(t) represents the sensory evidence for a particular

hypothesis. For example, if a stimulus corresponding to hypothesis 1 is displayed,

28

this means that A(t) = I1 − I2 > 0, and a correct response will occur if the top

boundary is hit before the bottom boundary. Errors will occur when noise pushes the

x to the bottom (incorrect) boundary before the top (correct) is reached, and they

will be more likely when I1 and I2 are close, i.e. when the inputs for the alternatives

are hard to distinguish in the presence of noise.

A particularly important case is when the leak γ and inhibition β are perfectly

balanced, so that λ = β − γ = 0. In this case, Eq. 1.22 is equivalent to the drift

diffusion process of Eq. 1.19, and we recover the DDM from §1.2.3.

1.3.3 Working with the OU and DDM models

Here we present known expressions for error rates and reaction times which describe

overall OU (and DDM) model behavior. These quantities can then be compared with

experimental data, as is done in Ch. 2, or they can be used to compute reward rates,

as done in Ch. 5.

The first quantity we note is the probability distribution of solutions for Eq. 1.22,

which is governed by the forward Kolmogorov or Fokker-Planck equation [Gar85].

∂p

∂t= − ∂

∂x[(λx+ A(t))p] +

σ2

2

∂2p

∂x2. (1.23)

A variety of insightful derivations of the Fokker-Planck equation can be found in

several good sets of lecture notes and textbooks [Gar85, Ris96, Fel66, Cro06]. Given

initial conditions, we can solve Eq. 1.23 analytically or numerically to compute the

probability distribution of solutions of Eq. 1.22. In Ch. 2 we study solutions of Eq.

1.23 for the interrogation paradigm.

In [BBM+06], one may find closed form expressions for the mean error rate and

decision time for the OU model with constant drift rate A. The reader is referred

to Appendix A of [BBM+06] for these expressions, they are too cumbersome to be

29

included here and are not explicitly used in this thesis.

For the DDM, however, the expressions for the mean error rate and decision time

take on simpler forms which we state here for reference. Using techniques presented

in [BT92, BT93, Gar85, BBM+06] for computing first passage times, it can be shown

that the mean error rate 〈ER〉 and mean decision time 〈DT 〉 can be written as

〈ER〉 = 11 + e2z̃ã

− 1− e2x0ã

e2z̃ã − e−2z̃ã (1.24)

〈DT 〉 = z̃tanh(z̃ã) + 2z̃(1− e−2x0ã)

e2z̃ã − e−2z̃ã − x0. (1.25)

These expressions will be used in chapters 2 and 5.

1.4 Thesis overview

The OU Model and the particular case of the DDM are the crucial building blocks of

the modeling efforts in this thesis. In essence, this thesis is a further demonstration

of how these accumulator models are useful for studying a variety of psychological

phenomenon including, but not limited to, perceptual decision making.

In Ch. 2, we adapt the OU model to fit a more generalized version of the motion

dots task where monkeys are now faced with biased rewards. By doing so, we demon-

strate that monkeys shift their behaviors in a systematic way, and that they do so in

a near optimal manner. We also show some fits of the OU model to electrophysiolog-

ical data and find that λ ≈ 0, giving some further evidence that the DDM is a good

model for both the behavior and neural activity related to perceptual choice.

In Ch. 3, we construct a multi-unit noisy connectionist model for the covert search

task mentioned in §1.1.2 [OSBG06]. We first reanalyze the data and discover some

new trends that were not fully established before, and then systematically construct a

model which explains the key behavioral and electrophysiological phenomena demon-

30

strated in the experimental data. Our model proposes that LIP plays more of an

attentional role in this covert search task – in fact LIP is not even necessary for the

task, although its inclusion aids performance. Our model is shown to come close in

some cases to the data, although there is still work left to be done.

In Ch. 4 we study a different generalization of the DDM, this time assuming that

the noise can have jumps in addition to Weiner increments. We demonstrate that

this simple change in the distribution of noise allows the pure DDM to reproduce fast

error trials given unbiased initial data, something which other models require more

parameters to reproduce. We fit our model to human subject data and compare it

with the extended DDM presented in §1.5.1.

In Ch. 5 we address the more general question of why humans, despite having a

prefrontal cortex with billions of neurons, can only perform a few number of tasks at

once. We construct an abstract model for studying capacity constraints on cognitive

control and use the DDM as a generalized model for a task. After studying various

aspects of the constructed model, large scale simulations demonstrate that a capacity

constraint does indeed arise out of the need for optimizing overall rewards.

We conclude the thesis with a short summary essay in Ch. 6.

1.5 Appendix: Two popular models for decision

making

In this section we review two models which deserve mention. First we will review

a useful extension of the pure DDM, namely the extended DDM, first presented by

Ratcliff and Rouder in [RR98] as a model for 2AFC data. Then we summarize the

Linear Ballistic Accumulator Model of Brown et al., which is a model of rapid choice

which does not use stochastic accumulation like the DDM (and extended DDM).

31

1.5.1 The Extended DDM of Ratcliff

In many situations, the simple pure DDM as presented in the main text is not sufficient

to fit reaction times. For example, one key feature often seen in behavioral 2AFC data

is that error trials and correct trials demonstrate significantly different reaction time

distributions. In [RR98], Ratcliff et al. presented the extended DDM which adds 4

new parameters to the pure DDM in order to fit four different sets of behavioral data.

Recall that the pure DDM from above already contains 5 parameters: x0, Z, A, σ, and

T0. Over the years, many reports have established that the extended DDM is able to

fit behavioral 2AFC data well [RT02, RS04, RM07, SCB+09].

x0

T0

0

-z

z

st

sx

Mean drift:

Time

A

Sample path

Std. dev. of sample path

positions at time t:

DDM first-passage time density

(conditioned on the event of an

upper boundary ( z ) crossing).

Stimulus onset

Trial-to-trial

Std. dev. of drift: sA

RT

c √t

Figure 1.4: Extended DDM illustration from [SCB+09]. The notation here differsfrom that used in the original extended DDM paper [RR98].

The four additional parameters to the pure DDM which were introduced in [RR98]

are variability in starting point σx, variability in drift rate σA, variability in non-

decision time σT , and a proportion of contaminant reaction times p0. For each trial,

the initial condition x(0) was then sampled from a uniform distribution, U [x0 −

32

σx, x0 + σx], and the drift rate A was sampled from a Gaussian distribution with

mean A and standard deviation σA. The non-decision time T0 was also sampled from

its own uniform distribution U [T0−σT , T0+σT ]. Finally, each trial had a probability

p0 of being a contaminant trial. On these contaminant trials, a random response was

made, and the reaction time was not generated by the diffusion process but instead

was drawn from a uniform distribution spanning the observed RT range from the

data. The extended DDM is illustrated in 1.4.

1.5.2 The Linear Ballistic Accumulator Model

One recent model for response times is the Linear Ballistic Accumulator [BH05, BH08]

which has seen several applications as a model of rapid choice [DABH09, FDB+08,

HBS09, LFEG09]. The basic idea is shown in Fig 1.5. Here, two accumulators are

used, one for response A and another for response B. The activity of an accumulator

is initialized to a value uniformly sampled from [0, A]. Then, the drift rate for the

first accumulator is drawn from a normal distribution with mean dA and standard

deviation σ. The drift rate for the second accumulator is also drawn from a normal

distribution with mean dB and standard deviation σ. Then for each accumulator,

evidence accumulates in a noiseless (ballistic) and linear fashion according to their

respective drift rates. The various accumulators race to their respective thresholds,

and whichever hits threshold first executes the corresponding response. Like the

diffusion models, a non-decision time T0 is typically included. In [DABH09], Donkin

et al. performed a detailed numerical comparison between the Extended DDM and

the LBA and conclude that inferences about psychological processes made from real

data are unlikely to depend on the model that is used. The LBA has the advantage

of having one less parameter when compared to the extended DDM, whereas the

extended DDM has the principled derivations as presented in sections 1.2 and 1.3

(although the LBA is derived in [BH05] from a simplified deterministic version of

33

Figure 1.5: Simplified representation of the Linear Ballistic Accumulator model ofBrown and Heathcote. Reprinted from [BH08], with permission from Elsevier.

the LCA of [UM01]). Furthermore, Goldfarb and Caicedo (personal communication)

have preliminary evidence that the LBA has difficulty staying close to the optimal

performance curve, which represents the relationship between error rate and decision

time that holds under the optimal threshold set in the DDM.

34

2 Can monkeys choose optimally when

faced with noisy stimuli and unequal

rewards? 1

This chapter focuses on modeling the behavior of monkeys performing an extended

version of the motion dots discrimination task in which the amount of reward for one of

two alternatives may be doubled (if correctly selected) on certain trials. The monkeys

are informed of a particular trial’s reward structure by a visual cue, illustrated in the

“Reward” column of Fig. 2.1 and described below. We propose extensions of the

OU process presented in §1.3.3 to account for the influence of unequal rewards, and

make use of the derived psychometric functions (PMFs) which link model predictions

to monkey behavior. The PMFs are characterized by two parameters: midpoint

slope, which quantifies the subject’s ability to extract signal from noise, and shift,

which measures the bias applied to account for unequal rewards. We fit these PMFs

to data collected from two adult rhesus monkeys and find that, when behavior is

averaged over multiple sessions, the monkeys shift their PMFs in a nearly optimal

manner; remarkably, monkeys respectively garner greater than 98% and 99% of their

maximum possible rewards. We also present a simple fit to the electrophysiological

data during the accumulation period of the task in order to estimate some of the OU

1This chapter shares its title with a paper written by Samuel Feng, Alan Rorie, Philip Holmes,and William T. Newsome [FHRN09]. Most notable is Alan Rorie, who was primarily responsible forperforming these extensive and thorough monkey experiments.

35

parameters.

We describe the experimental task in §2.1. §2.2 briefly reviews material from

§1.3, reducing the LCA model to an OU process and demonstrating how the resulting

PMFs change with experimental parameters. In §2.3 we compute the optimal shifts

of these PMFs due to biased reward conditions and mixed coherences between trials.

In §2.4 we demonstrate that the predicted PMFs from the OU model fit the data if

averaged over multiple sessions, and we compute how close the monkeys are to optimal

performance. We also present some work with the individual session data. In §2.5

we present some fits of the OU model to the averaged firing rate data during the

accumulation period and find that our OU process is approximately a drift diffusion

process. The chapter concludes with some future problems and discussion in §2.6.

2.1 Unequal rewards in the motion dots task

Figure 2.1: Diagram indicating the time course of cues during the biased rewardsmotion dots task. See text for details.

The full details of the behavioral study including equipment used, training of

monkeys, setup of experimental apparatus, and recording of electrophysiological data

can be found in [FHRN09]. The experiment was performed at Stanford University

by Alan Rorie, under the supervision of W.T. Newsome. We are indebted to them

36

for sharing their data.

Here we focus on the details presented in Fig. 2.1, which illustrates the sequence

of events which formed a typical trial. First, a small, yellow dot appears and the

monkey is required to fixate upon it for 150 msec. Next, two saccade targets appear

(open gray circles) on opposite sides of the fixation point and aligned with the axis of

motion to be discriminated. By convention, target 1 (T1) corresponds to a positive

coherence stimulus (right-going motion), and target 2 (T2) to negative coherence

stimulus (left-going motion). After 250 msec the targets change color, indicating the

magnitude of reward available for correctly choosing either target. A blue target

indicates a low magnitude (L) reward of 1 unit (1 drop) of juice, while a red target

indicates a high magnitude (H) reward of 2 units of juice. We denote by r1 ∈ {1, 2}

the magnitude of reward available for correctly choosing T1, and similarly define r2.

This yields four reward conditions overall, shown in the 4 panels of the “Reward”

column of Fig. 2.1: (1) LL (r1 = r2 = 1), in which both targets are blue, (2) HH

(r1 = r2 = 2), in which both are red, (3) LH (r1 = 1, r2 = 2), in which T1 is blue

and T2 is red, and (4) HL (r1 = 2, r2 = 1), in which T1 is red and T2 is blue. These

colored targets are displayed 250 msec after the open gray circles appear, and remain

on throughout the trial. After another 250 msec, the motion stimulus begins.

The motion stimulus consists of a field of randomly moving dots, a certain per-

centage of which were coherently moving either towards T1 or T2. The proportion

of dots moving towards the correct target is called the coherence – a trial with −3%

coherence indicates that 3% of the dots coherently move towards T2, while all of the

others move randomly. The motion stimulus remains on for 500 msec. Following mo-

tion stimulus offset, the monkey is required to maintain fixation for a variable delay

period of 300-550 msec (varied uniformly across trials within each session). After this

the fixation point disappears, cueing the monkey to report his decision with a sac-

cade to the target corresponding to the perceived direction of motion. Monkeys must

37

respond within 1000 msec otherwise the trial is discarded. If he chooses the correct

direction, he is rewarded according to the color of the chosen target. Until the Go!

period when the monkey can respond, fixation is enforced throughout the trial, and

breaks of fixation are penalized by aborting the trial and enforcing a time-out period

before the next trial.

Rorie et al. collected data from two monkeys which we call monkey A and monkey

T. Trials were presented in block-randomized order. For monkey A, they employed

13 possible signed coherences:

{0,±1.5%,±3%,±6%,±12%,±24%,±48%}

which, along with the four reward conditions, yields 52 conditions overall. For mon-

key T, the two lowest motion coherences ±1.5%,±3% were eliminated because this

animal’s psychophysical thresholds were somewhat higher than those of monkey A,

giving 36 conditions overall. The behavioral data analyzed here consists of 35 sessions

from monkey A (totaling 66933 trials) and 25 sessions from monkey T (totaling 32751

trials).

2.2 Predicting psychometric functions (PMFs) with

an accumulator model

We begin our model construction with the LCA from §1.3.1. More precisely, we

suppose that there are two states (x1(t), x2(t)) which represent short-term averaged

firing rates of two mutually-inhibiting pools of LIP neurons, which are sensitive to

alternatives 1 and 2, respectively. We understand that decisions are almost certainly

formulated through interactions among several oculomotor areas, but note that the

causal role of LIP has been demonstrated in [HDS06]. In the development here, each

38

population receives noisy sensory input from the stimulus along with input derived

from reward expectations. For clarity, we restate the LCA model equations here:

dx1 = [−γx1 − βf(x2) + I1(t)]dt+ σdW1 ,

dx2 = [−γx2 − βf(x1) + I2(t)]dt+ σdW2 ,(2.1)

where the meanings of parameters are stated in 1.3.1.

From here we reduce the linearized LCA to an OU process as presented in §1.3,

which yields a single scalar SDE for the activity difference x := x1 − x2:

dx = [λx+ A(t)]dt+ σdW , (2.2)

where λ = β − γ is the difference between leak and inhibition, A(t) = I1(t) − I2(t),

and dW = dW1 − dW2 are independent white noise increments. To complete the

model formulation we observe that our experiment is performed according to the

interrogation protocol with interrogation time T , as the motion stimulus stays on for

a fixed duration, after which monkeys are required to make a response (an eye saccade

to T1 or T2) which reports the direction of motion2. Note from the definition of x that

evidence for alternative 1 manifests itself as positive drift or A = I1 − I2 > 0, and for

alternative 2 vice versa, and that we have a correct response if sign(x(T )) = sign(A)

(since coherence is fixed during a trial, A(t) will be assumed constant during the

stimulus period, although others have assumed variable drift rates for OU processes

[EHL+08]).

The discussion from §1.3.3 reminds us that if λ = 0, Eq. 2.2 describes a drift

diffusion (DD) process, which is a continuum limit of the sequential probability ratio

test (§1.2.3) and is optimal for 2AFC tasks in that, given a fixed decision time, it

maximizes accuracy. Since we have no reason for assuming that λ = 0, we elect

2We choose not to explicitly model the delay and response periods, instead electing to assumethat the monkeys’ responses are locked in as soon as the stimulus terminates at time T .

39

to use the Orstein-Uhlenbeck (OU) model and proceed to derive expressions with

which we can study the monkeys’ behavior in our biased rewards 2AFC task. In

particular, we wish to study the monkeys’ psychometric functions (PMFs), which

represent how coherences affects accuracy for an individual. In doing so, we can

analyze the optimality of each monkey’s behavior and determine if the monkeys are

properly incorporating the additional biased reward information.

From §1.3.3 we know we can compute the probability of choosing alternative 1

under the interrogation protocol by computing the probability distribution of solu-

tions p(x, t) from the Fokker-Planck equation (1.23). If we suppose that the initial

data p(x, 0) are Gaussian with mean µ0 and variance ν0,

p(x, 0) =1√2πν0

exp

[

−(x− µ0)2

2ν0

]

, (2.3)

then the distribution of solutions of Eq. 2.2 are themselves Gaussian as time evolves:

p(x, t) =1

√

2πν(t)exp

[

−(x− µ(t))2

2ν(t)

]

, (2.4)

where

µ(t) = µ0eλt +

∫ t

0

eλ(t−s)A(s) ds and ν(t) = ν0e2λt +

σ2

2λ

(

e2λt − 1)

(2.5)

define the integrated stimulus and integrated noise. Eq. 2.4 can be verified by directly

computing its partial derivatives. Eqs. 2.5 are now the central focus of our study,

as they are directly linked to the distribution of solutions p(x, t) through Eq. 2.4,

and, as we shall see below, are directly responsible for the shapes of the psychometric

functions. It is useful to note that in the DD limit of λ = 0, Eqs. 2.5 simplify to

µ(t) = µ0 +

∫ t

0

Extensions and applications of stochastic accumulator models in attention and decision ... · 2013....

Documents

Transcript of Extensions and applications of stochastic accumulator models in attention and decision ... · 2013....