Extensions and applications of stochastic accumulator models in attention and decision ... · 2013....

203
Extensions and applications of stochastic accumulator models in attention and decision making Samuel F. Feng A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance by the Program in Applied and Computational Mathematics Adviser: Philip J. Holmes November 2012

Transcript of Extensions and applications of stochastic accumulator models in attention and decision ... · 2013....

  • Extensions and applications of

    stochastic accumulator models in

    attention and decision making

    Samuel F. Feng

    A Dissertation

    Presented to the Faculty

    of Princeton University

    in Candidacy for the Degree

    of Doctor of Philosophy

    Recommended for Acceptance

    by the Program in

    Applied and Computational Mathematics

    Adviser: Philip J. Holmes

    November 2012

  • c© Copyright by Samuel F. Feng, 2012.

    All Rights Reserved

  • Abstract

    The research presented in this thesis is a collection of applications and extensions of

    stochastic accumulator models to various areas of decision making and attention in

    neuroscience.

    Ch. 1 introduces the major techniques and experimental results that guide us

    throughout the rest of the thesis. In particular, we introduce and define the leaky,

    competing accumulator, drift diffusion, and Ornstein-Uhlenbeck models.

    In Ch. 2, we adopt an Ornstein-Uhlenbeck (OU) process to fit a generalized version

    of the motion dots task in which monkeys are now faced with biased rewards. We

    demonstrate that monkeys shift their behaviors in a systematic way, and that they

    do so in a near optimal manner. We also fit the OU model to neural data and find

    that OU model behaves almost like a pure drift diffusion process. This gives further

    evidence that the DDM is a good model for both the behavior and neural activity

    related to perceptual choice.

    In Ch. 3, we construct a multi-area model for a covert search task. We discover

    some new trends in the data and systematically construct a model which explains

    the key findings in the data. Our model proposes that the lateral intraparietal area

    (LIP) plays an attentional role in this covert search task, and suggests that the two

    monkeys used in this study adapted different strategies for performing the task.

    In Ch. 4, we extend the model of noise in the popular drift diffusion model (DDM)

    to a more general Lévy process. The jumps introduced into the noise increments

    dramatically affect the reaction times predicted by the DDM, and they allow the

    pure DDM to reproduce fast error trials given unbiased initial data, a feature which

    other models require more parameters to reproduce. The model is fit to human

    subject data and is shown to outperform the extended DDM in data containing fast

    error reaction times.

    In Ch. 5, we construct a model for studying capacity constraints on cognitive

    iii

  • control using the DDM as a generalized model for a task. After studying various

    aspects of the constructed model, large scale simulations demonstrate that a severe

    capacity constraint does indeed arise out of the need for optimizing overall rewards.

    The thesis concludes with some summarizing remarks in Ch. 6.

    iv

  • Acknowledgements

    First I would like to thank my thesis adviser Philip Holmes for his intellectual and

    personal guidance over these past years. Your candor, integrity, and approach to life

    have inspired me more than you realize. I am blessed to have had the opportunity to

    be your student.

    It has also been a pleasure to work with my collaborators Alan Rorie, William

    Newsome, Sam Gershman, and Jonathan Cohen. I am particularly grateful to Alan

    Rorie and William Newsome – when we collaborated several years ago I had no idea

    how fortunate I was to work with you.

    In a category by himself is Michael Schwemmer, both as a friend and a colleague.

    Fitting that data was a pain! Your drive and work ethic are infectious, and I will

    miss your jokes. I wish you blessings as you move on in your career.

    I would also like to thank Carlos Brody and Eric Shea-Brown for taking time to

    read this thesis.

    I am most thankful for the various friends and family who have made Princeton

    my new home. I thank my parents Pen and Janet Feng for their unending love and

    support. I thank Philip Eckhoff, Adam Hincks, Arie Israel, Richard Jordan, Jun

    Kitagawa, and Ross Willford for their friendship as roommates over the years. I

    thank Westerly Road Church for their Christ-centered prayers and encouragement. I

    thank my new wife Siyi for being at my side during the highs and lows of graduate

    study, and for focusing me on the more important things of life. You remind me of

    Christ’s sacrificial love more than anyone else I know.

    And finally, all thanks be to God. You are good, your love endures forever.

    v

  • Contents

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

    1 Introduction to modeling evidence accumulation in two alternative

    forced choice (2AFC) tasks 1

    1.1 Experimental background of 2AFC perceptual tasks . . . . . . . . . . 2

    1.1.1 A crash course in visual processing . . . . . . . . . . . . . . . 3

    1.1.2 LIP’s role in perception and attention . . . . . . . . . . . . . . 6

    1.2 Building up: constructing drift diffusion (DD) processes via statistical

    inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    1.2.1 A simple 2AFC statistical inference task . . . . . . . . . . . . 12

    1.2.2 The Neyman-Pearson lemma and the sequential probability ra-

    tio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    1.2.3 The continuum limit of the SPRT . . . . . . . . . . . . . . . . 20

    1.2.4 Relevant experimental results concerning drift diffusion pro-

    cesses and optimality . . . . . . . . . . . . . . . . . . . . . . . 22

    1.3 Working down: Reducing a biologically plausible model to an OU process 24

    1.3.1 From spiking neurons to the Leaky Competing Accumulator

    model (LCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    1.3.2 Reduction of LCA to an Ornstein-Uhlenbeck (OU) process . . 27

    1.3.3 Working with the OU and DDM models . . . . . . . . . . . . 29

    vi

  • 1.4 Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    1.5 Appendix: Two popular models for decision making . . . . . . . . . . 31

    1.5.1 The Extended DDM of Ratcliff . . . . . . . . . . . . . . . . . 32

    1.5.2 The Linear Ballistic Accumulator Model . . . . . . . . . . . . 33

    2 Can monkeys choose optimally when faced with noisy stimuli and

    unequal rewards? 35

    2.1 Unequal rewards in the motion dots task . . . . . . . . . . . . . . . . 36

    2.2 Predicting psychometric functions (PMFs) with an accumulator model 38

    2.2.1 Two approaches for modeling biased rewards: shifted initial

    conditions vs. persistent reward signals . . . . . . . . . . . . . 41

    2.2.2 Fixed time interrogation reaction times cannot distinguish the

    form for integrated drift and noise . . . . . . . . . . . . . . . . 43

    2.2.3 Examples of psychometric functions . . . . . . . . . . . . . . . 45

    2.3 Optimality analysis of a two parameter psychometric function . . . . 47

    2.3.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . 47

    2.3.2 Blocks with mixed stimuli: a continuum of coherences . . . . . 49

    2.3.3 Blocks with mixed stimuli: finite sets of coherences . . . . . . 51

    2.4 Applying the model to experimental data . . . . . . . . . . . . . . . . 53

    2.4.1 PMF fits to data averaged over multiple sessions . . . . . . . . 53

    2.4.2 How close are the animals, on average, to optimal performance? 57

    2.4.3 Variability of behaviors in individual sessions . . . . . . . . . . 61

    2.5 Fitting the OU process to the LIP neural data . . . . . . . . . . . . . 63

    2.6 Discussion of results and future directions . . . . . . . . . . . . . . . 67

    3 Modeling a covert visual search task with a multi-area stochastic

    model 71

    3.1 Data analysis of the covert search task . . . . . . . . . . . . . . . . . 72

    vii

  • 3.1.1 LIP encodes target location, limb preference, set-size effect, and

    cue-hemifield congruence . . . . . . . . . . . . . . . . . . . . . 75

    3.1.2 Accuracy vs reaction time . . . . . . . . . . . . . . . . . . . . 81

    3.1.3 Searching for a target . . . . . . . . . . . . . . . . . . . . . . . 82

    3.2 A multi-area model for the covert search task . . . . . . . . . . . . . 85

    3.3 Fitting methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    3.4 Model fits and results . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    3.5 Discussion and concluding remarks . . . . . . . . . . . . . . . . . . . 100

    3.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    3.6.1 Search analysis computations . . . . . . . . . . . . . . . . . . 102

    3.6.2 Model equations . . . . . . . . . . . . . . . . . . . . . . . . . 103

    4 Changing the noise in the DDM: jumpy noise can be good 109

    4.1 Lévy processes as models for 2AFC reaction times . . . . . . . . . . . 111

    4.2 Fitting the Simen et al. behavioral data . . . . . . . . . . . . . . . . 115

    4.3 Closing discussion on the jump DDM . . . . . . . . . . . . . . . . . . 120

    5 Multitasking vs. Multiplexing: Uncovering capacity constraints on

    cognitive control 122

    5.1 Stroop as a model for cognitive control . . . . . . . . . . . . . . . . . 124

    5.2 Defining many parallel tasks with drift diffusion processes . . . . . . . 127

    5.3 Multitasking model equations and key simplifications . . . . . . . . . 128

    5.4 Control as reward maximization . . . . . . . . . . . . . . . . . . . . . 133

    5.4.1 Free response: maximizing reward rate and optimizing thresholds134

    5.4.2 Interrogation: reward rates based solely on accuracy . . . . . . 135

    5.4.3 Scaling the overall reward rates . . . . . . . . . . . . . . . . . 136

    5.5 Methods for studying capacity constraints . . . . . . . . . . . . . . . 137

    5.5.1 Input values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

    viii

  • 5.5.2 Incongruency . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

    5.5.3 Network connectivity and fan out . . . . . . . . . . . . . . . . 139

    5.5.4 Simulating capacity constraints on cognitive control . . . . . . 141

    5.6 Model behavior in simple cases . . . . . . . . . . . . . . . . . . . . . 142

    5.6.1 Two inputs, one output . . . . . . . . . . . . . . . . . . . . . 143

    5.6.2 Two inputs, two outputs . . . . . . . . . . . . . . . . . . . . . 143

    5.6.3 Maximizing overall drift rate . . . . . . . . . . . . . . . . . . . 147

    5.7 Model simulation results . . . . . . . . . . . . . . . . . . . . . . . . . 150

    5.7.1 Full model simulation, 10 pathways . . . . . . . . . . . . . . . 152

    5.7.2 Choosing values for peripheral parameters . . . . . . . . . . . 153

    5.7.3 Capacity constraints for larger networks . . . . . . . . . . . . 157

    5.8 Key conclusions and experimental remarks . . . . . . . . . . . . . . . 159

    5.9 Appendix: implicit threshold optimization . . . . . . . . . . . . . . . 162

    6 Closing Remarks 164

    7 Bibliography 169

    ix

  • 1 Introduction to modeling evidence

    accumulation in two alternative forced

    choice (2AFC) tasks

    The common thread throughout this entire thesis is the use of a simple stochastic

    model for decision making called the drift diffusion model (DDM). This model has

    grown in popularity over the past several decades, and dozens of scientific papers

    have been published using it to model and explain various phenomenon in the world

    of perceptual decision making. The contributions of this thesis are to generalize and

    apply the DDM to new situations, and to show that using the DDM (and closely

    related accumulator models) is still a valuable technique in understanding simple

    decision processes. The DDM provides the right amount of complexity to reproduce

    desired behavioral properties and illuminate neural mechanisms while at the same

    time remaining tractable enough for mathematical analysis and efficient computation.

    The overarching message of the research presented in chapters 2-5 is simple: models

    based on the drift diffusion process are useful tools for computational neuroscientists

    interested in modeling perceptual decisions, and their potential is still not fully realized.

    Each chapter of this thesis carries the DDM (or a close relative) in a new direction by

    either generalizing its dynamics or using it as a critical element of a modeling effort

    to describe some neural or psychological phenomenon.

    The main purpose of this chapter is to survey the mathematical and experimental

    1

  • background which has contributed to the success of drift diffusion as a stochastic

    model for perceptual decisions. We have been careful to cite articles and resources

    whenever appropriate – in some sense, this introduction represents the type of docu-

    ment that might aid a new student beginning research in modeling perceptual decision

    making.

    In §1.1 we recap the body of experimental work which leads to the experimental

    data analyzed in chapters 2, 3, and 4. In §1.2 we present a systematic construction of

    the drift diffusion model starting from a basic statistical inference task. This section

    is written in a more pedagogical manner, although the reader is referred to outside

    sources for proofs of the longer results. In §1.3 we give a second account of the DDM

    by reviewing a series of computational results over the past decade which demonstrate

    how biologically-based models of spiking neurons can be reduced to the DDM. Finally

    in §1.4 we give a more detailed description of the rest of this thesis. In the appendix

    after this chapter we review two other popular models for decision making.

    1.1 Experimental background of 2AFC perceptual

    tasks

    In this section we present a basic overview of the structures of the mammalian brain

    that are implicated in decisions, actions, and choice. The experimental background

    and history presented here should communicate a feel for the major lines of thought

    that comprise the current understanding of our visual system and how our brains make

    simple perceptual decision. Such an exposition is included not only for completeness,

    but also in order to properly frame the modeling efforts which form the backbone

    of this thesis. We have taken particular care in each chapter to connect our results

    with experimental data, and all models have been methodically constructed with the

    experimental literature in mind. Our hope is that this survey paints a broad picture

    2

  • of this literature and the various experimental phenomenon that motivate the rest of

    this thesis.

    We begin with a brief overview of the visual processing system in §1.1.1. The

    contents of §1.1.1 can be found in any good neuroscience textbook (e.g., [KSJ00,

    BCP06]) or in the expository sections of several papers, including [Sch01, MC01,

    DD95]. A more detailed overview of the first several decades of results concerning

    visual processing in monkeys can be found in [MN87]. §1.1.2 then recounts the more

    specific history of the lateral intraparietal area (LIP) as an area involved in perceptual

    decision making. We trace two main lines of experiments: LIP as an area which

    correlates with planned eye movements and LIP as an area which carries attentional

    signals.

    1.1.1 A crash course in visual processing

    Some of the most complex tasks performed by humans involve decisions of varying

    timescales and complexity. Not surprisingly, neural correlates of deciding, choosing,

    and acting occur in numerous areas of the mammalian brain. We focus mainly on

    decisions made on the basis of visual evidence – we call these perceptual decisions.

    Vision begins at the photoreceptor cells in the retina, and the majority of the con-

    sequent signals travel along the optic nerve to the lateral geniculate nucleus (LGN)

    in the thalamus. The optic nerve also carries signals from the retinal ganglion cells

    to the suprachiasmatic nucleus and pretectal nucleus, areas which are involved with

    sleep and reflective eye movements, respectively. The LGN serves as a sensory relay

    station by passing the visual stimulus to the primary visual cortex (V1), which is

    located at the back of the brain in the occipital lobe1. The lateral geniculate nucleus

    also serves as a processing station for input from several other sections of the cortex.

    1Each hemisphere of the cerebral cortex is divided into four major lobes (frontal, parietal, tem-poral, occipital) by the folds and bumps which are characteristic of the surface of the brain. SeeFig. 1.1.

    3

  • V1/V2

    V4

    MT/MST

    IT

    PPC

    PFC

    FEF

    Figure 1.1: Figure reproduced and adapted from [Gra18]. The four major lobes of thebrain are color coded, with the frontal lobe in blue, the parietal lobe in yellow, thetemporal lobe in green, and the occipital lobe in pink. Signals from the retina travel,via the lateral geniculate nucleus, into area V1 of the visual cortex, and then travelalong one of two streams. Along the ventral stream, signals travel from V1 to V2 andV4 into the inferior temporal cortex (IT). The dorsal stream carries signals from V1to V2 into area MT and into the posterior parietal cortex (PPC). Both the ventraland dorsal streams contain projections to and from the prefrontal cortex (PFC). Alsonote the frontal eye field (FEF) which is a region of the anterior part of the PFC.

    Area V1, within the visual cortex, is generally considered the first stage of visual

    processing2, consisting of neurons with small receptive fields which form a precise

    topographic covering of the entire field of view. The neurons in V1 may be tuned

    to specific visual elements such as orientation, stereoscopic depth, and color. Out-

    puts from the primary visual cortex project to secondary and tertiary areas which

    themselves project to other visual areas in the parietal and temporal lobes.

    2Actually the retina itself has been shown to perform a considerable amount of processing andadaptation to visual stimuli. For example, ambient light levels may vary several orders of magnitude,but the retinal ganglion cells have been shown to adapt to both image contrast (the range of lightintensities) and spatial correlations even when mean intensity is fixed [SBW+97]. Retinal ganglioncells have also been shown to adapt to moving stimuli and anticipate trajectories of moving objects[BBJM99]. See [GSBH09, SSSB11] for some more recent results and modeling work on processingin retinal ganglion cells.

    4

  • From here, visual processing is organized into two main streams, called the dorsal

    stream and ventral stream. In the ventral stream, signals pass from V1 through visual

    areas V2 and V4 into the inferior temportal (IT) cortex. Neurons in the posterior IT

    are tuned for stimulus features such as color or shape, whereas anterior IT neurons

    are tuned for complex features like faces. In the dorsal stream, signals pass from V1

    through the middle temporal (MT) area into the posterior parietal cortex. Neurons in

    area MT respond to stimuli moving in specific directions, and lesion studies support

    the idea that the signals carried by MT are important in planning eye movements

    in perceptual tasks ([BB05] and see §1.1.2 below). Neurons in posterior parietal

    cortex modulate visual responses related to the orientation of stimuli, and play an

    important role in producing planned movements. The posterior parietal cortex has

    also been shown to contain neurons which correlate with saccadic and limb movements

    [DHP12]. Much of the output from the posterior parietal cortex then projects to the

    frontal motor cortex, which is indicative of its role in response planning.

    The posterior parietal cortex will appear in several places in this thesis. The

    experiments studied in chapters 2 and 3 examine the lateral intraparietal area (LIP),

    which is located within the intraparietal sulcus within the posterior parietal cortex,

    which itself lies along the dorsal stream [RGMN10, OSBG06]. LIP’s role in attention

    and planned saccadic eye movements is an area of active research, as we will see in

    §1.1.2 below.

    The two streams at this point converge by passing to the prefrontal cortex (PFC),

    which has been linked to complex cognitive elements such as personality and task

    representations [MC01]. We will study these task representations in Ch. 5, and give

    an account for how overlap between them may explain certain limitations on cognitive

    control capacity. One particular structure to note within the PFC is the frontal eye

    field (FEF), which lies at the top of the PFC. The FEF is intimately linked with the

    location of salient stimuli. Indeed, studies have shown that electrical stimulation of

    5

  • the FEF evokes saccades, and that signals in FEF correlate strongly with saccades

    to visual targets and may reflect the outcome of an automatic visual selection pro-

    cess [Sch04, Sch02]. Numerous studies have also linked FEF to covert attention and

    visual salience3 [KMK97, TB05, MTT08], although there is some contrary evidence

    which shows that certain FEF neurons do not respond when monkeys perform tasks

    that require attention with no eye movement [GB81]. These are still areas of active

    research.

    This description of the visual system as two cortical pathways was first presented

    by Mishkin et al [MUM83] and is almost certainly a gross oversimplification of the

    actual neural architecture behind how mammals formulate decisions and produce

    choices based on visual information. The illustration is, however, very useful in

    demonstrating that the areas with which we are concerned are only components of

    the entire apparatus at hand. Our focus primarily lies along the dorsal stream which

    contains the regions specifically linked to the formation of perceptual decisions and

    attention. Among these regions, we are particularly interested in area LIP.

    1.1.2 LIP’s role in perception and attention

    In this subsection we trace the lineage of results which led to the experiments analyzed

    in Ch. 2 and Ch. 3 [RGMN10, OSBG06]. This section is entirely focused on 2AFC

    tasks, most of which are also aimed towards uncovering LIP’s role in decision making

    and attentional tasks. We will first discuss a well known random dots motion task,

    and the idea that LIP is responsible for the planning of eye movements. Subsequently

    we discuss a more recent series of experiments focused on LIP’s role in attention.

    3Overt attention occurs when a sensory system, such as the eye, orients itself to a target (e.g.saccade to a visual target). Covert attention occurs when we mentally focus on one of several possiblesensory stimuli.

    6

  • LIP and the planning of eye movements in motion dots tasks

    In this section, we hope to communicate a feel for why the random dots motion

    stimulus is so useful for perceptual decision tasks, and how the study of intended

    saccades (eye movements) has become a primary area of study in visual perception

    and decision making. A good but older review of experiments implicating LIP as a

    center for processing eye movements can be found in [ABM92]. It all begins with

    the posterior parietal cortex, which contains LIP and has long been connected to the

    processing of eye movements. In the early 1900s, [B0́9] showed that bilateral damage

    to the posterior parietal cortex resulted in human subjects being unable to fixate their

    eyes and unable to reach/grab objects. Later on in the 60s and 70s it was demon-

    strated that electrical stimulation of the posterior parietal cortex produced saccades

    [FC55, Ben64], and that lesions resulted in deficits in saccades [Sun79, LMTY77], al-

    though the effects of lesions were not properly quantified until the 80s [LM89, KG88].

    In the 70s, Andersen et al [AAC85] first identified and named area LIP after retro-

    grade tracers in the frontal eye field and dorsolateral prefrontal cortex revealed strong

    connections predominantly within the lateral bank of the intraparietal cortex. This

    major finding was performed following key observations from Mountcastle and Lynch

    [MLG+75, LMTY77], who first reported cells selective for saccades in the inferior

    parietal lobule. Following this, several electrophysiological experiments showed that

    most LIP cells were related to eye movements, with many responding before saccades

    [BBF+91, ABB+90, GA88]. These experiments established LIP as a key region where

    we might find signals regarding response preparation, intention, and attention. LIP’s

    established position between the input sensory processing regions and the output mo-

    tor execution (i.e. saccade) regions is a primary reason why it is so heavily studied

    today.

    The random dots motion task used in the experiment analyzed in Ch. 2 also takes

    advantage of a series of results related to the visual processing of motion stimuli

    7

  • along the dorsal stream. It has been established that area MT and the medial supe-

    rior temportal area (MST) contain the predominant signals relevant to the cortical

    analysis of visual motion. These regions lie along the dorsal stream, and within the

    dorsal stream there seems to be a more specialized pathway of areas that are used

    to discriminate visual motion information. The key feature shared by areas along

    this specialized pathway is a high proportion of neurons selective to motion direction,

    which we will henceforth call directionally sensitive for brevity. This pathway begins

    in a sub-region of V1 called layer 4B, where experiments in the 70s and 80s discovered

    a higher percentage of directionally sensitive cells relative to other sub-regions of V1

    [Dow74, BF84, LH84, Mic85]. This region projects to area MT where over 80% of the

    neurons are directionally selective [DZ71, MVE83, Alb84]. Another region considered

    to lie along the dorsal stream, area V3, has also been found to consist of about 40%

    directionally selective neuron[FVE87], and that area MT also receives some indirect

    input from V1 through V2 and V3 [Mv83]. From MT there are many neuronal pro-

    jections to area MST, which also contains a high percentage of directionally sensitive

    neurons [VEMB81, DU86, THS+86].

    In [NP88], Newsome and Pare established the connection between a dynamic

    random dot display and the relevant signals in MT which represent the dynamic

    perception of motion. Over the next few years, [BSNM92, BNS+96, SN96, SMBN92,

    CA99, CN95] established that MT and MST do indeed contain the relevant signals for

    formulating a decision based on visual information collected from the random dots

    motion stimulus used in experiments like that of Rorie et al [RGMN10] which we

    study in Ch. 2. These neurons are tuned so that if one computes averaged firing rates

    over many trials, this averaged firing rate smoothly varies according to the amount

    of energy in a band of velocities. This enabled experimenters to control the level of

    difficulty of the perceptual tasks, and established the neural correlate which represents

    this change in the amount of perceived motion. Still, there is ongoing research aimed

    8

  • towards uncovering the complex and intertwined interactions among V1, MT, and

    MST, and how signals from these areas are combined to form inputs to LIP, FEF and

    other downstream visual areas. In addition to anatomical and electrophysiological

    studies carrying on the work cited above [BW10, PHP+11], some studies like [PB00]

    use cleverly constructed visual illusions to probe relationships among these areas.

    The 2AFC random dots motion task was first used in its current form in [SBNM96,

    SN01], where Shadlen et. al recorded from LIP during a 2AFC motion dots discrim-

    ination task. Here monkeys had to indicate the direction of motion of coherently

    moving dots embedded within a field of randomly moving dots via a saccade to a

    fixed target along the direction of motion. LIP neurons were first identified using

    the memory saccade task [HW83], in which monkeys are first required to fixate on a

    certain point for about 100 msec. Then a second stimulus flashes pseudo-randomly

    for about 300 msec at one of several locations in the monkey’s peripheral vision, while

    the monkey maintains fixation. After a short delay period ( 400-700 msec), the mon-

    key is required to saccade from the fixation point to the remembered location of the

    flashed target, and the cell is identified with that particular receptive field if it demon-

    strated persistent activity during the delay period. This procedure is commonly used

    to identify certain cells to be studied in LIP.

    Shadlen et al. identified cells using this memory saccade task, and oriented the

    visual stimulus of their task so that one of the saccade targets was situated within the

    recorded cell’s receptive field. They discovered that these LIP neurons exhibited firing

    rates which predicted the saccadic eye movement that the monkey would make at the

    end of the trial. We display averaged LIP recordings from Shadlen et al. [SN01] in

    Fig. 1.2 as an illustration of the remarkable phenomenon. The experiment performed

    by Rorie et al. which we have modeled and analyzed in Ch. 2 is an extension to the

    experiment of Shadlen et al [SN01].

    9

  • Figure 1.2: The average firing rates from 104 LIP neurons during the direction dis-crimination task of Shadlen et al. [SN01]. Solid and dashed curves are from trials inwhich the monkey judged direction toward and away the receptive field, respectively.Only correct trials are displayed. The various colors represent different strengthsof the random-dot motion, and we can see that the time course and magnitude ofresponse are affect by stimulus strength. Figure taken from Fig. 8 of [SN01].

    LIP and the control of attention

    The results in the previous subsection have established that LIP plays a role in the

    execution of saccades during perceptual decision tasks, and in particular the random

    dots motion task. However, the exact nature of LIP’s contribution to the processing

    of visual information is still not fully understood. In the late 1970s it was discovered

    that parietal neurons can also exhibit activity during fixation when a visual stimulus

    appeared in a neuron’s receptive field, which indicated that these neurons may have a

    role beyond the planning of saccades [RGS78]. Furthermore, when the visual stimulus

    was made behaviorally relevant, these visual neurons exhibited even further elevated

    activity, suggesting an attentional role for parietal cortex [RGS78, GB81].

    Since then, several studies have demonstrated neural correlates for attention in

    10

  • LIP [CG99, CDG96, RBK95, GG99, GKG98, PG00, BG03, GBP+02, Mes99]. In

    particular, Gottlieb et al demonstrated that LIP cells can exhibit little or no increased

    response to stimuli in their receptive fields unless the stimuli are behaviorally relevant

    [GKG98]. Gottlieb and Goldberg also discovered that in an anti-saccade task, where

    monkeys were required to saccade away from a visual target, the majority of LIP

    neurons encoded visual stimulus location instead of the target of the intended saccade

    [GG99]. It has also been established that chemical inactivation of LIP produces

    deficits in both saccade target selection and covert attention[WOD02, WOD04], and

    that a similar chemical inactivation can also produce deficits in attention [BG09], but

    without entirely compromising task performance.

    These results implicating LIP in attention and visual salience led Oristaglio et. al

    to construct a task which involved a covert search for a visual target and a nonsaccadic

    motor response [OSBG06]. Here both covert attention and a response was required,

    and furthermore the motor response is dependent upon successfully detecting one

    of two possible targets in an array of distractors, while maintaining fixation. The

    recorded signals from LIP demonstrate interesting and puzzling activity which has

    not been fully understood. In Ch. 3 we attempt to construct a model which fits this

    data (along with that from [BOSG08]) in order to determine if this relatively complex

    experiment helps us better understand LIP’s role in attention and perception.

    Much of the work presented earlier in this section suggest that parietal neurons

    in area LIP encode saccade motor decisions [SN01, NP88, SBNM96]. There is also

    evidence that LIP carries signals of attention, or perceptual selection, which are

    independent of the metrics, modality and reward of the required response [GB10].

    In fact, Gottlieb et al. suggest that the attentional responses seen in LIP represent

    a distinct type of decision that assigns value to sources of information rather than

    specific actions.

    This thesis is not primarily concerned with stating the precise role of LIP as a

    11

  • either a center of attention or saccade motor decisions. Rather, we are interested

    in how accumulator models may shed light on this discussion, as well as in other

    areas of cognitive psychology and neuroscience. We demonstrate in several cases

    that the computational and mathematical tools surrounding drift diffusion processes

    (and closely related OU processes) can and should be used in understanding neural

    activity and animal/human behaviors in these decisions tasks. Next, we will begin

    our presentation of these tools by deriving the drift diffusion process from a simple

    statistical inference task.

    1.2 Building up: constructing drift diffusion (DD)

    processes via statistical inference

    Probably the most common way to understand the diffusion equation (Eq. 1.19

    below) is seen in undergraduate physics and chemistry courses when one encounters

    molecular diffusion of a chemical compound or gas. There one imagines particles in

    an animated and irregular state of motion which is caused by frequent impacts of

    particles on each side and/or on the surrounding medium, normally a fluid. Here

    we consider a less physical basis, and instead derive the diffusion equation by first

    trying to solve a very simple statistical inference task. Following this approach, we

    build an alternative understanding of diffusion processes based upon the probabilistic

    accumulation of evidence for two available choices. Much of the work presented in

    this section and the next derives from the presentations in [BBM+06] and [GS07],

    although a few of the mathematical details have been reworked.

    1.2.1 A simple 2AFC statistical inference task

    The most natural language with which to precisely describe decision making is that

    of probability and statistics. This thesis only explicitly deals with decision processes

    12

  • which solve two alternative forced choice tasks, or 2AFC for short4. These tasks

    require a choice between two hypothesis h1 and h2, each of which represents a property

    of the world that may be true or false (e.g. the array of moving dots mentioned in

    §1.1.2 are moving left or right). 2AFC tasks are also considered forced in that the

    subject cannot elect to not respond (this is typically enforced by penalizing the subject

    more severely for a non response than even an incorrect response would have produced,

    e.g. inflicting a time out in addition to lack of reward). In performing these tasks, we

    also consider two distinct paradigms for how the subject must collect evidence and

    respond: the free response and interrogation paradigms (in this context we will at

    times use the words “paradigm” and “protocol” interchangeably). In the interrogation

    paradigm, subjects are required to respond at a fixed deadline. This implies that

    subjects have a fixed time epoch for which they may accumulate evidence, and that

    their decisions must be made based on whatever evidence is collected during this

    fixed amount of time. In the free response paradigm, subjects are allowed to respond

    whenever they please. This implies that subjects need to determine the proper speed-

    accuracy trade-off, because faster response times will increase the potential rate of

    rewards but also typically decrease accuracy.

    Before any evidence accumulation for either h1 or h2 takes place, however, one

    must first consider the prior probabilities P(h1),P(h2). These represent the probabil-

    ities that either hypothesis is true before obtaining any evidence. For the sensory-

    motor tasks described above (e.g. moving dots task), the prior typically represents

    the predicted probability of seeing a particular stimulus on the upcoming trial (e.g.

    probability that subject is presented with left-moving dots), which may be explicitly

    communicated to the subject or inferred from the relative frequency of a particular

    hypothesis on previous trials. In Ch. 2, we analyze a case where the amount of reward

    for a particular hypothesis is modulated trial-by-trial; indeed we find that manipu-

    4Others often write NAFC for the N alternative version

    13

  • lating rewards in this way effects monkey behavior in a manner similar to shifting

    prior probabilities. Also, in Ch. 4 we fit some behavioral data collected by Simen et

    al. [SCB+09] where these prior probabilities were manipulated and human subjects

    were required to take this into account.

    The subject, given his priors, now collects evidence from a stimulus and combines

    all of his information (priors, evidence, expected rewards rewards) into a quantity

    which we call the decision variable. Using this, the subject formulates a decision rule

    which determines from the decision variable when and how to respond.

    Throughout the past several decades, the most successful conceptual framework

    with which one may study 2AFC tasks is signal detection theory [GS66]. In its

    simplest form, the subject obtains one unit of noisy evidence e which is extracted

    from an experimentally controlled stimulus. That is, the experimenter presents a

    stimulus corresponding to either h1 or h2, and the subject obtains some noisy evidence

    e by observing the experimental event. The subject now needs to determine which of

    two conditional probability distributions P(e|h1) or P(e|h2) gave rise to the observed

    evidence e.

    In the case of two alternatives studied here, a simple choice for the decision variable

    is the ratio between the two relevant likelihoods P(e|h1) or P(e|h2):

    l12(e) =P(e|h1)P(e|h2)

    . (1.1)

    The quantity l12 is often referred to as the likelihood ratio. The subject now bases his

    decision on the likelihood ratio l12 by choosing some threshold β, and elects to choose

    h1 if l12 ≥ β and h2 if l12 < β (if l12 = β, we say that the subject arbitrarily chooses

    h1).

    The critical question is: how do we choose β? Suppose we wish to maximize the

    14

  • overall accuracy and that P(h1) = P(h2). The overall accuracy is written as

    P(H1|h1) + P(H2|h2) , (1.2)

    where H1 (H2) is the event that the subject chooses hypothesis h1 (h2). Since subjects

    are forced to respond, P(H1|h2)+P(H2|h2) = 1 (also P(H1|h1)+P(H2|h1) = 1), which

    means that maximizing Eq. 1.2 is equivalent to maximizing

    P(H1|h1)− P(H1|h2). (1.3)

    If we designate a region Λ as all of the evidences (point events) which lead to the

    acceptance of h1, then the probability that h1 is accepted when h1 is true can be

    written as

    P(H1|h1) =∑

    e∈ΛP(e|h1) , (1.4)

    and the probability of an incorrect acceptance of hypothesis h1 is

    P(H1|h2) =∑

    e∈ΛP(e|h2). (1.5)

    The task of maximizing Eq. 1.2 is now the task of selecting a set of points Λ so that

    we maximize∑

    e∈ΛP(e|h1)− P(e|h2). (1.6)

    Now consider some sample of evidence ē. The point event ē should be included in

    Λ if and only if its inclusion contributes positively to this sum of Eq. 1.6, i.e.

    P(ē|h1)− P(ē|h2) ≥ 0. (1.7)

    This implies we want to include in Λ only the observations where this inequality holds,

    15

  • which means that a point event ē should be included in Λ if and only if we have

    P(ē|h1)P(ē|h2)

    ≥ 1 . (1.8)

    This defines our acceptance region Λ for hypothesis h1 as all sample evidences e for

    which the likelihood ratio l12(e) ≥ 1, which means β should equal 1. To summarize,

    in order to maximize the overall accuracy Eq. 1.2 given a sample of noisy evidence

    e, we should choose hypothesis h1 if and only if l12(e) ≥ β, where β = 1.

    The above argument easily generalizes to the case where one wishes to maximize

    the weighted accuracy

    P(H1|h1) + aP(H2|h2) . (1.9)

    Again we can rewrite this as trying to maximize P(H1|h1) − aP(H1|h2) and observe

    again that we want to select a region Λ so that∑

    e∈Λ P(e|h1) − aP(e|h2) is as large

    as possible. The only point events which contribute positively to this sum are those

    events ē for which we have

    P(ē|h1)− aP(ē|h2) ≥ 0, orP(ē|h1)P(ē|h2)

    ≥ a . (1.10)

    This means the proper acceptance region for the hypothesis h1 is all of the events

    whose likelihood ratios equal or exceed a, which shows that in order to maximize the

    weighted accuracy Eq. 1.9 we simply set β = a. Knowing this, we will now write the

    weighted accuracy as

    P(H1|h1) + βP(H2|h2) , (1.11)

    since maximizing this form immediately tells us the value of β that should be set.

    This result actually shows us how to choose β for many different useful cases,

    most notably the case where one wishes to maximize expected value. Suppose we are

    given costs and rewards for all four possible trial outcomes:

    16

  • r11 : amount of reward for correctly choosing h1,

    r21 : amount of penalty for incorrectly choosing h1,

    r22 : amount of reward for correctly choosing h2,

    r12 : amount of penalty for incorrectly choosing h2.

    The expected reward is

    E(R) = r11P(h1)P(H1|h1) + r22P(h2)P(H2|h2)

    −r21P(h2)P(H1|h2)− r12P(h1)P(H2|h1).(1.12)

    Note that there are no assumptions on the priors P(h1),P(h2). We can rework Eq.

    1.12 into a form of Eq. 1.9:

    E(R) = r11P(h1) (1− P(H2|h1)) + r22P(h2)P(H2|h2)

    −r21P(h2) (1− P(H2|h2))− r12P(h1)P(H2|h1)

    ≃ (r22 + r21)P(h2)P(H2|h2)− (r11 + r12)P(h1)P(H2|h1)

    ≃ (r22 + r21)P(h2)P(H2|h2) + (r11 + r12)P(h1)P(H1|h1)

    ≃ (r22+r21)P(h2)(r11+r12)P(h1)

    P(H2|h2) + P(H1|h1) ,

    (1.13)

    where ≃ indicates equivalence up to a constant factor (i.e. equivalent maximization

    problems). We see that in order to find the acceptance region for hypothesis h1

    which maximizes the expected reward, we need to choose hypothesis h1 whenever the

    likelihood ratio l12(e) exceeds

    β =(r22 + r21)P(h2)

    (r11 + r12)P(h1). (1.14)

    In particular, when r22 + r21 = r11 + r12, this reduces to the case where we wish to

    maximize overall accuracy, which means we set

    β =P(h2)

    P(h1); (1.15)

    17

  • so, if hypothesis h1 is more likely, it takes a smaller value of l12 to decide upon choosing

    h1. Finally, in the case where both hypotheses are equally likely we recover β = 1 as

    demonstrated above in Eq. 1.8.

    The formulation presented in this section can be thought of as how a subject may

    compute a decision in a 2AFC task given only one unit of evidence. However, more

    realistic tasks usually demand that subject incorporate multiple pieces of evidence

    over some time epoch. How does one properly adjust their decision variable in order

    to accommodate for multiple pieces of evidence? This question is answered in the

    next section by the Neyman-Pearson lemma and the sequential probability ratio test.

    1.2.2 The Neyman-Pearson lemma and the sequential prob-

    ability ratio test

    We now study the situation in which a subject is successively presented with pieces

    of evidence. First, we study the interrogation paradigm, in which subjects are given

    a fixed amount of evidence {e1, . . . , eN} and are required to identify if these are

    sampled from the distribution corresponding to hypothesis h1 or that corresponding

    to hypothesis h2. The natural adjustment to our decision variable from above is to

    consider the likelihood ratio of all of the data,

    log [LR12] = log

    [

    P(e1, e2, . . . , eN |h1)P(e1, e2, . . . , eN |h2)

    ]

    =N∑

    j=1

    log

    [

    P(ej|h1)P(ej|h2)

    ]

    . (1.16)

    where we have taken a logarithm to let us work with sums instead of products.

    We have also assumed that the data are independent, an assumption which is usually

    enforced by the experimentalist. Eq. 1.16 will often be referred to as the log-likelihood

    or log-likelihood ratio. The decision rule is then essentially equivalent to that from

    above: if logLR12 ≥ Θ for some threshold Θ ∈ R, then accept hypothesis h1, and if

    logLR12 < Θ, accept hypothesis h2.

    18

  • This widely used likelihood-ratio test was proven to be optimal by Neyman and

    Pearson in 1933 [NP33]. The Neyman-Pearson lemma states that the likelihood-ratio

    test which accepts h1 if and only if logLR12 ≤ Θ is the most powerful statistical test

    of its size. Size here refers to the false alarm probability or error rate: α = P(H2|h1),

    where H2 refers to the event that hypothesis h2 is chosen according to this decision

    rule. There are several more updated proofs of the Neyman-Pearson lemma in the

    statistical literature [GS66, Gho70, Leh59].

    The Neyman-Pearson lemma implies that if Θ = log[1] = 0, then the likelihood-

    ratio test above maximizes accuracy by delivering the most likely hypothesis and

    minimizing the total error probability. Thus, the Neyman-Pearson lemma is optimal

    for the interrogation paradigm.

    The case of the free response paradigm is less straightforward. These 2AFC tasks

    require the subject to consider two elements: the decision between the hypotheses h1

    and h2, and the decision about whether to either respond or continue accumulating

    evidence5. More precisely, after collecting the nth piece of evidence en, the subject

    determines whether or not to immediately execute his decision based on e1, . . . , en,

    or to wait and collect en+1. After collecting the nth piece of evidence, our decision

    variable is essentially the same as before, but now the log-likelihood ratio needs to be

    updated for each new piece of evidence so that after n observations,

    log [LR12(n)] = log

    [

    P(e1, e2, . . . , en|h1)P(e1, e2, . . . , en|h2)

    ]

    =n∑

    j=1

    log

    [

    P(ej|h1)P(ej|h2)

    ]

    . (1.17)

    A sensible choice of decision rule is to update logLR12 after each new unit of evidence

    5Such language makes us think of the trade-off between exploration and exploitation. In ourcase, the subject is faced with a similar explore vs exploit decision at each moment of evidenceaccumulation. Indeed, the subject may choose to explore by waiting and collecting more evidence, orthe subject may exploit the collected evidence by choosing either h1 or h2 and responding accordingly.In general the trade-off between exploitation and exploration is poorly understood: even when theobjective functions are well specified there may still not be a known optimal policy for trading offbetween explore vs. exploit. Here we have a simple situation in which we may derive the solutionfor a nontrivial explore-exploit task.

    19

  • is collected, and to ask whether or not logLR12 has crossed some positive or negative

    number. Hypothesis h1 is accepted if logLR12 is greater than some positive threshold

    Z0, and h2 is accepted if logLR12 drops below some negative threshold Z1. This

    choice of decision rule applied to LR12 from Eq. 1.17 is what comprises the sequential

    probability ratio test (SPRT).

    Barnard [Bar46] and Wald [WW48, Wal04] independently showed that the SPRT

    is optimal for the free response paradigm in the sense that given a fixed level of

    accuracy that must be attained on average, the SPRT requires the smallest number

    of samples to settle on a decision. Other proofs of the optimality of the SPRT may

    be found in [GS66, Gho70, Leh59].

    1.2.3 The continuum limit of the SPRT

    When considering the SPRT, one often thinks of the decision variable as a discrete

    random walk, where the log likelihood ratios of independent samples of evidence

    produce independent increments at each time step. Writing log[LR12(n)] = xn for

    simplicity, we have

    xn = xn−1 + logP(en|h1)P(en|h2)

    . (1.18)

    The SPRT is performed by choosing some initial value for x0 and incrementing ac-

    cording to Eq. 1.18 until xn crosses some positive threshold Z0 or negative threshold

    Z1. The three values x0, Z0, and Z1, encode the priors for the hypotheses h1 and

    h2, as well as any reward biases that are present. Because the SPRT is unchanged

    if we shift x0, Z0 and Z1 by a constant value, we may assume that the thresholds

    are symmetric, that is Z0 = Z and Z1 = −Z for some number Z. The way in which

    thresholds and initial conditions are to be chosen is carefully studied in [BBM+06]. In

    the case of unbiased stimuli and equal rewards, the SPRT can bee seen as a discrete

    random walk with x0 = 0 and continuing according to Eq. 1.18 until it crosses some

    20

  • positive threshold +Z or negative threshold −Z.

    As discrete samples are taken more and more rapidly, the discrete random walk

    Eq. 1.18 approaches a continuous time variable X(t) after proper technical consider-

    ations are made. The details of this limiting procedure are covered in appendix A of

    [BBM+06], where it is shown that the continuum limit of SPRT then produces the

    drift diffusion model (DDM)

    dX = Adt+ σdW, X(0) = x0, (1.19)

    where X is the state of accumulated evidence for a particular choice, Adt is the aver-

    age increase in evidence per unit time (drift), and the diffusion term σdW represents

    Gaussian distributed white noise with mean 0 and variance σ2dt. It is important

    to note that A and σ depend upon specific properties (i.e. means and variances) of

    the distributions from which the samples are drawn, and that in our form we have

    implicitly assumed we are sampling from Gaussian distributions. When X(t) crosses

    either of the thresholds ±Z, accumulation halts and a decision is made (accepting h1if +Z is crossed and h2 if −Z is crossed). Often we use another parameter called the

    non-decision time T0 to account for any sensory delays before any evidence accumu-

    lation is made, and motor delays prior to the response. This is introduced to avoid

    arbitrarily small reaction times which are impossible in real data because of the time

    it takes for signals from the retina to travel to the relevant decision areas of the brain,

    and for the execution of motor actions.

    The DDM was introduced into the psychological literature in [Rat78] as a proposed

    theory for memory retrieval. We sometimes refer to the DDM as presented in this

    section as the pure DDM to distinguish it from the extended DDM that we summarize

    in §1.5.1. In Ch. 4 we construct a version of the DDM involving non-Gaussian noise,

    and compare its data fitting power with that of the extended DDM.

    21

  • The pure DDM can also be used to solve a continuous time version of the interro-

    gation paradigm by supposing that at the interrogation time T , the subject responds

    according to the sign of X(T ). If X(T ) > 0, then the subject accepts hypothesis h1,

    and if X(T ) < 0, the subject accepts hypothesis h2.

    When working with the DDM, we can write down explicit formulae for the mean

    decision times and error rates which can be used to describe overall model behavior.

    These details are presented in §1.3.3 when we restate some relevant formulas for the

    Ornstein-Uhlenbeck model, of which the DDM is a special case.

    The optimality of the DDM in both the interrogation and free response paradigms

    can be heuristically justified by recalling that it is the continuous time limit of either

    the SPRT or likelihood ratio test from §1.2.2, both of which have been shown to

    be optimal. Furthermore, [BBM+06] gives other direct arguments establishing the

    optimality of the DDM, and also provides formulas for how to optimally set the

    thresholds as well as how to modify the pure DDM presented here for biased priors.

    Although these results are indeed very useful, we choose to not review them here.

    The reader is encouraged to refer to appendix A of [BBM+06] for a clear exposition

    of these results.

    1.2.4 Relevant experimental results concerning drift diffu-

    sion processes and optimality

    The various ratio tests and the DDM presented above provide us with a good set

    of tools for describing and fitting data for various 2AFC tasks. One then wonders

    if these statistical tests represent how animal and human brains actually behave.

    Not surprisingly, animals do not always behave optimally. One famous example is

    the matching law first noted by Herrnstein in 1961 [Her61, Her70] which is reviewed

    in [Her97]. Herrnstein showed that pigeons would often choose at only twice the

    frequency a button which yielded twice as much reward, where an optimal pigeon

    22

  • would choose the higher rewarded button 100% of the time. There are also several

    cases of humans performing suboptimally (e.g. [LS91, LT89]).

    Yet in tasks where subjects perform suitably constrained 2AFC tasks with suf-

    ficient training, the DDM seems to provide good fits to both monkey and human

    behavioral data [BHHC10, BBM+06, BHHC10, SCB+09, RM07, GP08, RGMN10,

    BSN+11, RHH+07, RCS03]. Even more impressive are the neurophysiological con-

    nections that have linked the DDM process from Eq. 1.19 to neural activity. For

    example, Hanes and Schall demonstrated that neurons in the frontal eye field exhibit

    activity which resembles the drift rate of a diffusion process [HS96], and others have

    contributed further evidence from in vivo recordings in monkeys that occulomotor

    decision making in the brain mimics a DDM, with neural activity rising to a thresh-

    old before movement initiation [Sch01, MRDS03, GS01, SR04]. In the case of the

    random dots motion task, LIP recordings are believed to represent accumulating evi-

    dence in neurons having receptive fields containing the response target corresponding

    to a particular alternative, as shown in Fig. 1.2. Differences between the firing rates

    corresponding to the two different directions of motion then appear to behave like

    sample paths of the DDM [RS02]. It should be noted, however, that the DDM has

    only been shown to fit the evidence accumulation period during these dots motion

    tasks. For other phases of the trial where evidence is not being accumulated, the

    DDM does not necessarily fit the neural data well. An example of this is in the neu-

    ral data shown at the end of Ch. 2 – the LIP recordings during the reward period

    before the accumulation phase do not look at all like DDM processes. Furthermore,

    the neural data for a different task in Ch. 3 shows many features that a straightfor-

    ward application of the DDM would not be able to fit. There, we construct a model

    which incorporates integrate-to-threshold units with other types of units in order to

    fit the neural data.

    These various experimental results show that we can find signals in the brain which

    23

  • represent the decision variables in a statistical test. But how does the brain assemble

    these signals? How might neurons actually perform the computations required for

    evidence accumulation and hypothesis testing? In the next section we summarize

    a series of biophysically-motivated modeling results which begin to systematically

    connect single cell spiking models to the DDM.

    1.3 Working down: Reducing a biologically plau-

    sible model to an OU process

    Throughout the animal kingdom, brains can contain up to billions of neurons (with

    humans having ≈ 1011 neurons), each of which are nontrivial to model. Perhaps the

    most detailed dynamical models we have of neurons are those inspired by the Hodgkin

    and Huxley, where one may construct large systems of ODEs and PDEs which approx-

    imate the specific ionic currents at specific locations along a neuron’s axon, cell body,

    or dendritic tree ([HH52b, HHK52, HH52a, HH52d, HH52e, HH52c, GC10, DA05]).

    These models have been very successful at reproducing the key spatiotemporal prop-

    erties of single cells, most important of which is that of an action potential. However,

    such models are analytically intractable and relatively expensive to simulate, and

    if one wishes to model networks of even hundreds of neurons, the systems quickly

    become computationally infeasible.

    A common simplification of these multiunit compartmental models is the integrate

    and fire model. Here, one does not explicitly simulate the time course of an action

    potential, but rather only models the subthreshold voltage potential up until some

    threshold. When the threshold is reached, a delta function spike occurs and the

    voltage is reset to its resting potential. Various flavors of these integrate and fire

    neurons have been used to simulate models of tens of thousands of neurons [IE08,

    Izh03], and one simulation by Izhikevich explicitly modeled 1011 neurons and almost

    24

  • 1015 synapses (although this simulation of 1 second of real time took 50 days on 27

    3GHz processors).

    The key to all of these approaches is that they aim to directly model neurons,

    which are the basic building blocks of the brain. Any results about the overall network

    behavior then need to emerge from the explicit modeling of single neurons and their

    connections.We say that models constructed in this way are biophysically-based or

    biophysically-motivated.

    In this section we wish to present how the DDM can also be viewed as a reduction

    from biophysically-based models for perceptual decision making. We begin in §1.3.1

    with an overview of some biophysically-based models and illustrate their connection

    with a popular model called the Leaky Competing Accumulator (LCA) model. Then

    in §1.3.2 we show how the LCA can be reduced, given certain parameter ranges and

    assumptions, to an OU model, of which the DDM is a specific case.

    Throughout this development, the key point is to see the DDM in a different way,

    this time as a simplified version of a biophysically motivated model. This gives us

    the justification for considering the DDM as a plausible model for neural activity.

    Furthermore, this reduction from realistic models also allows us to further justify the

    link between the DDM and the related neural activity that was reviewed in §1.2.4.

    1.3.1 From spiking neurons to the Leaky Competing Accu-

    mulator model (LCA)

    Over the past decade, researchers have been able to demonstrate in computer sim-

    ulation that biophysically realistic cortical networks of spiking neurons are able to

    reproduce the behavioral dynamics of perceptual decisions. In 2002, Wang presented

    a model of area LIP by simulating spiking neurons and drawing neural and synap-

    tic properties of the model from anatomical and physiological observations [Wan02].

    Subsequent studies have further demonstrated how this biophysically-based spiking

    25

  • model reproduces experimentally observed behaviors in perceptual decision tasks.

    These studies have been able to connect many of these observations with biological

    properties of the spiking neurons, although rigorous reduction theorems are lacking,

    except for local bifurcations [WW06, WHMNW07, EWLH11, RL08, GH83].

    A looser connection between Wang’s model and accumulator models was estab-

    lished by Bogacz et al in [BBM+06]. It is still not completely understood how one

    may reduce a detailed neural network model to a noisy connectionist unit, but by

    assuming such a reduction Bogacz et al. showed that an averaged version of Wang’s

    model in [Wan02] may be viewed as a biologically realistic implementation of the

    Leaky Competing Accumulator model of Usher and McClelland [UM01]. We also

    note that another biologically inspired model of perceptual choice which has been

    reduced to the LCA [BUZM07, BBM+06] is that of Mazurek et al. [MRDS03].

    The LCA model is a two-dimensional system of stochastic differential equations

    whose state variables (x1(t), x2(t)) describe the activities of two mutually-inhibiting

    populations of neurons, each of which receives noisy sensory input [UM01, McC79]:

    dx1 = [−γx1 − βf(x2) + I1(t)]dt+ σdW1 ,

    dx2 = [−γx2 − βf(x1) + I2(t)]dt+ σdW2 ,(1.20)

    where f(·) is a sigmoidal type activation function, γ and β, respectively, denote the

    strengths of leak and inhibition, and σdWj are independent white noise (Weiner)

    increments of r.m.s. strength σ. The inputs I1(t), I2(t) are generally time dependent

    signals which vary over the course of a single decision process (i.e. trial).

    Under the interrogation paradigm the choice is determined by the difference x(t).=

    x1(t) − x2(t): at interrogation time T if x(T ) ≥ 0 the hypothesis corresponding to

    stimulus I1 is chosen, and vice versa for x(T ) < 0. Similarly, for the free response

    paradigm we set positive and negative thresholds for the decision variable x(t), and

    when a threshold is crossed the corresponding decision is made.

    26

  • 1.3.2 Reduction of LCA to an Ornstein-Uhlenbeck (OU) pro-

    cess

    It has been shown that under suitable conditions that the LCA model from above

    can be reduced to a one-dimensional Ornstein-Uhlenbeck model, which is a simple

    generalization of the DDM [BBM+06, BGH+05, BH01, FHRN09]. Here we present

    the reduction from [FHRN09], which demonstrates how the noiseless version of the

    LCA reduces to an OU process in the presence of constant inputs. We present this

    noise-free reduction for clarity, so that the technical tools involving stochastic center

    manifolds [Box89, Box91, AJM+95] need not be developed here. When the inputs

    Figure 1.3: Illustration showing nullclines and fixed points for the LCA, Eq. 1.20.The two curved lines represent the nullclines, and their intersections are the threefixed points. The two outermost fixed points are stable, and the middle fixed point isa saddle. The two stable fixed points shown represent relatively high activity in onepopulation versus the other, whereas the saddle point represents moderate activity inboth populations. The dashed diagonal line represents the slow attracting manifold.

    I1, I2 are constant and σ = 0, equilibrium solutions of Eq. 1.20 lie at the intersections

    of the nullclines given by γx1 = −βf(x2) + I1 and γx2 = βf(x1) + I2. Depending

    27

  • upon the precise form of f(·) and the parameter values I1, I2, β, γ, there may be

    one, two, or three equilibrium points. Each equilibrium point corresponds to either

    moderate activity in both populations or relatively high activity for one population

    versus the other, as illustrated in Fig. 1.3. Also shown in Fig. 1.3, if these nullclines lie

    sufficiently close to each other over the activity range that encompasses the equilibria,

    it follows that a one-dimensional, attracting, slow manifold exists that contains both

    stable and unstable equilibrium points, and the solutions that connect them [GH83,

    BH01].

    To demonstrate these ideas more clearly, we first linearize the sigmoidal activation

    function in Eq. 1.20 at the central equilibrium point (x̄, x̄) in the case of equal inputs

    I = I1 = I2, where x̄ =1γ[−βf(x̄) + I]. After parameterizing the sigmoid f(·) so

    that dfdx(x̄) = 1, the linearized system can be written as

    dx1 = [−γx1 − βx2 + I1(t)]dt+ σdW1,

    dx2 = [−γx2 − βx1 + I2(t)]dt+ σdW2,(1.21)

    and subtracting these equations yields a single scalar SDE for the differenced activity

    x(t) = x1(t)− x2(t):

    dx = [λx+ A(t)] dt+ σdW, (1.22)

    where λ = β − γ, A(t) = I1(t)− I2(t) and dW = dW1 − dW2 are independent white

    noise increments. This is an Ornstein-Uhlenbeck (OU) process.

    Eq. 1.22 describes the decision variable x(t) for our OU model. In the same way

    as before, in the interrogation paradigm the sign of x is used to determine the choice

    of response at interrogation time T . In the free response paradigm, a choice is made

    when x crosses either a positive or negative threshold.

    The stimulus differences A(t) represents the sensory evidence for a particular

    hypothesis. For example, if a stimulus corresponding to hypothesis 1 is displayed,

    28

  • this means that A(t) = I1 − I2 > 0, and a correct response will occur if the top

    boundary is hit before the bottom boundary. Errors will occur when noise pushes the

    x to the bottom (incorrect) boundary before the top (correct) is reached, and they

    will be more likely when I1 and I2 are close, i.e. when the inputs for the alternatives

    are hard to distinguish in the presence of noise.

    A particularly important case is when the leak γ and inhibition β are perfectly

    balanced, so that λ = β − γ = 0. In this case, Eq. 1.22 is equivalent to the drift

    diffusion process of Eq. 1.19, and we recover the DDM from §1.2.3.

    1.3.3 Working with the OU and DDM models

    Here we present known expressions for error rates and reaction times which describe

    overall OU (and DDM) model behavior. These quantities can then be compared with

    experimental data, as is done in Ch. 2, or they can be used to compute reward rates,

    as done in Ch. 5.

    The first quantity we note is the probability distribution of solutions for Eq. 1.22,

    which is governed by the forward Kolmogorov or Fokker-Planck equation [Gar85].

    ∂p

    ∂t= − ∂

    ∂x[(λx+ A(t))p] +

    σ2

    2

    ∂2p

    ∂x2. (1.23)

    A variety of insightful derivations of the Fokker-Planck equation can be found in

    several good sets of lecture notes and textbooks [Gar85, Ris96, Fel66, Cro06]. Given

    initial conditions, we can solve Eq. 1.23 analytically or numerically to compute the

    probability distribution of solutions of Eq. 1.22. In Ch. 2 we study solutions of Eq.

    1.23 for the interrogation paradigm.

    In [BBM+06], one may find closed form expressions for the mean error rate and

    decision time for the OU model with constant drift rate A. The reader is referred

    to Appendix A of [BBM+06] for these expressions, they are too cumbersome to be

    29

  • included here and are not explicitly used in this thesis.

    For the DDM, however, the expressions for the mean error rate and decision time

    take on simpler forms which we state here for reference. Using techniques presented

    in [BT92, BT93, Gar85, BBM+06] for computing first passage times, it can be shown

    that the mean error rate 〈ER〉 and mean decision time 〈DT 〉 can be written as

    〈ER〉 = 11 + e2z̃ã

    − 1− e2x0ã

    e2z̃ã − e−2z̃ã (1.24)

    〈DT 〉 = z̃tanh(z̃ã) + 2z̃(1− e−2x0ã)

    e2z̃ã − e−2z̃ã − x0. (1.25)

    These expressions will be used in chapters 2 and 5.

    1.4 Thesis overview

    The OU Model and the particular case of the DDM are the crucial building blocks of

    the modeling efforts in this thesis. In essence, this thesis is a further demonstration

    of how these accumulator models are useful for studying a variety of psychological

    phenomenon including, but not limited to, perceptual decision making.

    In Ch. 2, we adapt the OU model to fit a more generalized version of the motion

    dots task where monkeys are now faced with biased rewards. By doing so, we demon-

    strate that monkeys shift their behaviors in a systematic way, and that they do so in

    a near optimal manner. We also show some fits of the OU model to electrophysiolog-

    ical data and find that λ ≈ 0, giving some further evidence that the DDM is a good

    model for both the behavior and neural activity related to perceptual choice.

    In Ch. 3, we construct a multi-unit noisy connectionist model for the covert search

    task mentioned in §1.1.2 [OSBG06]. We first reanalyze the data and discover some

    new trends that were not fully established before, and then systematically construct a

    model which explains the key behavioral and electrophysiological phenomena demon-

    30

  • strated in the experimental data. Our model proposes that LIP plays more of an

    attentional role in this covert search task – in fact LIP is not even necessary for the

    task, although its inclusion aids performance. Our model is shown to come close in

    some cases to the data, although there is still work left to be done.

    In Ch. 4 we study a different generalization of the DDM, this time assuming that

    the noise can have jumps in addition to Weiner increments. We demonstrate that

    this simple change in the distribution of noise allows the pure DDM to reproduce fast

    error trials given unbiased initial data, something which other models require more

    parameters to reproduce. We fit our model to human subject data and compare it

    with the extended DDM presented in §1.5.1.

    In Ch. 5 we address the more general question of why humans, despite having a

    prefrontal cortex with billions of neurons, can only perform a few number of tasks at

    once. We construct an abstract model for studying capacity constraints on cognitive

    control and use the DDM as a generalized model for a task. After studying various

    aspects of the constructed model, large scale simulations demonstrate that a capacity

    constraint does indeed arise out of the need for optimizing overall rewards.

    We conclude the thesis with a short summary essay in Ch. 6.

    1.5 Appendix: Two popular models for decision

    making

    In this section we review two models which deserve mention. First we will review

    a useful extension of the pure DDM, namely the extended DDM, first presented by

    Ratcliff and Rouder in [RR98] as a model for 2AFC data. Then we summarize the

    Linear Ballistic Accumulator Model of Brown et al., which is a model of rapid choice

    which does not use stochastic accumulation like the DDM (and extended DDM).

    31

  • 1.5.1 The Extended DDM of Ratcliff

    In many situations, the simple pure DDM as presented in the main text is not sufficient

    to fit reaction times. For example, one key feature often seen in behavioral 2AFC data

    is that error trials and correct trials demonstrate significantly different reaction time

    distributions. In [RR98], Ratcliff et al. presented the extended DDM which adds 4

    new parameters to the pure DDM in order to fit four different sets of behavioral data.

    Recall that the pure DDM from above already contains 5 parameters: x0, Z, A, σ, and

    T0. Over the years, many reports have established that the extended DDM is able to

    fit behavioral 2AFC data well [RT02, RS04, RM07, SCB+09].

    x0

    T0

    0

    -z

    z

    st

    sx

    Mean drift:

    Time

    A

    Sample path

    Std. dev. of sample path

    positions at time t:

    DDM first-passage time density

    (conditioned on the event of an

    upper boundary ( z ) crossing).

    Stimulus onset

    Trial-to-trial

    Std. dev. of drift: sA

    RT

    c √t

    Figure 1.4: Extended DDM illustration from [SCB+09]. The notation here differsfrom that used in the original extended DDM paper [RR98].

    The four additional parameters to the pure DDM which were introduced in [RR98]

    are variability in starting point σx, variability in drift rate σA, variability in non-

    decision time σT , and a proportion of contaminant reaction times p0. For each trial,

    the initial condition x(0) was then sampled from a uniform distribution, U [x0 −

    32

  • σx, x0 + σx], and the drift rate A was sampled from a Gaussian distribution with

    mean A and standard deviation σA. The non-decision time T0 was also sampled from

    its own uniform distribution U [T0−σT , T0+σT ]. Finally, each trial had a probability

    p0 of being a contaminant trial. On these contaminant trials, a random response was

    made, and the reaction time was not generated by the diffusion process but instead

    was drawn from a uniform distribution spanning the observed RT range from the

    data. The extended DDM is illustrated in 1.4.

    1.5.2 The Linear Ballistic Accumulator Model

    One recent model for response times is the Linear Ballistic Accumulator [BH05, BH08]

    which has seen several applications as a model of rapid choice [DABH09, FDB+08,

    HBS09, LFEG09]. The basic idea is shown in Fig 1.5. Here, two accumulators are

    used, one for response A and another for response B. The activity of an accumulator

    is initialized to a value uniformly sampled from [0, A]. Then, the drift rate for the

    first accumulator is drawn from a normal distribution with mean dA and standard

    deviation σ. The drift rate for the second accumulator is also drawn from a normal

    distribution with mean dB and standard deviation σ. Then for each accumulator,

    evidence accumulates in a noiseless (ballistic) and linear fashion according to their

    respective drift rates. The various accumulators race to their respective thresholds,

    and whichever hits threshold first executes the corresponding response. Like the

    diffusion models, a non-decision time T0 is typically included. In [DABH09], Donkin

    et al. performed a detailed numerical comparison between the Extended DDM and

    the LBA and conclude that inferences about psychological processes made from real

    data are unlikely to depend on the model that is used. The LBA has the advantage

    of having one less parameter when compared to the extended DDM, whereas the

    extended DDM has the principled derivations as presented in sections 1.2 and 1.3

    (although the LBA is derived in [BH05] from a simplified deterministic version of

    33

  • Figure 1.5: Simplified representation of the Linear Ballistic Accumulator model ofBrown and Heathcote. Reprinted from [BH08], with permission from Elsevier.

    the LCA of [UM01]). Furthermore, Goldfarb and Caicedo (personal communication)

    have preliminary evidence that the LBA has difficulty staying close to the optimal

    performance curve, which represents the relationship between error rate and decision

    time that holds under the optimal threshold set in the DDM.

    34

  • 2 Can monkeys choose optimally when

    faced with noisy stimuli and unequal

    rewards? 1

    This chapter focuses on modeling the behavior of monkeys performing an extended

    version of the motion dots discrimination task in which the amount of reward for one of

    two alternatives may be doubled (if correctly selected) on certain trials. The monkeys

    are informed of a particular trial’s reward structure by a visual cue, illustrated in the

    “Reward” column of Fig. 2.1 and described below. We propose extensions of the

    OU process presented in §1.3.3 to account for the influence of unequal rewards, and

    make use of the derived psychometric functions (PMFs) which link model predictions

    to monkey behavior. The PMFs are characterized by two parameters: midpoint

    slope, which quantifies the subject’s ability to extract signal from noise, and shift,

    which measures the bias applied to account for unequal rewards. We fit these PMFs

    to data collected from two adult rhesus monkeys and find that, when behavior is

    averaged over multiple sessions, the monkeys shift their PMFs in a nearly optimal

    manner; remarkably, monkeys respectively garner greater than 98% and 99% of their

    maximum possible rewards. We also present a simple fit to the electrophysiological

    data during the accumulation period of the task in order to estimate some of the OU

    1This chapter shares its title with a paper written by Samuel Feng, Alan Rorie, Philip Holmes,and William T. Newsome [FHRN09]. Most notable is Alan Rorie, who was primarily responsible forperforming these extensive and thorough monkey experiments.

    35

  • parameters.

    We describe the experimental task in §2.1. §2.2 briefly reviews material from

    §1.3, reducing the LCA model to an OU process and demonstrating how the resulting

    PMFs change with experimental parameters. In §2.3 we compute the optimal shifts

    of these PMFs due to biased reward conditions and mixed coherences between trials.

    In §2.4 we demonstrate that the predicted PMFs from the OU model fit the data if

    averaged over multiple sessions, and we compute how close the monkeys are to optimal

    performance. We also present some work with the individual session data. In §2.5

    we present some fits of the OU model to the averaged firing rate data during the

    accumulation period and find that our OU process is approximately a drift diffusion

    process. The chapter concludes with some future problems and discussion in §2.6.

    2.1 Unequal rewards in the motion dots task

    Figure 2.1: Diagram indicating the time course of cues during the biased rewardsmotion dots task. See text for details.

    The full details of the behavioral study including equipment used, training of

    monkeys, setup of experimental apparatus, and recording of electrophysiological data

    can be found in [FHRN09]. The experiment was performed at Stanford University

    by Alan Rorie, under the supervision of W.T. Newsome. We are indebted to them

    36

  • for sharing their data.

    Here we focus on the details presented in Fig. 2.1, which illustrates the sequence

    of events which formed a typical trial. First, a small, yellow dot appears and the

    monkey is required to fixate upon it for 150 msec. Next, two saccade targets appear

    (open gray circles) on opposite sides of the fixation point and aligned with the axis of

    motion to be discriminated. By convention, target 1 (T1) corresponds to a positive

    coherence stimulus (right-going motion), and target 2 (T2) to negative coherence

    stimulus (left-going motion). After 250 msec the targets change color, indicating the

    magnitude of reward available for correctly choosing either target. A blue target

    indicates a low magnitude (L) reward of 1 unit (1 drop) of juice, while a red target

    indicates a high magnitude (H) reward of 2 units of juice. We denote by r1 ∈ {1, 2}

    the magnitude of reward available for correctly choosing T1, and similarly define r2.

    This yields four reward conditions overall, shown in the 4 panels of the “Reward”

    column of Fig. 2.1: (1) LL (r1 = r2 = 1), in which both targets are blue, (2) HH

    (r1 = r2 = 2), in which both are red, (3) LH (r1 = 1, r2 = 2), in which T1 is blue

    and T2 is red, and (4) HL (r1 = 2, r2 = 1), in which T1 is red and T2 is blue. These

    colored targets are displayed 250 msec after the open gray circles appear, and remain

    on throughout the trial. After another 250 msec, the motion stimulus begins.

    The motion stimulus consists of a field of randomly moving dots, a certain per-

    centage of which were coherently moving either towards T1 or T2. The proportion

    of dots moving towards the correct target is called the coherence – a trial with −3%

    coherence indicates that 3% of the dots coherently move towards T2, while all of the

    others move randomly. The motion stimulus remains on for 500 msec. Following mo-

    tion stimulus offset, the monkey is required to maintain fixation for a variable delay

    period of 300-550 msec (varied uniformly across trials within each session). After this

    the fixation point disappears, cueing the monkey to report his decision with a sac-

    cade to the target corresponding to the perceived direction of motion. Monkeys must

    37

  • respond within 1000 msec otherwise the trial is discarded. If he chooses the correct

    direction, he is rewarded according to the color of the chosen target. Until the Go!

    period when the monkey can respond, fixation is enforced throughout the trial, and

    breaks of fixation are penalized by aborting the trial and enforcing a time-out period

    before the next trial.

    Rorie et al. collected data from two monkeys which we call monkey A and monkey

    T. Trials were presented in block-randomized order. For monkey A, they employed

    13 possible signed coherences:

    {0,±1.5%,±3%,±6%,±12%,±24%,±48%}

    which, along with the four reward conditions, yields 52 conditions overall. For mon-

    key T, the two lowest motion coherences ±1.5%,±3% were eliminated because this

    animal’s psychophysical thresholds were somewhat higher than those of monkey A,

    giving 36 conditions overall. The behavioral data analyzed here consists of 35 sessions

    from monkey A (totaling 66933 trials) and 25 sessions from monkey T (totaling 32751

    trials).

    2.2 Predicting psychometric functions (PMFs) with

    an accumulator model

    We begin our model construction with the LCA from §1.3.1. More precisely, we

    suppose that there are two states (x1(t), x2(t)) which represent short-term averaged

    firing rates of two mutually-inhibiting pools of LIP neurons, which are sensitive to

    alternatives 1 and 2, respectively. We understand that decisions are almost certainly

    formulated through interactions among several oculomotor areas, but note that the

    causal role of LIP has been demonstrated in [HDS06]. In the development here, each

    38

  • population receives noisy sensory input from the stimulus along with input derived

    from reward expectations. For clarity, we restate the LCA model equations here:

    dx1 = [−γx1 − βf(x2) + I1(t)]dt+ σdW1 ,

    dx2 = [−γx2 − βf(x1) + I2(t)]dt+ σdW2 ,(2.1)

    where the meanings of parameters are stated in 1.3.1.

    From here we reduce the linearized LCA to an OU process as presented in §1.3,

    which yields a single scalar SDE for the activity difference x := x1 − x2:

    dx = [λx+ A(t)]dt+ σdW , (2.2)

    where λ = β − γ is the difference between leak and inhibition, A(t) = I1(t) − I2(t),

    and dW = dW1 − dW2 are independent white noise increments. To complete the

    model formulation we observe that our experiment is performed according to the

    interrogation protocol with interrogation time T , as the motion stimulus stays on for

    a fixed duration, after which monkeys are required to make a response (an eye saccade

    to T1 or T2) which reports the direction of motion2. Note from the definition of x that

    evidence for alternative 1 manifests itself as positive drift or A = I1 − I2 > 0, and for

    alternative 2 vice versa, and that we have a correct response if sign(x(T )) = sign(A)

    (since coherence is fixed during a trial, A(t) will be assumed constant during the

    stimulus period, although others have assumed variable drift rates for OU processes

    [EHL+08]).

    The discussion from §1.3.3 reminds us that if λ = 0, Eq. 2.2 describes a drift

    diffusion (DD) process, which is a continuum limit of the sequential probability ratio

    test (§1.2.3) and is optimal for 2AFC tasks in that, given a fixed decision time, it

    maximizes accuracy. Since we have no reason for assuming that λ = 0, we elect

    2We choose not to explicitly model the delay and response periods, instead electing to assumethat the monkeys’ responses are locked in as soon as the stimulus terminates at time T .

    39

  • to use the Orstein-Uhlenbeck (OU) model and proceed to derive expressions with

    which we can study the monkeys’ behavior in our biased rewards 2AFC task. In

    particular, we wish to study the monkeys’ psychometric functions (PMFs), which

    represent how coherences affects accuracy for an individual. In doing so, we can

    analyze the optimality of each monkey’s behavior and determine if the monkeys are

    properly incorporating the additional biased reward information.

    From §1.3.3 we know we can compute the probability of choosing alternative 1

    under the interrogation protocol by computing the probability distribution of solu-

    tions p(x, t) from the Fokker-Planck equation (1.23). If we suppose that the initial

    data p(x, 0) are Gaussian with mean µ0 and variance ν0,

    p(x, 0) =1√2πν0

    exp

    [

    −(x− µ0)2

    2ν0

    ]

    , (2.3)

    then the distribution of solutions of Eq. 2.2 are themselves Gaussian as time evolves:

    p(x, t) =1

    2πν(t)exp

    [

    −(x− µ(t))2

    2ν(t)

    ]

    , (2.4)

    where

    µ(t) = µ0eλt +

    ∫ t

    0

    eλ(t−s)A(s) ds and ν(t) = ν0e2λt +

    σ2

    (

    e2λt − 1)

    (2.5)

    define the integrated stimulus and integrated noise. Eq. 2.4 can be verified by directly

    computing its partial derivatives. Eqs. 2.5 are now the central focus of our study,

    as they are directly linked to the distribution of solutions p(x, t) through Eq. 2.4,

    and, as we shall see below, are directly responsible for the shapes of the psychometric

    functions. It is useful to note that in the DD limit of λ = 0, Eqs. 2.5 simplify to

    µ(t) = µ0 +

    ∫ t

    0