Distinguishing between parallel and serial processing in ... · Serial and parallel processing in...

Distinguishing between parallel and serial processing invisual attention from neurobiological data

Kang Li1,2, Mikiko Kadohisa, Makoto Kusunoki, John Duncan, Claus Bundesen2,Susanne Ditlevsen1*,

1 Department of Mathematical Sciences, University of Copenhagen, Copenhagen,Denmark2 Department of Psychology, University of Copenhagen, Copenhagen, Denmark

* [email protected]

Abstract

Serial and parallel processing in visual search have been long debated in psychology butthe processing mechanism remains an open issue. Serial processing allows only oneobject at a time to be processed, whereas parallel processing assumes that variousobjects are processed simultaneously. Here we present novel neural models for the twotypes of processing mechanisms based on analysis of simultaneously recorded spiketrains using electrophysiological data from prefrontal cortex of rhesus monkeys whileprocessing task-relevant visual displays. We combine mathematical models describingneuronal attention and point process models for spike trains. The same model canexplain both serial and parallel processing by adopting different parameter regimes. Wepresent statistical methods to distinguish between serial and parallel processing basedon both maximum likelihood estimates and decoding the momentary focus of attentionwhen two stimuli are presented simultaneously. Results show that both processingmechanisms are in play for the simultaneously recorded neurons, but neurons tend tofollow parallel processing in the beginning after the onset of the stimulus pair, whereasthey tend to serial processing later on. This could be explained by parallel processingbeing related to sensory bottom-up signals or feedforward processing, which typicallyoccur in the beginning after stimulus onset, whereas top-down signals related tocognitive modulatory influences guiding attentional effects in recurrent feedbackconnections occur after a small delay, and is related to serial processing, where allprocessing capacities are being directed towards the attended object.

Author summary

A fundamental question concerning processing of visual objects in our brain is how apopulation of cortical cells respond when presented with more than a single object intheir receptive fields. Is one object processed at a time (serial processing), or are allobjects processed simultaneously (parallel processing)? Inferring the dynamics ofattentional states in simultaneously recorded spike trains from sensory neurons whilebeing exposed to a pair of visual stimuli is key to advance our understanding of visualcognition. We propose novel statistical models and measures to quantify and follow thetime evolution of the visual cognition processes right after stimulus onset. We find thatin the beginning processing appears to be predominantly parallel, which develops intoserial processing 150− 200ms after stimulus onset in prefrontal cortex of rhesusmonkeys.

July 20, 2018 1/31

.CC-BY 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 2, 2018. . https://doi.org/10.1101/383596doi: bioRxiv preprint

https://doi.org/10.1101/383596

http://creativecommons.org/licenses/by/4.0/

1 Introduction 1

A fundamental question in theories of visual search is whether the process is serial or 2

parallel for given types of stimulus material (for comprehensive reviews, see [1–3]). In 3

serial search, only one stimulus is attended at a time, whereas in parallel search, several 4

stimuli are attended at the same time. The question of serial versus parallel search has 5

been extensively investigated by behavioral methods in cognitive psychology, but it is 6

still highly controversial. In this article, we briefly review extant empirical methods and 7

their results and then present and exemplify a new method for distinguishing between 8

serial and parallel visual search. The method is based on analysis of spike trains 9

measured in prefrontal cortex of rhesus monkeys while being exposed to a pair of 10

stimuli, which the animal should detect and later respond to with a saccade towards a 11

target object, first presented in [4]. A spike train is a sequence of recorded times at 12

which a neuron fires an action potential, and it is believed that spike times are the 13

primary way to transmit information in the nervous system. Point process modelling is 14

a natural mathematical framework for addressing such phenomena, and we embed this 15

into models of visual attention, which provides means to quantify parallel versus serial 16

visual processing. 17

1.1 Behavioral methods for distinguishing between serial and 18

parallel visual search 19

In typical experiments on visual search, the task of the observer is to indicate as quickly 20

as possible if a certain type of target is present in a display. Positive (target present) 21

and negative (target absent) mean response times are analyzed as functions of the 22

display set size N (the number of items in the display). The method of analysis was laid 23

out by [5–7] and further developed by [8]. The foundation is as follows. 24

In a simple serial model, the N items are scanned one at a time. When an item is 25

scanned, it is classified as a target or a distractor. The order in which items are scanned 26

is independent of their status as targets versus distractors. A negative response is made 27

when all items have been scanned and classified as distractors. Thus, the number of 28

items processed before a negative response is made equals N . Furthermore, the rate of 29

increase in mean negative response time as a function of N equals the mean time taken 30

to process one item, ∆t. A positive response is made as soon as a target is found. 31

Because the order in which items are scanned is independent of their status as targets 32

or distractors, the number of items processed before a positive response is made varies 33

at random between 1 and N with a mean of (1 +N)/2. Thus, the rate of increase in 34

mean positive response time as a function of N equals ∆t/2 (see [9] for experimental 35

evidence of serial processing in a behavioral task). 36

In a parallel model of attention, several stimuli can be attended at the same time. 37

The first detailed parallel model of visual processing of multi-element displays was the 38

independent channels model proposed by Eriksen and his colleagues (e.g., [10, 11]). It 39

was based on the assumption that display items presented to separated foveal areas are 40

processed in parallel and independently up to and including the stage of pattern 41

recognition. The independent channels model has been used to account for effects of N 42

on error rates. The linear relations between mean response time and N predicted by 43

simple serial models are difficult to explain by parallel models with independent 44

channels. However, the linear relations can be explained by parallel models with limited 45

processing capacity [12,13]. 46

A multi-feature whole-report paradigm for investigating serial versus parallel 47

processing was introduced in [14]: Suppose two features must be processed from each of 48

two stimuli (i.e., a total of four features). Let processing be interrupted before all of the 49

four features have completed processing. If, and only if, processing is parallel, there will 50

July 20, 2018 2/31


https://doi.org/10.1101/383596


be cases in which just one feature from each of the two stimuli completes processing 51

before the interruption. This event, in which the observer has only partially encoded 52

each of the two stimuli, should never happen when processing is serial. Thus, states 53

with partial information from more than one stimulus are strong indicators of parallel 54

processing. In the experiment of [14] (see [15] for replications and extensions), observers 55

were presented with brief exposures of pairs of colored letters and asked to report both 56

the color and the identity of each letter. The results showed strong evidence of states of 57

partial information from each of the two stimuli (e.g., information of just the identity of 58

one of the letters and just the color of the other one), and the results were fitted 59

strikingly well by a simple parallel-processing model assuming mutually independent 60

processing of the four features. 61

1.2 Method based on neurobiological data 62

As exemplified above, previous methods for distinguishing between serial and parallel 63

visual search have been based on behavioral data, and the evidence obtained by these 64

methods has been somewhat indirect. Moreover, these methods are all based on the 65

assumption that processing is either serial or parallel, and that it stays the same 66

throughout the trial. In this article, we present a new method for distinguishing 67

between serial and parallel visual search, a method based on analysis of 68

electrophysiological data. Furthermore, we propose measures to quantify the processing 69

mechanism in a continuum between serial and parallel processing. The method relies on 70

the probability-mixing model for single neuron processing [16], derived from the Neural 71

Theory of Visual Attention [1, 17], which states that when presented with a plurality of 72

stimuli a neuron only responds to one stimulus at any given time. By probabilistic 73

modeling and statistical inference using multiple simultaneously recorded spike trains, 74

we infer and decode what the recorded neurons are responding to, providing a way to 75

distinguish between parallel processing and serial processing on a neuronal level. The 76

new method appears more direct than previous methods, and it is possible to analyze 77

the time evolution of the processing mechanism over the course of a trial. 78

Consider an experiment in which we record the action potentials or spikes from each 79

of a number of neurons of the same type, e.g., a set of functionally similar neurons in 80

visual cortex with overlapping receptive fields (see, e.g., [16]), or neurons in prefrontal 81

cortex that are believed to be dynamically allocated to process task-relevant 82

information (see, e.g., [4]), which are the neurons we analyze in this paper. Suppose two 83

stimuli (Stimulus 1 and 2) are both within the classical receptive fields of all of the 84

recorded neurons, but otherwise the receptive fields are empty. In this situation, we may 85

test whether processing is parallel in the sense that on any given trial, some of the 86

recorded neurons represent Stimulus 1 throughout the trial, while others represent 87

Stimulus 2 throughout the trial. We may assume that a neuron represents Stimulus 1 88

rather than Stimulus 2 if the likelihood of the observed spike trains becomes higher by 89

assuming that the neuron represents Stimulus 1. We may also test whether processing is 90

strictly serial (i.e., one stimulus at a time) by testing, for example, whether there is a 91

time interval ∆1 in which all of the neurons represent Stimulus 1 and a time interval ∆2, 92

nonoverlapping with ∆1, in which all of the neurons represent Stimulus 2. 93

Strictly parallel or strictly serial processing of two or more stimuli may hardly be 94

expected in a biological system, and must be regarded as idealizations. However, we will 95

show how to measure the goodness of approximation of search processes in the brain to 96

simple serial and parallel search, as well as the time evolution of the processing 97

mechanism. 98

In the following sections, we first present the statistical methods and probabilistic 99

models that we employ to distinguish between parallel and serial processing, and then 100

show and discuss the results obtained using both simulated data and experimental data. 101

July 20, 2018 3/31


https://doi.org/10.1101/383596


In section 2, we introduce the experimental data that we rely on to measure the degree 102

of parallel and serial processing in a realistic biological situation, and explain our 103

proposed statistical criteria and measures to distinguish between parallel and serial 104

processing. We propose two models, the hidden Markov model (HMM) and the 105

correlated binomial model (CBM), to account for the spike train data in an attention 106

framework, and calculate their likelihood functions. The maximum likelihood estimates 107

(MLEs) provide a prior measurement of parallel versus serial processing. We then 108

describe methods to decode the momentary focus of attention given the fitted models, 109

which provides a posterior measurement. In section 3 we present the results of the 110

analysis conducted on the experimental data. 111

2 Materials and Methods 112

We present two models that relate the theories of visual attention to neuronal behavior, 113

providing a tool to distinguish or quantify between parallel and serial processing 114

through spike train analysis. Under the assumption of serial processing, the neurons are 115

correlated, acting together as a population. This dependence can arise through two 116

different pathways: 1) There exists an underlying variable driving the neurons towards 117

attending to the same stimulus, creating a dependence, even if the neurons are 118

conditionally independent given the state of this underlying variable. 2) The neurons 119

are directly positively correlated, driving them to synchronize their attention. 120

The first pathway is naturally described by a HMM, where the hidden Markov chain 121

switches between different states influencing the neuronal attention. If time is 122

discretized and there are two stimuli, this leads to a mixture of binomials at each 123

discretized time step, where the number of components in the mixture distribution 124

equals the number of states of the Markov chain. The binomial distributions provide 125

probabilities of the number of neurons attending to each stimulus, in dependence of the 126

hidden state of the Markov chain. The second pathway can be represented by a CBM, a 127

mixture of an ordinary binomial and a modified Bernoulli [18], which is used 128

independently at each discretized time step. For both models, the attended stimulus for 129

each neuron is unobserved, and the inference is based on spike train data. We estimate 130

parameters using MLE by marginalizing out the unobserved attention variables. The 131

estimated parameters in either model describe neuronal properties and are used to 132

obtain a prior measurement of the degree of parallel or serial processing. For both 133

models, we also decode the hidden states from the posterior probabilities of the latent 134

attention variables, i.e., an estimate of the stimulus the neurons were most probably 135

attending to given their observed spike trains. The decoding of attentional behavior 136

provides a posterior measurement of the degree of parallel or serial processing. The 137

diagram in Fig 1 summarizes the flow of the analysis including parameter estimation, 138

Fitted modelSpike traindata

Future spiketrain data

Latent attentionstates

- Neuronal general properties- Prior measurement of parallel vs. serial processing

- Neuronal specific attentional behavior- Posterior measurement of parallel vs. serial processing

Parameterestimation by MLE

Apply modelto data

Obtain posterior distribution of latent attention states(Decoding)

(Encoding)

Fig 1. Flow diagram of the analysis.

July 20, 2018 4/31


https://doi.org/10.1101/383596


decoding and interpretation. 139

2.1 Experimental Data 140

To distinguish between parallel and serial processing, we use the neural spike train data 141

recorded from neurons in prefrontal cortex of two rhesus monkeys presented with two 142

visual stimuli from [4]. They studied dynamic attentional construction, and found that 143

in the early stage after stimulus onset when processing competing stimuli, the global 144

attention is distributed among all objects with each neuron having a tendency towards 145

its contralateral hemifield. In the late stage, the global attention is reallocated and 146

neurons are redirected to the target stimulus. The data contain multiple simultaneously 147

recorded neurons responding to two competing stimuli. The data are organized in daily 148

sessions, and each session consists of a different set of recorded neurons. We only 149

analyze the sessions where at least five neurons are recorded to have enough data to 150

distinguish between parallel and serial processing, yielding a total of 48 sessions. The 151

monkey fixed attention on a central red dot on a computer screen, then each trial began 152

with a central cue indicating the target object of the specific trial. Each of two cues was 153

paired with one of the two alternative targets. After a brief delay, a choice display was 154

presented for 500ms containing two objects, one to the right and one to the left of the 155

fixation point. The objects could be either the cued target (T), an inconsistent 156

non-target (NI) because it was used as a target on other trials, a consistent non-target 157

(NC) never serving as a target, or nothing but a gray dot (NO). After a brief delay, the 158

monkey was rewarded with a drop of liquid for a saccade to the T location if a T had 159

been shown, or if no T had been presented, for maintaing fixation (no-go response) for 160

later reward. In the following we call a combination of two stimuli a condition. Table 1 161

shows the 12 possible conditions. The stimulus locations were denoted by whether they 162

were contra- or ipsilateral with respect to the recorded neuron. S1 Fig a shows an 163

example of the structure of the data in one session. To get an overall idea of the sample 164

sizes, histograms in S1 Fig b and c show the average number of trials per condition over 165

the 48 sessions, and the average number of simultaneously recorded neurons per trial 166

over sessions, respectively. 167

Table 1. The 12 conditions used in the trials (combinations of stimuli).

Condition 1 2 3 4 5 6 7 8 9 10 11 12

con T T T NI NC NO NI NI NC NC NO NOipsi NI NC NO T T T NC NO NO NI NI NC

Conditions can be merged into three groups, indicated by table cells: target in thecontralateral side, target in the ipsilateral side, and all combinations with no target.Contra- and ipsilateral sides are with respect to the recorded neuron. T: Target; NI:Inconsistent non-target; NC: Consistent non-target; NO: No display.

We analyze the choice phase where the two stimuli are shown. In Fig 2 are shown 168

the recorded spike trains of an example cell during this phase and 100ms before and 169

after. The red curves are kernel smoothing estimates of firing rates over time, plotted 170

on top of the spike trains. The 12 subplots show the 12 conditions with the titles 171

indicating stimulus on the contra- (left) and ipsilateral (right) sides with respect to the 172

recorded neuron. The neuron seems to favor the target T with a higher firing rate, and 173

its attention starts from the contralateral stimulus and is later redirected to the target 174

stimulus, following the overall tendency of most neurons reported by [4]. In S2 Fig we 175

show a complementary example neuron, which shows a tendency to the ipsilateral 176

stimulus in the early stage, and later the attention is redirected to the target stimulus. 177

July 20, 2018 5/31


https://doi.org/10.1101/383596


Furthermore, for this neuron there is more variability between trials under the same 178

condition. 179

T − NI

020

40

NI − T NI − NC NC − NI

T − NC

020

40

NC − T NI − NO NO − NI

T − NO

−0.1 0.1 0.3 0.5

020

40

NO − T

−0.1 0.1 0.3 0.5

NC − NO

−0.1 0.1 0.3 0.5

NO − NC

−0.1 0.1 0.3 0.5

Time / s

Firi

ng r

ate

(sp

ikes/

s)

Fig 2. Raster plots of measured spike trains recorded from an example cell(MN110411task 3 0). The 12 conditions are indicated in the title of the subplot.Kernel smoothing estimates of the firing rates are shown in red. The stimulus in the leftof the title indicates the stimulus of the contralateral side, and the right indicates thestimulus on the ipsilateral side with respect to the recorded neuron. The dashed linesindicate the interval of the choice phase where two stimuli are shown.

These figures present repeated trials of a single neuron. In S3 Fig, we show 180

simultaneously recorded neurons in two conditions of an example session. The 181

comparison of serial and parallel processing catches the difference among simultaneously 182

recorded neurons within one trial in terms of their attended stimulus, which is hard or 183

impossible to analyze by traditional methods by averaging across neurons and trials. 184

We thus develop a new methodology modeling each single spike train and the 185

correlation between spike trains. The serial and parallel processing can be distinguished 186

using the estimated parameters. To account for neuronal response times, we discard the 187

first 100ms after stimulus onset, using the interval from 100 to 500ms in the choice 188

phase when estimating the parameters of the two models. 189

2.2 Measures for parallel versus serial processing 190

Here we define different measures of the degree of serial and parallel processing based 191

on the estimated parameters of the models when a population of n neurons are 192

presented with two non-overlapping stimuli in their receptive fields. These measures will 193

vary with time, i.e., depend on the time since stimulus onset, but for ease of notation, 194

we suppress time from the notation here. Later we will introduce the time dependency. 195

We assume a homogeneous situation where all neurons follow the same distribution and 196

are exchangeable, except for individual firing rates as responses to single stimuli. These 197

measures are based on the basic probability-mixing model for the attention of single 198

neurons employed in [16], where a neuron responds to a stimulus mixture with certain 199

probabilities, such that the single neuron at any given time represents only one of the 200

stimuli in the mixture. First, we consider the marginal distribution of the attended 201

stimulus for each neuron. Let p denote the marginal probability of attending to one of 202

the stimuli, say stimulus 1, such that the probability of attending stimulus 2 is 1− p, 203

July 20, 2018 6/31


https://doi.org/10.1101/383596


where 0 < p < 1. If the neurons are independent, then the probability that all neurons 204

attend the same stimulus is pn + (1− p)n, and if the neurons are positively correlated, 205

this is a lower bound of the probability that all neurons attend the same stimulus. Thus, 206

p provides a measure of the tendency of serial or parallel processing. A narrow 207

distribution (extreme probability, p either close to 0 or 1) favors serial processing, since 208

in this case most neurons will attend the same stimulus. A wide distribution 209

(non-extreme probability, p close to 0.5) favors parallel processing, since in this case 210

neuronal attention will tend to split between the two stimuli. Second, we consider 211

correlations between neurons. Since the neurons are exchangeable, the correlation 212

coefficient, denoted by ρ, between any two neurons (pairwise correlation) is identical. 213

Stronger positive correlation implies more tendency to serial processing, no matter what 214

p is. Thus, if either the correlation is strong (ρ close to 1) or p is close to 0 or 1, serial 215

processing is favored, while if both the correlation is weak and the probability is not 216

extreme, parallel processing is favored. We summarize the different cases in Table 2. 217

Table 2. Effects of neural attentional probability and correlation to serialand parallel processing.

Extreme probability Non-extreme probability

Strong correlation Serial SerialWeak correlation Serial Parallel

Extreme probability implies a probability close to 0 or 1, and strong correlation impliesa correlation close to 1.

We now propose a single statistic as an alternative measure to distinguish between 218

serial and parallel processing. Again, we suppose to have a stimulus mixture of two 219

components and a population of n neurons attending to the mixture. The number of 220

neurons, Z, attending to the first stimulus follows a distribution with probability mass 221

function (PMF) f(z) for z ∈ {0, 1, . . . , n}, such that P (Z = z) = f(z), which depends 222

on the specific model. A distribution centered around n/2 indicates apparent parallel 223

processing, and a distribution centered at 0 and/or n indicates apparent serial 224

processing. Note that this population distribution incorporates both the marginal 225

probability of attention of the single neurons and the correlation between neuron pairs. 226

We define a statistic Dn as a measure of the degree of serial or parallel processing, given 227

by 228

Dn =

∑nz=0 |z − n/2|f(z)

n/2. (1)

The statistic Dn can be explained as a normalized expected deviation between thenumber of neurons attending to one stimulus and the half of the total number ofneurons. If we split the neuron population according to which stimulus they attendgiving two proportions (summing to 1), then Dn is the average difference between thetwo proportions, and it can take values between 0 and 1. The smaller Dn is, the moreparallel processing is favored. The Dn statistic depends on the total number of recordedneurons n. However, if we consider specific models for the PMF, for example thebinomial models introduced below, the dependence of n can be removed by using theasymptotic version

D∗ = limn→∞

Dn,

which provides a measure for the entire neuronal population relevant for the given task, 229

not only the measured ones. 230

July 20, 2018 7/31


https://doi.org/10.1101/383596


To summarize, to measure the degree of serial and parallel processing, we can use 231

the attentional probability p, the correlation of neuronal attention ρ, and the deviation 232

statistics Dn or D∗. 233

2.3 Models 234

In this section we present two models to explain the spike train data in an attention 235

framework. We discretize the 400ms of the trial where both stimuli are presented, and 236

which we use for the analysis, into T smaller intervals and let the models evolve 237

dynamically over these intervals. Within any of these small time intervals, we assume 238

that the attention of each neuron is not changing. Within a trial, let Xit ∈ {0, 1} denote 239

the attended stimulus of neuron i at time t for i = 1, . . . , n, t = 1, . . . , T , and let Y it 240

denote the spike train of neuron i in the t’th interval. We set Xit = 1 when neuron i 241

attends stimulus 1 at time t, and Xit = 0 when attending stimulus 2. 242

2.3.1 Hidden Markov Model and a Mixture of Binomial Distributions 243

To combine the visual attention hypotheses with neuronal dynamics, we adopt a HMM. 244

The HMM assumes some underlying unobserved variable that drives the attention of 245

the neurons. The HMM is defined over the T time steps. We let the probabilities p of 246

the single neurons, which can be interpreted as attentional weights, depend on the state 247

of the underlying HMM, which introduces correlation between neurons, even if they are 248

conditionally independent given the hidden state, and the probabilities evolve over time 249

following the dynamics of the HMM. Note that this implies that within each of the T 250

intervals, model parameters governing the stochastic neuronal activity (the spike train 251

generation) are constant. We use three hidden states, which can describe three 252

attentional regimes: attention directed to the contralateral side, to the ipsilateral side, 253

or equal attention to both sides. A transition between hidden states introduces a weight 254

reassignment of the attention to the stimuli, and thus, new laws for the generation of 255

spike trains. Let Ct ∈ {1, 2, 3} denote the hidden state at time t. Fig 3 shows a diagram 256

of the HMM for T = 3. Conditional on Ct, the {Xit}i=1,...,n are independent. 257

Fig 3. Diagram of the Hidden Markov Model. The HMM and attentional statesfrom a group of n neurons to a mixture of two stimuli, using T = 3 discretized timesteps.

July 20, 2018 8/31


https://doi.org/10.1101/383596


Let the initial distribution of the Markov chain be given by λλλ and the transitionprobability matrix (TPM) by ΓΓΓ:

λλλ =[λ1 λ2 λ3

],

ΓΓΓ =

γ11 γ12 γ13γ21 γ22 γ23γ31 γ32 γ33

, (2)

where∑3k=1 λk = 1,

∑3l=1 γkl = 1 for k = 1, 2, 3, and λk, γkl ≥ 0 for k, l = 1, 2, 3. Here, 258

λk = P (C1 = k) and γkl = P (Ct+1 = l|Ct = k) for all t. Let the vector πt = λλλΓΓΓt−1 259

denote the distribution of Ct, thus, πt,k = P (Ct = k). The TPM ΓΓΓ depends on the 260

stimulus pair, but the initial distribution λλλ is only related to the location of the 261

attended stimulus, since this is the initiation of the processing mechanism before the 262

specific stimuli are perceived, and is thus the same for all stimulus pairs. However, we 263

relax this assumption in Section 3.2.4. We denote by ΓΓΓm the TPM of condition m. 264

Conditional on Ct, neurons are assumed independent. Denote the probability ofattending to stimulus 1 given state c by αc = P (Xi

t = 1|Ct = c), yielding the matrix:

AAA =

α1 1− α1

α2 1− α2

α3 1− α3

. (3)

Attention probabilities and correlations Calculating the probability distribution 265

of Xt is straightforward following the HMM. The vector 266

PPP t = λλλΓΓΓt−1AAA = πtAAA = (P (Xit = 1), P (Xi

t = 0)) (4)

gives the vector of probabilities of attention to either stimulus. Straightforwardcalculations yield the moments and the correlation ρt between Xi

t and Xjt :

E(Xit) = P (Xi

t = 1); (5)

Var(Xit) = P (Xi

t = 1)(1− P (Xit = 1)); (6)

E(XitX

jt ) = λλλΓΓΓt−1 [α2

1, α22, α

23]T ; (7)

Cov(Xit , X

jt ) = E(Xi

tXjt )− E(Xi

t)E(Xjt ); (8)

ρt =Cov(Xi

t , Xjt )

Var(Xit)

, (9)

where T denotes transposition. The values P (Xit = 1) and ρt can be used to measure 267

the degree of serial and parallel processing as indicated in Table 2. 268

A mixture of three binomials By marginalizing out the hidden state Ct, the 269

HMM structure implies that at each time point t the neuronal attention behavior for 270

the n neurons follows a mixture of three binomial distributions, Bin3(πt, α, n). Here, 271

α = (α1, α2, α3) are the probability parameters of the three binomials, and the weights 272

are given by πt. The number of binomial trials equals the number of simultaneously 273

recorded neurons n. The PMF for the mixture of three binomials is 274

fBin3(z|πt, α, n) = πt,1f(z|α1, n) + πt,2f(z|α2, n) + (1− πt,1 − πt,2)f(z|α3, n), (10)

where f(z|αk, n) =(nz

)αzk(1− αk)n−z is the PMF of the binomial distribution. 275

The Dn statistic is calculated using Eq (1). For the mixture of three binomials in 276

(10), the asymptotic version is given by 277

D∗ = limn→∞

Dn = 2 (πt,1|α1 − 0.5|+ πt,2|α2 − 0.5|+ (1− πt,1 − πt,2)|α3 − 0.5|) . (11)

July 20, 2018 9/31


https://doi.org/10.1101/383596


Fig 4a illustrates how the probability and the correlation affect serial and parallel 278

processing for the HMM using n = 10 neurons for four different parameter settings. The 279

parameter settings are shown in Table 3, together with the corresponding calculated 280

attention probabilities, correlations, D10 and D∗ values. Only when p is not close to 0 281

or 1 and the correlation is weak, the 10 neurons tend to split between the two stimuli, 282

indicating parallel processing. Otherwise, a majority of the neurons attend to the same 283

stimulus, suggesting serial processing. The four cases from Case 1 to 4 show increasing 284

degree of parallel processing. In all cases are D∗ < Dn, so if more neurons are involved, 285

we expect more clear parallel processing for the given parameters. 286

0 2 4 6 8 10

0.0

0.2

0.4

Hidden Markov modelP

robabili

ty

● ● ● ● ● ● ● ●

●

●

●● Case 1

Case 2Case 3Case 4

0 2 4 6 8 10

0.0

0.2

0.4

Correlated binomial model

●

●

●

●● ● ● ● ● ● ●

● Case 1Case 2Case 3Case 4

a b

Number of neurons attending stimulus 1

Fig 4. The probability mass function of the number of neurons attendingto stimulus 1. The four sets of parameter values are given in Table 3. a) HMMleading to a mixture of three binomials. b) CBM.

Table 3. Parameter values, probabilities of attention, correlation, and thedeviation statistics D10 and D∗ for the HMM and the CBM.

πt,1 πt,2 πt,3 P (Xit = 1) ρ D10 D∗

Hidden Markov model, α = (0.95, 0.45, 0.1)Case 1 0.9 0.1 0.0 0.9 0.25 0.836 0.82Case 2 0.5 0.05 0.45 0.543 0.691 0.823 0.815Case 3 0.3 0.45 0.25 0.513 0.407 0.586 0.515Case 4 0.05 0.7 0.25 0.388 0.165 0.426 0.315

Correlated binomial model.Case 1 - - - 0.1 0.1 0.82 0.82Case 2 - - - 0.1 0.9 0.98 0.98Case 3 - - - 0.45 0.1 0.33 0.19Case 4 - - - 0.45 0.9 0.93 0.91

2.3.2 Correlated Binomial model 287

In the CBM the neurons are assumed directly correlated. It was studied in [18, 19], and 288

is denoted by CBin(n, p, ρ), where n is the number of correlated Bernoulli trials 289

(simultaneously recorded neurons in our model setting), 0 < p < 1 is the probability 290

P (Xit = 1), and ρ is the correlation coefficient. In this model the number of neurons z 291

attending stimulus 1 follows a mixture of two distributions. One is an ordinary binomial 292

distribution with parameters n and p. The other is a fully correlated distribution where 293

July 20, 2018 10/31


https://doi.org/10.1101/383596


z ∈ {0, n}, which can be viewed as a modified Bernoulli distribution with support {0, n} 294

with parameter p. The weight of the Bernoulli component is the correlation coefficient ρ. 295

The probability mass function is given by 296

fCBin(z|n, p, ρ) = (1− ρ)f(z|n, p) + ρpzn (1− p)

n−zn I{0,n}(z), (12)

where I{0,n}(z) is the indicator function which equals 1 for z ∈ {0, n} and 0 otherwise. 297

As before, we assume the distribution at the first step identical for all stimulus pairs, 298

whereas at all later steps, the distribution depends on the stimulus pair. Thus, at t = 1 299

the simultaneously recorded neurons follow CBin(n, p1, ρ1), and at t > 1 they follow 300

CBin(n, pt,m, ρt,m) for stimulus pair m. We do not assume a dependence structure over 301

time, as in the HMM, and the behavior at each time step is independent of the behavior 302

at other time steps. Instead, the correlation between simultaneously recorded neurons 303

are modeled directly by the parameter ρ. Compared with the HMM, where the 304

correlation is described through the attentional reassignment with a Markov chain, the 305

CBM is more direct. 306

Denote by Ct the hidden index, indicating either the binomial (Ct = 1) or the 307

Bernoulli (Ct = 2) component in the mixture. The probability of attention is directly 308

obtained from the parameter pt,m, and the correlation is obtained from ρt,m. The 309

asymptotic version of the deviation statistic D∗ to measure the degree of serial and 310

parallel processing is given by 311

D∗ = 2(1− ρ)|p− 0.5|+ ρ. (13)

Fig 4b shows the PMF of the correlated binomial distribution for four different 312

parameter settings, shown in Table 3, together with the D10 and D∗ values. 313

2.4 Likelihood functions and model fitting 314

The spike trains are modelled by point processes using conditional intensity functions(CIF) [20,21], see also [16]. Suppose a spike train y in the interval [Ts, Te] contains thespike times y = {t1, t2, . . . } with Ts ≤ t1 < t2 < · · · ≤ Te, and that it attends to thesame stimulus during the entire interval. The probability of observing y given theattended stimulus xt is given by [21,22]

P (y|xt) =

[∏τ∈y

h(τ |Hτ ;xt)

]exp

{−∫ Te

Ts

h(s|Hs;xt)ds

}, (14)

where Hs is the spike history up to time s, and h(s|Hs;xt) is the conditional intensityfunction, which we model using

h(s|Hs;xt) = r exp

10∑j=1

βj∆Ns−ju

. (15)

The base firing rate r := ri is neuron specific and a function of the attended stimulus 315

and the location (contra- or ipsilateral). Note that only the attended stimulus is 316

relevant, not the condition. For each neuron, there are therefore 7 rate parameters, 317

representing T, NI and NC at either side, and a parameter for NO. The exponential 318

term models the influence of the past spikes during the previous 10ms on the neuronal 319

activity. The constant u = 1ms is the discretization unit determined by the experiment, 320

and ∆Nt denotes whether there is a spike (∆Nt = 1) or not (∆Nt = 0) in the time 321

interval [t, t+ u). For simplicity, we assume that only past spikes of the neuron itself 322

July 20, 2018 11/31


https://doi.org/10.1101/383596


have an effect. All neurons are assumed to share the same set of β parameters 323

βj , j = 1, 2, . . . , 10. 324

Let M denote the considered conditions (stimulus pairs) and let |M| denote the 325

number of conditions. For simplicity, we do not always distinguish between all 12 326

conditions shown in Table 1, but sometimes merge them into classes, such that there 327

will be fewer parameters to estimate. In particular, we will consider the three classes of 328

conditions indicated in the table, defined by whether there is a target in the stimulus 329

pair, and if there is, whether it is contra- or ipsilateral. Under condition m, let the set 330

Km contain all the conducted trials. In trial k, let the set Nk contain all the 331

simultaneously recorded neurons and let yNkt denote the spike trains from these neurons 332

in the t’th interval, and likewise for the hidden attentional states XNkt . Each Nk is a 333

subset of the set of all neurons N used in the session, Nk ⊆ N , because not all neurons 334

are used in all trials (S1 FigA). 335

2.4.1 Hidden Markov Model 336

We denote the conditional probability of the Nk spike trains at time t given Ct by a 337

diagonal matrix: 338

PPP (yNkt |Ct) =

P (yNkt |Ct = 1) 0 0

0 P (yNkt |Ct = 2) 0

0 0 P (yNkt |Ct = 3)

. (16)

The likelihood function of all spike trains in one session is then given by 339

L =∏m∈M

∏k∈Km

{λλλPPP (yNk

1 |C1)T∏t=2

[ΓΓΓmPPP (yNk

t |Ct)]}

. (17)

By conditioning on the hidden attentional states XNkt , we obtain

P (yNkt |Ct) =

∏i∈Nk

P (yit|Ct)

=∏i∈Nk

[P (Xi

t = 1|Ct)P (yit|Xit = 1) + P (Xi

t = 0|Ct)P (yit|Xit = 0)

]=∏i∈Nk

[αCt

P (yit|Xit = 1) + (1− αCt

)P (yit|Xit = 0)

], (18)

where P (yit|xit) is given in Eq. (14). We obtain MLEs of the parameters by maximizing 340

the likelihood function. The parameters to be inferred are summarized in Table 4. 341

2.4.2 Correlated binomial model 342

Under the CBM, the attention of the simultaneously recorded neurons follow a mixtureof a binomial and a modified Bernoulli. The likelihood of the spike trains in conditionm at time t in trial k, yNk

t , is given by

Pm(yNkt ) =(1− ρt,m)

∏i∈Nk

[P (yit|Xi

t = 1)pt,m + P (yit|Xit = 0)(1− pt,m)

]︸︷︷︸

binomial

+

ρt,m

{pt,m

∏i∈Nk

P (yit|Xit = 1) + (1− pt,m)

∏i∈Nk

P (yit|Xit = 0)

}︸︷︷︸

modified Bernoulli

, (19)

July 20, 2018 12/31


https://doi.org/10.1101/383596


Table 4. Parameters to be estimated for each session in the HMM and theCB models.

Name Explanation Dimension

Hidden Markov Modelλλλm =[

λ1 λ2 1−λ1−λ2]

Initial distribution, the same 2for all conditions M

ΓΓΓm =γm11 γm12 1−γm11−γm12γm21 γm22 1−γm21−γm22γm31 γm32 1−γm31−γm32

Transition probability matrixfor each condition m ∈M 6|M|

AAA =

α1 1− α1

α2 1− α2

α3 1− α3

Conditional probability ofneuronal attention 3

riBase firing rates, different 7|N |

for each neuron in N

βWeights in the CIF model, the 10

same for all neurons NCorrelated Binomial Model

ρt,mCorrelation coefficients for |M| · (T − 1) + 1

m ∈M and t = 1, . . . , T

pt,mProbability parameter for |M| · (T − 1) + 1

m ∈M and t = 1, . . . , T

riBase firing rates, one 7|N |

for each neuron in N

βWeights in the CIF model, the 10

same for all neurons N

where P (yit|xit) is given in Eq. (14). The likelihood of the data of an entire session is

L =∏m∈M

∏k∈Km

T∏t=1

Pm(yNkt ). (20)

The parameters are summarized in Table 4. 343

We summarize the differences of the HMM and the CBM in Table 5. In both models, 344

it is assumed that in the early stage, i.e., the first discretized interval from 100ms to 345

100 + 400/T ms, neuronal attention is only affected by the position of stimuli (ipsi- or 346

contralateral) and not by stimulus types (T, NI, NC or NO). This assumption is 347

supported by the empirical findings by firing rate averaging showing attentional 348

reallocation over time [4]. It is also assumed that under the same stimulus types, the 349

attentional parameters are identical, implying that in all the trials of one condition, 350

neurons follow the same distribution, and differences from trial to trial are due to 351

randomness. 352

2.5 Decoding the attentional state 353

Decoding means to infer the attended stimulus from the observations and the estimated 354

parameters. To show the main idea, we suppress time and neuron indicator from the 355

July 20, 2018 13/31


https://doi.org/10.1101/383596


Table 5. Differences between the Hidden Markov model and the correlatedbinomial model.

HMM Correlated binomial

Motivation Extends the probability-mixingmodel with dynamic weight reas-signment.

Treats neuronal attention as cor-related binomial variables.

Neuronalcorrelation

Described through the Markovchain.

Modeled directly by parameters.

Parameterdimension

13 + 2|M|+ 7|N | 12 + 2|M|(T − 1) + 7|N |

Meaning ofC

Hidden state of the Markov chain,each state giving different stim-ulus weights. Implies top-downcontrol of attention.

State of neurons being either com-pletely independent or fully posi-tively correlated.

notation for the moment, denoting the hidden state by C, the attended stimulus by X 356

and the spike train data by Y . The posterior of X given Y = y is 357

P (X|Y = y) =∑c

P (X|C = c, Y = y)P (C = c|Y = y). (21)

The strategy is to first estimate P (C = c|Y = y) and then P (X|C = c, Y = y) 358

conditional on C = c. We are particularly interested in the PMF and the deviation 359

statistic of the attended stimuli, which we can calculate using P (X|C = c, Y = y) for 360

different states C. In the following, the decoding is explained for the two models in 361

more detail. 362

Decoding in the Hidden Markov model First we decode the hidden states Ct inthe HMM model. It is performed at each discretized time step by the forward-backwardalgorithm. Let yNk

s:t denote the spike trains in intervals s to t, for 1 ≤ s < t ≤ T in trialk, where Nk denotes the simultaneous recorded neurons in the k’th trial. Theprobability of Ct conditional on the observed spike trains at all time intervals 1 to Tcan be expressed as

P (Ct|yNk

1:T ) ∝ P (yNk

t+1:T |Ct)P (Ct|yNk1:t ), (22)

where

P (Ct|yNk1:t ) ∝ P (yNk

t |Ct)∑ct−1

P (Ct|ct−1)P (ct−1|yNk1:t−1) (23)

is the forward probability, calculated recursively by a forward sweep over 1 to T , and

P (yNk

t+1:T |Ct) =∑ct+1

P (yNk

t+2:T |ct+1)P (yNkt+1|ct+1)P (ct+1|Ct) (24)

is the backward probability, calculated recursively by a backward sweep over 1 to T . 363

When calculating the forward and backward probabilities, the likelihood conditional on 364

the hidden state, P (yNkt |Ct), is obtained by conditioning on the neuronal attention 365

{xit}i∈{Nk}: 366

P (yNkt |Ct) =

∏i∈Nk

∑xit∈{0,1}

P (yit|xit)P (xit|Ct). (25)

July 20, 2018 14/31


https://doi.org/10.1101/383596


After decoding the hidden state P (Ct|yNk

1:T ), the next is to decode {Xit}i∈{Nk}

conditional on Ct:

P (xit|yNkt , Ct) = P (xit|yit, Ct) ∝ P (yit|xit, Ct)P (xit|Ct). (26)

For all spike trains in trial k, yNk

1:T , we have thus obtained the discrete posterior 367

distributions of the hidden states P (Ct|yNk

1:T ) and the attended stimulus of each spike 368

train P (Xit |yit, Ct), at all time steps t = 1, . . . , T . This yields the marginal posterior 369

P (Xit |yNk

1:T ) =∑Ct∈{1,2,3} P (Xi

t |yit, Ct)P (Ct|yNk

1:T ). 370

At each time step t, conditional on Ct, spike trains are independent and the 371

posterior probabilities P (Xit |yit, Ct) are different from spike train to spike train. Thus, 372

the attended stimuli of all neurons follow a Poisson binomial distribution, a 373

generalization of the ordinary binomial distribution where each Bernoulli trial has a 374

distinct success probability [23]. The PMF of the Poisson binomial distribution is 375

calculated numerically using methods from [24]. Marginalizing out Ct, at each time step 376

t we then have a mixture of three Poisson binomial distributions. The PMF of this 377

mixture distribution can be regarded as probabilities of the number of neurons that 378

have attended stimulus one, conditional on their observed spike trains. The deviation 379

statistic Dn can also be obtained from the PMF. 380

Decoding in the correlated binomial model In the CBM, spike trains betweendifferent time steps and different trials are independent (except for the memorycomponent, the exponential term in Eq. (15)). Thus, decoding can simply be doneindependently for each discretized time step in each trial. Now, let Ct be an indexindicating either the binomial or the Bernoulli component in the mixture. As previously,we first decode Ct by calculating P (Ct|yNk

t ), then find the PMF by calculatingP (Xi

t |yit, Ct). We have

P (Ct|yNkt ) ∝ P (yNk

t |Ct)P (Ct), (27)

where the two cases Ct = 1 and Ct = 2 are given by the two components in Eq. (19). 381

Then for each case of Ct we decode the attended stimulus Xit . When Ct = 1, i.e., the 382

binomial case, Xit is obtained for each spike train independently with P (xit|yit, Ct = 1) ∝ 383

P (yit|xit, Ct = 1)P (xit|Ct = 1), resulting in a Poisson binomial distribution. When 384

Ct = 2, i.e., the fully correlated Bernoulli case, the attended stimuli of all neurons are 385

the same, which is obtained by P (xt|yNki , Ct = 2) ∝ P (yNk

i |xt, Ct = 2)P (xt|Ct = 2), 386

and the result is still a modified Bernoulli. Finally, the PMF is a mixture of a Poisson 387

binomial and a modified Bernoulli. 388

3 Results 389

3.1 Simulated data 390

We first simulate spike train data and check if our models and methods work properly 391

on the simulated data. For both the HMM and the CBM, we consider three parameter 392

settings. In all cases, we use 10 simultaneously recorded neurons, repeated for 20 trials. 393

The parameters, including base rates and response weights, are the same for the three 394

cases. We consider only one stimulus condition, such that each neuron only has two 395

base rate parameters, one for the contralateral and one for the ipsilateral sides. 396

The parameter values used in the simulations are shown in Table 6. For the HMM, 397

we use a time step of 0.1s and a total of 10 time steps. For the CBM, we use a time 398

step of 0.1s and a total of 5 time steps. S4 Fig shows the probabilities, correlations, D∗ 399

and Dn values as functions of time. Simulated example spike trains are also shown. 400

July 20, 2018 15/31


https://doi.org/10.1101/383596


Table 6. Parameters used to simulate data from the HMM and the CBM.Different ρ and p parameter values are used at different time steps t. Firing rates andweight values for the 10 simulated neurons are the same in the two models. The basefiring rate is denoted by ri,j for stimulus i and neuron j. The contralateral stimulus isrepresented by i = 1 and the ipsilateral stimulus by i = 2. The memory weights are thesame for all neurons, denoted by β.

Parameters

Hidden Markov Model, α = (0.91, 0.45, 0.15)λ Γ

Case 1 [0.3 0.65 0.5]T

0.95 0.02 0.030.1 0.8 0.10.5 0.2 0.3

Case 2 [0.5 0.4 0.1]T

0.3 0.6 0.10.1 0.3 0.60.01 0.01 0.98

Case 3 [0.7 0.15 0.15]T

0.2 0.1 0.70.05 0.9 0.050.7 0.1 0.2

Correlated Binomial Model

ρt (t = 1, . . . , 5) pt (t = 1, . . . , 5)Case 1 (0.6, 0.6, 0.3, 0.15, 0.1) (0.7, 0.65, 0.72, 0.75, 0.75)Case 2 (0.2, 0.2, 0.3, 0.4, 0.4) (0.65, 0.7, 0.75, 0.8, 0.8)Case 3 (0.3, 0.3, 0.4, 0.45, 0.5) (0.6, 0.55, 0.4, 0.3, 0.3)

Base ratesri,j (i = 1, 2 and j = 1, . . . , 10)

contralateral(

80 80 80 90 60 70 80 70 50 4020 10 30 10 30 20 10 20 10 20

)ipsilateral

Memory weightsβ

(−2,−0.5, 0, 0.5, 0.2, 0.1, 0.1, 0.1, 0.05, 0.05)

We apply the model fitting to the simulated data, and the MLEs are shown in Figs 5 401

and 6 together with the true parameters for the HMM and the CBM. The simulation 402

and model fitting procedure are repeated 100 times. 403

The D∗ and Dn values are computed from the parameter estimates, and are shown 404

in Fig 7 together with the true D∗ and Dn values. 405

Finally, we also perform decoding analysis using the estimated parameters for each 406

trial. The Dn values from the decoding are plotted in Fig S5 Fig together with the Dn 407

values computed directly from the parameter estimates. 408

The conclusion from this simulation study is that parameters can be successfully 409

estimated, and the Dn and D∗ values computed from the parameter estimates are close 410

to the true values. The Dn values from the decoding analysis have large variances, due 411

to the small sample size of 10 neurons. However, the median Dn values from the 100 412

decoding repetitions are often close to the encoding results. 413

July 20, 2018 16/31


https://doi.org/10.1101/383596


0.0

0.4

0.8

Lambda

●

●

●

−−

−

0.0

0.4

0.8

Gamma

●

● ●

●●

●

●●

●

−

−−

−−−−−

−

0.0

0.4

0.8

A

●

●

●

−

−−

20

60

100

Rates

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−

−

−

−

−

−

−

−

−−

−

−

−

−

−

−−

−−−

−2.0

−1.0

0.0

Weights

●

●

●

●

● ● ● ● ● ●

−

−−−−−−−−−

●Quantiles ofEstimates

True

0.0

0.4

0.8

Lambda

●

●

●−

−−

0.0

0.4

0.8

Gamma

●

●

●

●

●

●●

●

●−−

−

−

−

−−

−−

0.2

0.6

1.0

A

●

●

●

−

−−

20

60

100

Rates

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−

−

−

−

−

−

−

−

−−

−

−

−

−

−

−−

−−−

−2.5

−1.0

0.0

Weights

●

●

●

●● ● ● ● ● ●

−

−−−−−−−−−

1 2 3

0.0

0.4

0.8

Lambda

●

● ●

−

− −

1 2 3 4 5 6 7 8 9

0.0

0.4

0.8

Gamma

●

●

●

●

●

● ● ●

●

−

−

−

−

−−−−

−

1 2 3

0.2

0.6

1.0

A

●

●

●

−

−−

1 3 5 7 9 11 14 17 20

20

60

100

Rates

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−

−

−

−

−

−

−

−

−−

−

−

−

−

−

−−

−−−

1 3 5 7 9

−2.0

−1.0

0.0

Weights

●

●

●

●● ● ● ● ● ●

−

−−−−−−−−−

Fig 5. Parameter estimates of the hidden Markov model from thesimulation study. The estimates are shown as quantiles of the 100 repetitions. Thedashed lines represent the full 0− 100% quantiles, and the solid lines represent the25%− 75% quantiles. The central dot is the median. The red lines are the true valuesused in the simulation. The upper, middle and lower panels represent the three cases.

0.0

0.4

0.8

Correlation

● ●

●

●●

− −−

− −

0.5

0.7

0.9

Probability

●

●

●● ●

− −− − −

20

60

100

Rates

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−

−

−

−

−

−

−

−

−−

−

−

−

−

−

−−

−−−

−2.0

−1.0

0.0

Weights

●

●

●

●● ● ● ● ● ●

−

−−

− − − − − − −

●Quantiles ofEstimates

True

0.0

0.4

0.8

Correlation

● ●●

● ●

− − − − −

0.6

0.8

Probability

●

●

●

●●

− − − − −

020

60

100

Rates

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−

−

−

−

−

−

−

−

−−

−

−

−

−

−

−−

−−−

−2.5

−1.5

−0.5

0.5

Weights

●

●

●

●● ● ● ● ● ●

−

−−

− − − − − − −

1 2 3 4 5

0.0

0.4

Correlation

● ●

● ●

●

− − − − −

1 2 3 4 5

0.2

0.4

0.6

Probability

●●

●

● ●

− −−

− −

1 3 5 7 9 11 13 15 17 19

20

60

100

Rates

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−

−

−

−

−

−

−

−

−−

−

−

−

−

−

−−

−−−

1 2 3 4 5 6 7 8 9 10

−2.5

−1.0

0.0

Weights

●

●

●

●● ● ● ● ● ●

−

−−

− − − − − − −

Fig 6. Parameter estimates of the correlated binomial model from thesimulation study. The estimates are shown as quantiles of the 100 repetitions. Thedashed lines represent the full 0− 100% quantiles, and the solid lines represent the25%− 75% quantiles. The central dot is the median. The red lines are the true valuesused in the simulation. The upper, middle and lower panels represent the three cases.

3.2 Experimental data 414

The experimental spike train data from [4] were fitted to both models. For a 415

discretization with T steps, an equal length of 400/T ms were assigned to all time 416

intervals. Three different discretizations of T = 3, 5 or 10 were used, and two different 417

July 20, 2018 17/31


https://doi.org/10.1101/383596


2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

●●

● ●● ● ● ● ● ●

●

●●

●●

● ● ● ● ●

HMM, Case 1

●

●

DnD*True

2 4 6 8 10

c()

●●

●● ● ● ● ● ● ●

●

●

●● ● ● ● ● ● ●

HMM, Case 2

2 4 6 8 10

c()

●● ● ● ● ● ● ● ● ●

●

● ●● ● ● ● ● ● ●

HMM, Case 3

1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

●●

●● ●

●●

●● ●

CBM, Case 1

1 2 3 4 5

Time step

a

b

c()

●

●

●

● ●

●

●

●

● ●

CBM, Case 2

1 2 3 4 5

c()

●●

●

●●

●

●

●

●●

CBM, Case 3

Fig 7. Deviation statistics values computed from parameter estimates andtrue parameters. The estimates of Dn are shown in blue and the estimates of D∗ areshown in green as quantiles of the 100 repetitions. The dashed lines represent the full0− 100% quantiles, and the solid lines represent the 25%− 75% quantiles. The bluedots are the medians. The red dots are the true values used in the simulation. a) HMM.b) CBM.

classes of conditions with either all 12 or only 3 classes determined by whether there is a 418

target in the stimulus pair, and in that case, whether it is contra- or ipsilateral (see 419

Table 1). The models were fitted to each of the 48 sessions independently. 420

3.2.1 Parameter estimation in HMM 421

Fig 8 illustrates parameter estimates for the HMM using different condition and step 422

number settings. Fig 8a shows the probability of attending to the stimulus at the 423

contralateral side, pt = P (Xt = 1), as kernel density plots from all 48 estimates. Three 424

line types (solid, dashed and dotted) indicate the three time steps, and four colors 425

represent four types of conditions. At t = 1 all conditions follow the same distribution, 426

so there is a single black curve. For the subsequent time steps, the condition types are: 427

stimulus pairs with T on the ipsilateral side; stimulus pairs with T on the contralateral 428

side; stimulus pairs with NO on the ipsilateral side; and stimulus pairs with NO on the 429

contralateral side. It illustrates that neuronal attention slightly prefers the contralateral 430

stimulus in the beginning right after stimulus onset (the black density curve is centered 431

slightly towards larger values than 0.5), and later on tends to follow T and avoid NO. 432

Note that here we conduct model inference using all 12 conditions, and only combine 433

similar conditions together for presentation. 434

In Fig 8b, the estimates of the correlation ρt are plotted against the estimates 435

|pt − 0.5| (difference of the probability of the contralateral stimulus from 0.5, or 436

probability ”extremeness”) for each time step t = 1, . . . , 5, on top of a two-dimensional 437

kernel density estimate (bandwidth: 0.25) of the points as heatmaps. There are 48 438

estimates in the leftmost panel at t = 1 (no difference between conditions), and 48× 12 439

July 20, 2018 18/31


https://doi.org/10.1101/383596


estimates in the remaining panels from 12 conditions in 48 sessions. A straight line is 440

plotted on the anti-diagonal for easier reading. The lower left region of the heatmap 441

represents a tendency of parallel processing, and all other regions represent a tendency 442

of serial processing. In the leftmost panel corresponding to the first time step, a big 443

portion of the estimates fall in the lower left region. At later time steps, the estimates 444

tend to move to the right and upper regions. This implies that, in an early stage stimuli 445

tend to be processed in parallel. Later on more and more neurons share the same 446

attended stimulus in the form of serial processing. Despite the moving tendency, there 447

are points lying on both sides of the straight line at all time steps. This is evidence 448

supporting both processing mechanisms for all time steps throughout the entire spike 449

train. 450

In Fig 8c we investigate the asymptotic deviation statistic D∗. The average D∗ is 451

calculated over the 48 session estimates for each condition. The left panel shows the D∗ 452

values obtained from parameter estimates using 3 classes of merged conditions with 453

T = 5. The remaining panels show results using all 12 conditions, with T = 3, 5 and 10 454

for the middle left, the middle right and the right panels. In all cases, D∗ grows larger 455

over time, implying stronger serial processing. Further, different settings of 456

discretization and condition merging give different results. The differences caused by 457

using a larger T may be due to smaller sample sizes (shorter spike trains with only few 458

spikes). 459

3.2.2 Parameter estimation in correlated binomial 460

The estimates of the CBM is shown in Fig 9. The results are similar to the HMM. In 461

Fig 9b, we see apparent parallel processing at t = 1, while later on the correlation for 462

most estimates goes to either 1 or 0, and the probability becomes more extreme. For 463

t > 1, in most of the 48× 12 estimates the weight parameter of the mixture (the 464

correlation coefficient) is close to either 1 or 0, meaning one component is dominating 465

over the other. This is because of the small number of simultaneously recorded neurons 466

in most trials (see S1 Fig), which is insufficient for obtaining good estimates in a 467

mixture model. This is a weakness of the CBM since it only contains two extreme 468

components representing either full independence or full correlation. Model fitting of 469

the CBM on limited sample sizes can bias the correlation parameter. To check this 470

suspicion, we looked at the estimates from session ”MN110411”, the right-most neuron 471

in S1 Fig c with the largest number of simultaneously recorded neurons, and found that 472

the estimates of the correlation lie almost uniformly across 0 to 1, indicating that the 473

estimates of either 0 or 1 of the correlation in other sessions can be an artefact of small 474

sample sizes. 475

3.2.3 Decoding 476

Here we decode the attended stimulus of each neuron conditional on the observed spike 477

trains. The parameters used in the decoding algorithms are the estimated parameters 478

obtained by MLE. In the HMM model we show results using T = 3, T = 5 and T = 10, 479

and in the CBM using only T = 3. 480

Fig 10 shows the decoding of the attended stimulus for an example trial containing 481

10 simultaneously recorded spike trains in session ”MN110411”, condition NO-T. The 482

same data set is decoded using the HMM with T = 3, 5 and 10, and the CBM with 483

T = 3. The values of the Dn statistics are calculated based on the decoded probabilities 484

of a single trial containing simultaneously recorded neurons. 485

In Fig 11 we show box-plots of the Dn values from multiple trials across all sessions 486

as a function of time. Again we include four cases: HMM with T = 3, 5 and 10, and 487

CBM with T = 3. If a trial has too few simultaneously recorded spike trains, the Dn 488

July 20, 2018 19/31


https://doi.org/10.1101/383596


a

b

c

Corr

ela

tion

|P(X=1) - 0.5|

0.2

0.4

0.6

0.8

T − {NI,NC,NO}{NI,NC,NO} − TOther conditions

1 2 3 4 5

Time step

D*

NO − TT − NONI − TNC − TT − NIT − NC

NO − NINO − NCNI − NONC − NONC − NINI − NC

1 2 3



1 2 3 4 5



1 2 3 4 5 6 7 8 9

0.0 0.2 0.4

t = 10

0.2

0.4

0.6

0.8

1

0.0 0.2 0.4

t = 2

0.0 0.2 0.4

t = 3

0.0 0.2 0.4

t = 4

0.0 0.2 0.4

t = 5

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Kernel Density of P(X=1)

P(X=1)

De

nsity

{NO,NI,NC} − TT − {NO,NI,NC}{T,NI,NC} − NONO − {T,NI,NC}

t = 1t = 2t = 3

Fig 8. Experimental data: Results for the HMM. a) Kernel densityrepresentation of the estimates of P (X = 1), i.e., the probability of a neuron attendingto the contralateral stimulus, obtained using T = 3 and all 12 conditions. b) Correlationestimates vs probability extremeness estimates at the different time steps, on top of atwo-dimensional kernel density estimate as heatmaps, obtained using T = 5 and all 12conditions. c) Estimates of D∗ using 3 merged conditions with T = 5 (left), all 12conditions with T = 3 (middle left), T = 5 (middle right), and T = 10 (right).

values will be biased, and there will be large variance across trials. For example, in the 489

simulation study in Fig 4, we see large quantiles for the decoding results even with very 490

good parameter estimates. For this reason, we only consider trials with at least 10 491

simultaneously recorded spike trains. Note that the minimum number in a trial here is 492

different from the number of simultaneously recorded neurons in a session, because in 493

many trials not all simultaneously recorded neurons are used. We pre-selected data such 494

that the number of simultaneously recorded neurons in a session is at least 5, but in 495

most trials the simultaneously recorded spike trains can be fewer. We see a similar 496

trend as in the encoding results: The Dn values are increasing with time, indicating 497

stronger serial processing. The degree of serial processing becomes maximal starting 498

from around the middle of the spike train. In particular, for HMM with T = 5 (top 499

right), the Dn box-plot at t = 2 shows smaller median and larger variance than t = 3. 500

For HMM with T = 10, the box-plots at t = 2, 3, 4 show large variance (reaching low 501

July 20, 2018 20/31


https://doi.org/10.1101/383596


0.2

0.3

0.4

0.5

0.6

0.7

0.8



1 2 3


1 2 3 4 5



1 2 3 4 5

Time step

D*

a

b

c

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Kernel Density of P(X=1)

P(X=1)

De

nsi

ty

{NO,NI,NC} − TT − {NO,NI,NC}{T,NI,NC} − NONO − {T,NI,NC}

t = 1t = 2t = 3

Corr

ela

tion

|P(X=1) - 0.5|0.0 0.1 0.2 0.3 0.4 0.5

t = 10

0.2

0.4

0.6

0.8

1

0.0 0.1 0.2 0.3 0.4 0.5

t = 2

0.0 0.1 0.2 0.3 0.4 0.5

t = 3

Fig 9. Experimental data: Results for the CBM. Fig a and b use all 12conditions with T = 3. Fig c uses 12 conditions with T = 3 (left), 3 merged conditionswith T = 5 (middle), and 12 conditions with T = 5 (right). See caption of Fig 8 forexplanation.

values) compared with t ≥ 5. Note that the similar results in Figs 8c and 9c are prior 502

measures based on estimated parameters, and the plots in Fig 11 are posterior measures 503

based on the decoded attended stimulus for specific spike train data. Finally, in all 504

models and at all time steps, there is evidence of both parallel and serial processing, 505

implied by the wide box-plots. 506

3.2.4 Specific initial probabilities for each condition 507

Previously we have used the same initial probabilities for all conditions, i.e., neuronal 508

attention in the beginning right after stimulus onset is only affected by stimulus 509

locations and not by stimulus types, which is supported by the original study [4]. Here 510

we conduct a further analysis discarding this assumption and allowing each condition to 511

have its own initial probabilities. Doing so will greatly increase the number of 512

parameters to estimate, and the estimation and inference results of the mixture models 513

July 20, 2018 21/31


https://doi.org/10.1101/383596


0.0 0.1 0.2 0.3 0.4 0.5

Data

Time/s

| | | | || | | | | | || | | | | || | | | | | | || | | | | | |

| || | | | || | | | || | | | | |||| | | | ||| | | | | | | || | |||| | | | || | | | | ||| | || ||| | | || | ||| | | || |

| | || ||| | | | | | | | | | || | || || | | | | |

| || || | | | ||| | | | | | | | | | | | | || | || | | || | | | |

| || | || | | | || | | || | | ||| | | | | | | | | || ||| | | | | | | |

Decode

P(X=1|d)

0.000.240.870.480.090.500.090.610.350.38

0.010.040.010.000.000.130.150.040.130.06

0.000.130.000.000.340.110.000.000.040.06

P(C=c1|d)P(C=c2|d)P(C=c3|d)

0.21 0.95 0.990.79 0.05 0.000.00 0.00 0.00

0 5 10 15 20

0.0

0.2

0.4

0.6

PMF

Num

Pro

b

t = 1 , Dn = 0.34t = 2 , Dn = 0.89t = 3 , Dn = 0.86

0.0 0.1 0.2 0.3 0.4 0.5

Data

Time/s

| | | | || | | | | | || | | | | || | | | | | | || | | | | | |

| || | | | || | | | || | | | | |||| | | | ||| | | | | | | || | |||| | | | || | | | | ||| | || ||| | | || | ||| | | || |

| | || ||| | | | | | | | | | || | || || | | | | |

| || || | | | ||| | | | | | | | | | | | | || | || | | || | | | |

| || | || | | | || | | || | | ||| | | | | | | | | || ||| | | | | | | |

Decode

P(X=1|d)

0.060.590.840.860.310.570.460.640.440.51

0.010.080.390.090.100.200.020.240.180.14

0.030.090.030.000.000.080.160.020.070.05

0.020.020.000.000.140.070.010.010.030.05

0.000.070.000.000.140.070.000.010.060.04


0.02 0.73 0.96 0.99 0.990.96 0.27 0.04 0.00 0.000.02 0.00 0.00 0.00 0.00

0 5 10 15 20

0.0

0.2

0.4

0.6

PMF

Num

Pro

b

t = 1 , Dn = 0.25t = 2 , Dn = 0.73t = 3 , Dn = 0.89t = 4 , Dn = 0.93t = 5 , Dn = 0.92

0.0 0.1 0.2 0.3 0.4 0.5

Data

Time/s

| | | | || | | | | | || | | | | || | | | | | | || | | | | | |

| || | | | || | | | || | | | | |||| | | | ||| | | | | | | || | |||| | | | || | | | | ||| | || ||| | | || | ||| | | || |

| | || ||| | | | | | | | | | || | || || | | | | |

| || || | | | ||| | | | | | | | | | | | | || | || | | || | | | |

| || | || | | | || | | || | | ||| | | | | | | | | || ||| | | | | | | |

Decode

P(X=1|d)

0.9

0.9

0.9

0.9

0.9

0.9

0.9

0.9

0.9

0.9

0.0

0.4

0.5

0.5

0.1

0.4

0.2

0.4

0.3

0.4

0.1

0.1

0.4

0.4

0.1

0.3

0.2

0.3

0.3

0.3

0.1

0.2

0.3

0.1

0.3

0.2

0.0

0.3

0.2

0.2

0.3

0.2

0.1

0.0

0.1

0.2

0.3

0.2

0.2

0.2

0.1

0.2

0.3

0.3

0.0

0.2

0.3

0.1

0.2

0.2

0.2

0.0

0.0

0.0

0.2

0.1

0.1

0.1

0.1

0.1

0.0

0.1

0.0

0.0

0.1

0.1

0.0

0.0

0.0

0.1

0.0

0.1

0.0

0.0

0.1

0.1

0.0

0.0

0.1

0.1

0.0

0.1

0.0

0.0

0.2

0.1

0.0

0.1

0.1

0.1


0.0 0.4 0.5 0.6 0.7 0.7 0.9 0.9 0.9 0.9

0.2 0.6 0.5 0.4 0.3 0.3 0.1 0.1 0.1 0.1

0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0 5 10 15 20

0.0

0.2

0.4

0.6

PMF

Num

Pro

b

t = 1 , Dn = 0.84t = 2 , Dn = 0.47t = 3 , Dn = 0.56t = 4 , Dn = 0.65t = 5 , Dn = 0.69t = 6 , Dn = 0.66t = 7 , Dn = 0.84t = 8 , Dn = 0.90t = 9 , Dn = 0.90t = 10 , Dn = 0.84

0.0 0.1 0.2 0.3 0.4 0.5

Data

Time/s

| | | | || | | | | | || | | | | || | | | | | | || | | | | | |

| || | | | || | | | || | | | | |||| | | | ||| | | | | | | || | |||| | | | || | | | | ||| | || ||| | | || | ||| | | || |

| | || ||| | | | | | | | | | || | || || | | | | |

| || || | | | ||| | | | | | | | | | | | | || | || | | || | | | |

| || | || | | | || | | || | | ||| | | | | | | | | || ||| | | | | | | |

Decode

P(X=1|d)

0.020.350.800.580.150.590.150.680.460.50

0.010.040.010.000.000.120.140.040.120.06

0.000.140.000.000.350.110.000.000.040.07

P(C=c1|d)P(C=c2|d)

0.82 0.48 1.00

0.18 0.52 0.00

0 5 10 15 20

0.0

0.2

0.4

0.6

PMF

Num

Pro

b

t = 1 , Dn = 0.34t = 2 , Dn = 0.89t = 3 , Dn = 0.86

a b c d

Fig 10. Decoding of an example trial using different models. All models useall 12 conditions. The top three panels show the results of the HMM model using T = 3,5 and 10. The bottom panel shows the CBM with T = 3. a) Ten simultaneouslyrecorded spike trains from a trial in session ”MN110411”. The dashed lines indicate thediscretization. b) The posterior probability of each spike train attending thecontralateral stimulus at each time step, with the dashed lines indicating the time stepscorresponding to the discretization in a. Estimates in red color indicate higherprobability of attending the contralateral stimulus and blue color indicates higherprobability of attending the ipsilateral stimulus. Note that the target is located in theipsilateral side. c) The posterior probability of the hidden state C responsible fortop-down control. For the HMM, the hidden state indicates the index of the binomialcomponent, and for the CBM, the first hidden state is the independent binomialcomponent and the second is the fully correlated Bernoulli component. d) The PMF ofthe number of neurons attending to the contralateral stimulus conditional on the spiketrain data for each time step, with the Dn values shown in the legend. These values arecalculated by Eq. (1) using the estimated PMFs.

July 20, 2018 22/31


https://doi.org/10.1101/383596


0.2

0.4

0.6

0.8

1.0

HMM, T=3

1 2 3

0.2

0.4

0.6

0.8

1.0

HMM, T=5

1 2 3 4 5

0.2

0.4

0.6

0.8

1.0

HMM, T=10

Time step

Dn

1 2 3 4 5 6 7 8 9

0.2

0.4

0.6

0.8

1.0

CBM, T=3

1 2 3

Fig 11. Decoding results of Dn. Only trials with more than 10 simultaneouslyrecorded neurons are included. Results are from the HMM using T = 3, 5 and 10, andthe CBM using T = 3, respectively, indicated by the titles of the figures.

become increasingly unreliable given limited data size and large noise. We only analyze 514

the most simple example for the HMM using three time steps with three merged 515

conditions. In Fig S6 Fig are shown the D∗ statistics obtained using parameter 516

estimates, similar to Figs 8c and 9c, but for the two settings: fixing the same initial 517

probabilities or assuming different initial probabilities for each condition. Though the 518

D∗ results are different in the two plots in Fig S6 Fig, the conclusion remains the same; 519

neuronal attention is more parallel right after stimulus onset and becomes more serial 520

later on. 521

4 Discussion 522

In this study we combine the point process neuron models describing spike trains with 523

the neural interpretations of serial and parallel processing hypotheses in visual search. 524

We propose a HMM and a CBM to describe neuronal attention in neurophysiological 525

measurements from prefrontal cortex in rhesus monkeys. Results show that parallel 526

processing is favored in some sessions while serial processing is favored in other sessions, 527

and there is evidence for both parallel and serial processing at all time steps. Overall, we 528

see a tendency towards parallel processing in the early stage after stimulus onset, and 529

serial processing in the late stage. This means that, right after stimulus onset, neurons 530

tend to split to attend different stimuli, and later neurons become more synchronized 531

sharing the same attended stimulus. Furthermore, at the early stage neurons prefer the 532

contralateral stimulus, while in the late stage neurons favor the T and avoid NO, which 533

July 20, 2018 23/31


https://doi.org/10.1101/383596


agrees with the study conducted by averaging across spike trains [4]. 534

The early state of parallel processing can be related to feedforward or bottom-up 535

processing, where the sensory inputs are being processed before higher level cognitive 536

modulatory influences of recurrent feedback or top-down processing has begun [25,26]. 537

In the later stage, where top-down signals have had time to modulate the attention, the 538

neural activity tends to synchronize around the attended object, resembling serial 539

processing. Similar results have been observed in event-related potentials in 540

electroencephalography (EEG) measurements [27]. They found that forward 541

connections are sufficient to explain the data in early periods after stimulus onset, 542

whereas backward connections become essential after around 220ms. Even if the exact 543

timing of the switch between bottom-up and top-down signals is not clear, there is 544

evidence that after 200ms back projections play a prominent role, even if selective 545

responses are elicited already after 100ms after stimulus onset (see [26] and references 546

therein). Quantification of the relative contribution of feedforward and feedback signals 547

characterizing visual perception remains unclear, and thus, the concepts of parallel and 548

serial processing and our suggested analysis tools provide a useful mean for elucidating 549

these questions. 550

Decoding analysis provides posterior probabilities of neuronal attentions, yielding an 551

estimate of the PMF and therefore also of Dn. This can be used to analyze attentional 552

behavior for any given simultaneously recorded spike trains in future trials. The 553

conclusions regarding parallel and serial processing from the overall distribution of Dn 554

on all trials and sessions from the decoding analysis are the same as in the prior analysis 555

using only parameter estimates. Note that although both the prior and posterior 556

analysis provide similar results, the conclusions regarding neuronal attentional 557

properties should be drawn from the prior analysis based on the MLE. The MLE gives 558

the optimal estimation of the neuronal properties based on all the available data. The 559

decoding analysis, on the other hand, estimates what the neuron’s attention could have 560

been during a specific trial based on the data from this trial, and the uncertainty of the 561

decoding is represented by posterior distributions. 562

In [4], parallel processing in the early stage was reported. The same conclusion is 563

drawn from our analysis, where we find that the neurons prefer the contralateral 564

stimulus in the early stage, and integrating both hemispheres gives simultaneous 565

parallel processing. Furthermore, there exists not only such parallel processing 566

considering the whole brain, but also parallel processing based on neurons in a single 567

recording site, as supported by our finding. Though the simultaneously recorded 568

neurons in one location show a tendency towards the contralateral stimulus in the early 569

stage, there is strong evidence showing they split their attention between stimuli located 570

on both sides in a parallel way. 571

The models here are fitted to the specific data set from [4] and the model structure 572

contains the experimental conditions specific for this data set. However, with trivial 573

adjustments, the models also apply to generic neurophysiological data that consist of 574

simultaneously recorded spike trains. Currently the models and methods only support 575

two stimuli, and a future extension is the generalization to an arbitrary number of 576

stimuli. 577

The two models, the HMM and the CBM, yield different results regarding the degree 578

of serial and parallel processing. This is partly because the two models are based on 579

different assumptions. The biological reality of attention, which we try to describe with 580

these simple models, is complicated, and the two models approximate the reality and 581

explain neural attention from different perspectives. Further, the experimental data are 582

noisy with limited sample size and the models contain a large number of parameters, 583

which leads to large variance of estimators. For one trial or session, the difference 584

between the two models could be large, but the overall results of the two models over a 585

July 20, 2018 24/31


https://doi.org/10.1101/383596


large number of sessions produce similar conclusions. However it makes more sense to 586

make comparisons under the same model. For example, we compare different conditions 587

or different time steps only under the same model. 588

Another issue is the variability between sessions for the same model. We assume the 589

whole prefrontal area follow a probabilistic model and we want to estimate the model 590

parameters. However, in each session we only have a small subset with 5 to 20 591

simultaneously recorded neurons from a recording site, and the number is even smaller 592

for single trials (S1 Fig), with each neuron having its distinct firing rate and attentional 593

pattern (Figs 2 and S2 Fig). Thus, there is a large variance of the estimates from 594

session to session, and we obtain the overall result by averaging and applying kernel 595

density estimation. To obtain more stable and accurate results it would be beneficial to 596

use a larger simultaneously recorded population of neurons. 597

July 20, 2018 25/31


https://doi.org/10.1101/383596


Supporting information 598

S1 Fig. Data structure and sample sizes. a) Example of recorded neurons and 599

condition within each trial in a daily session. The symbol × indicates that the neuron is 600

recorded in the given trial. In this session, five neurons are recorded. Condition 1 was 601

used in 23 trials, and in trial 1 and 2 three neurons are recorded, but not the same ones. 602

b) Average number of trials per condition in 48 sessions. c) Average number of neurons 603

per trial in 48 sessions. In all sessions, at least 5 neurons are recorded, however, not all 604

enter in each trial. Histograms are based on 48 numbers (one for each session). 605

Average number of trials per condition

Fre

quen

cy

35 40 45 50 55 60

02

46

8

Average number of neurons per trial

Fre

quen

cy

1 2 3 4 5 6 7

02

46

812

Condition TrialNeuron

1 2 3 4 5

1

1 × × ×2 × × ×3 × × × ×...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.23 × × × × ×

224 × × × ×25 × ×...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a b c

S2 Fig. Raster plots of measured spike trains recorded from an example 606

cell (mj081029a 8 0). The 12 conditions are indicated in the title of the subplot. 607

Kernel smoothing estimates of the firing rates are shown in red. The stimulus in the left 608

of the title indicates the stimulus of the contralateral side, and the right indicates the 609

stimulus on the ipsilateral side with respect to the recorded neuron. The dashed lines 610

indicate the interval of the choice phase where two stimuli are shown. 611

T − NI

010

20

30

NI − T NI − NC NC − NI

T − NC

010

20

30

NC − T NI − NO NO − NI

T − NO

−0.1 0.1 0.3 0.5

010

20

30

NO − T

−0.1 0.1 0.3 0.5

NC − NO

−0.1 0.1 0.3 0.5

NO − NC

−0.1 0.1 0.3 0.5

Time / s

Firi

ng r

ate

(sp

ikes/

s)

July 20, 2018 26/31


https://doi.org/10.1101/383596


S3 Fig. Spike trains of simultaneously recorded neurons. Session 612

”MN110411” for two of the conditions. Each point in the figure denotes a spike at the 613

time indicated by the x-axis. Different trials are presented alternately using the red and 614

blue colors, and the simultaneously recorded spike trains within one trial are shown in 615

the same color. The left and right panels show two different conditions. Dashed lines 616

indicate the interval where the two stimuli are shown on the screen. 617

−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6

050

100

150

Time/s

Spik

e tra

ins

−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6

050

100

150

Time/s

Spik

e tra

ins

T - NO NO - T

July 20, 2018 27/31


https://doi.org/10.1101/383596


S4 Fig. Examples of simulated data for the HMM and the CBM. a) The 618

probabilities of attending the contralateral stimulus (left), the pairwise correlation 619

coefficients (middle), and the deviation statistic values (right) are shown as functions of 620

time. Different colors represent the three parameter settings. b) Example spike trains 621

are shown for the corresponding case and model. In each sub-figure, 10 trials are shown, 622

separated by horizontal white space lines. In each trial, 10 simultaneous spike trains are 623

plotted. 624

2 4 6 8 10

0.0

0.4

0.8

P(X=1), HMM

Time step

● ●● ● ● ● ● ● ● ●

●

●

●● ● ● ● ● ● ●

●

●

●●

● ● ● ● ● ●

2 4 6 8 10

0.0

0.2

0.4

0.6

Correlation, HMM

Time step

●

●● ● ● ● ● ● ● ●●

●●

●

●●

● ● ● ●

● ● ●● ●

● ● ● ● ●

2 4 6 8 10

0.4

0.5

0.6

0.7

0.8

D* and Dn, HMM

Time step

●

●

●

●●

●●

● ● ●

●

●

●

●● ● ● ● ● ●

●

● ●

● ●● ● ● ● ●

●

●

●

●

●●

●●

● ●

●

●

●

●●

● ● ● ● ●●

● ●

●●

●● ● ● ●

1 2 3 4 5

0.0

0.4

0.8

P(X=1), CBM

Time step

●●

● ● ●

●●

●● ●

●●

●

● ●

1 2 3 4 5

0.0

0.2

0.4

0.6

Correlation, CBM

Time step

● ●

●

●●

● ●

●

● ●

● ●

●●

●

1 2 3 4 5

0.4

0.5

0.6

0.7

0.8

D* and Dn, CBM

Time step

●●

●

●●

●

●

●

● ●

●●

●

●●

●

●

●●

●

●

●

●

● ●

●

●

●

●●

●

●

●

Case 1Case 2Case 3

DnD*

0.0 0.2 0.4 0.6 0.8 1.0

HMM, Case 1

Time/s

0.0 0.2 0.4 0.6 0.8 1.0

HMM, Case 2

Time/s

0.0 0.2 0.4 0.6 0.8 1.0

HMM, Case 3

Time/s

0.0 0.1 0.2 0.3 0.4 0.5

CBM, Case 1

Time/s

0.0 0.1 0.2 0.3 0.4 0.5

CBM, Case 2

Time/s

0.0 0.1 0.2 0.3 0.4 0.5

CBM, Case 3

Time/s

a

b

July 20, 2018 28/31


https://doi.org/10.1101/383596


S5 Fig. Deviation statistic values Dn obtained from decoding or encoding 625

analysis. Values are calculated from estimates (encoding, blue) or from decoding 626

analysis (green) and are shown as quantiles of the 100 repetitions. The dashed lines 627

represent the full 0− 100% quantiles, and the solid lines represent the 25%− 75% 628

quantiles. The dots are the medians.

2 4 6 8 10

0.0

0.4

0.8

Dn, HMM

Time step

c()

●

●●

● ●● ● ● ● ●

●

●●

●● ● ● ● ● ●

2 4 6 8 10

0.0

0.4

0.8

Dn, HMM

Time step

c()

●

●

●●

● ● ● ● ● ●

●●

●● ● ● ● ● ● ●

2 4 6 8 10

0.0

0.4

0.8

Dn, HMM

Time step

c()

●

● ●● ● ●

● ● ● ●

●

● ●● ● ● ● ● ● ●

1 2 3 4 5

0.0

0.4

0.8

Dn, CBM

Time step

c()

0.0

0.4

0.8

● ●

●●

●

0.0

0.4

0.8

●●

●● ●

●

●

Estimates (Encoding)Decoding

1 2 3 4 5

0.0

0.4

0.8

Dn, CBM

Time step

c()

0.0

0.4

0.8

●

●

●

●●

0.0

0.4

0.8

●

●

●

● ●

1 2 3 4 5

0.0

0.4

0.8

Dn, CBM

Time step

c()

0.0

0.4

0.8

●

●

●

●

●

0.0

0.4

0.8

●●

●

●●

629

S6 Fig. The D∗ values allowing varying initial probabilities. These are from 630

parameter estimates using two settings: assuming different initial probabilities for each 631

condition (a), or fixing the same initial probabilities for all conditions as before (b). 632

a b

0.5

00

.55

0.6

00

.65

0.7

00

.75

0.8

0

Time step

D*


1 2 3

0.5

00

.55

0.6

00

.65

0.7

00

.75

0.8

0

Time step

D*


1 2 3

July 20, 2018 29/31


https://doi.org/10.1101/383596


References 633

1. Bundesen C, Habekost T. Principles of visual attention: Linking mind and brain. 634

2008;. 635

2. Nobre K, Kastner S. The Oxford handbook of attention. Oxford University Press; 636

2013. 637

3. Townsend JT, Ashby FG. Stochastic modeling of elementary psychological 638

processes. CUP Archive; 1983. 639

4. Kadohisa M, Petrov P, Stokes M, Sigala N, Buckley M, Gaffan D, et al. Dynamic 640

construction of a coherent attentional state in a prefrontal cell population. 641

Neuron. 2013;80(1):235–246. 642

5. Sternberg S. High-speed scanning in human memory. Science. 643

1966;153(3736):652–654. 644

6. Sternberg S. The discovery of processing stages: Extensions of Donders’ method. 645

Acta psychologica. 1969;30:276–315. 646

7. Sternberg S. Memory-scanning: Mental processes revealed by reaction-time 647

experiments. American scientist. 1969;57(4):421–457. 648

8. Schneider W, Shiffrin RM. Controlled and automatic human information 649

processing: I. Detection, search, and attention. Psychological review. 650

1977;84(1):1. 651

9. Bricolo E, Gianesini T, Fanini A, Bundesen C, Chelazzi L. Serial attention 652

mechanisms in visual search: A direct behavioral demonstration. Journal of 653

cognitive neuroscience. 2002;14(7):980–993. 654

10. Eriksen CW, Lappin JS. Internal perceptual system noise and redundancy in 655

simultaneous inputs in form identification. Psychonomic Science. 656

1965;2(1-12):351–352. 657

11. Eriksen CW, Spencer T. Rate of information processing in visual perception: 658

Some results and methodological considerations. Journal of Experimental 659

Psychology. 1969;79(2p2):1. 660

12. Atkinson R, Holmgren J, Juola J. Processing time as influenced by the number of 661

elements in a visual display. Perception & Psychophysics. 1969;6(6):321–326. 662

13. Townsend JT. Mock parallel and serial models and experimental detection of 663

these. In: Purdue centennial symposium on information processing; 1969. p. 664

617–628. 665

14. Bundesen C, Kyllingsbæk S, Larsen A. Independent encoding of colors and 666

shapes from two stimuli. Psychonomic Bulletin & Review. 2003;10(2):474–479. 667

15. Kyllingsbæk S, Bundesen C. Parallel processing in a multifeature whole-report 668

paradigm. Journal of Experimental Psychology: Human Perception and 669

Performance. 2007;33(1):64. 670

16. Li K, Vozyrev V, Kyllingsbæk S, Treue S, Ditlevsen S, Bundesen C. Neurons in 671

primate visual cortex alternate between responses to competing stimuli in their 672

receptive field. Frontiers in Computational Neuroscience. 2016;10:141. 673

doi:10.3389/fncom.2016.00141. 674

July 20, 2018 30/31


https://doi.org/10.1101/383596


17. Bundesen C. A theory of visual attention. Psychological review. 1990;97(4):523. 675

18. Luceno A. A family of partially correlated Poisson models for overdispersion. 676

Computational statistics & data analysis. 1995;20(5):511–520. 677

19. Diniz CA, Tutia MH, Leite JG, et al. Bayesian analysis of a correlated binomial 678

model. Brazilian Journal of Probability and Statistics. 2010;24(1):68–77. 679

20. Daley D, Vere-Jones D. An Introduction to the Theory of Point Processes, volume 680

I: Elementary Theory and Methods of Probability and its Applications; 2003. 681

21. Kass RE, Eden UT, Brown EN. Analysis of neural data. Springer; 2014. 682

22. Truccolo W, Eden UT, Fellows MR, Donoghue JP, Brown EN. A point process 683

framework for relating neural spiking activity to spiking history, neural ensemble, 684

and extrinsic covariate effects. Journal of neurophysiology. 2005;93(2):1074–1089. 685

23. Hodges JL, Le Cam L. The Poisson approximation to the Poisson binomial 686

distribution. The Annals of Mathematical Statistics. 1960;31(3):737–740. 687

24. Hong Y. On computing the distribution function for the Poisson binomial 688

distribution. Computational Statistics & Data Analysis. 2013;59:41–51. 689

25. Treue S. Visual attention: the where, what, how and why of saliency. Current 690

Opinion in Neurobiology. 2003;13(4):428–432. 691

26. Liu H, Agam Y, Madsen JR, Kreiman G. Timing, Timing, Timing: Fast 692

Decoding of Object Information from Intracranial Field Potentials in Human 693

Visual Cortex. Neuron. 2009;62(2):281–290. 694

27. Garrido MI, Kilner JM, Kiebel SJ, Friston KJ. Evoked brain responses are 695

generated by feedback loops. Proceedings of the National Academy of Sciences of 696

the United States of America. 2007;104(52):20961–20966. 697

July 20, 2018 31/31


https://doi.org/10.1101/383596


Distinguishing between parallel and serial processing in ... · Serial and parallel processing in...

Documents

Transcript of Distinguishing between parallel and serial processing in ... · Serial and parallel processing in...