Analysis of large scale spiking networks dynamics with spatio-temporal constraints: application to...

103
Analyzing large scale spike trains with spatio-temporal constraints: application to retinal data Supervised by Prof. Bruno Cessac Hassan Nasser

Transcript of Analysis of large scale spiking networks dynamics with spatio-temporal constraints: application to...

Prsentation PowerPoint

Analyzing large scale spike trains with spatio-temporal constraints: application to retinal dataSupervised by Prof. Bruno Cessac

Hassan Nasser

Thank you Pierre for this kind presentation.Hello everybody.

It is a great pleasure for me to present my work entitled Analyzing large scale spike trains with spatio-temporal constraints, with application on retinal data acquisitions.

This thesis was caoched by my professor Bruno Cessac at neuromathcomp team at INRIA.

I would like first to thank the examiners for reviewing my thesis and all the jury members for attending this presentation.

25 secondes1

Response variability

Biological neural networkStimulusSpike ResponseSRNeural prostheticsBio-inspired technologies

Time (ms)Trial 2

Let first present the context. Biological neural networks have been studied by researchers since many decades. One way to understand their functioning is to study the relation between the stimulus they receive and spike response they produce.

One important characteristic of biological neural networks is response variability. That is, if we repeat the same stimulus several time on the same neuron, the response will be slightly different from one trial to another. For that, we study this relation in a probabilistic framework.

Now we want to understand this functioning, not only to be happy with the new information we add to our encyclopedias, but for a practical reason. For instance, we want to develop neural prosthetics such as artificial retina and bio inspired technologies such as image compression. 35 secondes2

StimulusSpike ResponseSR

Spike trainstatistics

3

One interesting studied biological neural network is the retina. Experimentalists acquire the retinal activity at the level of ganglion cells using MultiElectorde arrays. After spike sorting, we get the spike response that we represent by a spike train. On the x axis we have the time and on the y axis the neurons label. Each dash represent the action potential of a neuron at a moment of time. We are not concerned by acquisition nor by spike sorting, but we are developing tools to study the spike trains statistics.

40 secondes3

Memory4

4

ProbabilisticModelsMaximum entropyPoint processIsing (Schneidman et al 06)Triplets (Ganmor et al 09)Spatial

No memory

5

The Maximum entropy and point process models. For sure there are other directions but I cant present all of them due to time limitations.

So, the first maximum entropy models were purely spatial, such as Ising and triplets. But the disadvantage of those models is that they do not consider memory effects, which means that successive patterns are independent.5

ProbabilisticModelsMaximum entropy1 time-step memory (Marre et al 09)Generalized Linear modelPoint processGeneral framework (Vasquez et al 12)Ising (Schneidman et al 06)Triplets (Ganmor et al 09)Spatio-TemporalSpatial

No memory

Limited to 1 time stepmemoryLimited to small scale

Neurons are considered conditionally independent given the past

HawksLinear Non Linear model

# of neurons doubles every 8 years !!6

Later on, Marre introduced a framework with one time step memory and Vasquez introduced a framework for general spatio-temporal models.

Concerning the point process framework, we can mainly find the Generalized linear models and Linear non-linear models. Although those models are biologically plausible, but statistically speaking- they consider that neurons are conditionally independent given the past. This contradicts with the population activity assumption about the neural behavior.

This inconvenient could be avoided using the maximum entropy models with memory. But the problem with the existent frameworks is that they are whether limited to time step of memory or to small scale. In parallel, with the advent of new MEA techniques, we are able to acquire more and more neurons activitites. However, There is no framework that allows to study maximum entropy models on networks of larger size.6

GoalDefinitionsBasic conceptsMaximum entropy principle (Spatial & Spatio-temporal).

Montecarlo in the service of large neural spike trains

Fitting parametersTests on synthetic dataApplication on real data

The EnaS software

DiscussionDevelop a framework to fit spatio temporal maximum entropy models on large scale spike trains7

And this is actually the goal of my thesis: developing a framework to perform statistics on large scale neural data using the maximum entropy framework 7

GoalDefinitionsBasic conceptsMaximum entropy principle (Spatial & Spatio-temporal).

Montecarlo in the service of large neural spike trains

Fitting parametersTests on synthetic dataApplication on real data

The EnaS software

DiscussionDevelop a framework to fit spatio temporal maximum entropy models on large scale spike trains8

And this is actually the goal of my thesis: developing a framework to perform statistics on large scale neural data using the maximum entropy framework 8

Spike objects

Empirical probability 9

So let me tell you first about the notation we use to describe spike objects.Omega is the whole spike train. Each event we call it omega i t, where i is the index of the neuron and t the time of the event. A spike pattern is w(t) and it represent the activity of the whole network at one time step. And finally, the spike block which represent the activity of the whole network between two time steps t1 and t2.

We want to compute the empirical probability of spike events mainly the frequency in the spike train. We denote empirical probability pi omega of T. I would like to remind here that it depends from the spike train length which is finite and the experimental sample we are observing.

And since T is finite, we will have fluctuation from sample to sample on the empirical probbaility of a given event. These fluctuations are ruled by the central limit theorem and this fact could be represented in confidence plots.

9

Confidence plot

10010

10001

10101

10000

10010

00110

100 Observed probability 1Predicted probability01

Confidence plot is a tool to evaluate the model taking into account the fluctuation. Basically, we compare between the predicted probability of the model and the probability measured from the data. So, we take a certain pattern, say this pattern for example. We plug this point into the graph where the empirical probability is the abcissa and the predicted probability is the ordinate. Ideally, this point must be on the equality line. But due to finite size of the sample, this point is allowed to be within the central limit bounds. We have chosen to take bounds with +- 3 sigma which means that the points are lying in the confidence bounds with a probability of 99.6 percent.

Those bleu points represent purely spatial patterns, but since we are interested in evaluating the prediction of spatio-temporal activity, we will add also blocks of higher memory depth.

This helps us to evaluate the quality of the model. Here is a real example, and I explained this in details because I am going to present it frequently in my slides.

10

Monomials

PairwisePairwise with 1 time-step delay Triplet

11

So a monomial is defined as the product of a selected group of events. Its value is 1 if all its events are equal to 1 and 0 if at least one of the events is 0. This monomial for example represent the instantaneous activity between two neuron and this represents a delayed activity. We can also represent triplets activity like red object where the 3 neurons are firing in the same time.And finally a monomail that represents the a 2 time step delayed pairwise correlation between the neurons 2 and 4.We are going to compute the empirical probability of those monomials from the spike train and we consider that we are assuming the stationarity.11

Imagine a spatial case Maximum entropyGiven some measure??12: N Individual activity monomials.Constraints:

Now imagine we are in the spatial case. And we consider individual activity monomials (which are N monomials) and pairwise instantaneous activity monomials which are N square monomials and we have information about all their averages. On the other hand, we have 2 power N possible network state. Now obviously, 2 power N is way bigger than N plus N squared.

Now imagine we want to formalize a probability distribution for the patterns. We can ask a lot of question about this distribution, one of them is: Can you predict the probability of any pattern w(t) given the information about rates and pairwise correlations??

And the answer is YES. In fact, although there is no a unique mapping that from the monomials averages to the patterns probabilities, but among all those possible distributions, there is one that you wanna look for it seriously, and thats the one that has the largest entropy and the reason you wanna look for that is not for the fact that the maximum always goes up or something like that but it is for a simpler reason that the ME distribution is the one that has the least structure; it is the one that describes the system as random as possible given something we have measured from the data.

The entropy is a well know quantity in the information theory and the constraints are defined on the features of our models which are the monomials. This ME distribution should satisfy that the predicted rates or pairwise correlations should be equal with respect to the model probability mu and the empirical probability pi.

12

Spatial modelsSought distributionStatistical entropyNormalizationParametersEmpirical measurePredicted measure

Partition functionAfter fitting parameters:Ising model

13

Our goal is to find a distribution mu that maximizes the entropy taking into account the constraints. Mu belong an ensemble of the space of stationary probability distributions M. This variational principle defines a unique potential H lambda that depends only on patterns at a given time and the time of this pattern could be taken 0 whenever is its happenning time because we are in the stationary case. This variational principle also defines a unique probability distribution mu lambda of omega of zero.

This example has been studied in shcneidman and collaborators 2006 where contraints are the individual neurons activities and the instantaneous pairwise correlations.13

Prediction with a spatial model

Spatial patterns14Observed probabilityPredicted probability

With this spatial model we can predict correctly the pattenrs of size one.14

Prediction with a spatial model

Spatio temporal pattern of memory depth 1

15Observed probabilityPredicted probability

However, the model fails to predict the probabilities of patterns of size 2. because, with respect to this model, the probability of a pattern of size 2 is equal to the product of probability of patterns of size one.15

Prediction with a spatial model

Spatio temporal pattern of memory depth 2

16Observed probabilityPredicted probability

The same applies for patterns of size 3. And the reason is that those models do not take into account a memory which means that they do not consider a transition probability from the past to the present. Now let me tell you how we use Markov chain to avoid this problem.16

Prediction with a spatial model

Spatio temporal pattern of memory depth 2

17

memorypatternObserved probabilityPredicted probability

The same applies for patterns of size 3. And the reason is that those models do not take into account a memory which means that they do not consider a transition probability from the past to the present. Now let me tell you how we use Markov chain to avoid this problem.17

The Spike train as a Markov ChainTimeNeuron #

Present18

The idea is to represent a spike train of size n time steps as a markov chain whose probability equal to the probability of transition from the memory block of size D to the present. Multiplied by

18

The Spike train as a Markov ChainTimeNeuron #

19

Multiplied by the previous transition probability19

The Spike train as a Markov ChainTimeNeuron #ChapmanKolmogorov equation

20

And we keep multiplying until the first memory block All those transition probabilities, and finally multiplied by the intial condition mu of omega 0 D. This is the equation of chapman-kolmogrov and it will allow us to formalize a probability distribution for the spike train taking into account the memory. Now, let us see how we characterize those transition probabilities in a markov chain formalism20

Markov states with memory

010011001010010101110101

21100101101

001000000

111011111

001100110

Legal transitionsIllegal transitions100010111

As we are using a Markov chain, we will have Markov states. The idea is to characterize the transitions between those markov states, which are blocks of size D. Now, in the context of spike train, some transition are legal and some are illegal. For example, here the transition is leagal because we have a common part of size D-1 between thos blocks. Unlike illegal transition where this common part do not exist.

Now, since the illegal transition does not exist, their probability is 0. However, legal transition are going to be characterized by the exponential of H, where omega 0 D is this purple block that corresponds to the transition. 21

001000000

111011111

001100110

Markov states with memory

Legal transitionsIllegal transitions100010111

010011001010010101110101

22100101101

22

Transfer matrixNon normalizedPerron-Frobenius TheoremRight eigenvectorLeft eigenvectorThe biggest eigenvalue

Using ChapmanKolmogorov equationCompute the average of monomials23

PressureEntropyEmpirical probability of the potential

So we will create a matrix that will contain all the transition. L is a square matrix of size 2 power ND. Thanks to the Perron Frobenius theorem, we are going to have access for the statistical properties of our system. We will compute the largest eigenvalue and the corresponding right and left eigenvectors. This largest eigne value is related to a quantity called free energy or topological pressure. For spatial case, this pressure is equal to the log of the partition funtion.

Now this quantity is very useful because it is associated to the monomials averages. For instance, the average of the monomial ml is equal to the derivative of the pressure. It also allow us to compute the Kullback-Leibler divergence which is equal to the pressure minus the empirical probability of the potential minus the entropy of the empirical probability. Note that the entropy is constant because it depends only on the data and this is very usefull because because it cancels while minimizing the divergence in order to fit the parameters.

Using the equation of Chapman Kolomorgov, the right and left eignevalue and the pressure are exactly what we need to precise the probability of an arbitrary spike block of size n and its given by this formula. 23

Setting the constraintsRandom set of parametersUpdate the parametersFinal set of parametersPredicted distribution ComparisonTransfer Matrix

24

24

Memory needNeuron numberRange: R = D+1 = 3

25

Now imagine we want to stock this transfer matrix in the computer memory. For each entry we need 1 byte. Now, for the whole matrix we need 2 power 2ND.So the memory need explode exponentially with the number of neurons. Here is an example with a range R = 3 wich correponds to two memory step.For 5 neurons, we need 1 MB and for 10 neurons we need 1Terra byte. Well, we are maybe still in the possible if we use big machines. However, for 20 neurons we go beyond the universe limits!! 1 million million terrabyte. Even the world companies that have the biggest capacity wont satisfy our need.

For that we developed a new method that combines Montecarlo sampling with the framework of chapman-kolmogrov and that can go for larger scales. 25

Small scaleLarge scaleTransfer matrix Montecarlo26

We are going to use NR = 20 as the limit between small and large scale since 20 is the limit of the transfer matrix.26

?Setting the constraintsRandom set of parametersUpdate the parametersFinal set of parametersPredicted distribution ComparisonTransfer Matrix

27

27

GoalDevelop a framework to fit spatio temporal maximum entropy models on large scale spike trains

DefinitionsBasic conceptsMaximum entropy principle (Spatial & Spatio-temporal).

Montecarlo in the service of large neural spike trains

Fitting parametersTests on synthetic dataApplication on real data

The EnaS software

Discussion28

28

Metropolis-Hasting (1970)Montecarlo states

Proposal function29

Our goal is to sample a probability distribution mu that reflects the parameters lambda. So we are going to begin with an initial Montecarlo state, change it over iterations, accept or reject the change with respect to some criterion.The montecarlo state is a spike train of size N neurons and contains small n times steps.

The acceptation/rejection policy is determinied by this formula, where Q is a proposal matrix. Since we are using the metropolis algorithm, Q is symmetric and cancels. Now, we still have to compute the mu ratio but we do not want to compute the s R L because we want to avoid the transfer matrix.29

Metropolis-Hasting

x30

So I am going to pick up a random event, flip it from 0 to one or from 1 to 0 and compute the mu ratio.30

Metropolis-Hasting

31x

Now, the s is the same in both cases so it cancel. Still the problem of R and L that we want to avoid.

31

32x

In order to avoid them, we fix the boudaries from 0 to D-1 and from n-D to n so that the R and L wont change from a transition to another and by consequence they cancel. We take an enough large Montecarlo sample so that few times steps wont affect the results.32

x

33

And finally we end up compute the ratio of the exponentials which is very easy to compute.33

Algorithm reviewChoose a random event and flip it

NoAccept the changeReject the changeYesUpdated Montecarlo spike train34

So let us review the related algorithm.We begin with a random spike train and a given set of parameters. We choose a random event between D and Ntimes-D, we compute the exponential. We test if this exponential is bigger than a random number between 0 and 1 in order to accept or reject the change we generate Nflip samples.We can adjust the small n and Nflip in our simulation.

34

Hassan Nasser, Olivier Marre, and Bruno Cessac. Spike trains analysis using Gibbs distributions and Montecarlo method. Journal of Statistical Mechanics: Theory and experiments, 2013. 35

35

Update the parametersMonte-Carlo?Setting the constraintsRandom set of parametersFinal set of parametersPredicted distribution Comparison

36

36

GoalDevelop a framework to fit spatio temporal maximum entropy models on large scale spike trains

DefinitionsBasic conceptsMaximum entropy principle (Spatial & Spatio-temporal).

Montecarlo in the service of large neural spike trains

Fitting parametersTests on synthetic dataApplication on real data

The EnaS software

Discussion37

37

Fitting parameters / conceptDudk, M., Phillips, S., and Schapire, R. (2004). Performance guarantees for regularized maximum entropy density estimation. Proceedings of the 17th Annual Conference on Computational Learning Theory.Small scale: easy to computeLarge scale: hard to computeBounding the negative log likelihoodDivergenceIterations38Relaxation

In practive, maximizing the entropy in untractable. For that, we search for another equivalent process which is tracktable, like minimizing the KLD. The idea is to build an iterative algorithm that minimizes the divergence at each iteration by updating the parameters.

But in fact, the trade is different between small and large scale. In small scale, we can compute the divergence thanks to the transfer matrix. However, since we avoided to compute this matrix, now we cannot compute the DKL. For that, we should search should search for another method with which we can create an equivalent to the divergence minimization process.

We found the answer in Dudik et al 04. In this paper, the authors minimized the divergence by bounding the negative log likelihood of the model to the data. They also use relaxation, which means that they allow some fluctuations on the empirical average which is exactly what we have on data.

However, the limitation is that their method takes into account only spatial constraints, which means systems without memory.38

Fitting parameters / conceptDudk, M., Phillips, S., and Schapire, R. (2004). Performance guarantees for regularized maximum entropy density estimation. Proceedings of the 17th Annual Conference on Computational Learning Theory.Small scale: easy to computeLarge scale: hard to computeBounding the negative log likelihoodDivergenceIterations39Relaxation Spatial constraints

In practive, maximizing the entropy in untractable. For that, we search for another equivalent process which is tracktable, like minimizing the KLD. The idea is to build an iterative algorithm that minimizes the divergence at each iteration by updating the parameters.

But in fact, the trade is different between small and large scale. In small scale, we can compute the divergence thanks to the transfer matrix. However, since we avoided to compute this matrix, now we cannot compute the DKL. For that, we should search should search for another method with which we can create an equivalent to the divergence minimization process.

We found the answer in Dudik et al 04. In this paper, the authors minimized the divergence by bounding the negative log likelihood of the model to the data. They also use relaxation, which means that they allow some fluctuations on the empirical average which is exactly what we have on data.

However, the limitation is that their method takes into account only spatial constraints, which means systems without memory.39

Fitting parameters / conceptHassan Nasser and Bruno Cessac. Parameters fitting for spatio-temporal maximum entropy distributions: application to neural spike trains. Submitted to Entropy.Bounding the Divergence40With relaxationSpatio-temporal constraints

40

Cost functionRelaxation >0Using MontecarloParallel:Sequential:41Parameters updateNumber of parameters

So the idea is that we have to minimize the cost function given by the difference between the Kullback Leibler divergence that corresponds to two different set of parameters, lambda and lambda prime, where lambda prime equal to lambda + delta, delta being the update in parameters. But as you notice here that we still have this problem of pressure. But actually, we can also compute the cost function in this way, where the limit to infinity is precisely imposed by the spatio temporal constraints. We do not have this limit in the spatial case. And thanks to this formula, we can avoid the problem of computing the pressure.

In fact, we will not compute Cf but we will bound it with some quantity that also converges to zero. For this we used the same strategy of Dudik and collaborators but extended to spatio temporal case and obtain this formula where we took into account the fact that a real sample is always finite.

Actually, we consider two cost functions, one corresponds to the sequential update and one corresponds to a parallel update.Epsilon is the relaxation that we allow on the monomials averages. The monomials averages could be computed by Montecarlo instead of the transfer matrix. This allows us to fit large scale models.41

Setting the constraintsRandom set of parametersUpdate the parametersFinal set of parametersExact predicted distribution ComparisonMonte-Carlo

Fitting

42

42

Updating the target distributionMontecarloTaylor ExpansionPrevious distributionExponential decay of correlation In practice n is finite43

43Now, as Ive just said before, whenever we update the parameters, we should compute a new distribution using Montecarlo. We wanted to see whether we can compute this new distribution with another manner and that takes less time than Montecarlo.

Since the monomials averages is a function of the parameters, then we can make a taylor developement for distribution where mu is the previous set distribution and the second order term is the second derivative of the pressure with respect to parameters, easy to compute because it nothing but the sum of correlation function between monomials j and k at time n. n could be taken as a finite number because of the exponential decay of correlation function between monomials. Again, this series contains a summation over n because of memory. Without memory, we only have correlation at a single time step and n = 0.

Likewise, the third order term, involves correlation between sets of 3 monomials but it is very heavy to compute numerically. So we limited to the second order expansion and we apply the taylor expansion only for small delta.

Demo

44

Now I am going to show you a casting of the parameters fitting. I did it on an example of 5 neurons because it is fast os that you can see the evolution of things quantities along the iterations.

What we see now here is the spike train of five neurons and the model we want to fit. Here we are fitting a pairwise model with 1 time step memory. 44

45Error evolvingObserved probabilityPredicted probability

What you will see in the video is 3 graph which will change over the iterations.This graph show the parameters values on the y axis. The x axis represent the index.This graph shows the comparison between the averages of monomials predicted by the model and measured in the distribution.And finally here we show the Kullback-Leibler divergence.Please ommit this part of the video.45

46Parameters fitting Demo

So at the beginning we see the parallel update. Please observe that all parameters and monomials are updated at once and the divergence vanishes.Now, we see the sequential update and we see how only one parameter is updates and how the divergence vanishes with times.46

Synthetic data setsSparse

Rates parameters

Higher order parametersDense

47

So if we generate data from parameters with a guassian distribution, we are going to have dense data sets. However, we want to test the algorithm on data that look as closer as possible to real data set, which are sparse.

In order to do so, rates parameters should be distributed as gaussian but with a negative mean so that they would be all negative. However, higher order parameters should belong to a gaussian distribution of mean zero.

Like that, we have two data set with different configurations where of them looks like real data. I am going to show tests for fitting parameters on both data sets.47

DenseSparseN=20, R = 3NR = 60Random (known) parameter/monomials Try to recover the parametersErrors on parameters48Synthetic spike train

Now I am going to present test we performed to validate the algorithm robustnes on large scale. We generate synthetic data with a know parameters and monomials then we try to recover those parameters and we compute the L1 error between the known and the recover parameters. We tested on dense and sparse data with models with several dimensions and we see the good performance of the parameters with small errors whatever is the dimension.

48

Comparing blocks probabilitiesN = 40 / Spatial

N = 40 / Spatio-temporal

NR = 40 NR = 8049

We have done also another test. We generate synthetic data with an Ising distribution. Now the distribution is know. We do like the previous experiment, we try to compute the parameters with the algorithm. We see that the models predicts well the monomials averages and the patterns probabilities of size 1.We do the same for a pairwise example. We see the good prediction of probabilities of size 1,2 and 3.49

Data Courtesy: Michael J. Berry II (Princeton University) and Olivier Marre (Institut de la vision, Paris).

Purely spatial pairwisePairwise with 1 time-step memoryBinned at 20 msApplication on retinal data50

Schneidman et al 2006

But for now, I am going to talk about analyzing another set of retinal data provided by our collaborators Professors Olivier Marre and Michael Berry. It consists on a 40 neurons data set. We binned at 20 ms and we fitted spatial and spatio temporal models on 20 and 40 neuron from this data set.50

Real data: 20 neuronsSpatial PairwiseSpatio-temporal Pairwise

51

Here I show the comparison between empirical and predicted probabilities of patterns. The fact that all the points are aligned to the diagonal means that the fitting process has worked well and then the parameters are well estimated. Now, the confidence plots shows that the patterns of size 1 are predicted and those of size 2 and 3 are not well predicted. Fitting the same data with a spatio temporal model shows that patterns of size 1,2 and 3 are well predicted, which higlight the advantage from adopting spatio-temporal models for modelling spike trains.51

Real data: 40 neuronsPairwise Spatial Pairwise Spatio-temporal

52

Now in the case of 40 neurons, we see the same phenomenon with and Ising model. However, with a temporal model, looking at the monomials graph, we see that we couldnt arrive to a good fit since the monomials averages are not well predicted and this is presumably due to the finite size of the data set. I am going to discuss this point few slides later.52

GoalDevelop a framework to fit spatio temporal maximum entropy models on large scale spike trains

DefinitionsBasic conceptsMaximum entropy principle (Spatial & Spatio-temporal).

Montecarlo in the service of large neural spike trains

Fitting parametersTests on synthetic dataApplication on real data

The EnaS software

Discussion53

53

Event neural assembly Simulation (EnaS)V12007V22010V32014Thierry Viville Bruno CessacJuan-Carlos Vasquez / Horacio-Rostro Gonzalez/Hassan NasserSelim Kraria+ Graphical user interfaceGoal:Analyzing spike trainsShare research advances with the communityC++ & Qt (interface Java, Matlab, PyThon)54

EnaS has been created as an open source Library by Pr Bruno Cessac and Thierry Vieville in the aim of analyzing spike trains and making some impact from sharing the recent tools for spike train analytics.

Now we are with the third version thanks to a team work. The version three contains also a Graphical User Interface developed jointly with Selim Kraria (a software engineer). Selim is also working on the development of the library itself and his role is to maintain a professional architecture and good practices of the codes.The library is developed in C++ and the graphical user interface in Qt (which is a C++ library). It also allows interfacing with other frameworks such as Java, Python and Matlab.54

ArchitectureEnaSRasterBlockGibbs PotentialGraphical User interfaceData managementFormatsEmpirical statisticsGrammarDefining modelsGenerating artificial spike trainsFittingMontecarlo process (Parallelization)Interactive environmentVisualization of stimulus and response simultaneously.Demo

ContributionsContributions55

From architecture point of view, the library is divided in three main parts. Rasterblock is responsible for data management issues (loading files in several formats for instance, etc), the Grammar that will explain in the next slide and performing empirical statistics.The second part, Gibbs potential is the one responsible for defining models, fitting process and generating artificial spike trains. From implementation point of view, I mainely contributed to this part and I will present how we implemented the parallelization of the Montecarlo process.Finally, the last part, the graphical user interface developed by selim and tested by me and Bruno that offers an interactive environment for visualizing spike train, stimulus and both of them simultaneousely. We can also perform emipircial statistics and modelling using the Graphical User Interface.

Those parts interacts with each other by sharing information or jointly computing quantities such us the divergence computing and confidence plots where we need to know about the data and the model. The graphical user interface also uses the algorithms implemented in the library.

Now lets talk about the grammar55

010011001010010101110101

Grammar

010011001010010101110101

Needed in:Computing empirical distribution.Montecarlo sampling.Divergence and Entropy computing.Confidence bounds.

Grammar56Only observed transitions

Since we are working with spatio-temporal features, we need to know the transitions several times during the computations. For instance, the occurrence of those transitions will be used in computing the empirical distribution, divergence, Montecarlo, confidence plots and many other things. For that, we created a data structure called Grammar and that we compute only once when we want to analyse of a spike train and then we use it whenever needed. Here also I would like again to emphasize the advantage of Montecarlo over the transfer matrix, and that is, in Montecarlo, we need only to stock the existing transitions, T-D transition maximum. In the counterpart, we need to stock 2 power NR transitions for the transfer matrix. So how we create the Grammar data stucture?56

Grammar data structure010011001010010101110101

010011001010010101110101

N = 3.D = 2.

011001

011

PrefixSuffix57

Transition

Lets take this spike train for example. It is for 3 neurons and we are going to work with a model of memory equal to 2 time steps.So we are going to scan this spike train. The first step, we take the memory block that we are going to call prefix and then we take the pattern and put it under the memory block. This means that we observed the red pattern after this bleu block.57

Grammar data structure010011001010010101110101

010011001010010101110101

N = 3.D = 2.

011001

011

100111

001

PrefixSuffix58

We are going to iterate and do the same procedure.58

Grammar data structure010011001010010101110101

010011001010010101110101

N = 3.D = 2.

011001

011

100111

001

PrefixSuffix59

2This transitions appears 2 times!

We are going to iterate and do the same procedure.59

Grammar data structure010010101010100101111001

010011001010010101110101

N = 3.D = 2.

011001

011

100111

001

PrefixSuffix60

2

101010

011

100

A new suffix

If a pattern have a prefix that already exists in the tree, I am going to add it under this existing prefix!60

Map: C++ data containerSorting in a chosen order

61

Now there is a C++ data container called MAP. The map sorts data structure in an order that we should precise. We are going to store those transitions in this MAP where prefixes and suffixes are in an increasing order. And this is to simplify the search when we look for a particular transition occurrence.61

Map: C++ data containerSorting in a chosen order

It appeared two times!

62

And now, if I need to know how many times this transition appeared in the spike train, I will search for the prefix and then the suffix. And finally, I will read that it happened two times.62

ArchitectureEnaSRasterBlockGibbs PotentialGraphical User interfaceData managementGrammarEmpirical statisticsDefining modelsGenerating artificial spike trainsFittingMontecarlo process (Parallelization)Interactive environmentVisualization of stimulus and response simultaneously.Demo

63

Now I am going to explain the parallelizing of Montecalro process63

Parallelization of Montecarlo process

xxxxxxxxxxxxxxxxxxxx

Personal Multi-processors computer: 2-8 processors Cluster (64 processors machines at INRIA)OpenMp

64MPIMore processors / More time consuming in our case

The idea is that instead of having one processor flipping event in the spike train and computing Montecarlo states, why not to have several processors working on this Montecarlo spike train? The more processors we have, the less time we need to compute a target distribution.

Thanks to the OpenMp framework, we can make this parallelization on personal computers where we can find 2 to 8 processors or Clusters such as the INRIA cluster where there are 64 processors per computers.64

Processor 1Processor 2Example with 2 processors 65

Now I am going to explain how this could happen with an example of 2 processor. First I will divide the Montecarlo spike train in two parts. I will say to the first processor you work here and to the second you work here. As I promissed you at the begining of the presentation, we are not going to touch the boundaries in order to get ride of R and L.65

Boundaries between processors

xx66

14 months + February!1 Processor

Now, since those two processors are executing jobs in the same exact instant, it migh happen that both of them choose those 2 event within D time steps at their boundaries. Now, when we compute the exponential of delta H, we compute it between D and +D. The event at left need that all the guys around him to be stable in order to decide its preference (whether to accept or reject flipping). Since the event at right is moving, the event at left will decide but might not take the right decision. As a consequence, the instantaneous flipping in this region migh give some errors but they are very very small since 2D is a way smaller than Ntimes.66

ArchitectureEnaSRasterBlockGibbs PotentialGraphical User interfaceData managementGrammarEmpirical statisticsDefining modelsGenerating artificial spike trainsFittingMontecarlo process (Parallelization)Interactive environmentVisualization of stimulus and response simultaneously.Demo

67

Finally, let me show you a demo about EnaS.67

Data courtesy: Gerrit Hilgen, Newcastle University, Institute of Neuroscience, United Kingdom

Interface design:Selim Kraria68

EnaS Demo

The graphical user interface contains 4 windows. I am going to show a demo for the first two windows now. The data I am showing are provided by Gerrit Hilgen from the university of newcastle and the Interface design credits go to Selim Kraria.68

Modelling window69

Now I am going to show you the forth window dedicated for the modelling.If you follow me you will understand everything. Oups I am sorry!!First of all, we should precise what is the model we will be using. We can choose amongst a set of canonical models we can find widely in the litterature such as Ising, pairwise and triplets. We should also precise whether we want to compute with Montecarlo or the transfer matrix. We can also choose the parameters of Montecarlo algorithm such as the number of Montecarlo states, the length of Montecarlo sample, the number of sequential and parallel iterations.We can also choose to compute the pattern probabilities with Montecarlo or the transfer matrix and precise until the maximum size of blocks we want to compare. In this example it is 3.

Finally We choose what we want to compute, Here I have choose to compute the parameters, the divergence and the confidence plots. Now, we run the simulation and during this time we do something else After some hours we get the results. The first things that appears is the coefficients. Second, the comparison between monomials averages. This is an excellent indicator to visually determine if we arrived at the convergence or not. For instance, If you remember, I told you we need 500 iterations to arrive to convergence, but in this case we see that we converged after 100 parallel iterations and 40 sequential iterations. This graphs shows the evolving of the error and finally here we see the confidence plots and the divergence.69

Hassan Nasser, Selim Kraria, Bruno Cessac. EnaS: a new software for analyzing large scale spike trains. In preparation.70

70

Conclusion71Montecarlo. FittingEnaS SoftwareModelsPerspectivesParametersPerspectivesPerspectivesA framework to fit spatio-temporal maximum entropy models on large scale spike trains

I have presented for you a method for Fitting parameters combined with a Montecarlo techniques that allows modeling spike trains with the maximum entropy framework. The main problem we encountered is models hypothetis. This imposes a limitations that could be for sure resolved and I am going to propose my perspectives for at this level.After obtaining the parameters, the trivial question that we could ask is how to make sense of them. So I am also going to discuss this point and give some perspectives.Finally, I am going to suggest perspectives concerning the EnaS software.71

Synthetic data Vs Real dataSynthetic dataPotential shape is known(monomials are known)Real data72Potential shape is unknown(monomials are unknown)Fitting onlyGuessing the shape +Fitting

Let me first highlight a particular point. Performing tests on synthetic data and analyzing real acquition are completely 2 different words. Although we are using the same methods and analyzing spike trains in both cases.In fact, in the case of synthetic data, we know the distribution with which we generated the data. If we want to compute the parameters of a model applied to the data, we know with which model we perform the test.

However, in real data we have no a priori idea about the distribution. We fit with a hypothesis model. In this case, we have one additional task which is guessing the potential or the model shape.

72

MonomialsModelCanonicalIsing, pairwise with delay, triplets, Small scaleLarge scaleBig computation timeNon Observed monomialsEstimation errors.

Pre-Selection73Rodrigo Cofre & Bruno Cessac40 neurons

And since those are the monomials which determine the shape of the model and we are using canonical forms of potential. We have an exploding number of parameters at large scale in the contrary of small scales. For instance, for a 40 neurons data set, we have to fit aroud 4000 parameters as the graph shows. This result in a big computation time and fluctuation on the parameters estimation Those disadvantages could be reduced if we rethink our model.

This could be done by a naive thresholding. However, we work with another performing method and actually this issue is an ongoing work in our lab by another PhD student. 73

Making sense of parametersModel parametersEvaluate the importance of particular type of correlationsPossibility of generalize the model prediction on new stimulus

74

We fit the model and we have the parameters. What is the real value of those parameters. For sure that the application is not to be happy by looking at the values of the parameters. But there or two possible applications.First, as we have shown, we used the parameters to evaluate whether the model is able to predict the probability of patterns of size bigger than one and we succedded in showing that temporal connections are necessary to be taken into account in the statistics. Now, another application for that is to ask the model to predict the probability of patterns given another stimulus. 74

Stationary Maximum entropy modelNew stimulus1- Statistics2- No new responseSR

Stimulus Dependent Maximum Entropy models (Granot-Atedgi et al 13)New stimulusNew spikeResponseSR75

So let me get back to my first slide.So I illustrated The answer is not, because we assume that spike train in stationary and, although we compute a conditional probability distribution given the stimulus, but the stimulus is not taken into account explicitly in the model.

Recent works of Granot and collaborators in 2013 have shown that it is possible to take into account Maximum entropy models where the stimulus is taken into account in the constraints. Their work is an improvement of the LN model in the way that they have taken into account correlation between cells in order to avoid the conditional independence between neurons. They have shown that their model outperforms LN models (to continue, see if they did that on pattenrs of size more than 1).

75

EnaSRetinaSpike sortingSpike trainStimulusVisualizationVisualization+Empirical analysis + Maximum Entropy modellingNowFuture- More empirical observation packages- More neural coding functionalitiesSpike sorting- Receptive field- Neurons selectionType identification- Stimulus design

- Features extraction76Retina modelsVirtualRetina

Let me discuss a perspective concerning the EnaS software.Before, I would like to mention that over the 7 years of developement of EnaS, which was mixed between research, implementation and software development, EnaS has advanced a lot. I would suggest that this ambitious project dont stop here.In the whole trade of spike train analysis, if we look at the value chain, from designing the stimulus until having the spike trains, enas intervenes currently for visualizing the stimulus and the response and making some empirical statistics on the response plus Maximum entropy modeling. I am having the following question now: How we can get EnaS better and sell it or make people use it more?

The answer is not saying to people think differently like Steve Jobs. But it is in adding more functionalities that allows people to make more things with the data. For example, we can add stimulus design functionalities and stimulus features extraction. For example, detection of object speed in natural images in order to see how cells respond to several speed. Adding receptive field computing and ganglion cells identification techniques. This could help in selecting subsets of cells and studying how cells of different types respond to stimulation. We can add spike sorting for sure. And finally we can add more empirical statistics packages and more functionalities in neural coding.

76

Next Starting a company in IT/Data Analytics:First prize in innovative project competition (UNICE Foundation).Current project: Orientation in education using real surveys.EnaS is in perspective in collaboration with INRIA.

Caty Conraux & Vincent Tricard77

So Just a little slide to tell about what I am going to do after my PhD. Maybe it sounds a bit strange, but actually I am going to start a company in IT for data analytics afterwards. The idea is to develop applications for specific data analytics. My early adopter is the foundation of the University of Nice represent now by the director Caty conraux and Vincent Tricard. The foundation awarded my the first prize after the last competition they organized where there were around 30 candidates.The first application I am going to develop is a webservice available for the university students and that help them in choosing their specialization based on pertinent information exploited on real surveys realized by the university on the graduates who found a job. The EnaS software will be also in perspective in collaboration with INRIA hopefully.77

Thanks collaborators Adrian Palacios

Olivier Marre

Michael J. Berry II

Gaper Tkaik

Thierry Morra78

78

79

79

AppendixTuning Ntimes.Tuning Nflip.Validating montecarlo algorithm.Tunnig delta.MPI Vs OpenMP, memory.Why MPI is not better than OpenMP?Computational complexity of the Montecarlo algorithm.Review of Montecarlo / Nflip.Number of Iterations for fitting.Fluctuations on parameters / Non existing monomials.Epsilon on fitting parameters.Binning.Tests with several stimulus.Granot-Atedgi et al 2013Granot-Atedgi et al 2013

80

80

Transfer MatrixMontecarlo81

81

DenseSparse

82

We did the same experiment but this time while setting Nflip to 100 and varying Ntimes. In the dense an sparse case, we see that both transfer matrix and Montecrlo performs closely to each other. Which means that the performance is quite similar.82

MontecarloTransfer Matrix

ExactMontecarloTransfer MatrixExact83

So now to test Nflip in the dense case, we generate random parameters with dense profile and compute the target distribution given those parameters with Montecarlo and with the transfer Matrix. We now that the transfer matrix gives exact result. With montecarlo, we want to test the effect of k, so we compute the target distribution in term of k. We compare and compute the error and see that it vanishes to 0 when we increase k. We do the same for the sparse case. We see than we need to flip more in the sparse case than in the dense case.83

DenseSparse84

84

Multiprocessors computers:Personal computer (2-8 processors).Cluster (64 processors machines at INRIA).

Parallel programming frameworks:OpenMp: The processors of the same computer divide the tasks (live memory (RAM) is shared).MPI: several processors on each computer share the task (Memory in not shared).

4 processors Time/4.Parallelization

64 processors Time/64.

85

85

MPIOpenMP is limited to the number of processors on a single machine. With MPI, 64 processors x 10 machine 640 processors.Although we though it would take less time with MPI, but !

Master computer1 cluster of 64 procAnother cluster of 64 procAnother cluster of 64 procAnother cluster of 64 proc

The whole Montecarlo Spike train

At each change of the memory, there will be a communication between the clusters and the master At each flip loss of time in communication more than computing86

86

Computing complexity

87

87

Computational complexityTaken for running this algorithm:

Start: Random spike trainChoose a random event and flip it

NoAccept the changeReject the changeYesUpdated Montecarlo spike train

On a cluster of 64 processors: 40 Neurons Ising: 10 min40 Neurons Pairwise: 20 min88

88

Choose a random event and flip it

NoAccept the changeReject the changeYesUpdated Montecarlo spike trainTuningAlgorithm review

89

The algorihtm is standard but the question is to determine Nflip and Ntimes. Now a logic assumption is that the bigger the Ntimes and the number of neurons, the more we should flip, so we can write Nflip as the product of N, Ntimes and an integer k, to be determined. One way of tuning those variables is to generate a know synthetic distribution and run Montecarlo with respect to several values of Nflip and Ntimes. The advantages of synthetic data is that we already know the answer. However, the problem is that they might not necessarly imitate real data and then we wont test if our tuning is robust with data sets. To verify that the algorithm is robust, we generate two data types which I am going to present before showing the tuning results.89

How many iterations do we need?

90

90

91

91

Problem of non- observed monomialsConvex

92

The true expected ditribution is equal to the derivative of the pressure, which is equal to the approximated distribution.Now, the estimated distribution depends on the parameters. After a taylor expansion, we see that the fluctionation eta on the monomials averages and the fluctions epsilon on the parameters are related with the inverse of the transfer matrix.The matrix at right presents the inverse of the transfer matrix and it looks to have plenty of zero values.This means the inverse will have biggest values at some points. As a consequence, a small fluctuation on the monomials averages values may damage the parameters estimations.Since we are working here with a matrix that we multiply by the fluction and if we have a one small entry; then we might have ubnormal results for not only one parameter, but also other parameters.

92

BinningChange completely the statistics.700% of more new patterns appear when we bin at 20.Should be studied rigorously.

93

Another point I would like to adress is the binning value. Researchers have been using a binning of 20 ms. I did a little empirical test on only 5 neurons just to show how binning changes the histogram of appearing patterns. So at right we see the histogram of pattern of size one for binning = 1 at the top then 5, 10 and finally 20 at the bottom. The number of pattern the appears in the empirical distribution with 20 ms of binning increase 700%. Amazing!! This cimpletely change the information we are getting about the data. Although binning has been used to create a form of stationarity to get along with the maximum entropy formalism, but it should be taken into account rigouresely.93

0110000101

2 neurons + binning = 50000000101

11

0000000000

00

10

01

.

95- Loss of information- Loosing biological scale - More dense spike train- Less non-observed monomialsWhy spike trains have been binned in the literature?No clear answer.Relation between taking binning as a substitute for memory is not convincing.Might be because it allows having more monomials Less dangerous for convexity convergence is more guaranteed.

Making sense of parameters

Stimulus 1Stimulus 2Stimulus 4Stimulus 3

96

I fitted a Ising model on a set of 5 neurons that have been given 4 different stimuli. The results show that rates parameters are almost the same. However, correlation parameters are very different. Does that make sense now? So We are really at the point of saying that we have got the famous P[R/S]. 96

P[S|R]Einat Granot-Atedgi, Gaper Tkaik, Ronen Segev, Elad Schneidman. Stimulus-dependent Maximum Entropy Models of Neural Population Codes. Plos Comp. Biol. 2013.

97

Scheidman 2006LNL

97

Cross validation on small scaleVasquez et al 201398

Relaxation99

100Schneidman et al 2006 stimulus

Confidence bounds in linear scale101

Confidence bounds in log scale102

103