Neuroscience and Information theory - UIC Engineeringdevroye/courses/ECE534/project/Johnson.pdf ·...

1

Neuroscience and Information theoryJohnson Jonaris GadElkarim

University of Illinois at ChicagoE-mail: [email protected]

Abstract

Information theory has been introduced as a mathematical framework that can be used in many fieldsthat encounter statistical and random data analysis. One important field of interest in the biological sciencesis neuroscience. This paper gives a slight introduction about some utilities of information theory in solvingproblems that encounter neuroscience research.

I. INTRODUCTION

In 1948 a paper published by Claude Shannon has laid the foundations of one of the most important sciencesin the mankind history. The published paper titled “A Mathematical Theory of Communication” [1] whichhas created the mathematical science of “Information Theory” was constructed at first as a quantification ofinformation framework. Since its inception, information theory have found applications in many areas otherthan its main target - communication - such as quantum computing, cryptography, molecular biology and manyother areas. This paper focuses on the mutual relation between information theory and neuroscience.An ongoing debate about the usefulness of using information theory in biological phenomenons is still present[2,4]. One of the most famous arguments that is for information theory is the saying that: “The Founder ofInformation Theory Used Biology to Formulate the Channel Capacity” [2]. An important result of Shannon’swork was the channel capacity which describes the maximum number of bits one is able to send over a channelper transmission. Engineers have tried from the 1940s to reach the limits put by Shannon in his paper withoutsuccess; it is only lately that some new data coding schemes have succeeded to reach those bounds. It is verycuriously to find out that the channel capacity is very well applied to living organisms and their products suchas the “Channel capacity of the Molecular Machines” [2]. Surprisingly it has been found that codes discoveredfor communications teach us new biology when the same codes are found in a biological system.In addition, we can find a vast amount of applications in neuroscience where information theory has beenfound to be very useful. This paper will only focus on the following applications: first neural-coding, secondMagnetic Resonance Imaging (MRI) including functional MRI (fMRI) and Diffusion Weighted Imaging (DWI)techniques. In this paper, a simple introduction to each area will be given to explain briefly its basic idea andthe way information theory took part to address and solve some of the problems found. The paper assumes goodknowledge of information theory definitions and vocabulary, like Entropy, mutual information, K-L divergence,for a full definition please review Shannon’s paper at [1].

II. NEUROSCIENCE

Neuroscience is the science that studies the nervous system. The nervous system is one of the importantorgans in any living animal which contains a network of specialized cells called neurons. The main task of thenervous system is to coordinate the actions of an animal allowing the transmission of signals between differentparts of its body. Due to advancements in technology and biological equipments, the scope of neurosciencehas broadened to include different approaches to study the molecular, structural, functional, computational, andmedical aspects of the nervous system. Lately neuroscientists have been using newer techniques for their studies,techniques can vary from studying the molecular level of individual nerve cells and coding their output signals(Neural Coding) to imaging the whole brain using (Magnetic Resonance Imaging) and try to reconstructbrain fibers (Tractography) in order to form the complex network that interconnects different parts of the brain(Brain Connectivity) using (functional MRI) and (Diffusion Tensor Imaging) techniques.

A. Neuron:

2

Fig. 1. Neuron and Synaptic space.

The neuron is the building block for the neural sys-tem; they receive signals from other neurons through in-put terminals called dendrites and they emit signals toother neurons as electrochemical waves or pulses (actionpotential) travelling along thin fibers that act like cablescalled axons, which causes chemicals called neurotransmit-ters to be released at junctions called synapses and thenthe signal is received at the other end of the synapses.The synapse is a narrow space between the neuron andother neurons or a muscle that can be seen as a com-munication end [10]. There are three main types of neu-rons:

• Motor neuron: are specified to send messages away from theCentral Nervous System to affect body movement.

• Sensory neuron: are specified in the senses of taste, touch,hearing, smell, and sight. They send messages from the sensoryreceptors to the Central Nervous System.

• Inter neuron: are usually only found in the Central NervousSystem, they send messages between both motor and sensoryneurons.

A neuron can have over 1000 dendritic branches and around 10000 synaptic connections to other neuronsmaking connections with tens of thousands of other neurons. There are about 50 to 100 billion neurons in thehuman brain and 100 trillion synapses; they form a complex network of connections. The interactions of allthese types of neurons form neural circuits that generate an organism’s perception of the world and determineits behaviour.

III. NEURAL CODING

Neural coding is a field within neuroscience that is concerned with how information is represented in thebrain by networks of neurons. Information can be of many types such as sensory like vision and hearing andmemory. Neural coding main goal is to describe and determine the relationship between the stimulus (theexternal agent, action, or condition that elicits a physiological or psychological activity or response) and theindividual or multiple neuronal responses. In neural coding research we have two main activities going:

• Encoding: which is recording and trying to understand how neurons respond due to different types ofstimuli, moreover we want to create models that are able to predict responses from other stimuli.

• Decoding: which is the challenge to construct a stimulus for a certain type of responses.

A. Information Theory and Neural Signal Processing

Three of the main important questions that are usually asked in neural coding are: How is informationencoded/decoded? What does a response tell us about a stimulus? What is the fidelity of information representedby neural signals? Which pushes toward the main question about what is a neural code? Information theoryhas been a fundamental part to help answering part of those questions, more information can be found at[3,6,10].

A response made of a train of spikes may contain information based on different coding schemes. It hasbeen shown by ED Adrian [16] that in motor neurons the strength at which a contracted muscle is relaxedonly depends on the ’firing rate’ (the average number of spikes per unit time) which can be viewed as a ’ratecode’ which is the oldest (80 years) type of code. However, recently a complex type of code such as ’temporalcode’ which is based on the precise timing of single spikes that may be synchronized to an external stimulussuch as in the auditory system or be generated intrinsically by the brain neural circuitry. A big debate is stillgoing in the neuroscience community about whether neurons use rate coding or temporal coding.

We begin by exploiting the relationship between information theory entities used for communication channelmodel and the neuroscience adaptation to this model in figure (2). The mapping from neuroscience to informa-tion theory is as follows: the “Stimulus Generator” is the “Source” that creates a stimulus based on a specific

3

Fig. 2. Neuron Coding Diagram

input whether it is natural such as watching a picture or provoked in experiments; the “Neural Coding” is the“Encoder” that uses different schemes such as visual coding scheme that codes visually sensed informationfrom the eye; the “Neuron” is the “Channel” through which data travels and at last the “Stimulus Decoder” isthe “Decoder” at the receiver side that will decode the spikes or pulses and causes a certain response such astranslating a scene in the case of the visual system.

We can represent the stimulus (intensity, velocity) as a random variable S which have a probability p(s j)that represent that a stimulus condition takes the value sj ; and the response (firing rate) is represented by therandom variable R which have the probability p(r i) that represents that a neural response takes the value r i.We will consider only quantized discrete value amplitudes as continuous value carries an infinite amount ofinformation which is almost impossible to process.We start with some definitions: H(S) represents the maximum information that can be encoded, while H(R)represents the uncertainty about the response. Neuronal noise is defined as the conditional entropy

H(R|S) = −∑s

∑r

p(s)p(r|s) log p(r|s)

which measures the uncertainty remaining in the neural response when the stimulus conditions are known.On the other hand, Stimulus Equivocation is defined to be just the opposite of the neuronal noise

H(S|R) = −∑r

∑s

p(r)p(s|r) log p(s|r)

Fig. 3. The mutual information for a binaryencoding of a binary stimulus. Pe is the probabilityof an incorrect response being evoked. The plot onlyshows Pe ≤ 1/2 because values of Pe > 1/2correspond to an encoding in which the relationshipbetween the two responses and the two stimuli isreversed and the error probability is 1 - Pe.

We can now observe how Information theory is able to measurethe significance of how neural responses vary with different stimuli;moreover it is able to determine how much information is containedin neural responses about stimulus parameter values. This is doneby computing the mutual information between R and S which isthe difference between the total response entropy and the averageresponse entropy on trials that involve repetitive presentation of thesame stimulus.

I(R;S) = H(R)−H(R|S)We can observe that subtracting the entropy when the stimulus

does not change (i.e. H(R|S) ), removes from the total entropy thecontribution from response variability that is not associated with theidentity of the stimulus. This answers the second question and givesa more robust answer unlike using the response variance whichmight only be used if the assumed distributions are Gaussian thatonly depends on the mean and the variance.

To clarify this we can state three examples:

• If the response R of the neuron is not affected by the stimulus S (i.e. p(r|s) = p(r) ), the response in Rcontains no information about the stimulus S. If R is statistically independent from S we can see that themutual information will equal to zero.

• If the neuron produces a distinct response for each stimulus (i.e. p(r|s) = 1 if r = r s ), the response Rwill contains all the information about the stimulus S (i.e. I(R;S) = H(S) = H(R) ).

• Finally, assume that the stimulus S takes two possible values: s+, s- and the neuron responds with two ratesof spikes r+, r-. Let us assume that s+ is associated with r+ and that s- is associated with r-; and that the

4

probability of an incorrect response is Pe, meaning that for the correct responses P [r+ |s+] = P [r−|s−]= 1 - Pe, and for the incorrect responses P [r+ |s−] = P [r− |s+] = Pe. Assume that the two stimuli arepresented with equal probability so that P[r+] = P[r-] = 1/2. We can calculate the noise entropy as:

H(R|S) = −(1− Pe) log(1− Pe)− Pe logPe

Thus, the mutual information is

Im = 1 + (1− Pe) log(1− Pe) + Pe logPe

Hence we can see that when the encoding is error free (Pe = 0), the mutual information is one bit, whichis equal to both the full response entropy and the stimulus entropy. When the encoding is random (Pe = 1/2),the mutual information goes to zero. See figure (3).

Fig. 4. Flow chart of how to measure thechannel capacity of a neuron. The same stimulusis presented n times while the responses Ri aremeasured (left). These responses are averaged toobtain the average response Ravg. The differencebetween each Ri and Ravg become the noise tracesNi (middle). These are Fourier-transformed to thenoise power spectra Ni(f) (right), which can beaveraged as well. Bottom left, power spectra ofthe mean response (red) together with the meanpower spectra of the noise (yellow). Bottom right,ratio of these two functions, the so-called signal-to-noise ratio or SNR, together with the cumulativeinformation rate.

One of the most interesting problems to neuroscientists is todetermine the information that is contained in a neuron or aneuronal system. Many complications arise by the vast number ofconditions on the time-varying stimuli, and the type of the systemof neurons (parallel, serial, etc.), and the fact that responses mightdepend on the history of the stimulus (time delay). A method calledthe Upper Bound stated in [3,12] has addressed this problem usinginformation theory. The steps of this method are as follow, seefigure (4):

• Repetition: the same stimulus is presented n times and theresponse of each stimulus is measured The responses areaveraged.

• We calculate the individual noise by subtracting the averageresponse from the individual response.

• Individual noise power spectra is computed by mean of Fouriertransform of the individual noise

• Average noise power spectrum and average response powerspectrum are computed.

• Signal-to-noise ratio defined as the Ratio of response / noisespectrum is computed

• Mutual information rate is calculated as follow

I(S,R) =

k∫

0

log [1 + SNR(f)] df

This method has many advantages such as:

• More efficient because based on strong assumptions such assignals are Gaussian, however we need enough data to validatethose assumptions.

• Calculates an upper bound of the transferred information• If the response and neuronal noise have white noise charac-

teristics then the stimuli are optimally encoded.

In the above discussion we portrayed how information theorymay try to answer one of important questions in neural-coding.However we need to understand what can be our limits when using an information theoretic approach in neural-coding. In fact we have two main limitations. One important concern is the difference between neural decodingalgorithms and Information Theory. In neural-coding, decoding algorithms main intention is to predict whichstimulus S produces a particular response R. An example is Bayesian decoding which calculates the probabilitythat, given a response r, stimulus s was present: p(s|r) = p(r|s)

p(r) which may cause loss of information. HoweverInformation theory tends to quantify the overall knowledge about the stimulus under study which needs a hugeamount of data and is difficult to apply to large neuron populations such as the brain’s complex circuitry.

5

Fig. 5. (A) Detected decaying signal in the xy plane from the receiver coil.(B) Proton spinning in presence of a static B0 field.

Fig. 6. (A) and (B) Gradient fields in the X and Y directions. (C) Proton return to thermal equilibrium parallel to the Z axis after RFpulse.

IV. MAGNETIC RESONANCE IMAGING:

In MRI we try to see a bigger picture by mean of imaging the anatomical structure of a specified part ofthe body. In this paper more focus will be given on brain applications since it acts as the Central NervousSystem. The Magnetic Resonance Imaging or MRI is considered as a non-invasive (non-ionizing) technique forimaging the human body. The MRI is based on the phenomena of Nuclear Magnetic Resonance or NMR. Thephenomena states that in the presence of a static magnetic field, the hydrogen proton present in the nucleus ofthe atom will presses around the same axis of the applied static field with a frequency ω called the LarmorFrequency:ω = γB0, where γ is a property for the hydrogen atom called the gyromagnetic ratio and B0 is theintensity of the static field, figure (5-B). Due to the spinning of a huge number of hydrogen atoms in each mmcube, a net magnetization moment M0 is generated in the direction of the static field.

However we cannot measure this frequency in the longitudinal axis (the axis parallel to the applied staticfield), we can only measure it when the proton spins in the transverse plane to the static field. To realize this,we apply a radio frequency pulse (RF) with frequency equivalent to the Larmor frequency. The RF pulse willperturb the hydrogen atom and cause a tipping to the proton in the transverse plane, in this position we candetect a signal from the spinning proton by simply applying a coil in this plane, figure (5-A). Since the hydrogenproton is spinning, it will induce a voltage in the coil (Faraday Law) and then we can measure the resultedsignal. The proton’s perturbation will not last forever; the proton will soon go back to its thermal equilibriumwhich is spinning parallel to the longitudinal axis; that will cause the fainting of the detected signal from thecoil, figure (6-C).The frequency of the induced signal can be identified by mean of Fourier transform which will show thefrequency components of the captured signal. Since the Larmor frequency depends on the applied field intensity,it has been found that we can determine the location of any point in space by applying a gradient field thatvaries linearly in the required direction superimposed over the main static field, figure (6-A,B). Hence theLarmor frequency of every point in space will differ depending on its location. By transmitting RF pulses ata specific directions and time intervals with specific frequencies; this will allow us to specify which locationis space to excite or perturb and hence we can detect signals coming from that specific location.

It is well known that the human body has 70% of its mass of water which contain a lot of hydrogen. If we

6

Fig. 7. Slice selection by mean of RF pulse in the required direction.

want to image a certain volume (for example the human brain), we have first to choose our slice orientation.As shown in figure (7) an RF pulse will determine the slice of our choice. Then the chosen slice is divided intoa two dimensional grid and scanned voxel by voxel (volume unit in 3D space instead of pixel which is used in2D) using repeated RF pulses in the chosen slice’s plane. The scanning process is simply gathering the signalgenerated by each voxel. It has been shown that the detected signals are considered to be the two dimensionalFourier transform of the selected slice image. Hence to reconstruct the required image, we need to compute theinverse Fourier transform of the scanned slice space (also called K-space).For more information please review[21]. The successive application of RF pulses and of gradient fields in different directions and time periods arecalled “Pulse Sequences”, several pulse sequences have been developed to reveal different characterization ofthe human tissues and chemical structures.

A. Information Theory and MRI:

Unlike neural-coding where information theory takes a fundamental part; in MRI and its various techniquesinformation theory is only used to solve a specific part in different problems. One of these issues in MRI isimage registration which is the process of transforming the acquired images into a new coordinate system. Theprocess is important for surgical applications. For example, in neurosurgery it is currently useful to identifytumours with magnetic resonance images, yet the established stereotaxy technology (a method used for locatingpoints within the brain using an external, three-dimensional frame of reference usually based on the Cartesiancoordinate system) uses computed tomography (CT) images. The ability to register images between the twodifferent imaging techniques allows one to transfer the coordinates of tumours from the MR images into theCT stereotaxy.In this problem we want to register the MR and CT images taken for the same individual. Although bothimages comes from the same anatomical source (same person) yet they are very different, in fact MR andCT are useful in conjunction precisely because they are different. We need to construct a function F () that isable to predicts approximately the CT image from the corresponding MR value. Using F () we could evaluateregistrations by computing F (MR). A proposed method to find and computes F is presented in [26]. Themethod computes it by maximizing the mutual information that one volumetric image provides about the other.The method defines a reference volume denoted by u(x) where x is the coordinate of the voxel x and a testvolume denoted by v(x). T is a transformation from the coordinate frame of the reference volume to the test

7

Fig. 8. Three orthogonal central slices of the MRI data are shown with the edges from the registered and reformatted CT data overlaid.

volume, v(T (x)) is the test volume voxel associated with reference volume voxel u(x). The problem now isto find T̂ that maximizes the mutual information between u(x) and v(T (x)) such that:

T̂ = argmaxT I(u(x); v(T (x))

)where x is treated as a random variable over coordinate locations in the reference volume. In this problem themutual information is defined using entropy:

I(u(x); v(T (x))) = h(u(x)) + h(v(T (x)))− h(u(x); v(T (x)))

where h(u(x)) is the entropy in the reference volume, and is not a function of T; h(v(T (x))) is the entropy ofthe part of the test volume into which the reference volume projects, it encourages transformations that projectu into complex parts of v. The third term h(u(x); v(T (x))) contributes when u and v are functionally related;it encourages transformations where u explains v well.

However in reality we do not have access to the probability density functions of the random variables u(x)and v(T (x)). In fact when registering medical image data the entropy of a random variable is estimated byapproximating the probability densities as a superposition of functions centred on the elements of a sample.The above technique have many advantages over previous methods used to solve the same problem, first itworks directly with the image intensity data so no pre-processing or segmentation is required while othertechniques depend on the a priori quality of the segmentation process with a lot of assumption on the nature ofthe signals. Also it provides more flexible and robust way, unlike intensity based methods such as correlationtechnique which is based on minimizing a summed quadratic penalty function that represents the differencebetween two image intensity data. Unlike correlation method which is sensitive to negating the intensity of oneof the two signals to be compared, mutual information is not affected by the negation of either of the signals.Moreover, since it is based on stochastic approximation, it can be efficiently implemented.

B. Diffusion Tensor Imaging:

The Diffusion Tensor Magnetic Resonance Imaging (DTMRI or DTI) is a newly discovered imaging technique(1990) based on the conventional MRI. Diffusion refers to the random (Brownian) motion of molecules in afluid (liquid or gas).Water diffusion in isotropic medium was first described by Einstein and modelled as aGaussian displacement model that is equally likely on a sphere. For a better understanding of the phenomena,we can imagine dropping a water droplet in a glass of water; Einstein introduced the “displacement distribution”based on a probabilistic framework (a Gaussian distribution) to describe the motion of an ensemble of particles(the water droplet) undergoing diffusion that will traverse a certain distance within a particular time frame, orequivalently, the likelihood that a single given particle will undergo that displacement. Provided the number ofparticles is large and that they are free to diffuse, the displacement of molecules (r 2) from their starting point

8

Fig. 9. (A) Spherical model in isotropic case. (B) Ellipsoid model in anisotropic case with the 3 eigenvectors as the main axis and thesquare root of the eigenvalues as a scaling factor. (C) Examples of tensors, a) a sphere, b) a horizontal oriented ellipsoid, c) an inclinedellipsoid.

over a time (t), averaged over all the molecules in the sample, is directly proportional to the observation time(t). The constant of proportionality is the “self-diffusion coefficient D”. The relationship is: r 2 = 6Dt. Theunits of D are distance2/time (e.g., m2/s). The distribution of squared displacements takes a Gaussian form,with the peak being at zero displacement and there being equal probability of displacing a given distance fromthe origin is the same, see figure (9-A).

However in the human tissues, the medium is anisotropic; hence we are not able to model the diffusion as aspherical Gaussian distribution any more since the diffusion coefficient will not be the same in every direction.Tissues and fibers acts as barriers to water molecule that limit their diffusivity to a certain direction. The diffusionnow is represented as an ellipsoid in 3D (not a sphere) and described by a mathematical representation calledTensor. The “diffusion tensor D’’ is represented by a symmetric 3 X 3 matrix that describes the diffusion indifferent directions:

D =

⎡⎣DXX DXY DXZ

DYX DY Y DY Z

DZX DZY DZZ

⎤⎦

The diagonal elements of the D matrix correspond to the diffusion along three orthogonal axes, while theoff-diagonal elements represent correlation between displacements along those orthogonal axes. We need toclarify that the matrix is formed of 6 unknowns since it is symmetric. We can consider the D matrix as the3D covariance matrix of displacements in a given time. Since the ellipsoid axis might not fall exactly parallelto the measuring main axis, we usually describe it by the eigenvalues and eigenvectors of the D matrix. Theellipsoid principal axes are given by the eigenvectors, and their lengths are given by the diffusion distance ina given time t, we can see that the ellipsoid axes are scaled according to the square root of the eigenvalues,figure (9-B).Since the human tissues contain a lot of water molecules especially in the brain, using their movement weare able to track the neural connectivity and directionality in the brain’s white matter. Each voxel is describedby a preferred direction of diffusion described by a “diffusion tensor D”. The diffusion data is collected afterapplying a specific “pulse sequence” developed by Stejskal, E. O. and Tanner in 1965 that reveals the diffusivecharacterization of water molecules in tissues, [29]. The data must be collected from at least 6 different directionsto allow computing the 6 different unknowns of the D matrix from the following equation:

S = S0e−bgTDg

Where S is the acquired signal, S0 is an ordinary MRI signal with no diffusion, b is a parameter matrix thatdepends on the experiment and the direction of acquisition, g is a unit vector describing the diffusion encodinggradient (magnetic field) direction, T denotes transpose and D is the “Diffusion tensor”. For more informationabout DTI please review [20]

9

C. Information Theory and DTI:

One of the main applications of the DTI is Tractography, which is the brain fiber reconstruction using thediffusion matrix at each voxel giving information about the circuitry of the brain. Disease and injury can changethis circuitry and may cause cuts and deteriorations. A recent study by A. D. Leow, S. Zhu, and L. Zhan, [7]have initiated the concept of “Tensor Distribution Function’ or TDF”. We have described the diffusion model asa Gaussian distribution assuming that fibers do not cross and that they only flow in parallel bundles. Howeverin reality fibers do cross at certain regions. The TDF is a distribution function that models fiber crossingand non-Gaussianity in diffusion MR images. To resolve fiber orientations, we need to find the peaks of thisdistribution.In the mentioned paper, “Exponential Isotropy” has also been introduced as a measure to the degree of anisotropyof the diffusion process based on the TDF. The measure quantifies the overall anisotropy per voxel. We knowthat the Shannon entropy H of any probability function is a measure of the randomness which in the case ofthe TDF function will be inversely related to the certainty of the estimated dominant fiber directions, i.e. itwill be a measure of the isotropy of the diffusion at the specified voxel.

H [p(D)] = −∫

D∈Dp(D) log p(D)dD

where p(D) is the TDF distribution defined per voxel, D is the six-parameter space of the diffusion tensorD.The “Exponential isotropy” or EI is proposed as follows:

EI [p(D)] = e− ∫

D∈Dp(D) log p(D)dD

The EI is a measure of isotropy instead of anisotropy; hence if we took the ideal case where we only haveone fiber where p(D) is equal to 0 everywhere except at one point in the tensor space D the EI will be equalto 1. In the ideal two fibers crossing p(D) have only values at two points in the tensor space and it will be0.5 if they have equal weights the EI will equal to 2. Thus, in general, it is expected that the EI will take avalue that is approximately proportional to the number of dominant fibers.

An application based on tractography is DTI segmentation, where we need to identify famous fiber bundlesand structures in the brain like the “corpus callosum”, figure (10). The identification helps in diagnosing manydiseases such as: Strokes, Tumours, Alzheimer’s disease and Schizophrenia. In this problem many methodshave been used to define a distance measure between the DTI tensors. A recent paper by Zhizhou Wang andBaba C. Vemuri [19] has used the KL divergence as a method to measure the DTI distance.One of the characteristics of the D matrix is that it is symmetric positive definite (SPD) at each voxel and itcan also be considered as the covariance matrix of a local Gaussian distribution. A measure of dissimilaritybetween SPD tensors of neighbouring pixels can be the Kullback-Leibler (KL) divergence. From Einsteinequation for water diffusion, given a diffusion tensor (D), the displacement of water molecules (r) starts froma given location at time (t) is a random variable with the following probability density function with zero mean:

p(r|t,D) =1

2π√2Dt

e−rT D−1r

4t

We can conclude that the covariance matrix of (r) = 2tD. The distance between diffusion tensors can bederived from an information theoretical distance measure between Gaussian distributions as a Kullback-Leibler(KL) divergence for a pair of probability density functions p(x) and q(x) is given by:

KL(p||q) =∫

p(x) logp(x)

q(x)dx

However, KL divergence is not symmetric; we use the J-divergence to symmetrize it:

J(p, q) = 0.5(KL(p||q) +KL(q||p))

The proposed diffusion tensor “distance” is the square root of the J-divergence of the corresponding Gaussiandistributions:

10

Fig. 10. Corpus Callosum fibers reconstructed using Tractography.

d(D1, D2) =√J [p(r|t,D1), p(r|t,D2)

The defined measure is used as a novel discriminant measure into the “region based active contour models”[20] method used in image segmentation techniques. The method assumes the existence of different regions ina given image and tries to define contours that work as surrounding boundaries for those regions.There have been other techniques used to calculate such distance like the tensor Euclidean distance obtainedby using the Frobenius norm, although it is much simpler but they lack of affine invariance (when geometricproperties remain unchanged by affine transformations, i.e. non-singular linear transformations and translations)which is a strong feature present in the J-divergence technique and strongly required to image segmentation.

D. functional MRI:

Functional Magnetic Resonance Imaging or fMRI is one type of MRI scan developed in 1990 that is concernedabout localizing current active areas in the brain (whether it is an animal or a human) during a specific task(or idle). The main idea in fMRI is to measure the change in the oxygenated blood and its flow inside thebrain. Active areas will need more oxygenated blood than inactive areas, fMRI measures the level of “Blood-oxygen-level dependence or BOLD” through a physiological process called the hemodynamic response; moreinformation can be found at [27,28].

E. Information Theory and fMRI:

Functional MRI is now widely used for disease diagnosis like Alzheimer disease; however it is more widelyused in research related detecting correlation patterns of activated brain regions and tasks performed by thesubject of study. Recent researches have tried to use fMRI data in order to understand brain connectivity. Manytypes of statistical analysis techniques have been applied with goal of finding similarity in active regions. Onestudy [23] have used mutual information in order to understand how brain categorizes natural scenes. Mutualinformation has been computed between brain voxels to select those who are highly informative with each

11

experiment. Among thousands of brain voxels, the selection of relevant voxels that will help in predictingviewing condition from the neural response in the brain is a critical step. In each experiment fMRI have beenused to locate active brain region when a specific task was requested from the subject (see different naturalscene images). The mutual information is computed as follow:

I(V, L) = − 1

n

n∑i=1

log pk(Vi)− 1

n

n∑i=1

log pk(Li)− 1

n

n∑i=1

log pk(Vi, Li)

where n is the number of data-points observed, L i is a random variable corresponding to the experimentlabel for ith data point, Vi is a 7-dimensional random variable, V i = (vi1, vi2, vi3, vi4, vi5, vi6, vi7) with eachentry corresponding to one of 7 voxels? values at data point i, p k(xi) is the probability density function atpoint xi which is computed using the k-nearest neighbour algorithm (kNN is a method for classifying objectsbased on closest training examples in the feature space [30]) estimated at x i (x can be V or L). We selectlocations with the highest mutual information helps in choosing relevant voxels that can be used lately in anymachine learning algorithms which will give a space map about informative voxels with respect to the requiredtask.

After choosing relevant voxels, we used them and models the region that lies in between any two braindistributed regions as a communication channel. We then measure the mutual information across those tworegions which might give a measure about their relationship form functional point of view. Multivariateinformation analysis is computed to estimate shared information between two different sets of voxels as follow:

I(V ;S|L) = H(V |L) +H(S|L)−H(V ;S|L)where V and S are random variables for sets of 7 voxels. L is the experiment label.

Not only did this method confirmed ordinary expected connectivity but new connections have been revealedspecially for areas that have multiple connectives with various parts in the brain specially those associatedwith language processing. This shows the strong effect of using mutual information in a problem such as brainconnectivity. Perhaps a strong aspect of this method was using sets of voxels (7) and dealing with them as inmultivariate information analysis way unlike old methods that used to treat individual voxels as independent.

V. CONCLUSION

In this paper we have give a slight description of an important application to information theory which isneuroscience; we have seen how fundamental its effect is in neural-coding science describing basic definitionsfor embedded information that lies in neural responses. Also we have discussed how divergent is informationtheory in other applications related to neuroscience such as MRI, DTI and fMRI; moreover how efficient andsuperior are the results when using it. Many other researchers [18,17,24] have used information theoretic basedanalytic techniques in order to solve problems in neuroscience. However we need to know the limits we havewhen using this powerful mathematical tool that was initially built to describe communication problems. Carefulconclusions must be exhaustively studied before confirming a specific result. There are many problems thatremain unsolved and unanswered question especially in the field of neural-coding where information theorycan take a great part to answer them. However care must be taken in order to identify where we can use apowerful tool such as information theory.

VI. REFERENCES:

1) C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379-423,1948.

2) THOMAS D. SCHNEIDER: The Founder of Information Theory Used Biology to Formulate the ChannelCapacity, IEEE Eng Med Biol Mag.; 25(1): 30?33., 2006.

3) Alexander Borst and Frdric E. Theunissen: Information theory and neural coding, nature neuroscience,vol 2 no 11, Nov 1999.

4) Don H. Johnson: Information Theory and Neuroscience: Why is the intersection so small? InformationTheory Workshop, ITW ’08. IEEE, 2008.

5) Ralph Linsker: PERCEPTUAL NEURAL ORGANIZATION: SOME APPROACHES BASED ON NET-WORK MODELS AND INFORMATION THEORY Annu. Rev. Neurosci. 13:257-81, 1990.

12

6) W Bialek, F Rieke, R R de Ruyter van Steveninck, and D Warland: Reading a neural code. Science(N.Y.), 252(5014):1854-1857, June 1991.

7) A. D. Leow, S. Zhu,L. Zhan, K. McMahon, G. I. de Zubicaray, M. Meredith, M. J. Wright, A. W. Toga,and P. M. Thompson, The Tensor Distribution Function, Magnetic Resonance in Medicine 61:205-214,2009.

8) Tournier, J.D., Calamante, F., Connelly, A.: Improved characterization of crossing fibers: optimization ofspherical deconvolution parameters using a minimum entropy principle. Proceedings of the 13th AnnualMeeting of the ISMRM, Miami, USA, p. 384., 2005.

9) R Salvador, A Martnez, E Pomarol-Clotet, S Sarr, J Suckling, and E Bullmore: Frequency based mu-tual information measures between clusters of brain regions in functional magnetic resonance imaging.NeuroImage, 35(1):83-88, March 2007.

10) Peter Dayan, L. F. Abbott: Computational and Mathematical Modeling of Neural Systems, MIT Press;1st edition, Dec 2001.

11) Pillow, JW and Simoncelli, EP.: Dimensionality reduction in neural models: an information-theoreticgeneralization of spike-triggered average and covariance analysis. Journal of Vision, 6(4):414-428, 2006.

12) R.Q. Quiroga, S. Panzeri: Extracting Information from Neuronal Populations: Information Theory andDecoding Approaches?, Nature Reviews Neuroscience (10), pp. 173-185, 2009.

13) Koepsell, Kilian; Sommer, Friedrich T.: Information transmission in oscillatory neural activity. BiologicalCybernetics, Vol. 99 Issue 4/5, p403-416, 14p, Nov 2008.

14) S. Haykin: Neuronal Networks - a Comprehensive Foundation, Chapter 10, Prentice Hall, 1999.15) A.C.C. Coolen, R. Khn, P. Sollich: Theory of Neural Information Processing Systems, Part III, Oxford

University Press, 2005.16) Adrian ED and Zotterman Y.: The impulses produced by sensory nerve endings: Part II: The response

of a single end organ. Journal of Physiology 61: 151-71, 1926.17) Niki, K.; Hatou, J.; Tahara, I.: Structure analysis for fMRI brain data by using mutual information and

interaction, Vol. 3 p928 - 933, 1999.18) W. Senn1, K. Wyler, H.P. Clamann, J. Kleinle, H.-R. Luscher, L. Muller: Size principle and information

theory, Biol. Cybern. 76, 11-22, 1997.19) Zhizhou Wang, and Ba C. Vemuri: DTI Segmentation Using an Information Theoretic Tensor Dissimilarity

Measure, IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 24, NO. 10, p 1267-1277, October2005.

20) Heidi Johansen-Berg, Timothy E.J. Behrens: Diffusion MRI: From quantitative measurement to in-vivoneuroanatomy, Academic Press, 2009

21) VADIM KUPERMAN: Magnetic Resonance Imaging PHYSICAL PRINCIPLES AND APPLICATIONS,Academic Press, 2000.

22) RIEMI RONFARD: Region-Based Strategies for Active Contour Models, International Journal of Com-puter Vision, 13:2, 229-251, 1994.

23) Barry Chai, Dirk B. Walther, Diane M. Beck, Li Fei-Fei: Exploring Functional Connectivity of the HumanBrain using Multivariate Information Analysis, NIPS, 2009.

24) Jack W. Judy, Allan J. Tobin: NeuroEngineering - The Integration of Neuroscience with Engineering,International Conference on Engineering Education, 2003

25) Giovanni de Marco: Effective Connectivity and Brain Modeling by fMRI, Advanced Studies in Biology,Vol. 1, no. 3, 139-144, 2009.

26) William M. Well, Paul Viola, Hideki Atsumi, Shin Nakajima: Multi-Modal Volume Registration byMaximization of Mutual Information, Medical Image Analysis Volume 1, Issue 1, Pages 35-51, March1996.

27) P Jezzard PM Matthews SM Smith: Functional MRI : An Introduction to Methods, Oxford UniversityPress, 2003.

28) Functional magnetic resonance imaging. http://en.wikipedia.org/wiki/Fmri29) Stejskal, E. O. and Tanner, J. E.: Spin diffusion measurements: spin echoes in the presence of a time-

dependent field gradient, J. Chem. Phys. 42, 288-292, 1965.30) K-nearest neighbour algorithm. http://en.wikipedia.org/wiki/KNN

Neuroscience and Information theory - UIC Engineeringdevroye/courses/ECE534/project/Johnson.pdf ·...

Documents

Transcript of Neuroscience and Information theory - UIC Engineeringdevroye/courses/ECE534/project/Johnson.pdf ·...