Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian...

12
Bayesian inference of structural brain networks Max Hinne a,b , Tom Heskes a , Christian F. Beckmann b , Marcel A. J. van Gerven b a Radboud University Nijmegen, Institute for Computing and Information Sciences, Nijmegen, The Netherlands b Radboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands Abstract Structural brain networks are used to model white-matter connectivity between spatially segregated brain regions. The presence, location and orientation of these white matter tracts can be derived using diffusion-weighted magnetic resonance imaging in combination with probabilistic tractography. Unfortunately, as of yet, none of the existing approaches provide an undisputed way of inferring brain networks from the streamline distributions which tractography produces. State- of-the-art methods rely on an arbitrary threshold or, alternatively, yield weighted results that are difficult to interpret. In this paper, we provide a generative model that explicitly describes how structural brain networks lead to observed streamline distributions. This allows us to draw principled conclusions about brain networks, which we validate using simultaneously acquired resting-state functional MRI data. Inference may be further informed by means of a prior which combines connectivity estimates from multiple subjects. Based on this prior, we obtain networks that significantly improve on the conventional approach. Keywords: structural connectivity, probabilistic tractography, hierarchical Bayesian model 1. Introduction Human behavior ultimately arises through the interac- tions between multiple brain regions that together form networks that can be characterized in terms of structural, functional and effective connectivity (Penny et al., 2006). Structural connectivity presupposes the existence of white- matter tracts that connect spatially segregated brain re- gions which constrain the functional and effective connec- tivity between these regions. Hence, structural connec- tivity provides the scaffolding that is required to shape neuronal dynamics. Changes in structural brain networks have been related to various neurological disorders. For this reason, optimal inference of structural brain networks is of major importance in clinical neuroscience (Catani, 2007). Inference of these networks entails two steps. First is the estimation of the white matter tracts. The second step consists of obtaining the network that captures which regions are connected, based on the earlier identified fibre tracts. In this paper, we focus on the latter step. For the first step, we use diffusion-weighted imaging (DWI), which is a prominent way to estimate structural connectivity of whole-brain networks in vivo. It is a vari- ant of magnetic resonance imaging (MRI) which measures the restricted diffusion of water molecules, thereby provid- ing an indirect measure of the presence and orientation of * Corresponding author at Radboud University Nijmegen, Faculty of Science, Institute for Computing and Information Sciences, Post- bus 9010, 6500 GL Nijmegen, The Netherlands. Tel.: +31 24 365 21 72, Fax: +31 24 365 27 28 Email address: [email protected] (Max Hinne) white-matter tracts. By following the principal diffusion direction in individual voxels, streamlines can be drawn that represent the structure of fibre bundles, connecting separate regions of grey matter. This process is known as deterministic tractography (Conturo et al., 1999; Chung et al., 2010; Shu et al., 2011). Alternatively, fibres may be estimated using probabilistic tractography (Behrens et al., 2003, 2007; Friman et al., 2006; Jbabdi et al., 2007). This comprises a model for the principal diffusion direction that is then used to sample distributions of streamlines. Ulti- mately, the procedure results in a measure of uncertainty about where a hypothesized connection will terminate. A benefit of the probabilistic approach is that it explicitly takes uncertainty in the streamlining process into account. Apart from studies focusing on particular tracts, much research has been devoted to the derivation of macro- scopic connectivity properties, that is, whole-brain struc- tural connectivity. Several approaches have been sug- gested to extract whole-brain networks from probabilis- tic tractography results (Robinson et al., 2008; Hagmann et al., 2007; Gong et al., 2009). Unfortunately, inference of whole-brain networks from probabilistic tractography esti- mates remains somewhat ad hoc. Typically the underlying brain network is derived by thresholding the streamline distribution such that counts above or below threshold are taken to reflect the presence or absence of tracts, respec- tively. This approach is easy to implement but it has a number of issues. First, the threshold is arbitrarily chosen to have a particular value. In a substantial part of the liter- ature, the threshold that is used to transform the stream- line distribution into a network is actually set to zero (Hag- Preprint submitted to NeuroImage November 29, 2012

Transcript of Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian...

Page 1: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

Bayesian inference of structural brain networks

Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb, Marcel A. J. van Gervenb

aRadboud University Nijmegen, Institute for Computing and Information Sciences, Nijmegen, The NetherlandsbRadboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Abstract

Structural brain networks are used to model white-matter connectivity between spatially segregated brain regions. Thepresence, location and orientation of these white matter tracts can be derived using diffusion-weighted magnetic resonanceimaging in combination with probabilistic tractography. Unfortunately, as of yet, none of the existing approaches providean undisputed way of inferring brain networks from the streamline distributions which tractography produces. State-of-the-art methods rely on an arbitrary threshold or, alternatively, yield weighted results that are difficult to interpret.In this paper, we provide a generative model that explicitly describes how structural brain networks lead to observedstreamline distributions. This allows us to draw principled conclusions about brain networks, which we validate usingsimultaneously acquired resting-state functional MRI data. Inference may be further informed by means of a priorwhich combines connectivity estimates from multiple subjects. Based on this prior, we obtain networks that significantlyimprove on the conventional approach.

Keywords: structural connectivity, probabilistic tractography, hierarchical Bayesian model

1. Introduction

Human behavior ultimately arises through the interac-tions between multiple brain regions that together formnetworks that can be characterized in terms of structural,functional and effective connectivity (Penny et al., 2006).Structural connectivity presupposes the existence of white-matter tracts that connect spatially segregated brain re-gions which constrain the functional and effective connec-tivity between these regions. Hence, structural connec-tivity provides the scaffolding that is required to shapeneuronal dynamics. Changes in structural brain networkshave been related to various neurological disorders. Forthis reason, optimal inference of structural brain networksis of major importance in clinical neuroscience (Catani,2007). Inference of these networks entails two steps. Firstis the estimation of the white matter tracts. The secondstep consists of obtaining the network that captures whichregions are connected, based on the earlier identified fibretracts. In this paper, we focus on the latter step.

For the first step, we use diffusion-weighted imaging(DWI), which is a prominent way to estimate structuralconnectivity of whole-brain networks in vivo. It is a vari-ant of magnetic resonance imaging (MRI) which measuresthe restricted diffusion of water molecules, thereby provid-ing an indirect measure of the presence and orientation of

∗Corresponding author at Radboud University Nijmegen, Facultyof Science, Institute for Computing and Information Sciences, Post-bus 9010, 6500 GL Nijmegen, The Netherlands. Tel.: +31 24 365 2172, Fax: +31 24 365 27 28

Email address: [email protected] (Max Hinne)

white-matter tracts. By following the principal diffusiondirection in individual voxels, streamlines can be drawnthat represent the structure of fibre bundles, connectingseparate regions of grey matter. This process is known asdeterministic tractography (Conturo et al., 1999; Chunget al., 2010; Shu et al., 2011). Alternatively, fibres may beestimated using probabilistic tractography (Behrens et al.,2003, 2007; Friman et al., 2006; Jbabdi et al., 2007). Thiscomprises a model for the principal diffusion direction thatis then used to sample distributions of streamlines. Ulti-mately, the procedure results in a measure of uncertaintyabout where a hypothesized connection will terminate. Abenefit of the probabilistic approach is that it explicitlytakes uncertainty in the streamlining process into account.

Apart from studies focusing on particular tracts, muchresearch has been devoted to the derivation of macro-scopic connectivity properties, that is, whole-brain struc-tural connectivity. Several approaches have been sug-gested to extract whole-brain networks from probabilis-tic tractography results (Robinson et al., 2008; Hagmannet al., 2007; Gong et al., 2009). Unfortunately, inference ofwhole-brain networks from probabilistic tractography esti-mates remains somewhat ad hoc. Typically the underlyingbrain network is derived by thresholding the streamlinedistribution such that counts above or below threshold aretaken to reflect the presence or absence of tracts, respec-tively. This approach is easy to implement but it has anumber of issues. First, the threshold is arbitrarily chosento have a particular value. In a substantial part of the liter-ature, the threshold that is used to transform the stream-line distribution into a network is actually set to zero (Hag-

Preprint submitted to NeuroImage November 29, 2012

Page 2: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

mann et al., 2007, 2008; Zalesky et al., 2010; Vaessen et al.,2010; Chung et al., 2011). However, probabilistic stream-lining depends on the arbitrary number of samples thatare drawn per voxel. This implies that, as more samplesare drawn, more brain regions are likely to eventually be-come connected given a threshold at zero. Alternatively,the number of streamlines can be interpreted as connectionweight (Bassett et al., 2011; Zalesky et al., 2010; Robinsonet al., 2010), or a relative threshold can be applied (Kadenet al., 2007). This way, the relative differences betweenconnections remain respected. Unfortunately, the connec-tion weights do not have a straightforward (probabilistic)interpretation. Simply normalizing these weights does notyield a true notion of connection probability. At most,it can be regarded as the conditional probability that astreamline ends in a particular voxel given the startingpoint of the streamline. In the case of a streamline distri-bution with, say, half of the streamlines starting at nodeA ending in node B, and the other half ending in node C,normalized streamline counts cannot distinguish betweenone edge with an uncertain end point, or two edges withdefinite end points. Finally, several graph-theoretical mea-sures such as characteristic path length and clustering co-efficient are ill-defined for non-binary networks.

In general, it is problematic to use thresholding since itignores the relative differences between streamline counts.Intuitively, one would expect that if, say, ninety percent ofthe streamlines connect from voxel A to voxel B, and tenpercent connect voxel A to voxel C, then at the least theformer has a higher probability of having a correspond-ing edge in the network than the latter, but both edgesare possible as well. This is related to the burstiness phe-nomenon of words in document retrieval, where the oc-currence of a rare word in a document makes its repeatedoccurrence more likely (Xu and Akella, 2010). Summariz-ing, the issue with thresholding approaches is that theyconsider each tract in isolation. This ignores the infor-mation that can be gained from the possible symmetry instreamline counts, as well as from the relative differenceswithin a streamline distribution.

Another important observation is that the mentionedapproaches do not easily support the integration of prob-abilistic streamlining data with other sources of informa-tion. Data is often not collected in isolation but ratheracquired for multiple subjects, potentially using a mul-titude of imaging techniques. Multi-modal data fusion isneeded in order to provide a coherent picture of brain func-tion (Horwitz and Poeppel, 2002; Groves et al., 2011). Theintegration of multi-subject data is required for group-levelinference, where the interest is in estimating a networkthat characterizes a particular population, for example,when comparing patients with controls in a clinical set-ting (Simpson et al., 2011).

In the following, we provide a Bayesian framework forthe inference of whole-brain networks from streamline dis-tributions. In our approach, we consider the distributionof (binary) networks that are supported by our data, in-

stead of generating a single network based on an arbitrarythreshold. Our approach relies on defining a generativemodel for whole-brain networks which extends recent workon network inference in systems biology (Mukherjee andSpeed, 2008) and consists of two ingredients. First, a net-work prior is defined in terms of the classical Erdos-Renyimodel (Erdos and Renyi, 1960). This prior is later ex-tended to handle multi-subject data, capturing the notionthat different subjects’ brains tend to be similar. Second,we propose a forward model based on a Dirichlet com-pound multinomial distribution which views the stream-line distributions produced by probabilistic tractographyas noisy data, thus completing the generative model.

In order to validate our Bayesian framework we makeuse of the often reported observation that resting-state functional connectivity reflects structural connectiv-ity (Koch et al., 2002; Greicius et al., 2009; Honey et al.,2009; Lv et al., 2010; Skudlarski et al., 2008; Park et al.,2008; Damoiseaux and Greicius, 2009). We show thatstructural networks that derive from our generative modelinformed by the connectivity for other subjects providea better fit to the (in)dependencies in resting-state func-tional MRI (rs-fMRI) data than the standard thresholdingapproach.

2. Material and methods

2.1. Data acquisition

Twenty healthy volunteers were scanned after givinginformed written consent in accordance with the guide-lines of the local ethics committee. A T1 structural scan,resting-state functional data and diffusion-weighted im-ages were obtained using a Siemens Magnetom Trio 3Tsystem at the Donders Centre for Cognitive Neuroimag-ing, Radboud University Nijmegen, The Netherlands. Thers-fMRI data were acquired at 3 Tesla using a multi echo– echo planar imaging (ME-EPI) sequence (voxel size 3.5mm isotropic, matrix size 64×64, TR = 2000 ms, TEs =6.9, 16.2, 25, 35 and 45 ms, 39 slices, GRAPPA factor 3,6/8 partial Fourier). A total of 1030 volumes were ob-tained. An optimized acquisition order described by Cooket al. (2006) was used in the DWI protocol (voxel size 2.0mm isotropic, matrix size 110×110, TR = 13000 ms, TE= 101 ms, 70 slices, 256 directions at b = 1500 s/mm2 and24 directions at b=0).

2.2. Preprocessing of resting-state data

The multi-echo images obtained using the rs-fMRI ac-quisition protocol were combined using a custom Matlabscript (MATLAB 7.7, The MathWorks Inc., Natick, MA,USA) which implements the procedure described by Poseret al. (2006) and also incorporates motion correction us-ing functions from the SPM5 software package (WellcomeDepartment of Imaging Neuroscience, University CollegeLondon, UK). Of the 1030 combined volumes, the first sixwere discarded to allow the system to reach a steady state.

2

Page 3: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

R L

node

node

20 40 60 80 100

20

40

60

80

100

node20 40 60 80 100

node20 40 60 80 100

0

1a) b)

P

A

Figure 1: a) Covariance matrices for the resting-state data for three randomly selected subjects. b) Axial view of RGB-FA maps for thediffusion weighted images, again for three randomly selected subjects. The nodes in the matrices are shown in the order they appear in theAAL atlas.

Tools from the Oxford FMRIB Software Library (FSL,FMRIB, Oxford, UK) were used for further processing.Brain extraction was performed using FSL BET (Smith,2002). For each subject, probabilistic brain tissue mapswere obtained using FSL FAST (Zhang et al., 2001). Azero-lag 6th order Butterworth bandpass filter was appliedto the functional data to retain only frequencies between0.01 and 0.08 Hz. After preprocessing, the fMRI datawere parcellated according to the Automated AnatomicalLabeling (AAL) atlas (Tzourio-Mazoyer et al., 2002). Re-gions without voxels with gray-matter probability ≥ 0.5were discarded. This resulted in an average region countof 115.7 ± 0.1. For these regions the functional data wassummed and then standardized to have zero mean andunit standard deviation. The resulting data were used tocompute the empirical covariance matrix Σ. Example co-variance matrices are shown in Fig. 1a.

2.3. Preprocessing of diffusion imaging data

The preprocessing steps for the diffusion data wereconducted using FSL FDT (Behrens et al., 2003) andconsisted of correction for eddy currents and estimationof the diffusion parameters. Raw color-coded fractionalanisotropy maps are shown in Fig 1b. To obtain a measureof white-matter connectivity, we used FDT Probtrackx2.0 (Behrens et al., 2003, 2007). As seed voxels for tractog-raphy we used those voxels that live on the boundary be-tween white matter and gray matter. For each of these vo-xels 5000 streamlines were drawn, with a maximum lengthof 2000 steps. The streamlines were restricted by the frac-tional anisotropy to prevent them from wandering aroundin gray matter. Streamlines in which a sharp angle (>80degrees) occurred or that had a length less than 2 mmwere discarded. The output thus obtained is a matrix N

with nij the number of streamlines drawn from voxel i tovoxel j. To transform this into the parcellated scheme asdictated by the AAL atlas, the streamlines were summedover all voxels per region, resulting in an aggregated con-nectivity matrix which ranges over regions instead of vo-xels. Regions that had been removed after preprocessingthe fMRI data were removed from the aggregated connec-tivity matrix as well.

2.4. Framework for structural connectivity estimation

In this section we derive our Bayesian approach to theinference of whole-brain structural networks. The quantityof interest in our framework is the posterior over structuralnetworks represented by the adjacency matrix A given ob-served probabilistic streamlining data N and hyperparam-eters ξ. An element aij ∈ {0, 1} represents the absenceor presence of an edge between brain region i and j. A

is taken to be a simple graph, such that aij = aji andaii = 0. A brain region can either be interpreted as a voxelor as an aggregation of voxels as defined by a gray mat-ter parcellation. The posterior expresses our knowledgeon structural connectivity given the data and backgroundknowledge and is given by:

P (A | N, ξ) ∝ P(

N | A, a+, a−)

P (A | p) (1)

with hyperparameters ξ = (a+, a−, p), for which an inter-pretation will be given later on. In the following, for conve-nience, we will sometimes suppress the dependence on thehyperparameters. To infer the posterior distribution, wemust specify a prior P (A) and a forward model P (N | A)which together define a generative model of probabilisticstreamlining data. Given these components, the posteriorcan be approximated using a Markov chain Monte Carloalgorithm, as described in detail in Section 2.5. We nowproceed to formally define the components of the genera-tive model as shown in Fig. 2.

2.4.1. Forward model

We begin with a specification of the forward modelP (N | A). Here, we describe how the observed stream-line distributions N depend on the underlying network A

through latent streamline probabilities X.

Assume there are K brain regions for which we wantto estimate the structural connectivity. We start by con-sidering one region i and the possible targets in whicha postulated tract may terminate. Let nik denote thenumber of streamlines which start in region i and ter-minate in region k. We assume that nii = 0. Proba-bilistic tractography produces a distribution over targetvertices ni = (ni1, . . . , niK)T by drawing S streamlines,

3

Page 4: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

A

p

X N

a+, a–

Figure 2: The generative model that describes how the observedstreamline distribution N depends on the (hidden) connectivity prob-abilities X. These in turn depend on the hyperparameters a+ anda– as well as the connectivity A, which is determined by hyperpa-rameter p of the prior.

Ni =∑K

k=1 nik ≤ S of them ending up in a target re-gion.1 A particular distribution ni depends on the stream-line probabilities. That is, we expect many streamlinesbetween two regions when there is a high streamline prob-ability and vice versa. This is captured by expressing theprobability of a distribution ni in terms of a multinomialdistribution

P (ni | xi) ∝K∏

j=1

xnij

ij ,

in which xi = (xi1, . . . , xiK) is a probability vector with∑

j xij = 1. Each xij represents the probability of drawinga streamline from region i to region j. This streamliningprobability itself depends on whether or not there actuallyexists a physical tract between region i and region j.

Let ai denote the i-th row of A indicating the connectiv-ity between region i and all other regions. Intuitively, weexpect a high streamline probability when there is an edgein the network. Conversely, we expect a low probabilitywhen two regions are disconnected. Thus, the streamlineprobabilities depend on the actual white-matter connec-tivity as modeled by A. This is captured by modeling thedistribution of streamline probabilities using a Dirichletdistribution

P (xi | ai, a+, a–) ∝

K∏

j=1

xbij−1ij ,

where shorthand notation bij ≡ aija+ +(1−aij)a

– is used.The bij can be interpreted as the parameters that deter-mine the probability of streamlining from region i to regionj when an edge aij is either present (a+) or absent (a–).

To obtain a single expression for the likelihood of an ad-jacency matrix, let N = (n1; . . . ;nK) represent the com-bined probabilistic tractography data, i.e. for each of theK nodes a distribution of streamlines to all other nodes.Similarly, let X = (x1; . . . ;xK) denote the combined hid-den connection probabilities and A = (a1; . . . ;aK) the

1It is possible that streamlines end up in voxels outside any regionof the parcellation, hence the inequality.

adjacency matrix for all brain regions. The likelihood ofthe network A is expressed as

P (N | A, a+, a–) =

P (N | X)P(

X | A, a+, a−)

dX.

(2)By recognizing that the Dirichlet distribution is the conju-gate prior for the multinomial distribution, it follows thatEq. (2) is a product of Dirichlet compound multinomialdistributions (Madsen et al., 2005; Xu and Akella, 2010;Minka, 2000). The DCM distribution assumes that, givena network, a probability vector can be drawn with largevalues where the network has edges and small values wherethe network is disconnected. This probability vector, inturn, can be used to sample from a multinomial distribu-tion that reflects the probabilistic tractography outcome.

For sufficiently small choices of the hyperparametersof the DCM, sampling from this multinomial reflects theburstiness behavior we observe in the streamline distribu-tions, where some pairs of nodes are connected by manystreamlines, while most pairs have few or even zero stream-lines.

2.4.2. Network prior

In order to define a prior on adjacency matrices, weadopt the Erdos-Renyi model which states that the prob-ability of an edge between region i and j is given by param-eter p (Erdos and Renyi, 1960). This allows the prior to beexpressed in terms of a product of binomial distributions:

P (A | p) =∏

i<j

paij (1 − p)1−aij .

Recall that aji ≡ aij by definition, such that choosingp = 0.5 gives rise to a flat prior on simple graphs.

2.4.3. Hierarchical model

So far, we assumed that data for each subject is ana-lyzed independently. However, in practice, data for mul-tiple subjects may be available and data for one subjectmight inform the inference for another subject. The intu-ition is that brain connectivity will, to a certain extent, besimilar across subjects. Therefore, borrowing statisticalstrength from other subjects should lower the susceptibil-ity to noise and artifacts in a single subject. This canbe achieved by formulating a hierarchical model, wheresubject-dependent parameters at the first level are tied bysubject-independent parameters at the second level. Fig-ure 3 depicts this hierarchical model.

Suppose streamline data N = (N(1), . . . ,N(M)) is ac-quired for M subjects. Let A = (A(1), . . . ,A(M)) denotea vector whose elements A(m) refers to the connectivitymatrix for subject m. In the hierarchical model, we as-sume that the different subjects are related through parentconnectivity A. The different A(m) are conditionally in-dependent given A. For a new subject M +1, the quantity

4

Page 5: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

A A(m)

X(m)

N(m)

p

m = 1 : M

a+, a–

Figure 3: The hierarchical model describes how the connectivity fora subject depends on its streamline distribution but also on the con-nectivity in other subjects as mediated through parent network A.

of interest is the posterior marginal

P(

A(M+1) | N ,N(M+1), ξ)

∝ P(

N(M+1) | A(M+1), a+, a−

)

P(

A(M+1) | N , ξ)

.

We could approximate this marginal by sampling from thehierarchical model. However, this is a computationally de-manding task as it requires the simultaneous estimation ofall of the adjacency matrices belonging to each of the sub-jects, as well as the parent network A. Instead, we spec-ify a prior based on the connectivity obtained for othersubjects. This improvement over the Erdos-Renyi modeldefines a separate connection probability for each individ-ual edge instead of using a single parameter p to specifythe connection probability for complete networks. Thismulti-subject prior is derived from the hierarchical modelin Appendix A and is equal to:

P(

A(M+1) | N , ξ)

=∏

i<j

pa(M+1)ij

ij (1 − pij)(1−a

(M+1)ij

) , (3)

where pij ≡ (∑M

m=1 a(m)ij +1)/(M +2) with a

(m)ij the max-

imum likelihood (ML) estimate for subject m. Hence,we derive a prior for subject M + 1 from the ML esti-mates for subjects 1, . . . , M . These estimates can be ob-tained by running the single-subject models together witha flat prior. The multi-subject prior can subsequently beplugged into Eq. (1) to produce the posterior for subjectM + 1.

2.5. Approximate inference

Since the posterior (1) cannot be calculated analytically,we resort to an MCMC scheme to sample from this distri-bution (Mukherjee and Speed, 2008). We always start thesampling chain with a random symmetric adjacency ma-trix without self-loops. A new sample is proposed basedon a previous network A by flipping an edge, resulting in anetwork A′ (which, because of the symmetry of A, impliesa′

ij = 1 − aij and a′ji = 1 − aji). The acceptance of the

proposed sample is determined by the ratio

γ =P (A′ | N, ξ)

P (A | N, ξ).

A proposed network becomes a new sample with proba-bility min(1, γ) with log γ = ∆Lkl + ∆Pkl. Here, ∆Lkl

and ∆Pkl define the change in log-likelihood and log-priorrespectively, after flipping edge akl. A complete derivationof these terms is given in Appendix B.

The sample distributions were obtained for each subjectby drawing ten parallel chains of 300,000 samples (discard-ing the first 60,000 samples as burn-in phase and keep-ing only each 600th sample to assure independence). Thecollection of T accepted samples {A(1), . . . ,A(T )} formsan approximation of the posterior P (A | N, ξ). The sam-ples can be used to estimate posterior probabilities of net-work features, such as the probability of a specific con-nection. Assuming the Markov chain has converged, theposterior probability of a single connection is given by

E [aij |N] = 1T

∑Tt=1 a

(t)ij . Other summary statistics for the

distribution may be estimated in a similar manner.

2.6. Validation of structural connectivity estimates

Functional connectivity is constrained by structural con-nectivity (Honey et al., 2010; Cabral et al., 2012). In otherwords, when there is functional connectivity, there is oftenstructural connectivity, although structural connectivityis not a necessary requirement for functional connectiv-ity (Honey et al., 2009). We exploit this relationship inthe validation of structural connectivity estimates. Thisis achieved by constraining the conditional independencestructure of functional activity by structural connectiv-ity (Marrelec et al., 2006; Smith et al., 2010; Varoquauxet al., 2010; Deligianni et al., 2011). Assume that a K × 1vector of BOLD responses y can be modeled by a zero-mean Gaussian density with inverse covariance matrix Q.That is,

P (y | Q) = (2π)−K/2|Q|1/2 exp

{

−1

2y⊤Qy

}

. (4)

Then, given acquired resting-state data D = (y1; . . . ;yT )for T time points, model estimation reducesto finding the maximum likelihood solutionQ = arg maxQ

t P (yt | Q). However, in generalfor fMRI data, K > T , which implies that the covariancematrix is not full rank. Hence, finding its inverse requiressuboptimal solutions such as the generalized inverse orpseudo-inverse (Ryali et al., 2011). As a solution to thisproblem, regularization approaches have been suggestedto find sparse approximations of the inverse covariancematrix (Friedman et al., 2008; Huang et al., 2010). In oursetup, the sparsity structure is readily available in theform of structural connectivity A. In order to use A asa constraint when estimating Q, we can make use of thefact that variables yi and yj are conditionally independentif and only if qij = 0 (Dempster, 1972). That is, we caninterpret Eq. (4) as a Gaussian Markov random field withrespect to network A such that aij = 0 ⇔ qij = 0 for alli 6= j (Whittaker, 1990). We will use notation Q ∼ A to

5

Page 6: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

denote that the independence structure in Q is dictatedby A.

Let Σ = 1T

∑Tt=1 yt(yt)⊤ denote the empirical covari-

ance matrix. As shown by Dahl et al. (2008), the MLestimate can be formulated as the following convex opti-mization problem:

Q = arg maxQ∼A

ℓ (Q) s.t. {qij = 0 ⇔ aij = 0} ,

where ℓ (Q) = (T/2)(

log detQ − trace(QΣ))

is, up to aconstant, the log-likelihood function of Q. We made use ofa standard convex solver to find this constrained maximumlikelihood estimate (Schmidt et al., 2007) and use it to de-

fine the score for a particular matrix A: S(A) ≡ ℓ(Q).By comparing scores for different structural connectivityestimates, we are able to quantify the performance of astructural network in terms of how well it fits the func-tional data.

Since we compare different models, we have to takemodel complexity into account. We could opt for the use ofa penalty term such as the Bayesian information criterion.Here, however, we use a more stringent approach, where weenforce constant model complexity. This is implementedby constraining the number of edges for all networks fromone subject to be equal to that of the maximum likeli-hood (ML) solution AML = arg maxA P (N | A, a+, a−)of that particular subject. Recall that this maximum like-lihood solution is equivalent to the solution obtained whenusing a flat prior in our generative model. For the multi-subject prior, the constraint on edge count is achieved bystarting out with the converged ML solution and, subse-quently, drawing new samples by simultaneously addingand removing an edge. For the thresholded networks, wechoose a threshold such that the resulting number of edgesis the same as that of the ML solution. Note that this ap-proach is only a way to obtain a fair comparison betweendifferent structural networks and not a requirement of themodel itself. The threshold was applied to the asymmet-ric streamline data, normalized according to the numberof streamlines emanating from each node. Note that alladded edges were symmetric.

3. Results

In order to validate our framework, we made use ofresting-state functional data which was acquired in con-junction with the diffusion imaging data. Specifically, wecompared the fit to the functional data for structural net-works either obtained by the standard thresholded ap-proach or obtained using the developed generative model.The fit to the functional data is quantified in terms of thescore S(A). We performed a comparison using either a flatprior (by choosing p = 0.5) or the multi-subject prior. Forsimplicity, the hyperparameters a+ and a– were manuallyset to 1 and 0.1, respectively, as small values for the hyper-parameters capture the burstiness phenomenon described

in Section 1. For the thresholded approaches, we have onestructural network estimate, denoted by AT. In contrast,for our generative model, we have a posterior over struc-tural networks, which gives rise to a distribution of scoresS(A(t)) where t denotes sample index.

3.1. Comparing ML estimates with thresholded networks

The sparsity of the maximum likelihood estimatesAML, as obtained with the flat prior, was fairly constant(1019.2±39.4 out of 6670 possible edges). As an example,Fig. 4 shows connectivity results for one subject.

Although thresholding of streamline distributions iscommon practice, how exactly the threshold is appliedvaries between studies. To have a fair comparison, we in-vestigated the impact of different thresholding approaches.We considered applying the threshold to the maximum,the mean and the minimum of nij and nji, respectively.To compare our generative model with these approaches,we computed for each subject the fraction of samples ofthe posterior network distributions that scored higher thanthresholding. Let fF-T be the fraction of samples where thegenerative model with a flat prior scored higher than thethresholded network. The results for the distribution offF-T over 20 subjects, given the different threshold meth-ods, are shown in Table 1. When the threshold is appliedto the maximum of nij and nji, the generative model out-performs thresholding. However, when either the mean orthe minimum of nij and nji is used, samples obtained fromthe posterior with a flat prior score the same as thresholdednetworks, on average. To explain this behavior, it is in-structive to consider Eq. (B.3) in Appendix B. Given hy-perparameters a+ and a– very small compared to elementsof N, the change in log-likelihood after flipping edge aij

from absent to present boils down to

∆Lij ≈ (a+ − a–)

[

log

(

nij∑

k nik

)

+ log

(

nji∑

k njk

)]

.

This expression nicely summarizes the ramifications of ourmodel. When sampling over networks, the generativemodel takes symmetry between streamlines into account(which follows from the sum) and it considers the relativedistribution of streamlines (which follows from the frac-tions). Note that the latter is equivalent to normalizingthe streamlines; a required step for thresholding. Thresh-olding approaches can imitate the behavior of the DCMby thresholding on either the mean or the minimum of nij

and nji and by normalizing the streamline distribution bythe number of outgoing streamlines.

3.2. Multi-subject prior

With optimal threshold settings, it is possible to havethresholded networks that perform similar to the networkswe infer through the posterior distribution with a flatprior. However, our model is capable of incorporating ad-ditional constraints, such as the multi-subject prior. LetfM-T be the fraction of samples where the DCM with the

6

Page 7: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

node

Thresholded network

20 40 60 80 100

20

40

60

80

100

Flat model

20 40 60 80 100

20

40

60

80

100

0

0.2

0.4

0.6

0.8

1

20 40 60 80 100

20

40

60

80

100

0

0.2

0.4

0.6

0.8

1

node

node

20 40 60 80 100

20

40

60

80

100

0

2

4

6

8

10

12

node20 40 60 80 100

20

40

60

80

100 0.2

0.4

0.6

0.8

node20 40 60 80 100

20

40

60

80

100

a) b) c)

d) e) f)

Multi-subject model

Streamlines Prior Salient differences

Figure 4: (a–e) Connectivity results for that subject for which sampling in conjunction with the multi-subject prior showed the largestimprovement. Shown are (a) the network that is obtained through the thresholding approach, (b) the posterior connection probabilitiesaccording to the flat model, (c) the posterior connection probabilities according to the multi-subject model, (d) the streamline distributionon a log scale and (e) the multi-subject prior based on the other subjects as used in the multi-subject model. Panel (f) shows the most salientdifferences in connectivity between the maximum a posteriori networks and the thresholding approach, across all subjects. The edges arecolor-coded. White edges indicate those connections that were present in at least 6 subjects whereas these edges would not be part of thethresholded network. Orange edges show converse findings. All matrices are ordered according to the order of the AAL atlas.

Table 1: The fraction of samples that have a higher score than thresh-olded networks. The fraction of samples from the distribution witha flat and an multi-subject prior are represented by fF-T and fM-T,respectively. The different threshold approaches are max, mean andmin. The p-values were obtained using a one-sample t-test withµ0 = 0.5.

fF-T p fM-T pmax 0.60 ± 0.06 0.07 0.76 ± 0.07 <0.001mean 0.50 ± 0.06 0.47 0.67 ± 0.07 0.02min 0.49 ± 0.06 0.45 0.66 ± 0.08 0.03

multi-subject prior scored higher than the thresholded net-work. The results for the distribution of fM-T over 20 sub-jects, given the different threshold methods, are shown inTable 1. In addition, we compared the fraction of samplesobtained with the multi-subject prior that scored higherthan samples with the flat prior, fM-F. We found that thisdistribution had a mean of 0.64 ± 0.04 (p < 10−3).

The likelihood scores estimated for the distributionsover samples, obtained using our approach in the presenceof either the flat prior or the multi-subject prior, are shownin Fig. 5. In addition, the figure shows the score for thethresholded network, with a threshold applied to the min-

imum of nij and nji. The distributions obtained using themulti-subject prior are narrower and therefore more con-sistent than those obtained with the flat prior. Moreover,likelihood scores obtained using the multi-subject priortend to be of higher magnitude than those obtained us-ing the flat prior. From these results we can conclude thatour model is up to par with the most optimal threshold ap-proaches, but that it is capable of surpassing thresholdednetworks by using informative priors.

Lastly, Fig. 6 shows the connections for which our multi-subject approach differs most from those of threshold-ing, across all subjects. The edges correspond with thoseshown in Fig. 4(f). The figure shows edges that are presentin the maximum a posteriori networks while being absentin the corresponding thresholded networks for at least 6subjects and vice versa. The edges consistently and exclu-sively included by either of the approaches do not differmuch in length. In fact, the mean edge lengths are veryclose: 17.6 ± 1.6 mm for threshold-favored edges and 17.0± 1.4 mm for DCM-favored edges. However, we do ob-serve that when using the multi-subject prior, consistencyfor cerebellar and anterior cortical tracts is increased.

7

Page 8: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

Multi-subject prior Flat prior Thresholded network

# fF-T fM-T fM-F

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0.01

0.35

0.35

0.09

0.10

0.31

0.40

0.24

0.42

0.47

0.43

0.68

0.56

0.89

0.46

0.75

0.88

0.99

0.78

0.64

0.01

0.03

0.12

0.23

0.24

0.56

0.62

0.67

0.69

0.70

0.71

0.84

0.91

0.91

0.93

0.93

0.99

1.00

1.00

1.00

0.52

0.26

0.39

0.78

0.72

0.69

0.63

0.80

0.68

0.65

0.65

0.57

0.71

0.40

0.85

0.61

0.52

0.59

0.86

0.92

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20

Figure 5: The scores S(A) (horizontal axis) for the thresholded network AT, samples from the generative model given a flat prior (lowerhistogram, red) and given the multi-subject prior (upper histogram, green). The fraction of each bin that has a bright color correspondswith the fraction of the other distribution that is outperformed by this bin. These fractions are also shown in the table to the right; fF-T

is the fraction of samples with a flat prior that outperform thresholding, fM-T is the fraction of samples with the multi-subject prior thatoutperform thresholding and fM-F is the fraction of samples with the multi-subject prior that outperform samples with the flat prior. Thesubjects are ordered according to the performance of the multi-subject prior approach relative to the thresholded network.

4. Discussion

Standard thresholding approaches for the inference ofwhole-brain structural networks suffer from the fact thatthey rely on arbitrary thresholds while assuming indepen-dence between tracts and ignoring prior knowledge. Inorder to overcome these problems, we have put forwarda Bayesian framework for inference of structural brainnetworks from diffusion-weighted imaging. Our approachmakes use of a Dirichlet compound multinomial distri-bution to model the streamline distribution obtained byprobabilistic tractography. In addition, we defined a sim-ple prior on node degrees as well as a multi-subject priorthat uses connectivity estimates from other subjects as anadditional source of information.

The proposed methodology was validated using simulta-neously acquired resting-state functional MRI data. Theoutcome of our experiments revealed that the generativemodel combined with a flat prior performs equally well asthe most optimal thresholded network. The use of an infor-mative multi-subject prior instead created networks thatsignificantly outperformed the thresholding approach. Acomparison between the networks obtained with the multi-subject or flat prior showed that the former improved onthe latter, thereby motivating the use of the multi-subjectprior.

In our setup, the hyperparameters a+ and a– were setby hand and the edge probability p was chosen to re-sult in a flat prior. Instead these parameters could havebeen estimated from the streamline data using empiricalBayes, they could have been integrated out entirely in afull Bayesian sampling approach, or they could be opti-mized according to the resting-state functional data. Notefurther that a fair comparison between networks requiredmodel complexity to be controlled. This was achievedvia the constraint that networks obtained with either themulti-subject prior or with the thresholding approach hadthe same number of edges as the most probable networkwith a flat prior. While this is to the advantage of thethresholding approach, since no arbitrary threshold needsto be chosen, it can only impede networks obtained usingthe multi-subject prior since that might support a differentnumber of tracts.

Even given optimal settings for the thresholding ap-proach, our approach shows clear benefits. Foremost, theDCM model intuitively assigns probabilities to the exis-tence of edges in the inferred networks, providing a mech-anism to cope with the uncertainty in the data. Moreover,the proposed generative model allows for intuitive and wellfounded priors, such as the described multi-subject prior.The hierarchical model in Fig. 3 also allows for group-level inference (Robinson et al., 2010). This means that,

8

Page 9: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

Figure 6: The most salient differences in connectivity between the maximum a posteriori networks with the multi-subject prior and thethresholded networks, across all subjects. The edges are color-coded. Blue, thick edges indicate those connections that were present in atleast 6 subjects whereas these edges would not be part of the thresholded network. Red, thin edges show converse findings. Nodes that arenot adjacent to any of these edges are omitted.

given streamline data for multiple subjects, the generativemodel can be used to infer individual subject connectivityA as well as the group-level parent network A. This al-lows one to get a handle on group differences, for instance,in the context of clinical neuroscience. The current workfocused mainly on the empirical validation of our theoret-ical framework using functional data. In future work, wewill focus more on interpretation of the obtained structuralconnectivity estimates.

In this paper we used resting-state fMRI data as a meansto validate whole-brain structural networks derived fromdiffusion-weighted imaging. A logical extension of ourwork is to derive connectivity based on the integration ofthese two imaging modalities. This example of Bayesiandata fusion requires that we extend the generative modelto take functional data into account as well (Rykhlevskaiaet al., 2008; Sui et al., 2011). We can then use struc-tural networks as an informed prior for inference of func-tional connectivity or infer structural connectivity fromboth modalities simultaneously.

An additional benefit of our framework is that the net-work sparsity follows directly from optimizing Eq. (1). Inthe thresholding approach, the network sparsity is a conse-quence of the specific threshold setting. As a byproduct ofour study, we have observed that thresholding of stream-lines benefits from considering the mean or minimum of thenumber of streamlines connecting A to B and vice versa.This in itself may lead to improvements in the analysis ofstructural connectivity.

Summarizing, we proposed a Bayesian framework whichlays the foundations for a theoretically sound approachto the inference of whole-brain structural networks. Thisframework does not suffer from the issues which plaguecurrent thresholding approaches to structural connectivityestimation and has been shown to give rise to substantially

improved structural connectivity estimates. The proposedgenerative model is easily modified to incorporate othersources of information, thereby further facilitating the es-timation of whole-brain structural networks in vivo.

Acknowledgements

The authors gratefully acknowledge the support of theBrainGain Smart Mix Programme of the Netherlands Min-istry of Economic Affairs and the Netherlands Ministry ofEducation, Culture and Science. The authors thank Erikvan Oort and David Norris for the acquisition of the rs-fMRI and DWI data and the anonymous reviewers for theirvaluable suggestions and comments to improve the qualityof the paper.

Appendix A. Derivation of the multi-subject

prior

We describe here the derivation of the multi-subjectprior based on the maximum likelihood estimates for pre-viously seen subjects, as described in Section 2.4.3. The

9

Page 10: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

prior on A′ ≡ A(M+1) is given by

P (A′ | N , ξ) =∑

A

P(

A′ | A)

P(

A | N , ξ)

∝∑

A

P(

A′ | A)

P(

A | p)

×∑

A

P (N | A, a+, a–)P(

A | A)

∝∑

A

P(

A′ | A)

P(

A | p)

×M∏

m=1

A(m)

P(

N(m) | A(m), a+, a–

)

×P(

A(m) | A)

.

We approximate this quantity by assuming that the maincontribution in the sum over A(m) is due to the ML solu-tion

A(m) = arg maxA(m)

P(

N(m) | A(m), a+, a−

)

.

Following the Erdos-Renyi model with p = 0.5 forP

(

A | p)

gives a flat prior on simple graphs. Up to ir-

relevant constants and keeping in mind that A dependson N(m), the prior is rewritten as

P (A′ | N , ξ) ≈∑

A

P(

A′ | A)

M∏

m=1

P(

A(m) | A)

,

with A = {A(1), . . . , A(M)} the different ML solutions.We assume that the prior factorizes into

P(

A(M+1) | N , ξ)

=∏

i<j

P(

a(M+1)ij | N , ξ

)

. (A.1)

Next, we define the probability that a(m)ij , m =

(1, . . . , M + 1), inherits the connectivity from the parentnetwork aij by

P(

a(m)ij = 1 | aij = 1

)

= P(

a(m)ij = 0 | aij = 0

)

≡ qij ,

with qij close to 1. That is, each a(m)ij is a copy of aij

with unknown probability qij . The copying probabilitiesare assumed to be independent and have a flat prior. Esti-mating the prior probability for each edge is then nothingbut an instance of Laplace’s rule of succession. This saysthat, if we repeat an experiment that we know can resultin a success (presence of an edge) or failure (absence of

an edge) m times independently, and get∑M

m=1 a(m)ij suc-

cesses, then our best estimate of the probability that the

next repetition a(M+1)ij will be a success is:

P (a(M+1)ij = 1 | a

(1)ij , . . . , a

(M)ij ) =

∑Mm=1 a

(m)ij + 1

M + 2≡ pij .

Plugging this into Eq. (A.1), we obtain the prior

P(

A(M+1) | N , ξ)

=∏

i<j

pa(M+1)ij

ij (1 − pij)(1−a

(M+1)ij

) .

Appendix B. MCMC sampling

We derive here the acceptance rate γ of a sample A′

in the sampling chain as a function of one edge flip in A

(see Section 2.5). Note that each of the 2K(K−1)/2 possiblenetworks A has a probability greater than zero of beingconstructed, which guarantees that the Markov chain isirreducible. The log acceptance rate of a suggested sam-ple can be calculated as log γ = ∆Lkl + ∆Pkl, with ∆Lkl

and ∆Pkl the change in log-likelihood and log-prior re-spectively, after flipping edge akl. The sampling approachrequires that we can efficiently update both the likelihoodand the prior for new samples in the Markov chain. Thelog-likelihood is given by

L ≡∑

i

logNi!

j nij !+ log

Γ(

j bij

)

Γ(

j(bij + nij))

+∑

j

logΓ (bij + nij)

Γ (bij)

(B.1)

with bij ≡ aija++(1−aij)a

–. The change in log-likelihoodas a consequence of flipping an edge akl is defined as

∆Lkl = log P(

N | A′, a+, a−)

− log P(

N | A, a+, a−)

,(B.2)

with the sole difference that a′kl = a′

lk = (1 − akl). Plug-ging (B.1) into (B.2) yields

∆Lkl = log

[

Γ (b′kl + nkl)

Γ (bkl + nkl)

]

+ log

[

Γ (b′lk + nlk)

Γ (blk + nlk)

]

+ log

Γ(

j b′kj

)

Γ(

j bkj

)

+ log

Γ(

j b′lj

)

Γ(

j blj

)

− log

Γ(

j(b′kj + nkj)

)

Γ(

j(bkj + nkj))

− log

Γ(

j(b′lj + nlj)

)

Γ(

j(blj + nlj))

− 2 log

[

Γ (b′kl)

Γ (bkl)

]

.

(B.3)

The change in the log-prior as a consequence of flippingakl to 1 − akl for the prior follows from its definition inEq. (3)

∆Pkl = log P (A′ | N , ξ) − log P (A | N , ξ)

= (4akl − 2)

[

log

(

pkl

1 − pkl

)

+ log

(

plk

1 − plk

)]

.

Here the edge probability pkl is the same for all edges in thecase of the Erdos-Renyi model and estimated separatelyper edge in case of the multi-subject prior.

10

Page 11: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

References

Bassett, D.S., Brown, J.A., Deshpande, V., Carlson, J.M., Grafton,S.T., 2011. Conserved and variable architecture of human whitematter connectivity. NeuroImage 54, 1262–1279.

Behrens, T.E.J., Johansen-Berg, H., Jbabdi, S., Rushworth, M.F.S.,Woolrich, M.W., 2007. Probabilistic diffusion tractography withmultiple fibre orientations: What can we gain? NeuroImage 34,144–155.

Behrens, T.E.J., Woolrich, M.W., Jenkinson, M., Johansen-Berg,H., Nunes, R.G., Clare, S., Matthews, P.M., Brady, J.M., Smith,S.M., 2003. Characterization and propagation of uncertainty indiffusion-weighted MR imaging. Magnet. Reson. Med. 50, 1077–1088.

Cabral, J., Hugues, E., Kringelbach, M.L., Deco, G., 2012. Modelingthe outcome of structural disconnection on resting-state functionalconnectivity. NeuroImage 62, 1342–1353.

Catani, M., 2007. From hodology to function. Brain 130, 602–605.Chung, H.W., Chou, M.C., Chen, C.Y., 2010. Principles and limita-

tions of computational algorithms in clinical diffusion tensor MRtractography. Am. J. Neuroradiol. 32, 3–13.

Chung, M.K., Adluru, N., Dalton, K.M., Alexander, A.L., Davidson,R.J., 2011. Scalable brain network construction on white matterfibers, in: SPIE Medical Imaging, pp. 1–6.

Conturo, T.E., Lori, N.F., Cull, T.S., Akbudak, E., Snyder, A.Z.,Shimony, J.S., McKinstry, R.C., Burton, H., Raichle, M.E., 1999.Tracking neuronal fiber pathways in the living human brain. Proc.Natl. Acad. Sci. USA 96, 10422–10427.

Cook, P.A., Bai, Y., Seunarine, K.K., Hall, M.G., Parker, G.J.,Alexander, D.C., 2006. Camino: open-source diffusion-MRI re-construction and processing, in: 14th Scientific Meeting of theInternational Society for Magnetic Resonance in Medicine, Seat-tle, WA, USA. p. 2759.

Dahl, J., Vandenberghe, L., Roychowdhury, V., 2008. Covarianceselection for non-chordal graphs via chordal embedding. Optim.Method. Softw. 23, 501–520.

Damoiseaux, J.S., Greicius, M.D., 2009. Greater than the sum of itsparts: a review of studies combining structural connectivity andresting-state functional connectivity. Brain Struct. Funct. 213,525–533.

Deligianni, F., Varoquaux, G., Thirion, B., Robinson, E., Sharp,D.J., Edwards, A.D., Rueckert, D., 2011. A probabilisticframework to infer brain functional connectivity from anatomi-cal connections, in: Information Processing in Medical Imaging,Springer, Kaufbeuren, Germany. pp. 296–307.

Dempster, A.P., 1972. Covariance selection. Biometrics 28, 157–175.Erdos, P., Renyi, A., 1960. On the evolution of random graphs. Pub-

lications of the Mathematical Institute of the Hungarian Academyof Sciences 5, 17–61.

Friedman, J., Hastie, T., Tibshirani, R., 2008. Sparse inverse covari-ance estimation with the graphical lasso. Biostatistics 9, 432–441.

Friman, O., Farneback, G., Westin, C.F., 2006. A Bayesian approachfor stochastic white matter tractography. IEEE Trans. Med. Imag.25, 965–978.

Gong, G., Rosa-Neto, P., Carbonell, F., Chen, Z.J., He, Y., Evans,A., 2009. Age- and gender-related differences in the corticalanatomical network. J. Neurosci. 29.

Greicius, M.D., Supekar, K., Menon, V., Dougherty, R.F., 2009.Resting-state functional connectivity reflects structural connec-tivity in the default mode network. Cereb. cortex 19, 72–78.

Groves, A.R., Beckmann, C.F., Smith, S.M., Woolrich, M.W., 2011.Linked independent component analysis for multimodal data fu-sion. NeuroImage 54, 2198–2217.

Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C.J.,Wedeen, V., Sporns, O., 2008. Mapping the structural core ofhuman cerebral cortex. PLoS Biol. 6, e159.

Hagmann, P., Kurant, M., Gigandet, X., Thiran, P., Wedeen, V.,Meuli, R., Thiran, J.P., 2007. Mapping human whole-brain struc-tural networks with diffusion MRI. PLoS ONE 2, e597.

Honey, C.J., J.-P., T., Sporns, O., 2010. Can structure predict func-tion in the human brain? NeuroImage 52, 766–776.

Honey, C.J., Sporns, O., Cammoun, L., Gigandet, X., Thiran, J.P.,Meuli, R., Hagmann, P., 2009. Predicting human resting-statefunctional connectivity from structural connectivity. Proc. Natl.Acad. Sci. USA 106, 2035–40.

Horwitz, B., Poeppel, D., 2002. How can EEG/MEG and fMRI/PETdata be combined? Hum. Brain Mapp. 17, 1–3.

Huang, S., Li, J., Sun, L., Ye, J., Fleisher, A., Wu, T., Chen, K.,Reiman, E., 2010. Learning brain connectivity of Alzheimer’sdisease by sparse inverse covariance estimation. NeuroImage 50,935–949.

Jbabdi, S., Woolrich, M.W., Andersson, J.L.R., Behrens, T.E.J.,2007. A Bayesian framework for global tractography. NeuroImage37, 116–129.

Kaden, E., Knosche, T.R., Anwander, A., 2007. Parametric sphericaldeconvolution: Inferring anatomical connectivity using diffusionMR imaging. NeuroImage 37, 474–488.

Koch, M., Norris, D.G., Hund-Georgiadis, M., 2002. An investi-gation of functional and anatomical connectivity using magneticresonance imaging. NeuroImage 16, 241–250.

Lv, J., Guo, L., Hu, X., Zhang, T., Li, K., Zhang, D., Yang, J., Liu,T., 2010. Fiber-centered analysis of brain connectivities usingDTI and resting state fMRI data. Med. Image Comput. Comput.Assist. Interv. 13, 143–150.

Madsen, R.E., Kauchak, D., Elkan, C., 2005. Modeling word bursti-ness using the Dirichlet distribution, in: Proceedings of the 22ndInternational Conference on Machine Learning, ACM, New York,NY, USA. pp. 545–552.

Marrelec, G., Krainik, A., Duffau, H., Pelegrini-Issac, M., Lehericy,S., Doyon, J., Benali, H., 2006. Partial correlation for functionalbrain interactivity investigation in functional MRI. NeuroImage32, 228 – 237.

Minka, T.P., 2000. Estimating a Dirichlet distribution. TechnicalReport. MIT.

Mukherjee, S., Speed, T.P., 2008. Network inference using informa-tive priors. Proc. Natl. Acad. Sci. USA 105, 14313–14318.

Park, C., Kim, S., Kim, Y., Kim, K., 2008. Comparison of the small-world topology between anatomical and functional connectivity inthe human brain. Physica A 387, 5958–5962.

Penny, W.D., Friston, K.J., Ashburner, J.T., Kiebel, S.J., Nichols,T.E. (Eds.), 2006. Statistical Parametric Mapping: The Analysisof Functional Brain Images. Academic Press Inc.. 1st edition.

Robinson, E.C., Hammers, A., Ericsson, A., Edwards, A.D., Rueck-ert, D., 2010. Identifying population differences in whole-brainstructural networks: a machine learning approach. NeuroImage50, 910–919.

Robinson, E.C., Valstar, M., Hammers, A., Ericsson, A., Edwards,A.D., Rueckert, D., 2008. Multivariate statistical analysis of wholebrain structural networks obtained using probabilistic tractogra-phy. Med. Image Compute. Comput. Assist. Interv. 11, 486–493.

Ryali, S., Chen, T., Supekar, K., Menon, V., 2011. Estimationof functional connectivity in fMRI data using stability selection-based sparse partial correlation with elastic net penalty. NeuroIm-age 59, 3852–3861.

Rykhlevskaia, E., Gratton, G., Fabiani, M., 2008. Combining struc-tural and functional neuroimaging data for studying brain con-nectivity: A review. Psychophysiology 45, 173–187.

Schmidt, M., Fung, G., Rosales, R., 2007. Fast optimization meth-ods for L1 regularization: A comparative study and two new ap-proaches. Machine Learning: ECML 2007 , 286–297.

Shu, N., Liu, Y., Li, K., Duan, Y., Wang, J., Yu, C., Dong, H.,Ye, J., He, Y., 2011. Diffusion tensor tractography reveals dis-rupted topological efficiency in white matter structural networksin multiple sclerosis. Cereb. Cortex 21.

Simpson, S.L., Hayasaka, S., Laurienti, P.J., 2011. Exponential Ran-dom Graph Modeling for complex brain networks. PLoS ONE 6,e20039.

Skudlarski, P., Jagannathan, K., Calhoun, V., Hampson, M., Skud-larska, B.A., Pearlson, G., 2008. Measuring brain connectivity:diffusion tensor imaging validates resting state temporal correla-tions. NeuroImage 43, 554–561.

Smith, S.M., 2002. Fast robust automated brain extraction. Hum.

11

Page 12: Bayesian inference of structural brain networksmhinne/papers/2014_Bayesian_inference_of...Bayesian inference of structural brain networks Max Hinnea,b, Tom Heskesa, Christian F. Beckmannb,

Brain Mapp. 17, 143–155.Smith, S.M., Miller, K.L., Salimi-Khorshidi, G., Webster, M., Beck-

mann, C.F., Nichols, T.E., Ramsey, J.D., Woolrich, M.W., 2010.Network modelling methods for fMRI. NeuroImage 54, 875–891.

Sui, J., Pearlson, G., Caprihan, A., Adali, T., Kiehl, K.A., Liu, J.,Yamamoto, J., Calhoun, V.D., 2011. Discriminating schizophreniaand bipolar disorder by fusing fMRI and DTI in a multimodalCCA+ joint ICA model. NeuroImage 57, 839 – 855.

Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F.,Etard, O., Delcroix, N., Mazoyer, B., Joliot, M., 2002. Automatedanatomical labeling of activations in SPM using a macroscopicanatomical parcellation of the MNI MRI single-subject brain.NeuroImage 15, 273–289.

Vaessen, M.J., Hofman, P.A.M., Tijssen, H.N., Aldenkamp, A.P.,Jansen, J.F.A., Backes, W.H., 2010. The effect and reproducibil-ity of different clinical DTI gradient sets on small world brainconnectivity measures. NeuroImage 51, 1106–1116.

Varoquaux, G., Gramfort, A., Poline, J.B., Thirion, B., 2010. Braincovariance selection: better individual functional connectivitymodels using population prior, in: Zemel, R., Shawe-Taylor, J.(Eds.), Advances in Neural Information Processing Systems, Van-couver, Canada. pp. 2334–2342.

Whittaker, J., 1990. Graphical Models in Applied MultivariateStatistics. Wiley Publishing. 1st edition.

Xu, Z., Akella, R., 2010. Improving probabilistic information re-trieval by modeling burstiness of words. Information Processing& Management 46, 143–158.

Zalesky, A., Fornito, A., Harding, I.H., Cocchi, L., Yucel, M., Pan-telis, C., Bullmore, E.T., 2010. Whole-brain anatomical networks:does the choice of nodes matter? NeuroImage 50, 970–983.

Zhang, Y., Brady, M., Smith, S., 2001. Segmentation of brain MRimages through a hidden Markov random field model and the ex-pectation maximization algorithm. IEEE Trans. Med. Imag. 20,45–57.

12