Sparse Optimization in Exploring Brain...

29
Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering Joint China-Cuba Laboratory for Frontier Research in Translational Neurotechnology Chengdu, China June 2-22, 2016

Transcript of Sparse Optimization in Exploring Brain...

Page 1: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Sparse Optimization in Exploring Brain Networks

Jitkomut Songsiri

Chula Engineering

Joint China-Cuba Laboratory for Frontier Researchin Translational Neurotechnology

Chengdu, China

June 2-22, 2016

Page 2: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Outline

• Brain connectivity

• Granger graphical models

– Learning a single Granger graph– Learning multiple Granger graphs having a common structure– Learning multiple Granger graphs having a partially common structure– Numerical examples

• Structural Equation Modeling

Page 3: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Brain connectivity

• a brain connectivity or a brain network is represented by a graph

• nodes represents voxels (or ROIs)

• a brain connectivity is explained by the graph topology

• the graph topology is described by a statistical dependence measure of interest

1

Page 4: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Dependence Measures

• data are treated as independent samples (no temporal consideration)

– correlation (covariance matrix)– partial correlation (inverse of covariance matrix)– structural equation modeling (path coefficient matrix)

• data are treated as time series

– cross coherence function (normalized correlation function)– partial coherence function (normalized inverse of correlation) – often done

in frequency domain (inverse spectrum)– dynamical causal modeling (coupling matrices)– Granger causality (autoregressive coefficients)

2

Page 5: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Inference methods

estimation: covariance matrix, path coefficient of SEM, correlation function,spectrum, matrices in DCM, AR coefficients, etc.

inference methods can be roughly divided into:

• statistical tests (test if each of these measures is zero with a statisticalsignificance)

• sparse estimation (estimation formulation promotes sparsity in thesemeasures)

we pursue the latter approach by cooperating an ℓ1-minimization in theformulation

goal: learning a zero pattern in an dependence measure

3

Page 6: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Outline

• Brain connectivity

• Granger graphical models

– Learning a single Granger graph– Learning multiple Granger graphs having a common structure– Learning multiple Granger graphs having a partially common structure– Numerical examples

• Structural Equation Modeling

Page 7: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Granger causality (Granger 1969)

sparsity in coefficients Ak

(Ak)ij = 0, for k = 1, 2, . . . , p

is the characterization of Granger causality in AR model:

y(t) = A1y(t− 1) +A2y(t− 2) + · · ·+Apy(t− p) + ν(t)

• yi is not Granger-caused by yj

• knowing yj does not help to improve the prediction of yi

applications in neuroscience andsystem biology

(Salvador et al. 2005, Valdes-Sosa et al. 2005, Fujita et al.2007, ...)

4

Page 8: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Learning a single Granger Graphical Model (J. Songsiri 2013)

Problem: find Ak’s that minimize the mean-squared error and

• Ak’s contain many zeros

• common zero locations in A1, A2, . . . , Ap

Formulation: least-squares with sum-of-ℓ2-norm regularization

minA

(1/2)∥Y −AH∥22 + λ∑i =j

∥[(A1)ij (A2)ij · · · (Ap)ij

]∥2

the problem falls into the framework of Group Lasso

5

Page 9: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Default Mode Network (A. Pongrattanakul, P. Lertkultanon and J. Songsiri 2013)

• many active nodes in vACC, MTLs and a few in MPFC and PCC/RSC

• strong connections between MTLs and PCC and vACC has a significantconnectivity with PCC

• strong connections between left and right medial temporal lobes

6

Page 10: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Outline

• Brain connectivity

• Granger graphical models

– Learning a single Granger graph– Learning multiple Granger graphs having a common structure– Learning multiple Granger graphs having a partially common structure– Numerical examples

• Structural Equation Modeling

Page 11: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Application on learning a common brain structure

Brain of subject A Brain of subject B

• brains of subjects under the same condition from a homogenous group areassumed to have a common brain connectivity

• the connection strengths of a pair of nodes from group subjects can bedifferent

7

Page 12: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Common Granger causality of multiple AR models

AR

mod

el #

1A

R m

od

el #

2

common sparsity of

's in model #1

common sparsity of

's in model #2

define a vector

• common sparsity of Ak’s in each model defines its Granger causality structure

• our goal is to learn a common Granger causality structure among all models

8

Page 13: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Estimation Formulation

jointly estimate K AR models to have a common Granger-causality structure

reorder the variables

B(k)ij =

[(A

(k)1 )ij (A

(k)2 )ij · · · (A

(k)p )ij

]T, Cij = (B

(1)ij , B

(2)ij , . . . , B

(K)ij )

optimization problem:

minimizeA(1),...,A(K)

(1/2)

K∑k=1

∥Y (k) − A(k)

H(k)∥2

F + λ∑i=j

∥Cij∥2

• the superscript (k) denotes the kth model

• 1st term: least-squares error of K models

• 2nd term: promote a common sparsity in all models

9

Page 14: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Outline

• Brain connectivity

• Granger graphical models

– Learning a single Granger graph– Learning multiple Granger graphs having a common structure– Learning multiple Granger graphs having a partially common

structure– Numerical examples

• Structural Equation Modeling

Page 15: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Application on classifying brain conditions

Brain under condition A Brain under condition B

• brain under two conditions may share some similar connectivity patterns dueto some normal functioning of the brain

• different conditions of the brain may lead to some different edges in the brainconnectivity

10

Page 16: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Granger causality of multiple AR models

AR

model #

1A

R m

odel #

2

common sparsity of

's in model #1

common sparsity of

's in model #2

define a vector

causality patternof model # 1

causality patternof model # 2

• common sparsity of Ak’s in each model defines its Granger causality structure

• our goal is to learn similar Granger causality structures among all models

11

Page 17: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Estimation Formulation (J. Songsiri 2015)

jointly estimate K AR models to have similar Granger-causality structures

minimizeA(1),...,A(K)

K∑k=1

1

2∥Y (k)−A

(k)H

(k)∥22+λ1

∑i=j

K∑k=1

∥∥∥B(k)ij

∥∥∥2+λ2

∑i=j

K−1∑k=1

∥∥∥B(k+1)ij − B

(k)ij

∥∥∥2

• the superscript (k) denotes the kth model

• B(k)ij =

[(A

(k)1 )ij (A

(k)2 )ij · · · (A

(k)p )ij

]T∈ Rp

• 1st term: least-squares error of K models

• 2nd term: promote a sparsity in each model

• 3rd term: promote similarity in any two consecutive models

• a least-squares problem with sum-of-ℓ2-norm regularization

12

Page 18: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Group Fused Lasso framework

the estimation problem can be regarded as a Group Fused Lasso problem

minimizex

(1/2)∥Gx− b∥22 + λ1∥Px∥2,1 + λ2∥Dx∥2,1

with variable x ∈ Rn

• G ∈ Rm×n, b ∈ Rm,P ∈ Rs×n,D ∈ Rr×n are problem parameters

• sum of 2-norm: ∥z∥2,1 =∑L

k=1 ∥zk∥2

• D is a kronecker product of a projection and the forward difference matrix

• if λ2 = 0 and λ1 > 0, it reduces to a group lasso problem

• if λ2 > 0 and λ1 = 0, it is a class of total variation regularized problem

13

Page 19: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Outline

• Brain connectivity

• Granger graphical models

– Learning a single Granger graph– Learning multiple Granger graphs having a common structure– Learning multiple Granger graphs having a partially common structure– Numerical examples

• Structural Equation Modeling

Page 20: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Numerical examples

generate 3 sparse AR models having similar Granger structures

0 100 200 300 400 500 6000

5

10

15

20

25

30

35

40

Number of detected edges

Sum

of s

quar

ed e

rror Group Lasso

increase

Group Fused Lasso

Group Lasso:

seperately estimate 3 models

Group Fused Lasso:

jointly estimate 3 models

• model errors are increasing as the estimated Granger network is too dense

• Group Fused Lasso yields a lower model error as λ2 increases

14

Page 21: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

ROC curves of Group Lasso VS Group Fused Lasso

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FPR

TPR Group Lasso

Group Fused Lasso

increase

• at a fixed FPR (false positive rate), Group Fused Lasso yields a higher TPR(true positive rate) than Group Lasso

• obtain more accurate Granger structure as λ2 increases

15

Page 22: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Outline

• Brain connectivity

• Granger graphical models

• Structural Equation Modeling

– path analysis– confirmatory VS exploratory SEM– sparse SEM for exploratory SEM

Page 23: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Structural Equation Modeling (SEM)

path analysis is a special SEM that includes only the observed variables

path analysis model

Y = AY + ϵ

(multiple linear regression)

• ϵ is the model error, Y is the variable vector

• A is called the path matrix or path coefficient

• entries in the path matrix (aij) denotes a causal relation from Yj to Yi

16

Page 24: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Two important problems in path analysis

confirmatory SEM

• causal relationship is given

• zero pattern in A is given

exploratory SEM

• to explore a causal relationship amongvariables

• explore a zero pattern in A

17

Page 25: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Problem description

given samples of Y , we can compute the sample covariance S

the covariance of Y derived from Y = AY + ϵ is

Σ = (I −A)−1Ψ(I −A)−T , where Ψ = cov (ϵ)

goal: estimate Σ,Ψ and A so that Σ is close to S

in the sense that

d(S,Σ) = log detΣ + tr(SΣ−1)− log detS − n,

(Kullback-Leibler divergence function) is minimized

18

Page 26: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Sparse SEM (A. Pruttiakaravanich and J. Songsiri 2016)

a convex formulation for exploratory SEM

minimizeX,A,Ψ

− log detX + tr(SX) + 2γ∑

(i,j)/∈IA

|Aij|

subject to

[X (I −A)T

I −A Ψ

]⪰ 0,

0 ⪯ Ψ ⪯ αIP (A) = 0

• α is a parameter representing a bound on covariance error

• P (A) is a linear mapping giving the prior zero constraint in A, noted by theindex set IA

•∑

(i,j)/∈IA

|Aij| is the ℓ1-norm regularization to promote zeros in A

• if the optimal X has low rank, then Σ can be retrieved from Σ = X−1

19

Page 27: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Effect of the percentage known zero in A

to see the effect of percentage known zero in Atrue

• generate Atrue with n = 20 and sparsity 10%.

• generate S = (I − Atrue)−1Ψ(I − Atrue)

−T , Ψ = 0.1I

• solve sparse SEM by assuming that we know locations of zero in Atrue about

0%, 25%, 50%, 65%, 80%

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0% known zero25% known zero50% known zero65% known zero80% known zero

• knowing more correct zerostructure in Atrue provides thebetter accuracy of our learningcausal structure method

• knowing 0% zero in Atrue

(underdetermined problem)provides poor estimation resultsince we may not obtain a truesolution

20

Page 28: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

Summary

• we have proposed a Group Fused Lasso formulation for estimating jointlymultiple sparse AR models

• the formulation uses a sum of 2-norm penalty on the differences betweenconsecutive AR models

• it finds applications in exploring a common structure of time series belongingto different/common classes

• we also proposed a convex formulation for learning causal pattern in SEM

• it finds applications in exploring causal relationships among static variables

• the problem is a type of ℓ1-regularized estimation, can be solved by a convexsolver

• the accuracy of learning the true network depends on the selection of theregularization parameter

21

Page 29: Sparse Optimization in Exploring Brain Networksjitkomut.eng.chula.ac.th/pdf/talk-chengdu-june16.pdf · Sparse Optimization in Exploring Brain Networks Jitkomut Songsiri Chula Engineering

(Selected) References

• J. Songsiri, “Learning multiple granger graphical models via group fusedlasso,” in Proceedings of the 10th Asian Control Conference (ASCC), 2015.

• J. Songsiri, “Sparse autoregressive model estimation for learning Grangercausality in time series, in Proceedings of the 38th IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp.3198–3202.

• C. M. Alaız, A. Barbero, and J. R. Dorronsoro, “Group fused lasso, ArtificialNeural Networks and Machine Learning, pp. 66–73, 2013.

• A. Pruttiakaravanich and J. Songsiri, “A Convex Formulation for PathAnalysis in Structural Equation Modeling,” To Appear in Proceeding of SICEconference 2016.

22