Canonical Correlation Analysis on Riemannian Manifolds and...

1
Canonical Correlation Analysis on Riemannian Manifolds and its Applications Hyunwoo J. Kim 1 , Nagesh Adluru 1 , Barbara B. Bendlin 1 , Sterling C. Johnson 1 , Baba C. Vemuri 2 , Vikas Singh 1 http://pages.cs.wisc.edu/hwkim/projects/riem-cca 1 University of Wisconsin-Madison 2 University of Florida T HE MOTIVATING PROBLEM I Canonical correlation analysis (CCA) based statistical analysis of images where voxel-wise measurement is manifold valued. I CCA on the product Riemannian manifold representing SPD matrix-valued fields I Correlations between diffusion tensor images (DTI) and Cauchy deformation tensor fields derived from T1-weighted magnetic resonance (MR) images W HAT S NEW ? I Extensions of CCA to nonlinear spaces e.g., KCCA, ANN, Deep CCA (ICML 2013) I Adaptive CCA (ICML 2012) uses matrix manifold optimization but it’s NOT for manifold-valued data I This paper : generalization of CCA for Riemannian manifolds I Main technical highlight: Exact iterative method as well as single path algorithms with approximate projections (inner product and log-Euclidean) I Relationship between transformations on SPD(n) for more accurate approximation CCA IN E UCLIDEAN SPACE Pearson correlation for x R and y R ρ x ,y = COV(x , y ) σ x σ y = E[(x - μ x )(y - μ y )] σ x σ y = N i =1 (x i - μ x )(y i - μ y ) q N i =1 (x i - μ x ) 2 q N i =1 (y i - μ y ) 2 (1) Canonical Correlation for x R m and y R n max w x ,w y corr(π w x (x )w y (y )) = max w x ,w y N i =1 w T x (x i - μ x )w T y (y i - μ y ) q N i =1 (w T x (x i - μ x )) 2 q N i =1 ( w T y (y i - μ y ) ) 2 (2) where projection coefficient π w x (x ) := arg min t R d(t w x + μ x , x ) 2 . CCA ON M ANIFOLDS :B ASIC O PERATIONS Operation Subtraction Addition Distance Mean Covariance Euclidean -→ x i x j = x j - x i x i + --→ x j x k k -→ x i x j k n i =1 -→ ¯ xx i = 0 E (x i - ¯ x )(x i - ¯ x ) T Riemannian -→ x i x j = Log(x i , x j ) Exp(x i , --→ x j x k ) kLog(x i , x j )k x i n i =1 Log( ¯ x , x i )= 0 E Log( ¯ x , x i )Log( ¯ x , x i ) T π w x (x ) := arg min t R d(Exp(μ x , t w x ), x ) 2 (3) μ x w x t 1 x 1 w x t 2 w x Π S w x (x 2 ) x 1 x 2 T μ x M x M x S w x T μ y M y μ y w y u 1 w y u 2 w y y 1 y 2 M y S w y Π S w y (y 1 ) Π S w y (y 2 ) CCA ON MANIFOLDS Input: x 1 ,..., x N ∈M, y 1 ,..., y N ∈M Output: w x T μ x M, w y T μ y M (submanifolds), t i , u i R (projection coefficients) ρ x ,y = max w x ,w y ,t ,u N i =1 (t i - ¯ t )(u i - ¯ u ) q N i =1 (t i - ¯ t ) 2 q N i =1 (u i - ¯ u ) 2 s .t . t i = arg min t i (-,) kLog(Exp(μ x , t i w x ), x i )k 2 , i ∈{1,..., N } u i = arg min u i (-,) kLog(Exp(μ y , u i w y ), y i )k 2 , i ∈{1,..., N } (4) F ORMULATION WITH FIRST ORDER OPTIMALITY CONDITIONS ρ(w x , w y )= max w x ,w y ,t ,u f (t , u ) s.t. t i g (t i , w x )= 0, u i g (u i , w y )= 0, i ∈{1,..., N } (5) f (t , u ) := N i =1 (t i - ¯ t )(u i - ¯ u ) q N i =1 (t i - ¯ t ) 2 q N i =1 (u i - ¯ u ) 2 g (t i , w x ) := kLog(Exp(μ x , t i w x ), x i )k 2 , g (u i , w y ) := kLog(Exp(μ y , u i w y ), y i )k 2 O BJECTIVE FUNCTION FOR CCA ON M ANIFOLDS Given a constrained optimization problem max f (x ) s.t. c i (x )= 0, i , the augmented Lagrangian method (ALM) solves a sequence of the following models increasing ν k . max f (x )+ X i λ i c i (x ) - ν k X i c i (x ) 2 (6) The augmented Lagrangian formulation for our CCA formulation (5) is max w x ,w y ,t ,u L A (w x , w y , t , u , λ k ; ν k )= max w x ,w y ,t ,u f (t , u )+ N X i λ k t i t i g (t i , w x )+ N X i λ k u i u i g (u i , w y ) - ν k 2 N X i =1 t i g (t i , w x ) 2 + u i g (u i , w y ) 2 (7) I TERATIVE ALGORITHM BY AUGMENTED L AGRANGIAN METHOD 1: x 1 ,..., x N ∈M x , y 1 ,..., y N ∈M y 2: Given ν 0 > 00 > 0, starting points (w 0 x , w 0 y , t 0 , u 0 ) and λ 0 3: for k = 0, 1, 2 ... do 4: Start at (w k x , w k y , t k , u k ) 5: Find an approximate minimizer (w k x , w k y , t k , u k ) of L A (·, λ k ; ν k ), and terminate when k∇L A (w k x , w k y , t k , u k , λ k ; ν k )k≤ τ k 6: if a convergence test is satisfied then 7: Stop with approximate feasible solution 8: end if 9: λ k +1 t i = λ k t i - ν k t i g (t i , w x ), i and λ k +1 u i = λ k u i - ν k u i g (u i , w y ), i 10: Choose new penalty parameter ν k +1 ν k 11: Set starting point for the next iteration 12: Select tolerance τ k +1 13: end for S INGLE PATH ALGORITHM WITH APPROXIMATE PROJECTION Π S (x ) Exp(μ, d X i =1 v i hv i , Log(μ, x )i μ ) (8) 1: Input X 1 ,..., X N ∈M y , Y 1 ,..., Y N M y 2: Compute intrinsic mean μ x , μ y of {X i }, {Y i } 3: Compute X o i = Log(μ x , X i ), Y o i = Log(μ y , Y i ) 4: Transform (using group action) {X o i }, {Y o i } to the T I M x , T I M y 5: Perform CCA between T I M x , T I M y and get axes W a T I M x , W b T I M y 6: Transform (using group action) W a , W b to T μ x M x , T μ y M y W HY GROUP ACTION IS NEEDED ? I Transformations to the Identity of SPD(n) for more accurate projection I Group action is equivalent to the parallel transport from T p M to T I M on SPD(n) I Group action is computationally more efficient T HEOREM On SPD manifold, let Γ pI (w ) denote the parallel transport of w T p M along the geodesic from p ∈M to I ∈M. The parallel transport is equivalent to group action by p -1/2 wp -T /2 , where the inner product hu , v i p = tr(p -1/2 up -1 vp -1/2 ). S YNTHETIC E XPERIMENTS :R IEMANNIAN VS .E UCLIDEAN CCA Figure : Improvements in ρ x ,y using Riemannian CCA. Each column represents a synthetic experiment with a specific set of {μ x j , x j ; μ y j , y j }. The first column has 100 samples while the other columns have 1000 samples. N EUROIMAGING E XPERIMENTS : CCA FOR M ULTI - MODAL A NALYSIS Task: Detect gender and age effects on white and gray matter interactions. The white matter is characterized by diffusion tensors (Y DTI ) while the gray matter by Cauchy defor- mation tensors (X T1W = J T J ). Y DTI = β 0 + β 1 Gender + β 2 X T1W + β 3 X T1W · Gender + ε, Y DTI = β 0 0 + β 0 1 AgeGroup + β 0 2 X T1W + β 0 3 X T1W · AgeGroup + ε, Figure : Regions of interest in which the interaction models are tested. From left to right, cingulum bundle (DTI), hippocampus (T1W), entire white matter (DTI) and entire gray matter (T1W). Figure : Riemannian CCA weight vectors for DTI (w y ) and T1W (w x ) when using the cingulum (left) and hippocampal (right) ROIs. Bottom row shows the same when using entire white and gray matter ROIs. Figure : Riemannian CCA weight vectors for DTI (w y ) and T1W (w x ) when using the cingulum (left) and hippocampal (right) ROIs. Bottom row shows the same when using entire white and gray matter ROIs. European Conference on Computer Vision (ECCV) 2014 Research supported in part by NIH R01 AG040396, NIH R01 AG037639, NSF CAREER award 1252725, NSF RI 1116584, Waisman Center CIHM, Wisconsin ADRC.

Transcript of Canonical Correlation Analysis on Riemannian Manifolds and...

Page 1: Canonical Correlation Analysis on Riemannian Manifolds and ...pages.cs.wisc.edu/~hwkim/projects/riem-cca/riem-cca-poster-eccv20… · Title: Canonical Correlation Analysis on Riemannian

Canonical Correlation Analysis on Riemannian Manifolds and its ApplicationsHyunwoo J. Kim1, Nagesh Adluru1, Barbara B. Bendlin1, Sterling C. Johnson1, Baba C. Vemuri2, Vikas Singh1

http://pages.cs.wisc.edu/∼hwkim/projects/riem-cca 1University of Wisconsin-Madison 2University of Florida

THE MOTIVATING PROBLEM

I Canonical correlation analysis (CCA) based statistical analysis of images wherevoxel-wise measurement is manifold valued.

I CCA on the product Riemannian manifold representing SPD matrix-valued fieldsI Correlations between diffusion tensor images (DTI) and Cauchy deformation tensor

fields derived from T1-weighted magnetic resonance (MR) images

WHAT’S NEW?

I Extensions of CCA to nonlinear spaces e.g., KCCA, ANN, Deep CCA (ICML 2013)I Adaptive CCA (ICML 2012) uses matrix manifold optimization but it’s NOT for

manifold-valued dataI This paper : generalization of CCA for Riemannian manifoldsI Main technical highlight: Exact iterative method as well as single path algorithms

with approximate projections (inner product and log-Euclidean)I Relationship between transformations on SPD(n) for more accurate approximation

CCA IN EUCLIDEAN SPACE

Pearson correlation for x ∈ R and y ∈ R

ρx ,y =COV(x , y)

σxσy=

E[(x − µx)(y − µy)]

σxσy=

∑Ni=1(xi − µx)(yi − µy)√∑N

i=1(xi − µx)2√∑N

i=1(yi − µy)2(1)

Canonical Correlation for x ∈ Rm and y ∈ Rn

maxwx ,wy

corr(πwx(x), πwy(y)) = maxwx ,wy

∑Ni=1 wT

x (x i − µx)wTy (y i − µy)√∑N

i=1 (wTx (x i − µx))

2√∑N

i=1

(wT

y (y i − µy))2 (2)

where projection coefficient πwx(x) := arg mint∈R d(twx + µx ,x)2.

CCA ON MANIFOLDS: BASIC OPERATIONS

Operation Subtraction Addition Distance Mean Covariance

Euclidean −→xixj = xj − xi xi +−−→xjxk ‖−→xixj‖

∑ni=1−→xxi = 0 E

[(xi − x)(xi − x)T

]Riemannian −→xixj = Log(xi , xj) Exp(xi ,

−−→xjxk) ‖Log(xi , xj)‖xi

∑ni=1 Log(x , xi) = 0 E

[Log(x , xi)Log(x , xi)

T]

πwx(x) := arg mint∈R

d(Exp(µx , twx),x)2 (3)

µx

O

wxt1xo1

wxt2wx

ΠSwx(x2)

x1 x2

TµxMx

Mx

Swx

TµyMyµy wyu1 wyu2

wy

y1

y2MySwy

ΠSwy(y1)

ΠSwy(y2)

CCA ON MANIFOLDS

Input: x1, . . . , xN ∈M, y1, . . . , yN ∈MOutput: wx ∈ Tµx

M, wy ∈ TµyM (submanifolds), ti,ui ∈ R (projection coefficients)

ρx ,y = maxwx ,wy ,t ,u

∑Ni=1(ti − t)(ui − u)√∑N

i=1(ti − t)2√∑N

i=1(ui − u)2

s.t . ti = arg minti∈(−ε,ε)

‖Log(Exp(µx , tiwx),x i)‖2,∀i ∈ {1, . . . ,N}

ui = arg minui∈(−ε,ε)

‖Log(Exp(µy ,uiwy),y i)‖2, ∀i ∈ {1, . . . ,N}

(4)

FORMULATION WITH FIRST ORDER OPTIMALITY CONDITIONS

ρ(wx ,wy) = maxwx ,wy ,t ,u

f (t ,u) s.t. ∇tig(ti,wx) = 0,∇uig(ui,wy) = 0,∀i ∈ {1, . . . ,N} (5)

f (t ,u) :=

∑Ni=1(ti − t)(ui − u)√∑N

i=1(ti − t)2√∑N

i=1(ui − u)2

g(ti,wx) := ‖Log(Exp(µx , tiwx),x i)‖2,g(ui,wy) := ‖Log(Exp(µy ,uiwy),y i)‖2

OBJECTIVE FUNCTION FOR CCA ON MANIFOLDS

Given a constrained optimization problem max f (x) s.t. ci(x) = 0,∀i , the augmentedLagrangian method (ALM) solves a sequence of the following models increasing νk .

max f (x) +∑

i

λici(x)− νk∑

i

ci(x)2(6)

The augmented Lagrangian formulation for our CCA formulation (5) is

maxwx ,wy ,t ,u

LA(wx ,wy , t ,u,λk ; νk) = maxwx ,wy ,t ,u

f (t ,u) +N∑i

λkti∇tig(ti,wx)+

N∑i

λkui∇uig(ui,wy)− ν

k

2

N∑i=1

∇tig(ti,wx)2 +∇uig(ui,wy)2

(7)

ITERATIVE ALGORITHM BY AUGMENTED LAGRANGIAN METHOD

1: x1, . . . ,xN ∈Mx , y1, . . . ,yN ∈My

2: Given ν0 > 0, τ0 > 0, starting points (w0x ,w0

y , t0,u0) and λ0

3: for k = 0,1,2 . . . do4: Start at (wk

x ,wky , t

k ,uk)

5: Find an approximate minimizer (wkx ,wk

y , tk ,uk) of LA(·,λk ; νk), and terminate when

‖∇LA(wkx ,wk

y , tk ,uk ,λk ; νk)‖ ≤ τ k

6: if a convergence test is satisfied then7: Stop with approximate feasible solution8: end if9: λk+1

ti = λkti − νk∇tig(ti ,wx),∀i and λk+1

ui= λk

ui− νk∇uig(ui ,wy), ∀i

10: Choose new penalty parameter νk+1 ≥ νk

11: Set starting point for the next iteration12: Select tolerance τ k+1

13: end for

SINGLE PATH ALGORITHM WITH APPROXIMATE PROJECTION

ΠS(x) ≈ Exp(µ,d∑

i=1

v i〈v i,Log(µ,x)〉µ ) (8)

1: Input X1, . . . ,XN ∈My , Y1, . . . ,YN ∈ My2: Compute intrinsic mean µx ,µy of {Xi}, {Yi}3: Compute X oi = Log(µx ,Xi), Y oi = Log(µy ,Yi)

4: Transform (using group action) {X oi }, {Yoi } to the TIMx ,TIMy

5: Perform CCA between TIMx ,TIMy and get axes Wa ∈ TIMx , Wb ∈ TIMy6: Transform (using group action) Wa,Wb to Tµx

Mx , TµyMy

WHY GROUP ACTION IS NEEDED?

I Transformations to the Identity of SPD(n) for more accurate projectionI Group action is equivalent to the parallel transport from TpM to TIM on SPD(n)I Group action is computationally more efficient

THEOREM

On SPD manifold, let Γp→I(w) denote the parallel transport of w ∈ TpM along thegeodesic from p ∈M to I ∈M. The parallel transport is equivalent to group action byp−1/2wp−T/2, where the inner product 〈u, v〉p = tr(p−1/2up−1vp−1/2).

SYNTHETIC EXPERIMENTS: RIEMANNIAN VS. EUCLIDEAN CCA

Figure : Improvements in ρx ,y using Riemannian CCA. Each column represents a synthetic experiment with aspecific set of {µxj , εxj ;µyj , εyj}. The first column has 100 samples while the other columns have 1000 samples.

NEUROIMAGING EXPERIMENTS: CCA FOR MULTI-MODAL ANALYSIS

Task: Detect gender and age effects on white and gray matter interactions. The whitematter is characterized by diffusion tensors (YDTI) while the gray matter by Cauchy defor-mation tensors (XT1W =

√JTJ).

YDTI = β0 + β1Gender + β2XT1W + β3XT1W · Gender + ε,

YDTI = β′0 + β′1AgeGroup + β′2XT1W + β′3XT1W · AgeGroup + ε,

Figure : Regions of interest in which the interaction models are tested. From left to right, cingulum bundle(DTI), hippocampus (T1W), entire white matter (DTI) and entire gray matter (T1W).

Figure : Riemannian CCA weight vectors for DTI (wy) and T1W (wx) when using the cingulum (left) andhippocampal (right) ROIs. Bottom row shows the same when using entire white and gray matter ROIs.

Figure : Riemannian CCA weight vectors for DTI (wy) and T1W (wx) when using the cingulum (left) andhippocampal (right) ROIs. Bottom row shows the same when using entire white and gray matter ROIs.

European Conference on Computer Vision (ECCV) 2014 Research supported in part by NIH R01 AG040396, NIH R01 AG037639, NSF CAREER award 1252725, NSF RI 1116584, Waisman Center CIHM, Wisconsin ADRC.