Download - Optimal transport for diffeomorphic registration - DMA/ENSfeydy/Talks/MICCAI_2017/Poster_MICCAI2017.pdf · 15.Vaillant,M.,Glaunès,J.:Surfacematchingviacurrents.In:Biennial ... Jean

Optimal transport for diffeomorphic registrationWe define a fidelity term based on Optimal Transport to compare unlabeled shape data, and couple it we a registration algorithm.

Jean Feydy1,2 Benjamin Charlier3,5 François-Xavier Vialard4,6 Gabriel Peyré1,5

1DMA – École Normale Supérieure, Paris, France 2CMLA – ENS Cachan, Cachan, France3Institut Montpelliérain Alexander Grothendieck, Univ. Montpellier, Montpellier, France

4Univ. Paris-Dauphine - PSL Research, Paris, France 5CNRS, Paris, France 6INRIA Mokaplan, Paris, France

The Measure+Kernel Paradigm

We focus on the registration of a shape A toa shape B through a (rigid, diffeomorphic, etc.)transformation ϕ :

A −→ ϕ(A) ' B.

In a variational setting, one chooses a transforma-tion ϕ minimizing an energy

E(ϕ) = Reg(ϕ)︸︷︷︸Regularization

+ d(ϕ(A)→ B)︸︷︷︸fidelity term

.

Registration toolboxes thus require fidelity rou-tines d between unlabeled shapes ϕ(A) and B .Conveniently, one represents those as measures:

ϕ(A)↔ µ =I∑

i=1

piδxiand B ↔ ν =

J∑j=1

qjδyj.

• If ϕ(A) is a segmented surface, each weighteddirac piδxi

stands for a triangle.

• If ϕ(A) is a segmented density image, eachweighted dirac piδxi

stands for a voxel.

Then, one typically chooses a blurring function Gσ

associated to a kernel k = Gσ ? Gσ and use

d(µ→ ν) = ‖Gσ ? µ−Gσ ? ν ‖2L2 = 〈µ−ν | k?(µ−ν) 〉.

This simple fidelity can be computed at the cost ofa single convolution through the data (µ− ν).

Fig. 1 : Smoothed data Gσ ? (µ − ν) for two dif-ferent scales σ. (a) Fine kernels are not suited tolarge deformations, whereas (b) heavy-tailed ker-nels can be hard to tune.

Our Contribution

An Optimal Transportplan is a rough globalmatching, akin toa spring system.

It may tear shapesapart.

source A

target B

warped gridϕ(Rd)

OT planΓA→B

Use it to drive a smooth registration.

This global gradientallows a smoothregistration toolboxto reach quicklya satisfying overlap.warped source

ϕ(A)

Wait a little bit...

source registeredto the target

At convergence, thesmallest details canbe retrieved withoutrecurring to a coarse--to-fine scheme.

The proposed data attachment term is:

• Global, unlike kernel methods.

• Principled, as it relies on a blooming mathematical field.

• Differentiable, pluggable in any registration toolbox.

• Versatile, as it covers all scales and can be adapted to anyfeature space.

• Affordable, at a cost of 100-1000 gaussian convolutions pertransport plan.

The Math Behind It

In the simplest setting, we assume that

µ =I∑

i=1

piδxiand ν =

J∑j=1

qjδyj

have the same total mass. Then, we define theWasserstein fidelity term through a minimizationon I-by-J matrices – called transport plans Γ:

Wε(µ, ν)︸︷︷︸= d(µ→ν)

= minΓ

∑i,j

γi,j · |xi − yj|2︸︷︷︸transport cost

+ ε∑i,j

γi,j log γi,j︸︷︷︸entropic regularization

under the constraint that Γ = (γi,j) satisfies

∀i, j, γi,j > 0,∑

j

γi,j = pi,∑

i

γi,j = qj. (1)

Optimality conditions show that the OT plan canbe written as a product

γi,j = Γ(xi→ yj) = a(xi) k(xi, yj) b(yj), where:

• The kernel function k is given by

k(xi, yj) = k(xi − yj) = e−|xi−yj|2/ε.

• a and b are nonnegative functions supportedrespectively by {xi} and {yj}.

The Sinkhorn theorem then asserts that a and b areuniquely determined by eq. (1), which now reads

a = p

k ? b, b = q

k ? a.

√ε

√ε

Fig. 2 : Two OT plans computed with different reg-ularization scales

√ε. Increasing this parameter

results in a lower computational cost.

The Algorithm

Sinkhorn Iterative AlgorithmParameter : k : x 7→ e−|x|

2/ε

Input : source µ =∑

i piδxi

target ν =∑

j qjδyj

Output : fidelity Wε(µ, ν)1: a← ones(size(p))2: b← ones(size(q))3: while updates > tol do4: a← p / (k ? b)5: b← q / (k ? a)6: return ε ·

(〈 p, log(a) + 1/2 〉

+〈 q, log(b) + 1/2 〉)

If the data lies on a grid, k ? · is aseparable gaussian convolution.

If the data is sparse, k ? · is the prod-uct with the kernel matrix

(Kij) = k(xi, yj)

and its transpose.

In Practice

it. 10 it. 50

it. 677it. 100

Fig. 3 : Sinkhorn iterations propagatealong both shapes the informationencoded within (Kij).

The practical convergence rate of theSinkhorn algorithm is not well un-derstood yet, but computing an opti-mal transport plan typically requires∼1000 convolutions, depending on ε.Our Matlab and Python imple-mentations are freely available:github.com/jeanfeydy/lddmm-ot

Bonus Features

−−−−−→

Fig. 4 : Use OT plans to registrate ex-otic data types.

• Use Unbalanced Transport,relaxing the constraints of eq. (1)with a soft penalty term.

• Generalize the algorithm toFeatures Spaces such as the“position + orientation” space.

• Compute seamlessly thederivatives of the fidelity.

• Implement the algorithm in thelog-domain with Nesterovacceleration for increasednumerical stability and speed.

Take-Home Points

• The Sinkhorn algorithm, aniterative globalization trick,provides small kernels withlong-distance vision.

• Computed at the cost of a fewhundred convolutions, OptimalTransport plans can be used asspring systems driving adiffeomorphic registrationroutine.

• The resulting framework is morerobust than a kernel-based one,as no target data is “out-of-sight”.

• This new scheme will find its useat the coarsest scales, where itsproperties are worth thecomputational overhead.

References1. Al-Rfou, R., Alain, G., Almahairi, A., Angermüller, C., Bahdanau, D.,Ballas, N., ..., Zhang, Y.: Theano: A python framework for fastcomputation of mathematical expressions. CoRR abs/1605.02688 (2016)2. Avants, B., Epstein, C., Grossman, M., Gee, J.: Symmetric diffeomorphicimage registration with cross-correlation: Evaluating automatedlabeling of elderly and neurodegenerative brain. Medical Image Analysis12(1), 26 – 41 (2008)3. Charon, N., Trouvé, A.: The varifold representation of nonorientedshapes for diffeomorphic registration. SIAM J. Imaging Sciences 6(4),2547–2580 (2013)4. Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.X.: Scaling algorithms forunbal- anced transport problems. Preprint 1607.05816, Arxiv (2016)5. Chizat, L., Schmitzer, B., Peyré, G., Vialard, F.X.: An interpolatingdistance between optimal transport and Fisher–Rao. to appear inFoundations of Computa- tional Mathematics (2016)6. Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal

transportation. In: Proc. NIPS, vol. 26, pp. 2292–2300 (2013)7. Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In:Proceed- ings of the 31st International Conference on Machine Learning(ICML), JMLR WCP. vol. 32 (2014)8. Gee, J.C., Gee, J.C., Reivich, M., Reivich, M., Bajcsy, R., Bajcsy, R.:Elastically deforming a three-dimensional atlas to match anatomicalbrain images. J. Comput. Assist. Tomogr 17, 225–236 (1993)9. Gori, P., Colliot, O., Marrakchi-Kacem, L., Worbe, Y., de Vico Fallani, F.,Chavez, M., Lecomte, S., Poupon, C., Hartmann, A., Ayache, N., Durrleman,S.: A proto- type representation to approximate white matter bundleswith weighted currents. Proceedings of Medical Image Computing andComputer Aided Intervention (MIC- CAI’14) Part III LNCS 8675, 289–296(2014)10. Liero, M., Mielke, A., Savaré, G.: Optimal entropy-transport problemsand a new Hellinger–Kantorovich distance between positive measures.ArXiv e-prints (2015)

11. Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training ofrestricted Boltz- mann machines. In: Adv. in Neural InformationProcessing Systems (2016)12. Santambrogio, F.: Optimal transport for applied mathematicians.Progress in Non- linear Differential Equations and their applications 87(2015)13. Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.:Equivalence of distance-based and rkhs-based statistics in hypothesistesting. Ann. Statist. 41(5), 2263–2291 (10 2013)14. Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical imageregistration: A survey. IEEE Transactions on Medical Imaging 32(7),1153–1190 (July 2013)15. Vaillant, M., Glaunès, J.: Surface matching via currents. In: BiennialInternational Conference on Information Processing in Medical Imaging.pp. 381–392. Springer (2005)

Contact Information

Jean Feydy,PhD student with Prof. Alain Trouvé,DMA, ENS Paris + CMLA, ENS Cachan.

Mail : [email protected] : www.math.ens.fr/~feydy

1

www.math.ens.fr/~feydy