Optimal transport for diffeomorphic registrationWe define a fidelity term based on Optimal Transport to compare unlabeled shape data, and couple it we a registration algorithm.
Jean Feydy1,2 Benjamin Charlier3,5 François-Xavier Vialard4,6 Gabriel Peyré1,5
1DMA – École Normale Supérieure, Paris, France 2CMLA – ENS Cachan, Cachan, France3Institut Montpelliérain Alexander Grothendieck, Univ. Montpellier, Montpellier, France
4Univ. Paris-Dauphine - PSL Research, Paris, France 5CNRS, Paris, France 6INRIA Mokaplan, Paris, France
The Measure+Kernel Paradigm
We focus on the registration of a shape A toa shape B through a (rigid, diffeomorphic, etc.)transformation ϕ :
A −→ ϕ(A) ' B.
In a variational setting, one chooses a transforma-tion ϕ minimizing an energy
E(ϕ) = Reg(ϕ)︸ ︷︷ ︸Regularization
+ d(ϕ(A)→ B)︸ ︷︷ ︸fidelity term
.
Registration toolboxes thus require fidelity rou-tines d between unlabeled shapes ϕ(A) and B .Conveniently, one represents those as measures:
ϕ(A)↔ µ =I∑
i=1
piδxiand B ↔ ν =
J∑j=1
qjδyj.
• If ϕ(A) is a segmented surface, each weighteddirac piδxi
stands for a triangle.
• If ϕ(A) is a segmented density image, eachweighted dirac piδxi
stands for a voxel.
Then, one typically chooses a blurring function Gσ
associated to a kernel k = Gσ ? Gσ and use
d(µ→ ν) = ‖Gσ ? µ−Gσ ? ν ‖2L2 = 〈µ−ν | k?(µ−ν) 〉.
This simple fidelity can be computed at the cost ofa single convolution through the data (µ− ν).
Fig. 1 : Smoothed data Gσ ? (µ − ν) for two dif-ferent scales σ. (a) Fine kernels are not suited tolarge deformations, whereas (b) heavy-tailed ker-nels can be hard to tune.
Our Contribution
An Optimal Transportplan is a rough globalmatching, akin toa spring system.
It may tear shapesapart.
source A
target B
warped gridϕ(Rd)
OT planΓA→B
Use it to drive a smooth registration.
This global gradientallows a smoothregistration toolboxto reach quicklya satisfying overlap.warped source
ϕ(A)
Wait a little bit...
source registeredto the target
At convergence, thesmallest details canbe retrieved withoutrecurring to a coarse--to-fine scheme.
The proposed data attachment term is:
• Global, unlike kernel methods.
• Principled, as it relies on a blooming mathematical field.
• Differentiable, pluggable in any registration toolbox.
• Versatile, as it covers all scales and can be adapted to anyfeature space.
• Affordable, at a cost of 100-1000 gaussian convolutions pertransport plan.
The Math Behind It
In the simplest setting, we assume that
µ =I∑
i=1
piδxiand ν =
J∑j=1
qjδyj
have the same total mass. Then, we define theWasserstein fidelity term through a minimizationon I-by-J matrices – called transport plans Γ:
Wε(µ, ν)︸ ︷︷ ︸= d(µ→ν)
= minΓ
∑i,j
γi,j · |xi − yj|2︸ ︷︷ ︸transport cost
+ ε∑i,j
γi,j log γi,j︸ ︷︷ ︸entropic regularization
under the constraint that Γ = (γi,j) satisfies
∀i, j, γi,j > 0,∑
j
γi,j = pi,∑
i
γi,j = qj. (1)
Optimality conditions show that the OT plan canbe written as a product
γi,j = Γ(xi→ yj) = a(xi) k(xi, yj) b(yj), where:
• The kernel function k is given by
k(xi, yj) = k(xi − yj) = e−|xi−yj|2/ε.
• a and b are nonnegative functions supportedrespectively by {xi} and {yj}.
The Sinkhorn theorem then asserts that a and b areuniquely determined by eq. (1), which now reads
a = p
k ? b, b = q
k ? a.
√ε
√ε
Fig. 2 : Two OT plans computed with different reg-ularization scales
√ε. Increasing this parameter
results in a lower computational cost.
The Algorithm
Sinkhorn Iterative AlgorithmParameter : k : x 7→ e−|x|
2/ε
Input : source µ =∑
i piδxi
target ν =∑
j qjδyj
Output : fidelity Wε(µ, ν)1: a← ones(size(p))2: b← ones(size(q))3: while updates > tol do4: a← p / (k ? b)5: b← q / (k ? a)6: return ε ·
(〈 p, log(a) + 1/2 〉
+〈 q, log(b) + 1/2 〉)
If the data lies on a grid, k ? · is aseparable gaussian convolution.
If the data is sparse, k ? · is the prod-uct with the kernel matrix
(Kij) = k(xi, yj)
and its transpose.
In Practice
it. 10 it. 50
it. 677it. 100
Fig. 3 : Sinkhorn iterations propagatealong both shapes the informationencoded within (Kij).
The practical convergence rate of theSinkhorn algorithm is not well un-derstood yet, but computing an opti-mal transport plan typically requires∼1000 convolutions, depending on ε.Our Matlab and Python imple-mentations are freely available:github.com/jeanfeydy/lddmm-ot
Bonus Features
−−−−−→
Fig. 4 : Use OT plans to registrate ex-otic data types.
• Use Unbalanced Transport,relaxing the constraints of eq. (1)with a soft penalty term.
• Generalize the algorithm toFeatures Spaces such as the“position + orientation” space.
• Compute seamlessly thederivatives of the fidelity.
• Implement the algorithm in thelog-domain with Nesterovacceleration for increasednumerical stability and speed.
Take-Home Points
• The Sinkhorn algorithm, aniterative globalization trick,provides small kernels withlong-distance vision.
• Computed at the cost of a fewhundred convolutions, OptimalTransport plans can be used asspring systems driving adiffeomorphic registrationroutine.
• The resulting framework is morerobust than a kernel-based one,as no target data is “out-of-sight”.
• This new scheme will find its useat the coarsest scales, where itsproperties are worth thecomputational overhead.
References1. Al-Rfou, R., Alain, G., Almahairi, A., Angermüller, C., Bahdanau, D.,Ballas, N., ..., Zhang, Y.: Theano: A python framework for fastcomputation of mathematical expressions. CoRR abs/1605.02688 (2016)2. Avants, B., Epstein, C., Grossman, M., Gee, J.: Symmetric diffeomorphicimage registration with cross-correlation: Evaluating automatedlabeling of elderly and neurodegenerative brain. Medical Image Analysis12(1), 26 – 41 (2008)3. Charon, N., Trouvé, A.: The varifold representation of nonorientedshapes for dif- feomorphic registration. SIAM J. Imaging Sciences 6(4),2547–2580 (2013)4. Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.X.: Scaling algorithms forunbal- anced transport problems. Preprint 1607.05816, Arxiv (2016)5. Chizat, L., Schmitzer, B., Peyré, G., Vialard, F.X.: An interpolatingdistance be- tween optimal transport and Fisher–Rao. to appear inFoundations of Computa- tional Mathematics (2016)6. Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal
transportation. In: Proc. NIPS, vol. 26, pp. 2292–2300 (2013)7. Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In:Proceed- ings of the 31st International Conference on Machine Learning(ICML), JMLR WCP. vol. 32 (2014)8. Gee, J.C., Gee, J.C., Reivich, M., Reivich, M., Bajcsy, R., Bajcsy, R.:Elastically deforming a three-dimensional atlas to match anatomicalbrain images. J. Comput. Assist. Tomogr 17, 225–236 (1993)9. Gori, P., Colliot, O., Marrakchi-Kacem, L., Worbe, Y., de Vico Fallani, F.,Chavez, M., Lecomte, S., Poupon, C., Hartmann, A., Ayache, N., Durrleman,S.: A proto- type representation to approximate white matter bundleswith weighted currents. Proceedings of Medical Image Computing andComputer Aided Intervention (MIC- CAI’14) Part III LNCS 8675, 289–296(2014)10. Liero, M., Mielke, A., Savaré, G.: Optimal entropy-transport problemsand a new Hellinger–Kantorovich distance between positive measures.ArXiv e-prints (2015)
11. Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training ofrestricted Boltz- mann machines. In: Adv. in Neural InformationProcessing Systems (2016)12. Santambrogio, F.: Optimal transport for applied mathematicians.Progress in Non- linear Differential Equations and their applications 87(2015)13. Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.:Equivalence of distance-based and rkhs-based statistics in hypothesistesting. Ann. Statist. 41(5), 2263–2291 (10 2013)14. Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical imageregistration: A survey. IEEE Transactions on Medical Imaging 32(7),1153–1190 (July 2013)15. Vaillant, M., Glaunès, J.: Surface matching via currents. In: BiennialInternational Conference on Information Processing in Medical Imaging.pp. 381–392. Springer (2005)
Contact Information
Jean Feydy,PhD student with Prof. Alain Trouvé,DMA, ENS Paris + CMLA, ENS Cachan.
Mail : [email protected] : www.math.ens.fr/~feydy
1
Top Related