An Introduction to Optimal Transport

1. An Introductionto Optimal Transport Gabriel Peyr www.numerical-tours.com

2. Statistical Image ModelsColors distribution: ) each pixel Source image (X point in R3Source image after color transferStyle image (Y ) J. Rabin Wasserstein Regularization 3. Statistical Image ModelsColors distribution: ) each pixel Source image (X point in R3 Optimal transport framework Sliced Wasserstein projection ApplicationsApplication to Color Transfer Optimal transport framework Sliced Wasserstein projection ApplicationsApplication to Color TransferSliced Wasserstein projection color styleSource image after of X to transfer image color statistics YStyle image (Y )J. RabinWasserstein RegularizationSource image (X ) Sliced Wasserstein projection of X to styleimage color statistics Y Input imageSource image (X ) Modied color transferSource image after imageStyle image (Y ) J. RabinWasserstein Regularization 4. Statistical Image ModelsColors distribution: ) each pixel Source image (X point in R3 Optimal transport framework Sliced Wasserstein projection ApplicationsApplication to Color Transfer Optimal transport framework Sliced Wasserstein projection ApplicationsApplication to Color TransferSliced Wasserstein projection color styleSource image after of X to transfer image color statistics YStyle image (Y )J. RabinWasserstein RegularizationSource image (X ) Sliced Wasserstein projection of X to styleimage color statistics Y Input imageSource image (X ) Modied color transferSource image after imageStyle image (Y )Texture synthesisOther applications:J. RabinWasserstein RegularizationTexture segmentation 5. Discrete DistributionsN 1Discrete measure: = pi Xi Xi Rd pi = 1i=0 i 6. Discrete DistributionsN 1Discrete measure: =pi Xi Xi Rd pi = 1i=0iPoint cloudConstant weights: pi = 1/N .Xi Quotient space:RN d / N 7. Discrete DistributionsN 1Discrete measure: =pi XiXiRd pi = 1i=0iPoint cloudHistogramConstant weights: pi = 1/N .Fixed positions Xi (e.g. grid)Xi Quotient space:A ne space:RN d / N {(pi )ii pi = 1} 8. From Images to Statistics Source image (X )Discretized image f RN d fiRd = R3 N = #pixels, d = #colors.Source image Style image (Y )J. RabinWasserstein Regularizatio 9. From Images to Statistics Source image (X )Discretized image fRN dfiRd = R3N = #pixels, d = #colors. Source imageDisclamers: images are not distributions. image (Y ) Style J. RabinWasserstein Regularizatio Needs an estimator: f f Modify f by controlling f . 10. From Images to Statistics Source image (X )Discretized image fRN dfiRd = R3N = #pixels, d = #colors. Source imageDisclamers: images are not distributions. image (Y ) Style J. RabinWasserstein Regularizatio Needs an estimator: f f Modify f by controlling f . XiPoint cloud discretization: f =fii 11. From Images to StatisticsSource image (X )Discretized image fRN d fiRd = R3N = #pixels, d = #colors.Source imageDisclamers: images are not distributions. image (Y ) StyleJ. RabinWasserstein Regularizatio Needs an estimator: f f Modify f by controlling f .XiPoint cloud discretization: f = fi iHistogram discretization: f = pi Xi i 1 Parzen windows: pi =(xifj )Zfj 12. Overview Discrete Optimal Transport Continuous Optimal Transport Displacement Interpolation 13. Optimal Assignments N 1Discrete distributions: X = Xi N i=1YX 14. Optimal AssignmentsN 1Discrete distributions:X=Xi Ni=1Y XOptimal assignment: argmin ||Xi Y(i) ||p Y (i)N i Xi 15. Optimal Assignments N1Discrete distributions: X=XiNi=1 YXOptimal assignment: argmin ||Xi Y(i) ||pY (i)Ni XiWasserstein distance: Wp (X , Y )p =||Xi Y(i) ||pi Metric on the space of distributions. 16. Optimal Assignments N1Discrete distributions: X=XiNi=1 YXOptimal assignment: argmin ||Xi Y(i) ||pY (i)Ni XiWasserstein distance: Wp (X , Y )p =||Xi Y(i) ||pi Metric on the space of distributions.Projection on statistical constraints: C = {ff = Y }ProjC (f ) = Y 17. Computing Transport DistancesExplicit solution for 1D distribution (e.g. grayscale images):XiYi sorting the values, O(N log(N )) operations. 18. Computing Transport DistancesExplicit solution for 1D distribution (e.g. grayscale images):XiYi sorting the values, O(N log(N )) operations.Higher dimensions: combinatorial optimization methods Hungarian algorithm, auctions algorithm, etc. O(N 5/2 log(N )) operations.intractable for imaging problems. 19. Computing Transport DistancesExplicit solution for 1D distribution (e.g. grayscale images):XiYisorting the values, O(N log(N )) operations.Higher dimensions: combinatorial optimization methods Hungarian algorithm, auctions algorithm, etc.O(N 5/2 log(N )) operations.intractable for imaging problems.Arbitrary distributions:= pi Xi = qi Yi iiWp (, ) solution of a linear program. p 20. Coupling MatricesN1weighted distributions = i=1 pi XiExtension to:arbitrary number of points=N2i=1 qi Yi 21. Coupling Matrices N1weighted distributions =i=1 pi XiExtension to:arbitrary number of points= N2 i=1 qi YiIf N1 = N2 , permutation matrix: P = P = ( i (j) )i,j ||Xi Y (i) ||p = Pi,j ||Xi Yj ||p ii,j 22. Coupling MatricesN1weighted distributions = i=1 pi XiExtension to:arbitrary number of points=N2i=1 qi YiIf N1 = N2 , permutation matrix: P = P = (i (j) )i,j||XiY(i) ||p = Pi,j ||Xi Yj ||pii,jProbabilistic coupling:, = P RN1N2 P0, P 1 = p, P 1 = q pq 23. Coupling MatricesN1weighted distributions = i=1 pi XiExtension to:arbitrary number of points=N2i=1 qi YiIf N1 = N2 , permutation matrix: P = P = (i (j) )i,j||XiY(i) ||p = Pi,j ||Xi Yj ||pii,jProbabilistic coupling:, = P RN1N2 P0, P 1 = p, P 1 = q p Dened even if N1 = N2 . Takes into account weights. Linear objective.q 24. Kantorovitch FormulationLinear programming (Kantorovitch):W (, )p = P , C P argmin P, C =Ci,j Pi,j P, pi,jqOptimal coupling P : p q 25. Kantorovitch FormulationLinear programming (Kantorovitch):W (, )p = P , C P argmin P, C =Ci,j Pi,j P , pi,j Pextremal points:qP, P C =cstOptimal coupling P : p q 26. Kantorovitch FormulationLinear programming (Kantorovitch):W (, )p = P , C P argmin P, C =Ci,j Pi,j P , pi,j Pextremal points:qP, P C =cstOptimal coupling P : Theorem: If pi = qi = 1/N ,N, P = P p q 27. Our experiments also show that xed-point precision further s Optimization Codes up the computation. We observed that the value of the nal port cost is less accurate because of the limited precision, b the particle pairing that produces the actual interpolation sDiscrete optimal transport: unchanged. We used the xed point method to ge remains the results presented in this paper. The results of the perfor Pargmin P, Cstudy are also of broader interest, as current EMD image re=Ci,j Pi,j P or color transfer techniques rely on slower solvers [Rubner, i,j 2000; Kanters et al. 2003; Morovic and Sun 2003].2Linear program: 10Network-Simplex-(fixed-point) Network-Simplex-(double-prec.) Transport-Simplex y-=--x2 Time-in-secondsInterior points: slow. y-=--x30 10Network simplex. 10Transportation simplex. 10 [Bonneel et al. 2011] 123 410101010Problem-size-:-number-of-bins-per-histogramFigure 6: Log-log plot of the running times of different soThe network simplex behaves as a O(n2 ) algorithm in prBlock search pivotingstrategythe transportand ONeill 1991]whereas [Kelly simplex runs in O(n3 ). 28. pplication to Color TransferColor Histogram Equalization1Input color images: fi RN 3 . projectioniof= to style Sliced Wasserstein XN x fi (x) image color statistics Y Optimal transport framework Sliced Wasserstein projection ApplicationsApplication to Color TransferSource image (X ) f1 f0 Sliced Wasserstein projec image color statistics Y f0 Source image after color transfer1 image (Y ) StyleSource image (X )0 J. Rabin Wasserstein Regularization 29. pplication to Color TransferColor Histogram Equalization 1 Input color images: fi RN 3 . projectioniof= to styleSliced Wasserstein X N x fi (x)image color statistics Y Optimal assignement:min ||f0 f1 ||N Optimal transport framework Sliced Wasserstein projection ApplicationsApplication to Color TransferSource image (X ) f1 f0 Sliced Wasserstein projec image color statistics Y f0 Source image after color transfer1 image (Y ) StyleSource image (X )0 J. Rabin Wasserstein Regularization 30. pplication to Color TransferColor Histogram Equalization 1 Input color images: fi RN 3 . projectioniof= to styleSliced Wasserstein X N xfi (x)image color statistics Y Optimal assignement:min ||f0 f1 || N Transport: T : f0 (x)R3f1 ( (i)) R3Optimal transport framework Sliced Wasserstein projection Applications Application to Color TransferSource image (X ) f1f0 Sliced Wasserstein projecimage color statistics Y f0Source image after color transfer1 image (Y ) Style Source image (X ) 0TJ. Rabin Wasserstein Regularization 31. pplication to Color TransferColor Histogram Equalization 1 Input color images: fi RN 3 . projectioniof= to styleSliced Wasserstein X N xfi (x)image color statistics Y Optimal assignement:min ||f0 f1 ||NOptimal transport framework Sliced Wasserstein projection Applications Transport:T : f0 (x) R3Application to Color Transfer R3 f1 ( (i)) Optimal transport framework Sliced Wasserstein projection ApplicationsApplication to Color Transfer Equalization: ) f0 = T (f0 ) f0 = f1 Sliced Wasserstein projection of X to sty Source image (X image color statistics Y f1 f0T (f )0Sliced Wasserstein projecimage color statistics Y Source image (X ) T f0 Source image after color transfer 1 image (Y )Style Source image (X )0 Source image after color transfer 1 Style image (Y ) TJ. Rabin Wasserstein Regularization J. Rabin Wasserstein Regularization 32. Overview Discrete Optimal Transport Continuous Optimal Transport Displacement Interpolation 33. ork, Measure Preserving Maps ica-d ofDistributions 0 , 1 on Rk . ase.eedsans- thateme rateanceeval t al. 0 1 34. ork, Measure Preserving Maps ica-d ofDistributions 0 , 1 on Rk . ase.eedsMass preserving map T : RkRk .ans- that 1 = T 0 where (T 0 )(A) = 0 (T (A)) 1eme rateancexT (x)eval t al. 0 1 35. ork, Measure Preserving Maps ica-d ofDistributions 0 , 1 on Rk . ase.eedsMass preserving map T : RkRk .ans- that 1 = T 0 where (T 0 )(A) = 0 (T(A))1eme rateancexT (x)eval t al. 0 1Smooth distributions: i = i (x)dxT 0 = 1 1 (T (x))|det T (x)| =0 (x) 36. Optimal TransportLp optimal transport:W2 (0 , 1 )p = min ||T (x) x||p 0 (dx)T 0 =1 37. Optimal TransportLp optimal transport: W2 (0 , 1 )p = min ||T (x) x||p 0 (dx) T 0 =1Regularity condition: 0 or 1 does not give mass to small sets.Theorem (p > 1): there exists a unique optimal T . T T1 0 38. Optimal TransportLp optimal transport: W2 (0 , 1 )p = min ||T (x)x||p 0 (dx) T 0 =1Regularity condition: 0 or 1 does not give mass to small sets.Theorem (p > 1): there exists a unique optimal T .Theorem (p = 2): T is dened as T = with convex. T TT (x)T (x )T is monotone:1 x T (x) T (x ), x x 0 0x 39. From Continuous to DiscreteVector X RN d X = XiGrayscale: 1-D(image, coe cients, . . . )iColors: 3-DBG R 40. From Continuous to DiscreteVector X RN d X =Xi Grayscale: 1-D(image, coe cients, . . . ) i Colors: 3-DY = T XBG N, T : XiY (i) Y R XY (i)Xi 41. From Continuous to DiscreteVector X R Nd X = XiGrayscale: 1-D(image, coe cients, . . . )iColors: 3-DY = T X BG N,T : XiY(i)YR XY (i)XiReplace optimization on T by optimization on N. ||T (x) x||p X (dx) = ||XiY (i) ||pi 42. Continuous Wasserstein Distance Couplings:, x A Rd , (A Rd ) = (A) y B Rd , (Rd B) = (B) 43. Continuous Wasserstein DistanceCouplings:,x A Rd , (A Rd ) = (A) y B Rd , (Rd B) = (B)Transportation cost:Wp (, )p = min c(x, y)d(x, y) , Rd Rd 44. Continuous Wasserstein DistanceCouplings:,x A Rd , (A Rd ) = (A) y B Rd , (Rd B) = (B)Transportation cost:Wp (, )p = min c(x, y)d(x, y) , Rd Rd 45. Continuous Optimal TransportLet p > 1 and does not vanish on small sets. Unique , s.t. Wp (, )p = c(x, y)d(x, y)Rd RdOptimal transport T : Rd Rd :xy (x, T (x)) 46. Continuous Optimal TransportLet p > 1 and does not vanish on small sets. Unique, s.t. Wp (, )p = c(x, y)d(x, y) Rd RdOptimal transport T : RdRd : xp = 2: T =unique solution of y is convex l.s.c. (x, T (x)) ( ) = 47. 1-D Continuous WassersteinDistributions ,on R. tCumulative functions: C (t) = d(x)For all p > 1: T =C 1 CT is non-decreasing (change of contrast) 48. 1-D Continuous WassersteinDistributions , on R.tCumulative functions:C (t) = d(x)For all p > 1: T =C1CT is non-decreasing (change of contrast)Explicit formulas: 1 H Wp (, )p = |C 1 C 1 p | 0W1 (, ) = |C C | = ||(CC ) H||1 R 49. Continuous Histogram Transferf1Input images: fi : [0, 1]2[0, 1], i = 0, 1.f0 50. Continuous Histogram Transfer f1Input images: fi : [0, 1]2 [0, 1], i = 0, 1.Gray-value distributions: i dened on [0, 1].i ([a, b]) =1{a f b} (x)dx[0,1]21f00 51. Continuous Histogram Transfer f1Input images: fi : [0, 1]2 [0, 1], i = 0, 1.Gray-value distributions: i dened on [0, 1].i ([a, b]) =1{a f b} (x)dx[0,1]2Optimal transport: T = C11 C0 .1f0 C0 (f0 )T (f0 )C0 C110 1 52. Discrete Histogram TransferDiscretized grayscale images f0 , f1 RN .f1f0 53. Discrete Histogram TransferDiscretized grayscale images f0 , f1 RN .Discrete distributions i = fi = N1 k fi (k) .f1f01 010000 5000 80004000 60003000 40002000 2000100000 54. Discrete Histogram TransferDiscretized grayscale images f0 , f1RN .Discrete distributions i = fi = N 1k fi (k) .Sorting the values : i N s.t. fi ( i (k))fi ( i (k + 1)).Optimal transport: T : f0 (0 (k))f1 ( 1 (k))f1f01 010000 5000 80004000 60003000 40002000 2000100000 55. Discrete Histogram TransferDiscretized grayscale images f0 , f1RN .Discrete distributions i = fi = N 1k fi (k) .Sorting the values : i N s.t. fi ( i (k)) fi ( i (k + 1)).Optimal transport: T : f0 (0 (k))f1 ( 1 (k))f1f0T (f0 ) T10000 8000 60001 5000 4000010000 8000 60001 3000 4000 40002000 2000 20001000000050 100 150 200250 56. Discrete Histogram TransferDiscretized grayscale images f0 , f1RN .Discrete distributions i = fi = N 1k fi (k) .Sorting the values : i N s.t. fi ( i (k)) fi ( i (k + 1)).Optimal transport: T : f0 (0 (k))f1 ( 1 (k))Matlab code: [a,I] = sort(f0(:)); f0(I) = sort(f1(:));f1f0T (f0 ) T10000 8000 60001 5000 4000010000 8000 60001 3000 4000 40002000 2000 20001000000050 100 150 200250 57. Gaussian Optimal TransportInput distributions (0 , 1 ) with i = N (mi , i ).Ellipses: Ei = x Rd(mix) i1(mi x)cE0 E1 58. Gaussian Optimal TransportInput distributions (0 , 1 ) with i = N (mi , i ).Ellipses: Ei = x Rd(mi x)i1(mi x) cE0TE1Theorem: If ker(0) Im( 1) = {0}, T (x) = Sx + m1m0where 1/2 1/21/2 1/2 1/2S= 1 + 0,1 10,1=( 1 0 1 ) W2 (0 , 1 )2 = tr (0+120,1 ) + ||m0m1 ||2 , 59. PDE FormulationsSmooth distributions: i = i (x)dxT 0 = 11 (T (x))|det T (x)| = 0 (x) 60. PDE FormulationsSmooth distributions: i = i (x)dxT 0 = 11 (T (x))|detT (x)| = 0 (x)L2 optimal transport map T =: 1( (x))det(H) = 0 (x) (Monge-Amp`re) e 61. PDE FormulationsSmooth distributions:i =i (x)dxT 0 = 11 (T (x))|det T (x)| = 0 (x)L2 optimal transport map T =:1( (x))det(H) =0 (x)(Monge-Amp`re)eFluid dynamic formulation:nd (x, t) 0, m(x, t) Rd 1 ||m||2 t +m=0W (0 , 1 )2 = mins.t.,mRd 0 (0, ) = 0, (1, ) = 1Finite element discretization [Benamou-Brenier] 62. PDE FormulationsSmooth distributions:i =i (x)dxT 0 = 11 (T (x))|det T (x)| = 0 (x)L2 optimal transport map T =:1( (x))det(H) =0 (x)(Monge-Amp`re)eFluid dynamic formulation:nd (x, t) 0, m(x, t) Rd 1 ||m||2 t +m=0W (0 , 1 )2 = mins.t.,mRd 0 (0, ) = 0, (1, ) = 1Finite element discretization [Benamou-Brenier] Related works of [Tannenbaum et al.]. 63. cdv l0 v dv detrv dv l1 0:can be thought as an elliptic system thought as an The sys-system of equations. The sys- v cc> tem cv c> can be of equations. elliptic cthe GPU. We usedcubic grid. Relaxation was performed using ainterpolation four-a trilinear interpolationused a trilinear parallelizable operator for transferringthe GPU. We operator for transferring Image Registration s solved using preconditionedsolved that a correction for dv can be obtainedcoarse gridwith anIttem is to verify using preconditioned conjugateby solving is easy conjugate 1gradient with angradient correction to nerelaxation scheme. This restriction color Gauss-Seidel grids. The residual to ne grids. The residual restrictionthe coarse grid correction increases robustnessthethe system dv % c> cv c> preconditioner. Wright, 1999) The sys-cv (Nocedal andincomplete Choleskymplete Cholesky preconditioner. v voperator for projecting residual fromfor projecting residual from the on and efciency and is especially suited for the implementation ne to coarse grids isoperator the ne to coarse grids is tem cv cc can be thought as an elliptic system of equations. The sys- >the GPU. We used a trilinear interpolation operator for transferring tem is solved using preconditioned conjugate gradient with anthe coarse grid correction to ne grids. The residual restriction incomplete Cholesky preconditioner.operator for projecting residual from the ne to coarse grids isT[ur Rehman et al, 2009] Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the anterior part of the brain. 64. Overview Discrete Optimal Transport Continuous Optimal Transport Displacement Interpolation 65. Wasserstein GeodesicsGeodesics between (0 , 1 ): t [0, 1] tVariational caracterization:(W2 is a geodesic distance) t = argmin (1t)W2 (0 , )2 + tW2 (1 , )2 66. Wasserstein GeodesicsGeodesics between (0 , 1 ): t [0, 1] tVariational caracterization: (W2 is a geodesic distance)t = argmin (1 t)W2 (0 , )2 + tW2 (1 , )21 0 Optimal transport caracterization:t = ((1 t)Id + tT ) 0 67. Wasserstein GeodesicsGeodesics between (0 , 1 ): t[0, 1]tVariational caracterization: (W2 is a geodesic distance)t = argmin (1 t)W2 (0 , )2 + tW2 (1 , )21 0 Optimal transport caracterization: t = ((1t)Id + tT ) 00Gaussian case: t = Tt 0 = N (mt , t) mt = (1 t)m0 + tm1 t = [(1 t)Id + tT ] 0 [(1 t)Id + tT ]the set of Gaussians is geodesically convex. 1 68. Texture Model InterpolationExemplar f0 69. Texture Model InterpolationProbabilityanalysisdistribution = N (m, )Exemplar f0 70. Texture Model InterpolationProbabilityanalysis synthesisdistribution = N (m, )Exemplar f0Outputs f 71. Texture Model Interpolation Probability analysis synthesis distribution = N (m, )Exemplar f0 Outputs f f [0] t f [1]0 1 72. Conclusion Source image (X )Statistical modeling of images. Source imStyle image (Y )J. Rabin Wasserstein Re 73. Conclusion Source image (X )Statistical modeling of images.Applications of OT:14 Anonymous Metric between descriptors.Source imUse the distance W2 (0 , 1 ) Style image (Y )J. RabinWasserstein ReP ( )P ( ) P ( c ) P ( )in P ( ) out P () c 74. Conclusion Source image (X )Statistical modeling of images.Applications of OT:14 Anonymous Metric between descriptors.Source imUse the distance W2 (0 , 1 ) Style image (Y )J. RabinWasserstein Re Projection on statistical constraints. Use the transport TP ( )P ( ) P ( c ) P ( )in P ( ) out P () c 75. Conclusion Source image (X )Statistical modeling of images.Applications of OT:14 Anonymous Metric between descriptors.Source imUse the distance W2 (0 , 1 ) Style image (Y )J. RabinWasserstein Re Projection on statistical constraints. Use the transport T Mixing statistical models.P ( )P ( ) P ( c ) P ( )in P ( ) out P () c

An Introduction to Optimal Transport

Documents

Transcript of An Introduction to Optimal Transport