Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively...
Transcript of Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively...
![Page 1: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/1.jpg)
Massively scalable Sinkhorn distances via the Nyström method
J. Altschuler, F. Bach, J. Niles-Weed, A. Rudi
![Page 2: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/2.jpg)
Optimal Transport
Increasingly popular tool in machine learning
Kusner et al. (2015)
Day 18
Day 0low probability high probabilitytrajectory trajectory
Neural
Day 0MEF
Stromal
IPSC
Trophoblast
Schiebinger et al. (2018)
Feydy et al. (2017)
Gulrajani et al. (2017)
![Page 3: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/3.jpg)
Definitions
p, q ∈ Δn
pq
xixj
two probability distributions on points in .
nℝd
x1, …, xn ∈ ℝd
∥xi∥ ≤ R ∀i
Wasserstein distance between and p qW(p, q) := minP∈ℳ(p,q) ∑
i,j
Pij∥xi − xj∥2
set of couplings between and p qℳ(p, q) := P :P ∈ ℝn×n
+P1 = pP⊤1 = q
Slow to compute
![Page 4: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/4.jpg)
Sinkhorn distance
Wasserstein distance between and p qW(p, q) := minP∈ℳ(p,q) ∑
i,j
Pij∥xi − xj∥2
[Cuturi ’13] : Sinkhorn “distance”Wη(p, q) := minP∈ℳ(p,q) ∑
i,j
Pij∥xi − xj∥2 − η−1H(P)
H(P ) =X
ij
Pij log1
Pij
Slow to compute — timeO(n3)
Faster to compute — timeO(n2)
Can we do better?
![Page 5: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/5.jpg)
Our contribution
New algorithm (NYS-SINK) approximates Sinkhorn distance in time and spaceO(n)
Guarantees automatically adaptive to low-dimensional structure
Outperforms standard approaches by orders of magnitude
![Page 6: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/6.jpg)
Minimizer: unique matrix in of formℳp,q
Why Sinkhorn distance?
· ·
positive diagonal matrices
entrywise exponential
∈ ℳp,qe−η∥xi−xj∥2
![Page 7: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/7.jpg)
Sinkhorn scalingEasy iterative algorithm to find minimizer
1. Rescale rows to match 2. Rescale columns to match 3. Repeat
pq
Converges! [Sinkhorn ’67]
Goal: find (unique) Pη = D1e−η∥xi−xj∥2D2 ∈ ℳp,q
… in time! [Altschuler, Weed, Rigollet ’17]O(n2ϵ−2)
![Page 8: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/8.jpg)
Prior analysis [AWR ’17]
O(1)iterations
x O(n2)per iteration
(matrix-vector product)
= O(n2)time
· ·e−η∥xi−xj∥2
![Page 9: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/9.jpg)
Idea #1
O(1)iterations
x O(n)per iteration
(matrix-vector product)
= O(n)time
· · low rank approximation
(via Nyström method)
![Page 10: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/10.jpg)
Idea #1
O(1)iterations
x O(n)per iteration
(matrix-vector product)
= O(n)time
· ·
Rigorous analysis needs new stability guarantees for Sinkhorn scaling
low rank approximation
(via Nyström method)
![Page 11: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/11.jpg)
Idea #1
· · low rank approximation
(via Nyström method)
Works well, but error guarantee depends on ambient dimension (could be large)
![Page 12: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/12.jpg)
Idea #2
· · low rank approximation
(via Nyström method)
Works well, but error guarantee depends on ambient dimension (could be large)
Choose rank adaptively and pay only for intrinsic dimension
![Page 13: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/13.jpg)
Idea #2
· · low rank approximation
(via Nyström method)
Works well, but error guarantee depends on ambient dimension (could be large)
Choose rank adaptively and pay only for intrinsic dimension
Rigorous analysis needs new interpolation guarantees for Gaussian kernel
![Page 14: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/14.jpg)
Experiments
Comparison with existing approaches ( approximation rank, iterations)r : T :
![Page 15: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/15.jpg)
Experiments
comparable results
![Page 16: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/16.jpg)
Experiments
orders-of-magnitude speedup
![Page 17: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/17.jpg)
Experiments
Comparison with Sinkhorn scaling ( desired accuracy)ε :
![Page 18: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/18.jpg)
Experiments
linear vs. quadratic runtime
![Page 19: Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively scalable Sinkhorn distances via the Nyström method J. Altschuler, F. Bach, J. Niles-Weed,](https://reader035.fdocuments.us/reader035/viewer/2022071409/61029b6cd549296b210d31ae/html5/thumbnails/19.jpg)
Experimentsmuch larger problems solvable
in memory