Transfer Learning and Optimal...
Transcript of Transfer Learning and Optimal...
![Page 1: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/1.jpg)
Transfer Learning
and
Optimal Transport
Ievgen Redko
UMR • CNRS • 5516 • SAINT-ETIENNE
Ievgen Redko SciDoLySE 1 / 79
![Page 2: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/2.jpg)
Credits and acknowledgments
Documents used for this talk :
D. Xu, K. Saenko, I. Tsang. Tutorial on Domain Transfer Learning for VisionApplications, CVPR’12.
S. Pan, Q. Yang and W. Fan. Tutorial : Transfer Learning with Applications, IJCAI’13.
S. Ben-David. Towards Theoretical Understanding of Domain Adaptation Learning,workshop LNIID at ECML’09.
F. Sha and B. Kingsbury. Domain Adaptation in Machine Learning and SpeechRecognition, Tutorial - Interspeech 2012.
K. Grauman. Adaptation for Objects and Attributes, workshop VisDA at ICCV’13
J. Blitzer and H. DauméIII. Domain Adaptation, Tutorial - ICML 2010.
Acknowledegments :A. Habrard, Rémi Flamary, Nicolas Courty, Devis Tuia, Tien Nam Li, Marc Sebban
Ievgen Redko SciDoLySE 2 / 79
![Page 3: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/3.jpg)
Outline
Introduction
Optimal transport for domain adaptationProblem formulationRegularization framework for domain adaptationNumerical experiments
Mapping estimation for discrete optimal transportProblem formulationApplication to domain adaptationApplication to seamless copy in images
Optimal transport for target shiftMotivationProposed modelExperimental results
Optimal transport for joint distribution adaptation
Other contributionsDifferentially private OT
Conclusions
Ievgen Redko SciDoLySE 3 / 79
![Page 4: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/4.jpg)
Introduction
Ievgen Redko SciDoLySE 4 / 79
![Page 5: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/5.jpg)
Artificial Intelligence
Ultimate goal : Build systems that can learn by exploring the world.
Ievgen Redko SciDoLySE 5 / 79
![Page 6: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/6.jpg)
Artificial Intelligence
Ultimate goal : Build systems that can learn by exploring the world.
- Unfortunately, not easy (or almost impossible as for now)
Ievgen Redko SciDoLySE 5 / 79
![Page 7: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/7.jpg)
Goals in AI
Intermediate goal : Build systems that can classify and recognize well
Solution : Use Machine learning (ML) methods = near-human performance
Ievgen Redko SciDoLySE 6 / 79
![Page 8: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/8.jpg)
Issues of Traditional MLIssues :
- Near-human performance is achieved using lots of labeled data
- Some tasks do not have that much labeled data (biology, physics etc)
- Some data/tasks evolve with time
- There exist too many tasks !
Ievgen Redko SciDoLySE 7 / 79
![Page 9: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/9.jpg)
Issues of Traditional MLIssues :
- Near-human performance is achieved using lots of labeled data
- Some tasks do not have that much labeled data (biology, physics etc)
- Some data/tasks evolve with time
- There exist too many tasks !
Solution : Transfer learning
+ Use systems build for different but related applications
Ievgen Redko SciDoLySE 7 / 79
![Page 10: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/10.jpg)
Transfer Learning
Definition [Pan, TL-IJCAI’13 tutorial]Ability of a system to recognize and apply knowledge and skills learned in previousdomains/tasks to novel domains/tasks
Ievgen Redko SciDoLySE 8 / 79
![Page 11: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/11.jpg)
Transfer Learning
Definition [Pan, TL-IJCAI’13 tutorial]Ability of a system to recognize and apply knowledge and skills learned in previousdomains/tasks to novel domains/tasks
Example
We have labeled images from a Web image corpus Is there a Person in unlabeled images from a Video corpus ?
Person no Person
?→
Is there a Person ?
Ievgen Redko SciDoLySE 8 / 79
![Page 12: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/12.jpg)
Settings
Supervised learning vs Transfer learning
?
from the same domain
?
from different domains
Domains are modeled as probability distributions over an instance space
Tasks associated to a domain (classification, regression, clustering, ...)
Goal
Improve a target predictive function in the target domainusing knowledge from the source domain
Ievgen Redko SciDoLySE 9 / 79
![Page 13: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/13.jpg)
A Taxonomy of Transfer Learning
“A survey on Transfer Learning” [Pan and Yang, TKDE 2010]
Ievgen Redko SciDoLySE 10 / 79
![Page 14: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/14.jpg)
In this tutorial
We focus on domain adaptation for classification
How can we learn, using labeled data from a sourcedistribution, a low-error classifier for another related
target distribution ?
Ievgen Redko SciDoLySE 11 / 79
![Page 15: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/15.jpg)
In this tutorial
We focus on domain adaptation for classification
How can we learn, using labeled data from a sourcedistribution, a low-error classifier for another related
target distribution ?
Why ?
“Hot topic” - tutorials at ICML 2010, CVPR 2012, Interspeech 2012,workshops at ICCV 2013, NIPS 2013, ECML 2014
Many real-world motivating examples
Ievgen Redko SciDoLySE 11 / 79
![Page 16: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/16.jpg)
A toy problem : Inter-twinning moons
(a) 10 (b) 20
(c) 30
(d) 40 (e) 50
(f) 70
Ievgen Redko SciDoLySE 12 / 79
![Page 17: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/17.jpg)
Intuition and motivation : computer vision
“Can we train classifiers with Flickr photos, as they have already beencollected and annotated, and hope the classifiers still work well on mobilecamera images ?” [Gonq et al., CVPR’12]
“object classifiers optimized on benchmark dataset often exhibit significantdegradation in recognition accuracy when evaluated on another one” [Gonq etal., ICML’13, Torralba et al., CVPR’11, Perronnin et al., CVPR’10]
“Hot topic” - Visual domain adaptation [Tutorial CVPR’12, ICCV’13]
Ievgen Redko SciDoLySE 13 / 79
![Page 18: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/18.jpg)
Intuition and motivation : computer vision
“Can we train classifiers with Flickr photos, as they have already beencollected and annotated, and hope the classifiers still work well on mobilecamera images ?” [Gonq et al., CVPR’12]
“object classifiers optimized on benchmark dataset often exhibit significantdegradation in recognition accuracy when evaluated on another one” [Gonq etal., ICML’13, Torralba et al., CVPR’11, Perronnin et al., CVPR’10]
“Hot topic” - Visual domain adaptation [Tutorial CVPR’12, ICCV’13]
Ievgen Redko SciDoLySE 13 / 79
![Page 19: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/19.jpg)
Intuition and motivation : computer vision
“Can we train classifiers with Flickr photos, as they have already beencollected and annotated, and hope the classifiers still work well on mobilecamera images ?” [Gonq et al., CVPR’12]
“object classifiers optimized on benchmark dataset often exhibit significantdegradation in recognition accuracy when evaluated on another one” [Gonq etal., ICML’13, Torralba et al., CVPR’11, Perronnin et al., CVPR’10]
“Hot topic” - Visual domain adaptation [Tutorial CVPR’12, ICCV’13]
Ievgen Redko SciDoLySE 13 / 79
![Page 20: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/20.jpg)
Intuition and motivation : computer vision
“Can we train classifiers with Flickr photos, as they have already beencollected and annotated, and hope the classifiers still work well on mobilecamera images ?” [Gonq et al., CVPR’12]
“object classifiers optimized on benchmark dataset often exhibit significantdegradation in recognition accuracy when evaluated on another one” [Gonq etal., ICML’13, Torralba et al., CVPR’11, Perronnin et al., CVPR’10]
“Hot topic” - Visual domain adaptation [Tutorial CVPR’12, ICCV’13]
Ievgen Redko SciDoLySE 13 / 79
![Page 21: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/21.jpg)
Problems with data representations
[Xu,Saenko,Tsang, Domain Transfer Tutorial - CVPR’12]
Ievgen Redko SciDoLySE 14 / 79
![Page 22: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/22.jpg)
Hard to predict what will change in the new domain
[Xu,Saenko,Tsang, Domain Transfer Tutorial - CVPR’12]
Ievgen Redko SciDoLySE 15 / 79
![Page 23: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/23.jpg)
Natural Language Processing
Part of Speech TaggingAdapt a tagger learned from medical papers to a journal
Text are represented by “words” (Bag of Words)
Ievgen Redko SciDoLySE 16 / 79
![Page 24: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/24.jpg)
Spam detection
Adapt a classifier from a mailbox of an office worker to that of a hippie musician
Ievgen Redko SciDoLySE 17 / 79
![Page 25: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/25.jpg)
Sentiment analysis
Adapt a classifier predicting the preferences for books to those of DVDs
Ievgen Redko SciDoLySE 18 / 79
![Page 26: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/26.jpg)
Electronics to video games [Pan-IJCAI’13 tutorial]
Electronics Video games
(1) Compact ; easy to operate ; verygood picture quality ; looks sharp !
(2) A very good game ! It is action pa-cked and full of excitement. I am verymuch hooked on this game.
(3) I purchased this unit from CircuitCity and I was very excited about thequality of the picture. It is really nice andsharp.
(4) Very realistic shooting action andgood plots. We played this and werehooked.
(5) It is also quite blurry in very darksettings. I will never_buy HP again.
(6) It is so boring. I am extremelyunhappy and will probably never_buy
UbiSoft again.
Source specific : compact, sharp, blurry.
Target specific : hooked, realistic, boring.
Domain independent : good, excited, nice, never_buy, unhappy.
Ievgen Redko SciDoLySE 19 / 79
![Page 27: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/27.jpg)
Other applications
Speech recognition [Tutorial at Interspeech’12]
Medecine
Biology
Time series
Wifi localization
Ievgen Redko SciDoLySE 20 / 79
![Page 28: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/28.jpg)
Why transfer learning ?
Ievgen Redko SciDoLySE 21 / 79
![Page 29: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/29.jpg)
Why transfer learning ? Smart people talk
Ievgen Redko SciDoLySE 22 / 79
![Page 30: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/30.jpg)
Why transfer learning ? Smart people talk
Ievgen Redko SciDoLySE 23 / 79
![Page 31: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/31.jpg)
A bit of vocabulary
Unsupervised Transfer Learning
= No labels in source and target domains
Unsupervised DA
= Presence of source labels, no target labels
Semi-supervised DA
= Presence of source labels, few target labels and a lot of unlabeled data
Semi-supervised learning
= No distribution shift, few labeled data and a lot of unlabeled data from thesame domain
Ievgen Redko SciDoLySE 24 / 79
![Page 32: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/32.jpg)
A bit of vocabulary
Unsupervised Transfer Learning
= No labels in source and target domains
Unsupervised DA
= Presence of source labels, no target labels
Semi-supervised DA
= Presence of source labels, few target labels and a lot of unlabeled data
Semi-supervised learning
= No distribution shift, few labeled data and a lot of unlabeled data from thesame domain
Ievgen Redko SciDoLySE 24 / 79
![Page 33: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/33.jpg)
A bit of vocabulary
Unsupervised Transfer Learning
= No labels in source and target domains
Unsupervised DA
= Presence of source labels, no target labels
Semi-supervised DA
= Presence of source labels, few target labels and a lot of unlabeled data
Semi-supervised learning
= No distribution shift, few labeled data and a lot of unlabeled data from thesame domain
Ievgen Redko SciDoLySE 24 / 79
![Page 34: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/34.jpg)
A bit of vocabulary
Unsupervised Transfer Learning
= No labels in source and target domains
Unsupervised DA
= Presence of source labels, no target labels
Semi-supervised DA
= Presence of source labels, few target labels and a lot of unlabeled data
Semi-supervised learning
= No distribution shift, few labeled data and a lot of unlabeled data from thesame domain
Ievgen Redko SciDoLySE 24 / 79
![Page 35: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/35.jpg)
Several key questions
1. How to estimate the distribution shift ?
2. What are the generalization guarantees ?
RPT(h) ≤?RPS
(h)?+?
3. When adaptation is possible ?
Ievgen Redko SciDoLySE 25 / 79
![Page 36: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/36.jpg)
Several key questions
1. How to estimate the distribution shift ?
2. What are the generalization guarantees ?
RPT(h) ≤?RPS
(h)?+?
3. When adaptation is possible ?
Ievgen Redko SciDoLySE 25 / 79
![Page 37: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/37.jpg)
Several key questions
1. How to estimate the distribution shift ?
2. What are the generalization guarantees ?
RPT(h) ≤?RPS
(h)?+?
3. When adaptation is possible ?
Ievgen Redko SciDoLySE 25 / 79
![Page 38: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/38.jpg)
Several key questions
4. How to design new algorithms ?
Ievgen Redko SciDoLySE 26 / 79
![Page 39: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/39.jpg)
3 main classes of algorithms
1. Instance-based methods
= Correct a sample bias by reweighting source labeled data : source instancesclose to target instances are more important
2. Feature-based methods
= Find a common space where source and target are close
3. Adjustment/Iterative methods
= Modify the model by incorporating pseudo-labeled information
Ievgen Redko SciDoLySE 27 / 79
![Page 40: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/40.jpg)
Optimal transport for domainadaptation
Ievgen Redko SciDoLySE 28 / 79
![Page 41: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/41.jpg)
The following slides are courtesy of R. Flamary(https://remi.flamary.com/biblio/presvannes2016.pdf)
Ievgen Redko SciDoLySE 29 / 79
![Page 42: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/42.jpg)
Problem setup
Amazon DLSR
Feature extraction Feature extraction
Source Domain Target Domain
+ Labels
not working !!!!
decision function
no labels !
Problems
Labels only in the source domain, and classification is in the target domain.
Classifier trained on the source data performs badly in the target domain
Ievgen Redko SciDoLySE 30 / 79
![Page 43: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/43.jpg)
Optimal transport for domain adaptation
Assumptions There exist a transport T between the source and target domain.
The transport preserves the conditional distributions :
Ps(y|xs) = Pt(y|T(xs)).
Ievgen Redko SciDoLySE 31 / 79
![Page 44: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/44.jpg)
Optimal transport for domain adaptation
Assumptions There exist a transport T between the source and target domain.
The transport preserves the conditional distributions :
Ps(y|xs) = Pt(y|T(xs)).
3-step strategy1. Estimate optimal transport between distributions.
2. Transport the training samples onto the target distribution.
3. Learn a classifier on the transported training samples.
Dataset
Class 1
Class 2
Samples
Samples
Classifier on
Optimal transport
Samples
Samples
Classification on transported samples
Samples
Samples
Classifier on
Ievgen Redko SciDoLySE 31 / 79
![Page 45: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/45.jpg)
Objective function
Optimization problem
minγ∈P
〈γ,C〉F + λΩs(γ) + ηΩ(γ),
where
Ωs(γ) entropic regularization [Cuturi, 2013].
η ≥ 0 and Ωc(·) is a DA regularization term.
Regularization to avoid overfitting in high dimension and encode additionalinformation.
Ievgen Redko SciDoLySE 32 / 79
![Page 46: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/46.jpg)
Entropic regularization
0 5 10 15
0
5
10
15
Optimal matrix γ
Ωs(γ) =∑
i,j
γ(i, j) log γ(i, j)
Extremely efficient optimization scheme (Sinkhorn Knopp).
Solution is not sparse anymore.
Ievgen Redko SciDoLySE 33 / 79
![Page 47: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/47.jpg)
Class-based regularization [Courty et al., 2016]
0 5 10 15
0
5
10
15
Optimal matrix γ
Ωc(γ) =∑
j
∑
c
||γ(Ic, j)||pq ,
Group components of γ using source labels
Target samples receive masses only from “same class” source samples.
Ievgen Redko SciDoLySE 34 / 79
![Page 48: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/48.jpg)
Laplacian regularization for sample displacement
Sim. graph with S si,j>0 Small λ Large λ
Ωc(γ) =1
N2s
∑
i,j
Ssi,j‖(xs
i − xsi )− (xs
j − xsj)‖2
Proposed in [Ferradans et al., 2013] for color transfer in images.
Similar samples defined in Ss have similar displacements
Similarity graph Ss is Similarity graph using source labels
Ievgen Redko SciDoLySE 35 / 79
![Page 49: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/49.jpg)
Optimization problem
minγ∈P
〈γ,C〉F + λΩs(γ) + ηΩ(γ),
Special cases
η = 0 : Sinkhorn Knopp [Cuturi, 2013].
λ = 0 and Laplacian regularization : Large quadratic program solved withconditionnal gradient [Ferradans et al., 2013].
Non convex group lasso ℓp − ℓ1 : Majoration Minimization with SinkhornKnopp [Courty et al., 2014].
General framework with convex regularization Ω(γ)
Can we use efficient Sinkhorn Knopp scaling to solve the global problem ?
Yes using generalized conditional gradient [Bredies et al., 2009].
Linearization of the second regularization term but not the entropicregularization.
Ievgen Redko SciDoLySE 36 / 79
![Page 50: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/50.jpg)
Barycentric mapping
How to transport the samples using the obtained coupling matrix ?
Ievgen Redko SciDoLySE 37 / 79
![Page 51: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/51.jpg)
Barycentric mapping
How to transport the samples using the obtained coupling matrix ?
Use barycentric mapping
xis = argmin
xt∈XT
∑
ij
γ(i, j)∗c(x, xjt)
If c(x, x′) is the Euclidean distance then
XS ≃ γ∗XT
Ievgen Redko SciDoLySE 37 / 79
![Page 52: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/52.jpg)
Simulated problem with controllable complexity
(g) rotation=10 (h) rotation=30
(i) rotation=50 (j) rotation=70
Two moons problem [Germain et al., 2013]
Two entangled moons with a rotation between domains.
The rotation angle allow a control of the adaptation difficulty.
Ievgen Redko SciDoLySE 38 / 79
![Page 53: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/53.jpg)
Results on the two moons dataset
10
20
30
40
50
70
90
SVM (no adapt.) 0 0.104 0.24 0.312 0.4 0.764 0.828
DASVM 0 0 0.259 0.284 0.334 0.747 0.820
PBDA 0 0.094 0.103 0.225 0.412 0.626 0.687
OT-exact 0 0.028 0.065 0.109 0.206 0.394 0.507
OT-IT 0 0.007 0.054 0.102 0.221 0.398 0.508
OT-GL 0 0 0 0.013 0.196 0.378 0.508
OT-Lap 0 0 0.004 0.062 0.201 0.402 0.524
Average prediction error for adaptation from 10 to 90.
Clear advantage of the optimal transport techniques.
Regularization helps (a lot) up to 40.
Ievgen Redko SciDoLySE 39 / 79
![Page 54: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/54.jpg)
Visual adaptation datasets
Digit recognition, MNIST VS USPS (10 classes, d=256, 2 dom.).
Face recognition, PIE Dataset (68 classes, d=1024, 4 dom.).
Object recognition, Caltech-Office dataset (10 classes, d=800/4096, 4 dom.).
Ievgen Redko SciDoLySE 40 / 79
![Page 55: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/55.jpg)
Comparison on vision datasets
Datasets Digits Faces ObjectsMethods ACC Nb best ACC Nb best ACC Nb best
1NN 48.66 0 26.22 0 28.47 0PCA 42.94 0 34.55 0 37.98 0GFK 52.56 0 26.15 0 39.21 0TSL 47.22 0 36.10 0 42.97 1JDA 57.30 0 56.69 7 44.34 1
OT-exact 49.96 0 50.47 0 36.69 0OT-IT 59.20 0 54.89 0 42.30 0OT-Lap 61.07 0 56.10 3 43.20 0OT-LpLq 64.11 1 55.45 0 46.42 1OT-GL 63.90 1 55.88 2 47.70 9
OT works very well on digits and object recognition
Good but not best on face recognition (-.5% wrt JDA).
Ievgen Redko SciDoLySE 41 / 79
![Page 56: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/56.jpg)
Next step
Limits
Scales at least quadratically with the dataset.
What about domains with different class proportions ? [Tuia et al., 2015]
Out of sample extension ?
Ievgen Redko SciDoLySE 42 / 79
![Page 57: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/57.jpg)
Mapping estimation for discreteoptimal transport
Ievgen Redko SciDoLySE 43 / 79
![Page 58: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/58.jpg)
Mapping estimation for discrete optimal transport
Why estimate the mapping ?
Out of sample problem.
Solving optimization problem every time the dataset changes
Transporting a very large number of samples.
Interpretability (depending on the mapping model).
Ievgen Redko SciDoLySE 44 / 79
![Page 59: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/59.jpg)
Mapping estimation for discrete optimal transport
Why estimate the mapping ?
Out of sample problem.
Solving optimization problem every time the dataset changes
Transporting a very large number of samples.
Interpretability (depending on the mapping model).
Ievgen Redko SciDoLySE 44 / 79
![Page 60: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/60.jpg)
Mapping estimation for discrete optimal transport
Why estimate the mapping ?
Out of sample problem.
Solving optimization problem every time the dataset changes
Transporting a very large number of samples.
Interpretability (depending on the mapping model).
Ievgen Redko SciDoLySE 44 / 79
![Page 61: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/61.jpg)
Mapping estimation for discrete optimal transport
Why estimate the mapping ?
Out of sample problem.
Solving optimization problem every time the dataset changes
Transporting a very large number of samples.
Interpretability (depending on the mapping model).
Ievgen Redko SciDoLySE 44 / 79
![Page 62: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/62.jpg)
Mapping estimation for discrete optimal transport
Why estimate the mapping ?
Out of sample problem.
Solving optimization problem every time the dataset changes
Transporting a very large number of samples.
Interpretability (depending on the mapping model).
How to estimate the mapping ?
Go back to Monge formulation ? No !
Fit the barycentric mapping but also introduce smoothness.
Ievgen Redko SciDoLySE 44 / 79
![Page 63: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/63.jpg)
Mapping estimation for discrete optimal transport
Why estimate the mapping ?
Out of sample problem.
Solving optimization problem every time the dataset changes
Transporting a very large number of samples.
Interpretability (depending on the mapping model).
How to estimate the mapping ?
Go back to Monge formulation ? No !
Fit the barycentric mapping but also introduce smoothness.
Ievgen Redko SciDoLySE 44 / 79
![Page 64: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/64.jpg)
Mapping estimation
Problem formulation [Perrot et al., 2016]
argminT∈H,γ∈P
f(γ, T ) = λγ 〈γ,C〉F
︸ ︷︷ ︸
OT loss
+ ‖T (Xs)− nsγXt‖2F︸ ︷︷ ︸
Mapping data fitting
+ λTR(T )︸ ︷︷ ︸
Mapping reg.
where
Xs and Xt are the source and target datasets,
T (·) is applied for each elements of the above matrices,
nsγXt is the barycentric mapping for source samples with uniform weights,
H is the space of transformations (more details later),
R(·) is a regularization term controlling the complexity of T .
Ievgen Redko SciDoLySE 45 / 79
![Page 65: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/65.jpg)
Mapping family H
Linear transformations
H =
T : ∀x ∈ Ω, T (x) = xTL
.
L is a d× d real matrix.
R(T ) = ‖L− I‖2F
where I is the identity matrix.
Update is a classical linear least square regression.
Ievgen Redko SciDoLySE 46 / 79
![Page 66: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/66.jpg)
Mapping family H
Linear transformations
H =
T : ∀x ∈ Ω, T (x) = xTL
.
L is a d× d real matrix.
R(T ) = ‖L− I‖2F
where I is the identity matrix.
Update is a classical linear least square regression.
Nonlinear transformations
H =
T : ∀x ∈ Ω, T (x) = kXs(xT )L
kXs(xT ) =
(k(x,xs
1) k(x,xs2) · · · k(x,xs
ns)).
k(·, ·) is a positive definite kernel.
L is a ns × d real matrix.
Update is a classical kernel least square regression.
Ievgen Redko SciDoLySE 46 / 79
![Page 67: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/67.jpg)
Illustrative exampleLin
ear m
appi
ng
2D Dataset
Nonl
inea
r map
ping
Source samplesTarget samples
Barycentric displacement T displacement Out of sample T
Clearly a non-linear mapping.
The mapping model controls the barycentric mapping.
Ievgen Redko SciDoLySE 47 / 79
![Page 68: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/68.jpg)
Domain adaptation : Caltech-Office dataset
Task 1NN GFK SA OT L1L2 OTEOTLin OTLinB OTKer OTKerBT γ T γ T γ T γ
D → W 89.5 93.3 95.6 77.0 95.7 95.7 97.3 97.3 97.3 97.3 98.4 98.5 98.5 98.5
D → A 62.5 77.2 88.5 70.8 74.9 74.8 85.7 85.7 85.8 85.8 89.9 89.9 89.5 89.5D → C 51.8 69.7 79.0 68.1 67.8 68.0 77.2 77.2 77.4 77.4 69.1 69.2 69.3 69.3W → D 99.2 99.8 99.6 74.1 94.4 94.4 99.4 99.4 99.8 99.8 97.2 97.2 96.9 96.9W → A 62.5 72.4 79.2 67.6 71.3 71.3 81.5 81.5 81.4 81.4 78.5 78.3 78.5 78.8W → C 59.5 63.7 55.0 63.1 67.8 67.8 75.9 75.9 75.4 75.4 72.7 72.7 65.1 63.3A → D 65.2 75.9 83.8 64.6 70.1 70.5 80.6 80.6 80.4 80.5 65.6 65.5 71.9 71.5A → W 56.8 68.0 74.6 66.8 67.2 67.3 74.6 74.6 74.4 74.4 66.4 64.8 70.0 68.9A → C 70.1 75.7 79.2 70.4 74.1 74.3 81.8 81.8 81.6 81.6 84.4 84.4 84.5 84.5
C → D 75.9 79.5 85.0 66.0 69.8 70.2 87.1 87.1 87.2 87.2 70.1 70.0 78.6 78.6C → W 65.2 70.7 74.4 59.2 63.8 63.8 78.3 78.3 78.5 78.5 80.0 80.4 73.5 73.4C → A 85.8 87.1 89.3 75.2 76.6 76.7 89.9 89.9 89.7 89.7 82.4 82.2 83.6 83.5Mean 70.3 77.8 81.9 68.6 74.5 74.6 84.1 84.1 84.1 84.1 79.6 79.4 80.0 79.7
XXX Clear advantage of the mapping estimation methods.
Ievgen Redko SciDoLySE 48 / 79
![Page 69: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/69.jpg)
Seamless copy with gradient adaptation
Poisson image editing with gradient adaptation
Adapt the gradients from the source to the target domain :
∆f = div Ts→t(v) over Ω, with f |∂Ω = ft|∂Ω. (1)
Ts→t : R6 → R6 is the mapping between gradients of the source and target
images in the domain.
Ievgen Redko SciDoLySE 49 / 79
![Page 70: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/70.jpg)
Seamless copy with gradient adaptation
Poisson image editing with gradient adaptation
Adapt the gradients from the source to the target domain :
∆f = div Ts→t(v) over Ω, with f |∂Ω = ft|∂Ω. (1)
Ts→t : R6 → R6 is the mapping between gradients of the source and target
images in the domain.
Ievgen Redko SciDoLySE 49 / 79
![Page 71: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/71.jpg)
Seamless copy with gradient adaptation
Poisson image editing with gradient adaptation
Adapt the gradients from the source to the target domain :
∆f = div Ts→t(v) over Ω, with f |∂Ω = ft|∂Ω. (1)
Ts→t : R6 → R6 is the mapping between gradients of the source and target
images in the domain.
Ievgen Redko SciDoLySE 49 / 79
![Page 72: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/72.jpg)
Seamless copy with gradient adaptation
Poisson image editing with gradient adaptation
Adapt the gradients from the source to the target domain :
∆f = div Ts→t(v) over Ω, with f |∂Ω = ft|∂Ω. (1)
Ts→t : R6 → R6 is the mapping between gradients of the source and target
images in the domain.
Ievgen Redko SciDoLySE 49 / 79
![Page 73: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/73.jpg)
Seamless copy with gradient adaptation
Poisson image editing with gradient adaptation
Adapt the gradients from the source to the target domain :
∆f = div Ts→t(v) over Ω, with f |∂Ω = ft|∂Ω. (1)
Ts→t : R6 → R6 is the mapping between gradients of the source and target
images in the domain.
Ievgen Redko SciDoLySE 49 / 79
![Page 74: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/74.jpg)
Optimal transport for target shift
Ievgen Redko SciDoLySE 50 / 79
![Page 75: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/75.jpg)
Different reasons to adapt
Covariate shift
adapt when only marginal distributions of inputs change
−4 −2 0 2 40.0
0.2
0.4
0.6
0.8
1.0
1.2Source domain
−4 −3 −2 −1 0 1 2 3 4
Target domain
Ievgen Redko SciDoLySE 51 / 79
![Page 76: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/76.jpg)
Different reasons to adapt
Covariate shift
adapt when only marginal distributions of inputs change
+ most popular scenario– conditional distributions may differ in practice
−4 −2 0 2 40.0
0.2
0.4
0.6
0.8
1.0
1.2Source domain
−4 −3 −2 −1 0 1 2 3 4
Target domain
Ievgen Redko SciDoLySE 51 / 79
![Page 77: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/77.jpg)
Covariate shift and optimal transport (Courty et al., 2014)
Ievgen Redko SciDoLySE 52 / 79
![Page 78: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/78.jpg)
Different reasons to adapt
Target shift
adapt when only marginal distributions of outputs change
−4 −2 0 2 40.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4Source domain
−4 −3 −2 −1 0 1 2 3 4
Target domain
Ievgen Redko SciDoLySE 53 / 79
![Page 79: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/79.jpg)
Different reasons to adapt
Target shift
adapt when only marginal distributions of outputs change
+ occurs in many real-world applications (imbalanced data)+ few contributions in literature– assumes the same distribution of inputs
−4 −2 0 2 40.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4Source domain
−4 −3 −2 −1 0 1 2 3 4
Target domain
Ievgen Redko SciDoLySE 53 / 79
![Page 80: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/80.jpg)
Different reasons to adapt
Sample-selection bias, source component shift, domain shift etc.
rarely studied (and very difficult) cases
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.50.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7Source domain
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Target domain
Ievgen Redko SciDoLySE 54 / 79
![Page 81: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/81.jpg)
Wish list
General multi-source setting (more than 2 source domains)
Solving covariate shift between each pair source-target
Tackling target shift
Ievgen Redko SciDoLySE 55 / 79
![Page 82: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/82.jpg)
First insight
How to tackle target shift ? Take 1
Assume that target distribution and source distributions are defined as
PT =
C∑
i=1
πTi Pi P
πS =
∑
i
πiPi,
with Pi being a distribution of class i ∈ 1, . . . , C. We want to solve Problem 1
π⋆ = argmin
π∈∆C
W (PπS , PT )
Ievgen Redko SciDoLySE 56 / 79
![Page 83: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/83.jpg)
First insight
How to tackle target shift ? Take 1
Assume that target distribution and source distributions are defined as
PT =
C∑
i=1
πTi Pi P
πS =
∑
i
πiPi,
with Pi being a distribution of class i ∈ 1, . . . , C. We want to solve Problem 1
π⋆ = argmin
π∈∆C
W (PπS , PT )
Ievgen Redko SciDoLySE 56 / 79
![Page 84: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/84.jpg)
First insight
How to tackle target shift ? Take 1
We can prove the following
Assume that ∀i, ∄α ∈ ∆C |αi = 0, Pi =∑
j αjPj.Then, for any distribution PT , the unique solution π∗ toProblem 1 is given by π
T.
Ievgen Redko SciDoLySE 57 / 79
![Page 85: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/85.jpg)
First insight
How to tackle target shift ? Take 1
We can prove the following
Assume that ∀i, ∄α ∈ ∆C |αi = 0, Pi =∑
j αjPj.Then, for any distribution PT , the unique solution π∗ toProblem 1 is given by π
T.
Wasserstein distance is a good candidate for proportion estimation !
Ievgen Redko SciDoLySE 57 / 79
![Page 86: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/86.jpg)
Second insight
How to tackle target shift ? Take 2
Consider a multi-source setting with a weighted source distribution
We can show that in case of binary classification
error intarget domain
≤ overallsource error+
distance betweenclasses
∗ distance between sourceand target class proportions
More formally, for class distributions P0 and P1
distance betweenclasses
∗ distance between sourceand target class proportions=dist(P0, P1) ∗ |πT −
N∑
j=1
αjπjS |
Ievgen Redko SciDoLySE 58 / 79
![Page 87: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/87.jpg)
Second insight
How to tackle target shift ? Take 2
Consider a multi-source setting with a weighted source distribution
We can show that in case of binary classification
error intarget domain
≤ overallsource error+
distance betweenclasses
∗ distance between sourceand target class proportions
More formally, for class distributions P0 and P1
distance betweenclasses
∗ distance between sourceand target class proportions=dist(P0, P1) ∗ |πT −
N∑
j=1
αjπjS |
Ievgen Redko SciDoLySE 58 / 79
![Page 88: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/88.jpg)
Second insight
How to tackle target shift ? Take 2
Consider a multi-source setting with a weighted source distribution
We can show that in case of binary classification
error intarget domain
≤ overallsource error+
distance betweenclasses
∗ distance between sourceand target class proportions
More formally, for class distributions P0 and P1
distance betweenclasses
∗ distance between sourceand target class proportions=dist(P0, P1) ∗ |πT −
N∑
j=1
αjπjS |
Ievgen Redko SciDoLySE 58 / 79
![Page 89: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/89.jpg)
Second insight
How to tackle target shift ? Take 2
Consider a multi-source setting with a weighted source distribution
We can show that in case of binary classification
error intarget domain
≤ overallsource error+
distance betweenclasses
∗ distance between sourceand target class proportions
More formally, for class distributions P0 and P1
distance betweenclasses
∗ distance between sourceand target class proportions=dist(P0, P1) ∗ |πT −
N∑
j=1
αjπjS |
Reweight source samples to match target proportions !
Ievgen Redko SciDoLySE 58 / 79
![Page 90: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/90.jpg)
Some notations
Data matrices X(k) ∼ µ(k) from K source domains
Data matrix X ∼ µ from target domain
Source proportions of classes h(k)c =
∫µ(k)c , µ(k) =
∑C
c=1 µ(k)c
Linear operators D(k)1 and D
(k)2 that transform m(k) to h
(k)c and back
Ievgen Redko SciDoLySE 59 / 79
![Page 91: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/91.jpg)
Objective function
Putting it all together
h⋆ = argmin
h∈∆C ,Γ
K∑
k=1
λk KL(γ(k)|ζ(k))
s.t. ∀k D(k)1 γ
(k)1n = h.
A constrained Wasserstein barycenter problem
h⋆ reweights source instances to match source and target distributions
Efficient optimization with Bregman projections
Ievgen Redko SciDoLySE 60 / 79
![Page 92: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/92.jpg)
Classification in the target domain
How would we obtain target labels ?
Before : Barycentric mapping
+ Accurate alignment of samples– Computationally costly : aligning + learning a classifier
Alternative : Label propagation
L =
K∑
k=1
λkD(k)1 γ
(k)
+ Directly obtaining target labels+ Majority vote by source domains = can be seen as boosting
Ievgen Redko SciDoLySE 61 / 79
![Page 93: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/93.jpg)
Illustration
Covariate shift DA mixes instances from different classes !
Ievgen Redko SciDoLySE 62 / 79
![Page 94: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/94.jpg)
Illustration
Proposed method handles target shift efficiently !
Ievgen Redko SciDoLySE 63 / 79
![Page 95: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/95.jpg)
Real-world data
Zurich Summer’ data set composed of 20 satellite images
4 classes : Roads, Buildings, Trees and Grass
17 source and 1 target domain
Average class proportions [0.25± 0.07, 0.4± 0.13, 0.22± 0.11, 0.13± 0.11]
Ievgen Redko SciDoLySE 64 / 79
![Page 96: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/96.jpg)
Examples of images
Input satellite images
Ievgen Redko SciDoLySE 65 / 79
![Page 97: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/97.jpg)
Examples of images
Input satellite images
Satellite images with 4 classes
Ievgen Redko SciDoLySE 65 / 79
![Page 98: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/98.jpg)
Classification results
# of sourcedomains
Average classproportions
# ofsource
instances
Noadaptation
OTDA
PT
OTDA
LP
MDA
Causal
JCPOT
LP
Targetonly
2 [0.17 0.4 0.16 0.27] 2’936 0.61 0.52 0.57 0.65 0.66 0.65
5 [0.22 0.39 0.18 0.21] 6’716 0.62 0.55 0.6 0.66 0.68 0.64
8 [0.25 0.46 0.17 0.12] 16’448 0.63 0.54 0.59 0.67 0.71 0.65
11 [0.26 0.48 0.16 0.1] 21’223 0.63 0.54 0.58 0.67 0.72 0.673
14 [0.26 0.45 0.19 0.1] 27’875 0.63 0.52 0.58 0.67 0.72 0.65
17 [0.25 0.42 0.20 0.13] 32’660 0.63 0.5 0.59 0.67 0.73 0.61
Ievgen Redko SciDoLySE 66 / 79
![Page 99: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/99.jpg)
Optimal transport for jointdistribution adaptation
The following slides are courtesy of R. Flamary (OTML workshop, NIPS’17)http://otml17.marcocuturi.net/wp-content/uploads/2018/01/OTML_NIPS_
2017.pdf
Ievgen Redko SciDoLySE 67 / 79
![Page 100: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/100.jpg)
Joint distribution and classifier estimation
Objectives of JDOT
• Model the transformation of labels (allow change of proportion/value).
• Learn an optimal target predictor with no labels on target samples.
• Approach theoretically justified.
Joint distributions and dataset
• We work with the joint feature/label distributions.
• Let Ω ∈ Rd be a compact input measurable space of dimension d and C the set of
labels.
• Let Ps(X,Y ) ∈ P(Ω× C) and Pt(X,Y ) ∈ P(Ω× C) the source and target joint
distribution.
• We have access to an empirical sampling Ps = 1Ns
∑Ns
i=1 δxsi,ys
iof the source
distribution defined by Xs = xsiNs
i=1 and label information Ys = ysi Ns
i=1.
• but the target domain is defined only by an empirical distribution in the feature
space with samples Xt = xtiNt
i=1.
14 / 29
![Page 101: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/101.jpg)
Joint distribution OT (JDOT)
Proxy joint distribution
• Let f be a Ω → C function from a given class of hypothesis H.
• We define the following joint distribution that use f as a proxy of y
Pft = (x, f(x))x∼µt
(5)
and its empirical counterpart Pt
f= 1
Nt
∑Nt
i=1 δxti,f(xt
i) .
Learning with JDOT
We propose to learn the predictor f that minimize :
minf
W1(Ps, Pt
f) = inf
γ∈∆
∑
ij
D(xsi ,y
si ;x
tj , f(x
tj))γij
(6)
• ∆ is the transport polytope.
• D(xsi ,y
si ;x
tj , f(x
tj)) = α‖xs
i − xtj‖2 + L(ys
i , f(xtj)) with α > 0.
• We search for the predictor f that better align the joint distributions.
15 / 29
![Page 102: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/102.jpg)
Optimization problem
minf∈H,γ∈∆
∑
i,j
γi,j
(αd(xs
i ,xtj) + L(ys
i , f(xtj))
)+ λΩ(f) (7)
Optimization procedure
• Ω(f) is a regularization for the predictor f
• We propose to use block coordinate descent (BCD)/Gauss Seidel.
• Provably converges to a stationary point of the problem.
γ update for a fixed f
• Classical OT problem.
• Solved by network simplex.
• Regularized OT can be used
(add a term to problem (7))
f update for a fixed γ
minf∈H
∑
i,j
γi,jL(ysi , f(x
tj)) + λΩ(f) (8)
• Weighted loss from all source labels.
• γ performs label propagation.
19 / 29
![Page 103: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/103.jpg)
Regression with JDOT
5 0 5x
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
y
Toy regression distributions
2.5 0.0 2.5 5.0x
1.0
0.5
0.0
0.5
1.0
Toy regression models
Source modelTarget modelSource samplesTarget samples
2.5 0.0 2.5 5.0x
1.0
0.5
0.0
0.5
1.0
Joint OT matrices
JDOT matrix linkOT matrix link
2.5 0.0 2.5 5.0x
1.0
0.5
0.0
0.5
1.0
Model estimated with JDOT
Source modelTarget modelJDOT model
Least square regression with quadratic regularization
For a fixed γ the optimization problem is equivalent to
minf∈H
∑
j
1
nt
‖yj − f(xtj)‖2 + λ‖f‖2 (9)
• yj = nt
∑
j γi,jysi is a weighted average of the source target values.
• Note that this problem is linear instead of quadratic.
• Can use any solver (linear, kernel ridge, neural network).
20 / 29
![Page 104: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/104.jpg)
Classification with JDOT
0 2 4 6 8 10 12 14 160.0
0.2
0.4
0.6
0.8
1.0 Accuracy along BCD iterations
α=0.1
α=0.5
α=1.0
α=10.0
α=50.0
α=100.0
Multiclass classification with Hinge loss
For a fixed γ the optimization problem is equivalent to
minfk∈H
∑
j,k
Pj,kL(1, fk(xtj)) + (1− Pj,k)L(−1, fk(x
tj)) + λ
∑
k
‖fk‖2 (10)
• P is the class proportion matrix P =1Nt
γ⊤P
s.
• Ps and Y
s are defined from the source data with One-vs-All strategy as
Y si,k =
1 if ysi = k
−1 else, P s
i,k =
1 if ysi = k
0 else
with k ∈ 1, · · · ,K and K being the number of classes.21 / 29
![Page 105: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/105.jpg)
Caltech-Office classification dataset
Domains Base SurK SA OT-IT OT-MM JDOT
caltech→amazon 92.07 91.65 90.50 89.98 92.59 91.54
caltech→webcam 76.27 77.97 81.02 80.34 78.98 88.81
caltech→dslr 84.08 82.80 85.99 78.34 76.43 89.81
amazon→caltech 84.77 84.95 85.13 85.93 87.36 85.22
amazon→webcam 79.32 81.36 85.42 74.24 85.08 84.75
amazon→dslr 86.62 87.26 89.17 77.71 79.62 87.90
webcam→caltech 71.77 71.86 75.78 84.06 82.99 82.64
webcam→amazon 79.44 78.18 81.42 89.56 90.50 90.71
webcam→dslr 96.18 95.54 94.90 99.36 99.36 98.09
dslr→caltech 77.03 76.94 81.75 85.57 83.35 84.33
dslr→amazon 83.19 82.15 83.19 90.50 90.50 88.10
dslr→webcam 96.27 92.88 88.47 96.61 96.61 96.61
Mean 83.92 83.63 85.23 86.02 86.95 89.04
Avg. rank 4.50 4.75 3.58 3.00 2.42 2.25
• Classical dataset [Saenko et al., 2010] dedicated to visual adaptation.
• Feature extraction by convolutional neural network [Donahue et al., 2014].
• Comparison with Surrogate Kernel [Zhang et al., 2013], Subspace Alignment
[Fernando et al., 2013] and OT Domain Adaptation [Courty et al., 2016b].
• Parameter selected via reverse cross-validation [Zhong et al., 2010].
• SVM (Hinge loss) classifiers with linear kernel.
• Best ranking method and 2% accuracy gain in average.
22 / 29
![Page 106: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/106.jpg)
Amazon Review Classification dataset
Domains NN DANN JDOT (mse) JDOT (Hinge)
books→dvd 0.805 0.806 0.794 0.795
books→kitchen 0.768 0.767 0.791 0.794
books→electronics 0.746 0.747 0.778 0.781
dvd→books 0.725 0.747 0.761 0.763
dvd→kitchen 0.760 0.765 0.811 0.821
dvd→electronics 0.732 0.738 0.778 0.788
kitchen→books 0.704 0.718 0.732 0.728
kitchen→dvd 0.723 0.730 0.764 0.765
kitchen→electronics 0.847 0.846 0.844 0.845
electronics→books 0.713 0.718 0.740 0.749
electronics→dvd 0.726 0.726 0.738 0.737
electronics→kitchen 0.855 0.850 0.868 0.872
Mean 0.759 0.763 0.783 0.787
• Dataset aim at predicting reviews across domains [Blitzer et al., 2006].
• Comparison with Domain adversarial neural network [Ganin et al., 2016a].
• Classifier f is a neural network with same architecture as DANN.
• JDOT has better accuracy, classification loss is better than mean square error.
23 / 29
![Page 107: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/107.jpg)
Wifi localization regression dataset
Domains KRR SurK DIP DIP-CC GeTarS CTC CTC-TIP JDOT
t1 → t2 80.84±1.14 90.36±1.22 87.98±2.33 91.30±3.24 86.76 ± 1.91 89.36±1.78 89.22±1.66 93.03 ± 1.24
t1 → t3 76.44±2.66 94.97±1.29 84.20±4.29 84.32±4.57 90.62±2.25 94.80±0.87 92.60 ± 4.50 90.06 ± 2.01
t2 → t3 67.12±1.28 85.83 ± 1.31 80.58 ± 2.10 81.22 ± 4.31 82.68 ± 3.71 87.92 ± 1.87 89.52 ± 1.14 86.76 ± 1.72
hallway1 60.02 ±2.60 76.36 ± 2.44 77.48 ± 2.68 76.24± 5.14 84.38 ± 1.98 86.98 ± 2.02 86.78 ± 2.31 98.83±0.58
hallway2 49.38 ± 2.30 64.69 ±0.77 78.54 ± 1.66 77.8± 2.70 77.38 ± 2.09 87.74 ± 1.89 87.94 ± 2.07 98.45±0.67
hallway3 48.42 ±1.32 65.73 ± 1.57 75.10± 3.39 73.40± 4.06 80.64 ± 1.76 82.02± 2.34 81.72 ± 2.25 99.27±0.41
• Objective is to predict position of a device on a discretized grid
[Zhang et al., 2013].
• Same experimental protocol as [Zhang et al., 2013, Gong et al., 2016].
• Comparison with domain-invariant projection and its cluster regularized version
([Baktashmotlagh et al., 2013], DIP and DIP-CC), generalized target shift
([Zhang et al., 2015], GeTarS), and conditional transferable components, with its
target information preservation regularization ([Gong et al., 2016], CTC and
CTC-TIP).
• JDOT solves the adaptation problem for transfer across device (10% accuracy
gain on Hallway).
24 / 29
![Page 108: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/108.jpg)
Large scale JDOT Strategy
Large scale JDOT
• JDOT do not scale well to large datasets/ deep learning.
• Use minibach for computing the transport in the primal [Genevay et al., 2017].
• Evaluate batch-local couplings on (sufficiently large) couples of random (without
replacement) batches in source and target domain
• update f from these couplings
Algorithm : Deep JDOT
input Source data Xs, ys, Targte data Xt
for BCD Iterations do
for each Source/Target minibatch do
Solve OT with JDOT loss
Perform label propagation on minibatch
end for
Update model f on one epoch
end for
25 / 29
![Page 109: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/109.jpg)
Large scale datasets
Description MNIST→ USPS USPS→MNIST SVHN→MNIST MNIST→ MNIST-M
Source samples 60000 9298 73257 60000
Target samples 9298 60000 60000 60000
height/width 16×16 16×16 32×32×3 28×28×3
• Four cross domain digits datasets: MNIST, USPS, SVHN, MNIST-M .
• We consider a deep convolutional architecture.
• Dropout is used on the dens layers when training.
• Transport distance computed in the raw image space.
26 / 29
![Page 110: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/110.jpg)
Experimental Results for large scale JDOT
Methods MNIST→ USPS USPS→MNIST SVHN→MNIST MNIST→ MNIST-M
Source only (SO) 86.18 58.73 53.15 59.52
DeepCoral [Sun and Saenko, 2016] 88.43 (22.0) 85.02 (64.6) 69.61 (35.6) 62.18 (0.07)
MMD [Long and Wang, 2015] 89.89 (36.3) 79.19 (50.3) 53.27 (0.01) 52.53 (-19.1)
DANN [Ganin et al., 2016b] 89.06 (28.2) 87.03 (70.0) 73.85∗ (44.7) 76.63 (46.6)
ADDA [Tzeng et al., 2017] 91.22 (49.3) 79.98 (52.2) 76.0∗ (49.4) 79.16 (53.5)
DeepJDOT 91.50 (52.01) 91.21 (79.82) 83.62 (65.85) 67.84 (22.67)
Train on Target (TO) 96.41 99.42 99.42 96.21
• Accuracy in % of the DA methods.
• The values in () represent the coverage gap between SO (source only) and TO
(golden performance if the model is learnt on target labelled data), DA−SOTO−SO
.
• DeepJDOT is better in 3 out of 4 DA problems.
• Plots represent test performances along the BCD iterations.
27 / 29
![Page 111: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/111.jpg)
Other contributions :
theoretical guarantees for DA with OT
Ievgen Redko SciDoLySE 68 / 79
![Page 112: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/112.jpg)
Why domain adaptation works ?
Intuition : If two domains are similar then the adaptation should be easy ?
Ievgen Redko SciDoLySE 69 / 79
![Page 113: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/113.jpg)
Why domain adaptation works ?
Intuition : If two domains are similar then the adaptation should be easy ?
Answer : Yes. Due to the following result [Ben David et al., 2007]
error in targetdomain
=
what we wantto learn
≤error in source
domain
=
can learn wellusing ML
+how different
two domains are ?
=
divergence betweenthe two domains
+is it possible
to adapt ?
=
A non-estimableterm
Can the Wasserstein distance be introduced into these bounds ?
Ievgen Redko SciDoLySE 69 / 79
![Page 114: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/114.jpg)
Several learning bounds
With the Wasserstein distance as a divergence term
between marginal distributions µS and µT [Redko et al., 2017, Shen et al.,2018]
RT (h) ≤ RS(h) +W (µS , µT ) +O(
1√n
)
+Non-estimable
term
between joint distributions PS and PT [Courty et al., 2017]
RT (h) ≤ RS(h) +W (PS , PT ) +O(
1√n
)
+Non-estimable
term+
Prob. TransferLipschitzness term
Ievgen Redko SciDoLySE 70 / 79
![Page 115: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/115.jpg)
Several learning bounds
With the Wasserstein distance as a divergence term
between marginal distributions µS and µT [Redko et al., 2017, Shen et al.,2018]
RT (h) ≤ RS(h) +W (µS , µT ) +O(
1√n
)
+Non-estimable
term
between joint distributions PS and PT [Courty et al., 2017]
RT (h) ≤ RS(h) +W (PS , PT ) +O(
1√n
)
+Non-estimable
term+
Prob. TransferLipschitzness term
Ievgen Redko SciDoLySE 70 / 79
![Page 116: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/116.jpg)
Other contributions :
differentially private OT
Ievgen Redko SciDoLySE 71 / 79
![Page 117: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/117.jpg)
Privacy in Optimal Transport and Domain Adaptation
Challenge : to perform optimal transport for domain adaptation
One needs to have access to both source and target data.
At least one party (source or target) has to reveal data.
It raises privacy concerns.
Ievgen Redko SciDoLySE 72 / 79
![Page 118: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/118.jpg)
Privacy in Optimal Transport and Domain Adaptation
Challenge : to perform optimal transport for domain adaptation
One needs to have access to both source and target data.
At least one party (source or target) has to reveal data.
It raises privacy concerns.
How to transfer knowledge with OT while protecting the privacy of users ?
Ievgen Redko SciDoLySE 72 / 79
![Page 119: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/119.jpg)
Privacy in Optimal Transport and Domain Adaptation
Ievgen Redko SciDoLySE 73 / 79
![Page 120: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/120.jpg)
Differential Privacy
ε-differential private [Dwork et al. 2006]A randomized mechanism M : Xn → Rd is ε- differential private (generalized lateron to (ε, δ)-differential private) if for any two datasets X,X ′ ∈ Xn differing in asingle element and for any output t of M :
P(M(X) = t) ≤ eεP(M(X ′) = t).
Two close datasets correspond to close distributions.
M(X) does not leak much information about any individual point from X.
Example : Johnson-Lindenstrauss transform with some noise.
Ievgen Redko SciDoLySE 74 / 79
![Page 121: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/121.jpg)
Differential Privacy
ε-differential private [Dwork et al. 2006]A randomized mechanism M : Xn → Rd is ε- differential private (generalized lateron to (ε, δ)-differential private) if for any two datasets X,X ′ ∈ Xn differing in asingle element and for any output t of M :
P(M(X) = t) ≤ eεP(M(X ′) = t).
Two close datasets correspond to close distributions.
M(X) does not leak much information about any individual point from X.
Example : Johnson-Lindenstrauss transform with some noise.
Idea : Use Johnson-Lindenstrauss transform to obtain a new DP cost matrix !
Ievgen Redko SciDoLySE 74 / 79
![Page 122: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/122.jpg)
DPOT Algorithm
Input : Xs, Xt, and σ, ℓ > 0 (σ and ℓ known by both parties)
Step 1. Source generates a N (0, 1ℓ)k×ℓ matrix M (known by both parties)
and a N (0, σ)k×ℓ noise matrix ∆ (only known by the Source).
Step 2. Source sends M, Xs +∆, where Xs = XsM
Step 3. Target computes C = c(Xs +∆, Xt)− ℓσ2 where Xt = XtM
(ℓσ2 is subtracted from each entry to cancel the bias caused by ∆)
Step 4. Solve OT with cost matrix C and return P and W (Xs, Xt).
DPOT
Xs
Xt
Xs Xs
Xt
M
MCoupling matrix
~
~
Δ
Ievgen Redko SciDoLySE 75 / 79
![Page 123: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/123.jpg)
Ievgen Redko SciDoLySE 76 / 79
![Page 124: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/124.jpg)
Conclusions
Ievgen Redko SciDoLySE 76 / 79
![Page 125: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/125.jpg)
Conclusions
OT is a very powerful tool for domain adaptation
A large variety of possible applications/scenarios (more remain uncovered)
Quite computationally efficient (but costly regularization is needed)
Ievgen Redko SciDoLySE 77 / 79
![Page 126: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/126.jpg)
Conclusions
OT is a very powerful tool for domain adaptation
A large variety of possible applications/scenarios (more remain uncovered)
Quite computationally efficient (but costly regularization is needed)
Try it using POT library !
Ievgen Redko SciDoLySE 77 / 79
![Page 127: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/127.jpg)
References I
Bredies, K., Lorenz, D. A., and Maass, P. (2009).
A generalized conditional gradient method and its connection to an iterativeshrinkage method.
Computational Optimization and Applications, 42(2) :173–193.
Courty, N., Flamary, R., and Tuia, D. (2014).
Domain adaptation with regularized optimal transport.
In European Conference on Machine Learning and Principles and Practice ofKnowledge Discovery in Databases (ECML PKDD).
Courty, N., Flamary, R., Tuia, D., and Rakotomamonjy, A. (2016).
Optimal transport for domain adaptation.
Pattern Analysis and Machine Intelligence, IEEE Transactions on.
Cuturi, M. (2013).
Sinkhorn distances : Lightspeed computation of optimal transportation.
In Neural Information Processing Systems (NIPS), pages 2292–2300.
Ferradans, S., Papadakis, N., Rabin, J., Peyré, G., and Aujol, J.-F. (2013).
Regularized discrete optimal transport.
In Scale Space and Variational Methods in Computer Vision, SSVM, pages 428–439.
Ievgen Redko SciDoLySE 78 / 79
![Page 128: Transfer Learning and Optimal Transportscidolyse.ens-lyon.fr/sites/default/files/2019-09/OT...Transfer Learning and Optimal Transport Ievgen Redko ievgen.redko@univ-st-etienne.fr UMR](https://reader034.fdocuments.us/reader034/viewer/2022042401/5f0ff28b7e708231d446ae36/html5/thumbnails/128.jpg)
References II
Germain, P., Habrard, A., Laviolette, F., and Morvant, E. (2013).
A PAC-Bayesian Approach for Domain Adaptation with Specialization to LinearClassifiers.
In ICML, pages 738–746, Atlanta, USA.
Perrot, M., Courty, N., Flamary, R., and Habrard, A. (2016).
Mapping estimation for discrete optimal transport.
In Neural Information Processing Systems (NIPS).
Tuia, D., Flamary, R., Rakotomamonjy, A., and Courty, N. (2015).
Multitemporal classification without new labels : a solution with optimal transport.
In 8th International Workshop on the Analysis of Multitemporal Remote SensingImages.
Ievgen Redko SciDoLySE 79 / 79