Detach and Adapt Learning Cross-Domain Disentangled Deep...

Detach and Adapt:Learning Cross-Domain Disentangled Deep Representation

for Image Synthesis and Classification

*Tzu-Chien Fu1, *Yen-Cheng Liu2, Wei-Chen Chiu1,3, and Y.-C. Frank Wang1,2

1Research Center for IT Innovation, Academia Sinica2Dept. EE, National Taiwan University

3Dept. CS, National Chiao Tung University

(* indicates equal contributions)

(Traditional) Machine Learning vs. Transfer Learning

• Transfer Learning• Collecting/annotating data is typically expensive.• Improved learning & understanding in the target domain by leveraging

knowledge from the source domain

Research Focuses

• Transfer Learning for• Homogeneous/heterogeneous domain adaptation• Multi-label classification / zero-shot learning• Robust face recognition (e.g., cross-resolution, cross-modality, etc.)

Heterogeneous Domain Adaptation

• Deep Transfer Learning for Cross-Domain Data Classification• Learning from source & target-domain data described by distinct types of features

Heterogeneous Domain Adaptation (cont’d)

• Transfer Neural Trees (TNT)• Joint learning of cross-domain mapping FS/FT & cl. layer G (deep neural decision forest)• Propose stochastic pruning for G to avoid overfitting source-domain labeled data• Unique embedding loss for learning target-domain data in a semi-supervised setting

5Y.-C. F. Wang et al., "Transfer Neural Trees for Heterogeneous Domain Adaptation,” ECCV, 2016.

Source-domainlabeled data

Target-domainlabeled data

Target-domainunlabeled data

Multi-Label Classification

• Predicting multiple labels w/o using annotated ground truth info (e.g., bounding box)• Learning across image and label-domain data + exploit label co-occurrences

Labels:PersonTableSofaChairTVLightsCarpet…

Multi-Label Classification (cont’d)

• Canonical Correlated AutoEncoder (C2AE) [AAAI’17]• Unique integration of autoencoder & deep canonical correlation analysis (DCCA)• Autoencoder in C2AE: label embedding + label recovery + label co-occurrence• DCCA in C2AE: joint feature & label embedding

Latent spacelabel space

label space

feature space

CloudsLakeOceanWaterSkySunSunset

Y.-C. F. Wang et al., Learning Deep Latent Spaces for Multi-Label Classification, AAAI 2017

Research Focuses

• Transfer Learning for• Domain adaptation

• Cross-domain image synthesis/translation/classification • Multi-label classification / zero-shot learning• Robust face recognition (e.g., cross resolution, etc.)

FaceApp

• Beyond putting a smile on your face• Over 10M downloads

• Feature Disentanglement: • Learn a latent space which factorizes the representation z into different parts (i.e., attributes)

for describing the corresponding info (e.g., identity, pose, or expression of facial images).

Introduction

Representation

Latent Space

Encoder Decoder

1. Uninterpretable2. Entangled

zAttribute

ExpressionPose

Representation

Other factors

Encoder Decoder

Latent Space

Glassesl

• Unsupervised Learning• Disentangling images without observing attribute info• No guarantee in disentangling particular semantics

• Supervised Learning• With supervision of image labels, disentangle the associated factor from feature representation• Can manipulate the output image with label/attribute of interest accordingly.

• Ours: Cross-Domain Feature Disentanglement• Source-domain training data: existing annotated instances• Target-domain data: no ground truth info, to be adapted/manipulated• Can be viewed as either semi-supervised learning, or unsupervised domain adaptation

Settings for Feature Disentanglement

Rotation angle

Our Goal

Source domainw/ attribute info

Target domainw/o attribute annotation

Feature Disentanglement

Unsupervised Domain Adaptation

• A unified framework for cross-domain feature disentanglement, with only attribute supervision from the source domain.

Related Works

• Feature Disentanglement• Unsupervised: InfoGAN [1]• Supervised: AC-GAN [2]

• Unsupervised Cross-Domain Image Synthesis/Translation• Image synthesis: CoGAN [3]• Image translation: UNIT [4]

[4] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. arXiv, 2017.

[1] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. Advances in Neural Information Processing Systems (NIPS), 2016.

[3] M.-Y. Liu and O. Tuzel. Coupled generative adversarial networks. Advances in Neural Information Processing Systems (NIPS), 2016

[2] A. Odena, C. Olah, and J. Shlens. Conditional image synthesis with auxiliary classifier GANs. arXiv, 2016.

No label

InfoGAN & AC-GAN (Unsup/Sup. Feature disentanglement)

InfoGAN (unsupervised)

w/ ground truth attribute labels

AC-GAN (supervised)

[1] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), 2016.[2] A. Odena, C. Olah, and J. Shlens. Conditional image synthesis with auxiliary classifier GANs. arXiv preprint arXiv:1610.09585, 2016.

• Synthesize pairs of corresponding images

• Enforce weight-sharing constraints in high-level layers

CoGAN (Unsupervised Cross-Domain Image Synthesis)

Unsupervised Domain AdaptationUnsupervised Cross-Domain Image Synthesis

15[3] M.-Y. Liu and O. Tuzel. Coupled generative adversarial networks. In Advances in Neural Information Processing Systems (NIPS), 2016

UNIT (Unsupervised Cross-domain Image Translation)

UNIT learns translation functions of mapping an image in one domain to another without any corresponding images across domains.

16[4] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. arXiv, 2017.

Figure: Overview of our proposed method

Proposed Method - Cross-Domain Disentanglement (CDD)

Real / Fake�𝑋𝑋𝑧𝑧

𝑋𝑋

Generator Discriminator

Synthesize realistic images

Proposed Method - Cross-Domain Disentanglement (CDD)Generative Adversarial Network (GAN)

Synthesized images �𝑋𝑋 Real images 𝑋𝑋

Proposed MethodAuxiliaryClassifier-GAN (AC-GAN)

Real / Fake�𝑋𝑋

𝑙𝑙

𝑧𝑧

𝑋𝑋

𝑙𝑙 = 1,2, … , 𝐿𝐿Generator Discriminator

Synthesize images conditioned on disentangled factor 𝑙𝑙

Disentangle the specific factor 𝑙𝑙from the representation 𝑧𝑧

𝑙𝑙 = 1(𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑔𝑔𝑙𝑙𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔)

𝑙𝑙 = 0(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑔𝑔𝑙𝑙𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔)

𝑧𝑧

supervision

Synthesized images �𝑋𝑋

Real / Fake𝑋𝑋 �𝑋𝑋

𝑙𝑙

𝑧𝑧Encoder

𝑋𝑋

𝑙𝑙 = 1,2, … , 𝐿𝐿Generator Discriminator

Proposed MethodVAE + AC-GAN

𝑧𝑧

𝑙𝑙 = 0

𝑙𝑙 = 1

Translate the images conditioned on disentangled factor 𝑙𝑙

Synthesized images �𝑋𝑋Input images 𝑋𝑋

supervision

Source DomainReal / Fake

𝑋𝑋𝑆𝑆 �𝑋𝑋𝑆𝑆

𝑙𝑙𝑧𝑧

Encoder Generator Discriminator

ES EC GC GS DS DC

𝑋𝑋𝑆𝑆

𝑙𝑙 = 1,2, … , 𝐿𝐿

Divide the network into low-level and high-level layers.

Source DomainReal / Fake

𝑙𝑙𝑧𝑧

ES EC GC GS DS DC

𝑋𝑋𝑆𝑆

𝑙𝑙 = 1,2, … , 𝐿𝐿

Divide the network into low-level and high-level layers.

Joint Space

Source Domain

Target Domain

Real / Fake

𝑋𝑋𝑇𝑇 �𝑋𝑋𝑇𝑇

𝑙𝑙𝑧𝑧

𝑋𝑋𝑆𝑆

𝑋𝑋𝑇𝑇

𝑙𝑙 = 1,2, … , 𝐿𝐿

Proposed MethodVAE + AC-GAN for cross-domain images

Share the high-level layers of Encoder, Generator, and Discriminator23

Joint Space

Source Domain

Target Domain

Real / Fake

𝑋𝑋𝑇𝑇 �𝑋𝑋𝑇𝑇

𝑙𝑙𝑧𝑧

𝑋𝑋𝑆𝑆

𝑋𝑋𝑇𝑇

𝑙𝑙 = 1,2, … , 𝐿𝐿

No attribute supervision in the target domain. We only urge the synthesized data in target domain �𝑋𝑋𝑇𝑇 to be disentangled.

Joint Space

Source Domain

Target Domain

Real / Fake

𝑋𝑋𝑆𝑆

𝑋𝑋𝑇𝑇

𝑙𝑙𝑧𝑧

𝑋𝑋𝑆𝑆

𝑋𝑋𝑇𝑇

𝑙𝑙 = 1,2, … , 𝐿𝐿

Translate the images �𝑋𝑋𝑇𝑇→𝑆𝑆 and �𝑋𝑋𝑆𝑆→𝑇𝑇 across different domains.

�𝑋𝑋𝑇𝑇→𝑆𝑆

�𝑋𝑋𝑆𝑆→𝑇𝑇

Joint Space

Source Domain

Target Domain

Real / Fake

𝑋𝑋𝑆𝑆

𝑋𝑋𝑇𝑇

𝑙𝑙𝑧𝑧

𝑋𝑋𝑆𝑆

𝑋𝑋𝑇𝑇

𝑙𝑙 = 1,2, … , 𝐿𝐿

Tie the disentangled factor 𝒍𝒍 across domains with

�𝑋𝑋𝑇𝑇→𝑆𝑆

�𝑋𝑋𝑆𝑆→𝑇𝑇

Overall objective function can be defined as:

Experiments• Qualitative Evaluation:

• Conditional image synthesis and translation

• Quantitative Evaluation:• Cross-domain attribute classification

• Dataset• CelebFaces Attributes dataset (CelebA)• A large-scale face dataset with 200K+ celebrity images with 40 facial annotated attributes

ResultsS : faces w/o eyeglasses; T : faces w/ eyeglasses; l : attribute of smiling

ResultsS : real photo of faces; T : simulated sketch of faces; l : attribute of smiling

ResultsS : real photo of faces; T : simulated sketch of faces; l : attribute of eyeglasses

Summary

• Transfer Learning for• Homogeneous/heterogeneous domain adaptation• Multi-label classification / zero-shot learning• Robust face recognition (e.g., cross-resolution, cross-modality, etc.)

• Feature Disentanglement for• Cross-domain image synthesis/translation/classification• Only label supervision from a single (source) domain is needed

Detach and Adapt Learning Cross-Domain Disentangled Deep...

Documents

Transcript of Detach and Adapt Learning Cross-Domain Disentangled Deep...

Disentangled Sequential Autoencoder · Disentangled Sequential Autoencoder (a) generator (b) encoder (factorised q) (c) encoder (full q) Figure 1. A graphical model visualisation

Disentangled Representations in Neural Models

LEGAN: Disentangled Manipulation of Directional Lighting ...

Discounting disentangled: an expert survey on the ...

On Disentangled Representations Learned from Correlated Data

Animating Face using Disentangled Audio Representations · Animating Face using Disentangled Audio Representations Gaurav Mittal Baoyuan Wang Microsoft Research fgamit, baoyuanwg@microsoft.com

Are Disentangled Representations Helpful for Abstract Visual … · 2020-02-13 · Are Disentangled Representations Helpful for Abstract Visual Reasoning? Sjoerd van Steenkiste IDSIA,

Disentangled Makeup Transfer with Generative Adversarial Network · 2019-07-03 · Disentangled Makeup Transfer with Generative Adversarial Network Honglun Zhang 1;2, Jidong Tian

Disentangled Contour Learning for Quadrilateral Text Detection

Domain Agnostic Learning with Disentangled Representationsproceedings.mlr.press/v97/peng19b/peng19b.pdf · Domain Agnostic Learning with Disentangled Representations Xingchao Peng1

InfoGAN: Interpretable Representation Learning by ...papers.nips.cc/paper/6399-infogan-interpretable...In addition, prior research attempted to learn disentangled representations using

Disentangled Representation Learning GAN for Pose-Invariant Face …cvlab.cse.msu.edu/pdfs/Tran_Yin_Liu_CVPR2017.pdf · 2019-07-17 · Disentangled Representation Learning GAN for

Disentangled Non-local Neural Networks - ECVA

1 InterFaceGAN: Interpreting the Disentangled Face Representation Learned … · 2020-05-21 · 1 InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs Yujun

Gait Recognition via Disentangled Representation Learningcvlab.cse.msu.edu/pdfs/Zhang_Tran_Yin_Atoum_Liu_Wan_Wang... · 2019-07-17 · Gait Recognition via Disentangled Representation

Reinforcement Learning with a Disentangled Universal Value ...

Gait Recognition via Disentangled Representation …openaccess.thecvf.com/content_CVPR_2019/papers/Zhang...Gait Recognition via Disentangled Representation Learning Ziyuan Zhang, Luan

Multi-Level Variational Autoencoder: Learning Disentangled ......Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations Diane Bouchacourt

InfoGAN: Interpretable Representation Learning by ... · InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Xi Chen yz, Yan Duan

An Efﬁcient Integration of Disentangled Attended ... · STUDENT, PROF, COLLABORATOR: BMVC AUTHOR GUIDELINES 1 An Efﬁcient Integration of Disentangled Attended Expression and Identity