Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: –...

35
Lars Mescheder 1 , Sebastian Nowozin 2 , Andreas Geiger 1 1 MPI-IS Tübingen 2 University of Tübingen 3 Microsoft Research Cambridge Which Training Methods for GANs do actually Converge?

Transcript of Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: –...

Page 1: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

Lars Mescheder1, Sebastian Nowozin2, Andreas Geiger1

1MPI-IS Tübingen 2University of Tübingen 3Microsoft Research Cambridge

Which Training Methods for GANs do actually Converge?

Page 2: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

2

Introduction

Generative neural networks:

Key challenge:have to learn high dimensional probability distribution

noise

Page 3: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

3

Introduction

Generative Adversarial networks (GANs):

real or fake?Generator

Discriminator

Page 4: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

4

Generative Adversarial Networks

Alternating gradient descent

Page 5: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

5

Generative Adversarial Networks

Simultaneous gradient descent

Page 6: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

6

Generative Adversarial Networks

● Does a (pure) Nash-equilibrium exist?– Yes, if there is with (Goodfellow et al., 2014)

● Does it solve the min-max problem?– Yes, if (Goodfellow et al., 2014)

● Do simultaneous and / or alternating gradient descent converge to the Nash-equilbrium?

Page 7: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

7

Ist GAN training locally asymptotically stable?

Mescheder et al. (2017): No, if Jacobian of gradientvector field has purely imaginary eigenvalues

Is GAN training locally asymptotically stable in the general case?

Nagarajan and Kolter (2017): Yes, if generatorand data distributions locally have the same support

Heusel et al. (2017): Yes, if optimal discriminatorparameters are continuous function of generator parametersand two-timescale annealing scheme is adopted

Page 8: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

8

Our Contributions

● Dirac-GAN:– Unregularized GAN-training is not always stable

when distributions do not have the same support● Analysis of common regularizers:

– WGAN and WGAN-GP not always stable– Instance noise & zero-centered gradient penalties are stable

● Simplified gradient penalties– Convergence proof for realizable case

● Empirical results:– High resolution (1024x1024) generative models

without progressively growing architectures

Page 9: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

9

Convergence theory

“Simple experiments, simple theorems are the building blocks that help us understand more complicated systems.”

Ali Rahimi – Test of Time Award speech, NIPS 2017

Page 10: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

10

Convergence theory

The Dirac-GAN:

Page 11: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

11

Convergence theory

Unregularized GAN training:

Page 12: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

12

Convergence theory

Understanding the gradient vector field:

Local convergence of simultaneous and alternating gradient descent determined by eigenvalues of Jacobian

Page 13: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

13

Convergence theory

Continuous system:

Page 14: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

14

Convergence theory

Continuous system:

Page 15: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

15

Convergence theory

Discretized system:

Page 16: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

16

Convergence theory

Discretized system:

Page 17: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

17

Which training methods converge?

Unregularized GAN training:

Eigenvalues:

Page 18: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

18

Which training methods converge?

Eigenvalues:

Wasserstein-GAN1 training:

1Arjovsky et al. - Wasserstein GAN (2017)

Page 19: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

19

Which training methods converge?

Eigenvalues:

Wasserstein-GAN1 training(5 discriminator updates / generator update):

1Arjovsky et al. - Wasserstein GAN (2017)

Page 20: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

20

Which training methods converge?

Zero-centered Gradient Penalties

Eigenvalues:

Page 21: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

21

Which training methods converge?

Zero-centered Gradient Penalties (critical)

Eigenvalues:

Page 22: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

22

Which training methods converge?

Zero-centered Gradient Penalties

no convergence

slow convergence

fast convergence

Page 23: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

23

General convergence results

● Regularizers for discriminator

● Regularized gradient vector field

Page 24: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

24

Assumption IV: the generator and data distributionhave locally the same support (Nagarajan & Kolter)

General convergence results

Assumption I: the generator can represent the true data distribution

Assumption II: and

Assumption III: the discriminator can detect when the generator deviates from equilibrium

Page 25: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

25

Assumption IV: the generator and data distributionhave locally the same support (Nagarajan & Kolter)

General convergence results

Assumption I: the generator can represent the true data distribution

Assumption II: and

Assumption III: the discriminator can detect when the generator deviates from equilibrium

For the Dirac-GAN:

Page 26: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

26

Assumption IV: the generator and data distributionhave locally the same support (Nagarajan & Kolter)

General convergence results

Assumption I: the generator can represent the true data distribution

Assumption II: and

Assumption III: the discriminator can detect when the generator deviates from equilibrium

For GANs in the wild:

Page 27: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

27

General convergence results

Theorem: under Assumption I, II, III and some mild technical assumptions the GAN training dynamics for the regularized training objective are locally asymptotically stable near the equilibrium point

Page 28: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

28

General convergence results

Proof (idea):

1Nagarajan & Kolter - Gradient descent GAN optimization is locally stable (2017)

(Extends prior work by Nagarajan & Kolter1)

Negative definiteFull column rank

(orthogonal to )

All eigenvalues of have negative real part

Page 29: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

29

Experiments

Imagenet (128 x 128, 1k classes)

Page 30: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

30

Experiments

LSUN bedrooms (256 x 256)

Page 31: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

31

Experiments

LSUN churches (256 x 256)

Page 32: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

32

Experiments

LSUN towers (256 x 256)

Page 33: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

33

Experiments

celebA-HQ (1024 x 1024)

Page 34: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

34

● use alternating instead of simultaneous gradient descent● don’t use momentum● use regularization to stabilize the training● simple zero-centered gradient penalties for the discriminator

yield excellent results ● progressively growing architectures might be not all that

important when using a good regularizer

Practical recommendations

Page 35: Which Training Methods for GANs do actually Converge? · 8 Our Contributions Dirac-GAN: – Unregularized GAN-training is not always stable when distributions do not have the same

35

Poster #77