Deep Learning Insights on GANs
Deep Learning Insights on GANs
• Adversarial
• Trained in an adversarial setting
• Networks
• Use Deep Neural Networks
Why Generative Models?
• We’ve only seen discriminative models so far
• Given an image X, predict a label Y
• Estimates P(Y|X)
Lotter, William, Gabriel Kreiman, and David Cox. "Unsupervised learning of visual structure using predictive generative networks." arXiv preprint arXiv:1511.06380 (2015).
Magic of GANs…
Which one is Computer generated?
Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." arXiv preprint arXiv:1609.04802 (2016).
Magic of GANs…
http://people.eecs.berkeley.edu/~junyanz/projects/gvm/
Adversarial Training
• In the last lecture, we saw:
• We can generate adversarial samples to fool a discriminative model
• We can use those adversarial samples to make models robust
• We then require more effort to generate adversarial samples
• Repeat this and we get better discriminative model
D D(x)
G
z
G(z)
D(G(z))
https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
Training Generator
https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
Generator in action
https://openai.com/blog/generative-models/
GAN’s formulation
Generator
updates
Vanishing gradient strikes back again…
Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
CIFAR
Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
DCGAN: Bedroom images
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434 (2015).
Deep Convolutional GANs (DCGANs)
Key ideas:
• Replace FC hidden layers with
Generator Architecture Convolutions
• Generator: Fractional-Strided
convolutions
• Inside Generator
• Use ReLU for hidden layers
• Use Tanh for the output layer
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434 (2015).
Latent vectors capture interesting patterns…
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434 (2015).
Part 2
• Advantages of GANs
• Training Challenges
• Non-Convergence
• Mode-Collapse
• Proposed Solutions
• Supervision with Labels
• Mini-Batch GANs
• Modification of GAN’s losses
• Discriminator (EB-GAN)
• Generator (InfoGAN)
Advantages of GANs
• Plenty of existing work on Deep Generative Models
• Boltzmann Machine
• Deep Belief Nets
• Variational AutoEncoders (VAE)
• Why GANs?
• Sampling (or generation) is straightforward.
• Training doesn't involve Maximum Likelihood estimation.
• Robust to Overfitting since Generator never sees the training data.
• Empirically, GANs are good at capturing the modes of the distribution.
Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Problems with GANs
• Probability Distribution is Implicit
• Not straightforward to compute P(X).
• Thus Vanilla GANs are only good for Sampling/Generation.
• Training is Hard
• Non-Convergence
• Mode-Collapse
Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Training Problems
• Non-Convergence
• Mode-Collapse
• Deep Learning models (in general) involve a single player
• The player tries to maximize its reward (minimize its loss).
• Use SGD (with Backpropagation) to find the optimal parameters.
• SGD has convergence guarantees (under certain conditions).
• Problem: With non-convexity, we might converge to local optima.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Non-Convergence
Let
𝜕 𝜕
•
𝜕𝑥 𝜕𝑦
𝜕
•
𝜕 𝜕𝑥
Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Problems with GANs
• Non-Convergence
• Mode-Collapse
Mode-Collapse
• Generator fails to output diverse samples
Target
Expected
Output
Metz, Luke, et al. "Unrolled Generative Adversarial Networks." arXiv preprint arXiv:1611.02163 (2016).
Some real examples
Reed, S., et al. Generating interpretable images with controllable structure. Technical report, 2016. 2, 2016.
Some Solutions
• Mini-Batch GANs
• Supervision with labels
• More formally,
• Let the Discriminator look at the entire batch instead of single examples
• If there is lack of diversity, it will mark the examples as fake
• Thus,
• Generator will be forced to produce diverse samples.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Mini-Batch GANs
• Extract features that capture diversity in the mini-batch
• For e.g. L2 norm of the difference between all pairs from the batch
• This in turn,
• Will force the Generator to match those feature values with the real data
• Will generate diverse batches
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Basic (Heuristic) Solutions
• Mini-Batch GANs
• Supervision with labels
Supervision with Labels
• Label information of the real data might help
Car
Dog
Real
D D Human
Fake
Fake
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Alternate view of GANs
𝐺 𝐷
∼ ( ) ∼ ( )
∗ ∗
𝐷 𝐺
• Alternatively, we can flip the binary classification labels i.e. Fake = 1, Real = 0
∼ ( ) ∼ ( )
∗
∼ ( ) ∼ ( )
• Now, we can replace cross-entropy with any loss function (Hinge Loss)
∗
∼ (𝑥) ∼ (𝑧)
Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Energy-Based GANs
• Modified game plans
• Generator will try to generate samples with
low values
• Discriminator will try to assign high scores to
fake values
Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
More Bedrooms…
Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Celebs…
The Cool Stuff…
3D Faces
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
Cool Stuff (contd.)
3D Chairs
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
How to reward Disentanglement?
• Disentanglement means individual dimensions
independently capturing key attributes of the image
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative Adversarial Nets
Recap: Mutual Information
• Mutual Information captures the mutual dependence between two variables
𝑝(𝑥,𝑦)
𝑥,𝑦 𝑝(𝑥)𝑝(𝑦)
InfoGAN
• We want to maximize the mutual information
between and
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
InfoGAN
Mutual Information’s Variational Lower bound
~ (𝑧,𝑐) ~ ( | )
~ (𝑧,𝑐) ~ ( | )
~ (𝑧,𝑐) ~ ( | )
~ (𝑐), ~ (𝑧,𝑐)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• Coupled GAN
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Conditional GANs
MNIST digits generated conditioned on their class label.
MNIST digits
Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
Conditional GANs
• Simple modification to the original GAN
framework that conditions the model on
additional information for better multi-modal
learning.
Image Credit: Figure 2 in Odena, A., Olah, C. and Shlens, J., 2016. Conditional image synthesis with auxiliary classifier GANs. arXiv preprint arXiv:1610.09585.
Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets”. arXiv preprint arXiv:1411.1784 (2014).
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• Coupled GAN
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Image-to-Image Translation
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Image-to-Image Translation
• Architecture: DCGAN-based
architecture
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. “Generative adversarial text to image synthesis”. ICML (2016).
Text-to-Image Synthesis
Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Face Aging with Conditional GANs
Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Conditional GANs
Conditional Model Collapse
• Scenario observed when the
Conditional GAN starts ignoring
either the code (c) or the noise
variables (z).
• This limits the diversity of
images generated.
Credit?
Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• Coupled GAN
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Coupled GAN
Weight-sharing constraints the network to learn a joint distribution without corresponding supervision.
Liu, Ming-Yu, and Oncel Tuzel. “Coupled generative adversarial networks”. NIPS (2016).
Coupled GANs Hair
Color
• Some examples of
generating facial
images across Facial
different feature Expression
domains.
• Corresponding images
in a column are Sunglasses
Liu, Ming-Yu, and Oncel Tuzel. “Coupled generative adversarial networks”. NIPS (2016).
Laplacian Pyramid of Adversarial Networks
Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Laplacian Pyramid of Adversarial Networks
Training Procedure:
Models at each level are trained independently to learn the required representation.
Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Adversarially Learned Inference
• Basic idea is to learn an encoder/inference network along with the
generator network.
generator distribution
Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).
Adversarially Learned Inference
Discriminator Network
Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).
Adversarially Learned Inference
• Nash equilibrium yields
• Joint:
• Marginals: and
• Conditionals: and
Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).
Summary
• GANs are generative models that are implemented using two
stochastic neural network modules: Generator and Discriminator.
• Generator tries to generate samples from random noise as input
• Discriminator tries to distinguish the samples from Generator and
samples from the real data distribution.
• Both networks are trained adversarially (in tandem) to fool the other
component. In this process, both models become better at their
respective tasks.
Why use GANs for Generation?
• Can be trained using back-propagation for Neural Network based
Generator/Discriminator functions.
• Sharper images can be generated.
• Faster to sample from the model distribution: single forward pass
generates a single sample.
Reading List
• Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. Generative adversarial nets, NIPS (2014).
• Goodfellow, Ian NIPS 2016 Tutorial: Generative Adversarial Networks, NIPS (2016).
• Radford, A., Metz, L. and Chintala, S., Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434. (2015).
• Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. Improved techniques for training gans. NIPS (2016).
• Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization
Generative Adversarial Nets, NIPS (2016).
• Zhao, Junbo, Michael Mathieu, and Yann LeCun. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016).
• Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
• Liu, Ming-Yu, and Oncel Tuzel. Coupled generative adversarial networks. NIPS (2016).
• Denton, E.L., Chintala, S. and Fergus, R., 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. NIPS (2015)
• Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., & Courville, A. Adversarially learned inference. arXiv preprint
arXiv:1606.00704 (2016).
Applications:
• Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004. (2016).
• Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. Generative adversarial text to image synthesis. JMLR (2016).
• Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). Face Aging With Conditional Generative Adversarial Networks. arXiv preprint arXiv:1702.01983.
THANK YOU