Lecture 3 Generative Models
Lecture 3 Generative Models
Generative Models
Kian Katanforoosh
Kian Katanforoosh
Today’s outline
Kian Katanforoosh
Today’s outline
Kian Katanforoosh
I. Introduction to Generative Models
Kian Katanforoosh
I. Introduction to Generative Models
Kian Katanforoosh
fi
I. Introduction to Generative Models
Text-to-Image Synthesis Image in-painting
Super resolution
[Rombach et al. (2022): High-Resolution Image Synthesis with Latent Diffusion Models] Kian Katanforoosh
I. Introduction to Generative Models
Kian Katanforoosh
I. Introduction to Generative Models
Probability distributions:
Samples from the “real data distribution” “real data distribution”
Matching distributions
Goal
Image space
“generated distribution”
Samples from the “generated distribution”
Image space
Image space
[Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas (2017): StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks] Kian Katanforoosh
Today’s outline
Kian Katanforoosh
II. Generative Adversarial Networks (GANs)
A. G/D Game
B. Training GANs
C. Nice results
[Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples] Kian Katanforoosh
II. Generative Adversarial Networks (GANs)
A. G/D Game
B. Training GANs
C. Nice results
[Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples] Kian Katanforoosh
II.A - G/D Game
100-d (64,64,3)
random code generated image
≠
⎛ 0.47 ⎞
⎜ ⎟ Generator “G”
⎜ ⎟
⎜ ⎟ (Neural Network)
⎜⎝ 0.19 ⎟⎠
[Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas (2017): StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks] Kian Katanforoosh
II.A - G/D Game
100-d (64,64,3)
random code generated image
Run Gradient Descent simultaneously on two
⎛ 0.47 ⎞
⎜ ⎟ minibatches (true data / generated data)
⎜ ⎟ Generator “G”
⎜ ⎟ (Neural Network)
⎜⎝ 0.19 ⎟⎠
⎧ y = 0 if x = G(z)
⎪
x Discriminator “D”
(Neural Network) ⎨
⎪⎩ y = 1 otherwise
Probability distributions
Real images
(database)
Image space
Kian Katanforoosh
fi
II.A - G/D Game
100-d (64,64,3)
random code generated image
⎛ 0.47 ⎞
⎜ ⎟ Generator “G”
⎜ ⎟
⎜ ⎟ (Neural Network)
⎜⎝ 0.19 ⎟⎠
z Gradients
Binary classi cation
⎧ y = 0 if x = G(z)
End goal: G is outputting ⎪
images that are x Discriminator “D”
(Neural Network) ⎨
indistinguishable from real ⎪⎩ y = 1 otherwise
images for D
Probability distribution
Real images
(database)
Image space
Kian Katanforoosh
fi
II. Generative Adversarial Networks (GANs)
A. G/D Game
B. Training GANs
C. Nice results
[Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples] Kian Katanforoosh
II.B - Training GANs
⎧⎪ yreal is always 1
Training procedure, we want to minimize: Labels: ⎨
y
⎪⎩ gen is always 0
• The cost of the discriminator
mreal mgen
1 1
J ( D)
=− ∑
mreal i=1
yreal .log(D(x )) −
(i) (i)
∑
mgen i=1
(i) (i)
(1− ygen ).log(1− D(G(z )))
!####"####$ !###### #"####### $
cross-entropy 1: cross-entropy 2:
“D should correctly predict real data as 1” “D should correctly predict generated data as 0”
Kian Katanforoosh
II.B - Training GANs
5
Non-saturating cost
mgen
1
0 J (G )
=− ∑
mgen i=1
(i)
log(D(G(z )))
(G )
J
Saturating cost m
1
gen
J (G )
= ∑
mgen i=1
(i)
log(1− D(G(z )))
-20
0 D(G(z)) 1
Kian Katanforoosh
II.B - Training GANs
⎡ 1 mgen ⎤ ⎡ 1
mgen
⎤ ⎡ 1
mgen
⎤
Note that: min ⎢ ∑ log(1− D(G(z ))) ⎥ ⇔ max ⎢
(i)
∑ log(D(G(z ))) ⎥ ⇔ min ⎢ −
(i)
∑ (i)
log(D(G(z ))) ⎥
⎢⎣ mgen i=1 ⎥⎦ ⎢⎣ mgen i=1 ⎥⎦ ⎢⎣ mgen i=1 ⎥⎦
mgen
1
J (G )
=− ∑
mgen i=1
(i)
log(D(G(z ))) “G should try to fool D: by minimizing this”
Kian Katanforoosh
[Lucic, Kurach et al. (2018): Are GANs Created Equal? A Large-Scale Study]
Kian Katanforoosh
II.B - Training GANs
5
Non-saturating cost
mg
0 1
(G )
J (G )
=− ∑
mg i=1
(i)
log(D(G(z )))
for num_iterations:
J
Saturating cost
for k iterations:
mg
update D 1
update G
J (G )
= ∑
mg i=1
(i)
log(1− D(G(z )))
-20
0 D(G(z)) 1
A. G/D Game
B. Training GANs
C. Nice results
[Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples] Kian Katanforoosh
II.C - Nice results
Operation on codes
(64,64,3)
Code 1 generated image
⎛ 0.12⎞
⎜ ⎟
⎜ ⎟ Generator “G”
⎜ ⎟ (Neural Network) 1
⎜ ⎟
⎝ 0.92⎠
(64,64,3)
Code 2 generated image
Code 1 Code 2 Code 3
⎛ 0.47 ⎞
⎜ ⎟ Generator “G” ⎛ 0.12⎞ ⎛ 0.47 ⎞ ⎛ 0.42 ⎞
⎜ ⎟ 2 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ Generator “G”
⎜ ⎟ (Neural Network)
⎜⎝ 0.19 ⎟⎠
⎜
⎜
⎟
⎟
- ⎜
⎜
+
⎟
⎟
⎜
⎜
⎟
⎟
(Neural Network)
⎜ ⎟ ⎜⎝ 0.19 ⎟⎠ ⎜ ⎟
⎝ 0.92⎠ ⎝ 0.07 ⎠
(64,64,3)
Code 3 generated image
⎛ 0.42 ⎞
⎜ ⎟
⎜ ⎟ Generator “G” Man with glasses - man + woman = woman with glasses
⎜ ⎟ (Neural Network) 2
⎜ ⎟
⎝ 0.07 ⎠
[Radford et al. (2015): UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS] Kian Katanforoosh
II.C - Nice results
Face Generation:
[Karras et al. (2018): A Style-Based Generator Architecture for Generative Adversarial Networks]
https://www.youtube.com/watch?
v=kSLJriaOumA&feature=youtu.be
Kian Katanforoosh
II.C - Nice results
Image Generation:
Samples from the “generated distribution”
[Zhu, Park et al. (2017): Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks] Kian Katanforoosh
II.C - Nice results
Goal: Convert horses to zebras on images, and vice-versa.
Unpaired images
Horse images Zebra images
[Zhu, Park et al. (2017): Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks] Kian Katanforoosh
II.C - Nice results
⎧ y = 0 if x = G1(H )
⎪
Architecture? ⎨
⎪⎩ y = 1 otherwise (x = z)
H G1(H)
H2Z
Generator1
⎧ y = 0 if x = G2(Z ) Discriminator1
⎪ (H2Z)
⎨
⎪⎩ y = 1 otherwise (x = h)
G2(G1(H))
Generator2 ⎧ y = 0 if x = G1(H )
Discriminator2 ⎪
(Z2H) ⎨
⎪⎩ y = 1 otherwise (x = z)
G1(G2(Z))
Generator1 Discriminator1
⎧ y = 0 if x = G2(Z ) (H2Z)
⎪
⎨
⎪⎩ y = 1 otherwise (x = h)
G2(Z) Z
Generator2
Discriminator2
(Z2H) Z2H
[Zhu, Park et al. (2017): Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks] Kian Katanforoosh
II.C - Nice results
⎧ y = 0 if x = G1(H )
⎪
⎨
Loss to minimize? H
G1
G
D1
⎩⎪ y = 1 otherwise
G1(
i=1 gen i=1
G1
⎧ y = 0 if x = G2(Z )
mgen ⎪
G Z
1
⎨
∑
⎪⎩ y = 1 otherwise
G2
J (G1)
=− (i)
log(D1(G1(H ))) D2
mgen i=1
mreal mgen
1 1
J ( D 2)
=−
mreal
∑ log(D2(h )) − m ∑ log(1− D2(G2(Z )))
(i) (i)
mgen
1 J=J ( D1)
+J (G1)
+J ( D 2)
+J (G 2)
+ λJ cycle
J (G 2)
=− ∑
mgen i=1
(i)
log(D2(G2(Z )))
mgen mgen
1 1
J cycle
=− ∑
mgen i=1
|| G2(G1(H ) − H ||1 −
(i) (i)
∑
mgen i=1
|| G1(G2(Z ) − Z ||1
(i) (i)
Kian Katanforoosh
II.C - Nice results
CycleGANs:
Face2ramen
+ Face detection
Pix2Pix:
[Isola et al. (2017): Image-to-Image Translation with Conditional Adversarial Networks] Kian Katanforoosh
fi
II.C - Nice results
Super-resolution
[Ledig et al. (2016): Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network] Kian Katanforoosh
II.C - Nice results
• Many more…
Kian Katanforoosh
Today’s outline
Kian Katanforoosh
III. Diffusion models
A. Basic principles
B. Loss function and training
C. Sampling
D. Latent diffusion
E. Results and applications
Kian Katanforoosh
III. A. Basic Principles
Limitations of GANs:
- Mode collapse: G learns to "cheat" by focusing on a narrow set of outputs that
consistently fool D, rather than exploring the full range of the data distribution.
- Adversarial training leads to instability, such as vanishing gradients, dif cult to
converge.
Diffusion models are generative model that progressively add noise to data and
learn to reverse this process to generate new samples.
[Dhariwal & Nichol (2021): Diffusion Models Beat GANs on Image Synthesis] Kian Katanforoosh
fi
fi
III. A. Basic Principles
x0 xt xt+1 xT
(0,1)
xt+1 = xt + ϵt
Pixels added as random Gaussian noise
Pixels retained from the previous image (randomly sampled at each step)
Kian Katanforoosh
𝒩
[(Simpli ed from:) Ho et al. (2020): Denoising Diffusion Probabilistic Models]
fi
III. A. Basic Principles
x0 xt−1 xt xT
Predicts the cumulative
noise added from x0 to xt
Reconstruction loss:
2
Diffusion Model ϵ̂ ℒ = ∥ϵ − ϵ∥̂ 2
The model’s prediction of
the noise added to the
xt clean image after t steps
−
Ground truth noise representing
≈ the difference between the clear
and noisy image at time step t
xt ϵ̂ x0
[(Simpli ed from:) Ho et al. (2020): Denoising Diffusion Probabilistic Models] Kian Katanforoosh
fi
III. B. Loss function and training
The index t is also provided to the model to condition its noise prediction
Noisy image
Training process (xt)
The actual noise that was added to the original image to generate xt
Training process:
( . . . , t = 5, ϵ) 1. Sample:
( . . . , t = 45, ϵ)
(Noisy image, index of time step,
Database of images cumulative noise added)
( . . . , t = 3, ϵ)
2. Stochastic Gradient Descent
...
Reconstruction loss:
2
ℒ = ∥ϵ − ϵ∥̂ 2
( . . . , t = 19, ϵ)
Note that all these epsilons are different.
xt+1 = xt + ϵt xt+1 = 1 − βt ⋅ xt + βt ⋅ ϵ
“Add random Gaussian
“Erase/shrink pixels of the previous image” noise to certain pixels”
xt = x0 + ϵ xt = ᾱt ⋅ x0 + 1 − ᾱt ⋅ ϵ
t
where ϵ = ϵ0 + . . . + ϵt−1
∏
where ᾱt = (1 − βi)
i=1
Kian Katanforoosh
𝒩
[(Simpli ed from:) Ho et al. (2020): Denoising Diffusion Probabilistic Models]
fi
III. C. Sampling
Inference process:
Diffusion Model
Diffusion Model
2. Progressive denoising: for each step
from t=T (pure noise) back to t=0, repeat:
subtra
Diffusion Model
subtra
x0 xt−1 xt xT
Encoder
model (VAE)
+ϵ0 +ϵ1 +ϵt
zt Diffusion Model ϵ̂ Why does it help? By working in latent space, the model deals with a much smaller vector or
tensor compared to the high-dimensional pixel space of the full image. This makes the
learning process faster and less computationally expensive, while still preserving the critical
features of the data.
Note: During sampling (test time), you want to use a decoder to go from z0 to a clean image x0.
[(Simpli ed from:) Rombach et al. (2022): High-Resolution Image Synthesis with Latent Diffusion Models] Kian Katanforoosh
fi
III. D. Latent Diffusion
x0 xt−1 xt xT
Encoder
model (VAE)
+ϵ0 +ϵ1 +ϵt
zt
noise added from z0 to zt details. This compressed representation still retains enough information about the original image,
but in a more compact form.
Diffusion Model ϵ̂ Why does it help? By working in latent space, the model deals with a much smaller vector or
y
tensor compared to the high-dimensional pixel space of the full image. This makes the
learning process faster and less computationally expensive, while still preserving the critical
features of the data.
[(Simpli ed from:) Rombach et al. (2022): High-Resolution Image Synthesis with Latent Diffusion Models] Kian Katanforoosh
fi
III. E. Results & Applications
Completed modules:
• C4M1: Foundations of Convolutional Neural Network (slides)
• C4M2: Deep Convolutional Models (slides)
Quizzes (due by 11:00 a.m. PST, 30 minutes prior to the start of lecture time, unless otherwise noted):
• The basics of ConvNets
• Deep convolutional models
Programming Assignments (due by 11:00 a.m. PST, 30 minutes prior to the start of lecture time,
unless otherwise noted):
• Convolutional Model: step by step
• Convolutional Model: application
• Keras Tutorial: This assignment is optional.
• Residual Networks
Kian Katanforoosh