0% found this document useful (0 votes)
14 views47 pages

Lecture 3 Generative Models

Uploaded by

oyedelestudy12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views47 pages

Lecture 3 Generative Models

Uploaded by

oyedelestudy12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

CS230: Lecture 5

Generative Models
Kian Katanforoosh

Kian Katanforoosh
Today’s outline

I. Introduction to Generative Models


II. Generative Adversarial Networks
III. Diffusion Models

Kian Katanforoosh
Today’s outline

I. Introduction to Generative Models


II. Generative Adversarial Networks
III. Diffusion Models

Kian Katanforoosh
I. Introduction to Generative Models

What are examples of use-cases for


generative modeling?

Kian Katanforoosh
I. Introduction to Generative Models

From “discriminative” to “generative” models:


In traditional machine learning, models are trained to discriminate (e.g., classify an
image as a cat or not a cat). Generative models, on the other hand, learn the underlying
distribution of the data and can create new data points (e.g., generate new cat
images). This shift from recognition to creation is crucial for tasks like simulation,
prediction, and creativity.

Generative modeling is promising for human & AI collaborations:


AI systems that can generate creative works—art, music, writing—are becoming co-
creators with humans in elds where creativity is key.

Kian Katanforoosh
fi
I. Introduction to Generative Models
Text-to-Image Synthesis Image in-painting

Super resolution

[Rombach et al. (2022): High-Resolution Image Synthesis with Latent Diffusion Models] Kian Katanforoosh
I. Introduction to Generative Models

The approach is unsupervised:


Collect a lot of data, use it to train a model to generate similar data from scratch.
Intuitively, why does it work?
Number of parameters of the model << amount of data

Kian Katanforoosh
I. Introduction to Generative Models

Probability distributions:
Samples from the “real data distribution” “real data distribution”

Matching distributions

Goal
Image space

“generated distribution”
Samples from the “generated distribution”

Image space

Image space

[Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas (2017): StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks] Kian Katanforoosh
Today’s outline

I. Introduction to Generative Models


II. Generative Adversarial Networks
III. Diffusion Models

Kian Katanforoosh
II. Generative Adversarial Networks (GANs)

A. G/D Game

B. Training GANs

C. Nice results

[Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples] Kian Katanforoosh
II. Generative Adversarial Networks (GANs)

A. G/D Game

B. Training GANs

C. Nice results

[Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples] Kian Katanforoosh
II.A - G/D Game
100-d (64,64,3)
random code generated image


⎛ 0.47 ⎞
⎜ ⎟ Generator “G”
⎜ ⎟
⎜ ⎟ (Neural Network)
⎜⎝ 0.19 ⎟⎠

How can we train G to generate images from the true data


distributions?

[Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas (2017): StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks] Kian Katanforoosh
II.A - G/D Game
100-d (64,64,3)
random code generated image
Run Gradient Descent simultaneously on two
⎛ 0.47 ⎞
⎜ ⎟ minibatches (true data / generated data)
⎜ ⎟ Generator “G”
⎜ ⎟ (Neural Network)
⎜⎝ 0.19 ⎟⎠

z Gradients Binary classi cation

⎧ y = 0 if x = G(z)

x Discriminator “D”
(Neural Network) ⎨
⎪⎩ y = 1 otherwise

Probability distributions

Real images
(database)

Image space

Kian Katanforoosh
fi
II.A - G/D Game
100-d (64,64,3)
random code generated image

⎛ 0.47 ⎞
⎜ ⎟ Generator “G”
⎜ ⎟
⎜ ⎟ (Neural Network)
⎜⎝ 0.19 ⎟⎠

z Gradients
Binary classi cation

⎧ y = 0 if x = G(z)
End goal: G is outputting ⎪
images that are x Discriminator “D”
(Neural Network) ⎨
indistinguishable from real ⎪⎩ y = 1 otherwise
images for D
Probability distribution

Real images
(database)

Image space

Kian Katanforoosh
fi
II. Generative Adversarial Networks (GANs)

A. G/D Game

B. Training GANs

C. Nice results

[Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples] Kian Katanforoosh
II.B - Training GANs

⎧⎪ yreal is always 1
Training procedure, we want to minimize: Labels: ⎨
y
⎪⎩ gen is always 0
• The cost of the discriminator
mreal mgen
1 1
J ( D)
=− ∑
mreal i=1
yreal .log(D(x )) −
(i) (i)

mgen i=1
(i) (i)
(1− ygen ).log(1− D(G(z )))
!####"####$ !###### #"####### $
cross-entropy 1: cross-entropy 2:
“D should correctly predict real data as 1” “D should correctly predict generated data as 0”

• The cost of the generator


mgen
1
J (G )
= −J ( D)
= ∑
mgen i=1
(i)
log(1− D(G(z ))) “G should try to fool D: by minimizing the
opposite of what D is trying to minimize”

Kian Katanforoosh
II.B - Training GANs

Saturating cost ⎡ 1 mgen ⎤ ⎡ 1


mgen
⎤ ⎡ 1
mgen

min ⎢ ∑ log(1− D(G(z ))) ⎥ ⇔ max ⎢
(i)
∑ log(D(G(z ))) ⎥ ⇔ min ⎢ −
(i)
∑ (i)
log(D(G(z ))) ⎥
for the generator: ⎢⎣ mgen i=1 ⎥⎦ ⎢⎣ mgen i=1 ⎥⎦ ⎢⎣ mgen i=1 ⎥⎦

5
Non-saturating cost
mgen
1
0 J (G )
=− ∑
mgen i=1
(i)
log(D(G(z )))
(G )
J

Saturating cost m
1
gen

J (G )
= ∑
mgen i=1
(i)
log(1− D(G(z )))
-20
0 D(G(z)) 1

[Ian Goodfellow (2014): NIPS Tutorial: GANs] Kian Katanforoosh


2 1 3

Kian Katanforoosh
II.B - Training GANs

⎡ 1 mgen ⎤ ⎡ 1
mgen
⎤ ⎡ 1
mgen

Note that: min ⎢ ∑ log(1− D(G(z ))) ⎥ ⇔ max ⎢
(i)
∑ log(D(G(z ))) ⎥ ⇔ min ⎢ −
(i)
∑ (i)
log(D(G(z ))) ⎥
⎢⎣ mgen i=1 ⎥⎦ ⎢⎣ mgen i=1 ⎥⎦ ⎢⎣ mgen i=1 ⎥⎦

New training procedure, we want to minimize:


mreal mgen
1 1
J ( D)
=− ∑
mreal i=1
yreal .log(D(x )) −
(i) (i)

mgen i=1
(i) (i)
(1− ygen ).log(1− D(G(z )))
!####"####$ !###### #"####### $
cross-entropy 1: cross-entropy 2:
“D should correctly label real data as 1” “D should correctly label generated data as 0”

mgen
1
J (G )
=− ∑
mgen i=1
(i)
log(D(G(z ))) “G should try to fool D: by minimizing this”

Kian Katanforoosh
[Lucic, Kurach et al. (2018): Are GANs Created Equal? A Large-Scale Study]

Kian Katanforoosh
II.B - Training GANs

Simultaneously training G/D?

5
Non-saturating cost
mg
0 1
(G )
J (G )
=− ∑
mg i=1
(i)
log(D(G(z )))
for num_iterations:
J
Saturating cost
for k iterations:
mg
update D 1
update G
J (G )
= ∑
mg i=1
(i)
log(1− D(G(z )))

-20
0 D(G(z)) 1

[Ian Goodfellow (2014): NIPS Tutorial: GANs] Kian Katanforoosh


II. Generative Adversarial Networks (GANs)

A. G/D Game

B. Training GANs

C. Nice results

[Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples] Kian Katanforoosh
II.C - Nice results

Operation on codes
(64,64,3)
Code 1 generated image

⎛ 0.12⎞
⎜ ⎟
⎜ ⎟ Generator “G”
⎜ ⎟ (Neural Network) 1
⎜ ⎟
⎝ 0.92⎠

(64,64,3)
Code 2 generated image
Code 1 Code 2 Code 3
⎛ 0.47 ⎞
⎜ ⎟ Generator “G” ⎛ 0.12⎞ ⎛ 0.47 ⎞ ⎛ 0.42 ⎞
⎜ ⎟ 2 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ Generator “G”
⎜ ⎟ (Neural Network)
⎜⎝ 0.19 ⎟⎠




- ⎜

+






(Neural Network)
⎜ ⎟ ⎜⎝ 0.19 ⎟⎠ ⎜ ⎟
⎝ 0.92⎠ ⎝ 0.07 ⎠
(64,64,3)
Code 3 generated image

⎛ 0.42 ⎞
⎜ ⎟
⎜ ⎟ Generator “G” Man with glasses - man + woman = woman with glasses
⎜ ⎟ (Neural Network) 2
⎜ ⎟
⎝ 0.07 ⎠

[Radford et al. (2015): UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS] Kian Katanforoosh
II.C - Nice results

Face Generation:

[Karras et al. (2018): A Style-Based Generator Architecture for Generative Adversarial Networks]

https://www.youtube.com/watch?
v=kSLJriaOumA&feature=youtu.be

Kian Katanforoosh
II.C - Nice results

Image Generation:
Samples from the “generated distribution”

[Zhang et al. (2017): StackGAN++]


Kian Katanforoosh
II.C - Nice results

[Liu et al. (2017): Unsupervised Image-to-Image Translation Networks] Kian Katanforoosh


II.C - Nice results

[Zhu, Park et al. (2017): Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks] Kian Katanforoosh
II.C - Nice results
Goal: Convert horses to zebras on images, and vice-versa.

Data? Architecture? Cost function?

Unpaired images
Horse images Zebra images

[Zhu, Park et al. (2017): Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks] Kian Katanforoosh
II.C - Nice results
⎧ y = 0 if x = G1(H )

Architecture? ⎨
⎪⎩ y = 1 otherwise (x = z)

H G1(H)
H2Z
Generator1
⎧ y = 0 if x = G2(Z ) Discriminator1
⎪ (H2Z)

⎪⎩ y = 1 otherwise (x = h)
G2(G1(H))

Generator2 ⎧ y = 0 if x = G1(H )
Discriminator2 ⎪
(Z2H) ⎨
⎪⎩ y = 1 otherwise (x = z)

G1(G2(Z))

Generator1 Discriminator1
⎧ y = 0 if x = G2(Z ) (H2Z)


⎪⎩ y = 1 otherwise (x = h)
G2(Z) Z

Generator2
Discriminator2
(Z2H) Z2H

[Zhu, Park et al. (2017): Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks] Kian Katanforoosh
II.C - Nice results
⎧ y = 0 if x = G1(H )

Loss to minimize? H
G1
G
D1
⎩⎪ y = 1 otherwise

mreal mgen G2(


1 1 G2
J ( D1)
=−
mreal
∑ log(D1(z )) − m ∑ log(1− D1(G1(H )))
(i) (i)

G1(
i=1 gen i=1
G1
⎧ y = 0 if x = G2(Z )
mgen ⎪
G Z
1


⎪⎩ y = 1 otherwise

G2
J (G1)
=− (i)
log(D1(G1(H ))) D2
mgen i=1
mreal mgen
1 1
J ( D 2)
=−
mreal
∑ log(D2(h )) − m ∑ log(1− D2(G2(Z )))
(i) (i)

i=1 gen i=1

mgen
1 J=J ( D1)
+J (G1)
+J ( D 2)
+J (G 2)
+ λJ cycle
J (G 2)
=− ∑
mgen i=1
(i)
log(D2(G2(Z )))

mgen mgen
1 1
J cycle
=− ∑
mgen i=1
|| G2(G1(H ) − H ||1 −
(i) (i)

mgen i=1
|| G1(G2(Z ) − Z ||1
(i) (i)

Kian Katanforoosh
II.C - Nice results

CycleGANs:

Face2ramen

+ Face detection

[Shu Naritomi et al.: Face2Ramen]


[Takuya Tako: Face2Ramen using CycleGAN]
[Zhu, Park et al. (2017): Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks] Kian Katanforoosh
II.C - Nice results

Pix2Pix:

https://af nelayer.com/pixsrv/ by Christopher Hesse.

[Isola et al. (2017): Image-to-Image Translation with Conditional Adversarial Networks] Kian Katanforoosh
fi
II.C - Nice results

Super-resolution

[Ledig et al. (2016): Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network] Kian Katanforoosh
II.C - Nice results

Motion Retargeting video subjects : https://www.youtube.com/watch?

[Chan et al. (2018): Everybody Dance Now] Kian Katanforoosh


II.C - Nice results

Other applications of GANs:

• Beaulieu-Jones et al., Privacy-preserving generative deep neural


networks support clinical data sharing.

• Hwang et al., Learning Beyond Human Expertise with Generative Models


for Dental Restorations.

• Gomez et al., Unsupervised cipher cracking using discrete GANs.

• Many more…

Kian Katanforoosh
Today’s outline

I. Introduction to Generative Models


II. Generative Adversarial Networks
III. Diffusion Models

Kian Katanforoosh
III. Diffusion models

A. Basic principles
B. Loss function and training
C. Sampling
D. Latent diffusion
E. Results and applications

Kian Katanforoosh
III. A. Basic Principles

Limitations of GANs:
- Mode collapse: G learns to "cheat" by focusing on a narrow set of outputs that
consistently fool D, rather than exploring the full range of the data distribution.
- Adversarial training leads to instability, such as vanishing gradients, dif cult to
converge.

Ideally, we want to:


- avoid mode collapse by modeling the entire data distribution in a gradual,
probabilistic manner.
- have more stable gradients with a well-de ned, non-adversarial task (and loss)

Diffusion models are generative model that progressively add noise to data and
learn to reverse this process to generate new samples.

[Dhariwal & Nichol (2021): Diffusion Models Beat GANs on Image Synthesis] Kian Katanforoosh
fi
fi
III. A. Basic Principles

Forward process (diffusion) +ϵt

x0 xt xt+1 xT
(0,1)
xt+1 = xt + ϵt
Pixels added as random Gaussian noise
Pixels retained from the previous image (randomly sampled at each step)

Expanding this formula


for t steps:
xt = x0 + ϵ where ϵ = ϵ0 + . . . + ϵt−1

Kian Katanforoosh
𝒩
[(Simpli ed from:) Ho et al. (2020): Denoising Diffusion Probabilistic Models]
fi
III. A. Basic Principles

Reverse process (denoising)

x0 xt−1 xt xT
Predicts the cumulative
noise added from x0 to xt
Reconstruction loss:

2
Diffusion Model ϵ̂ ℒ = ∥ϵ − ϵ∥̂ 2
The model’s prediction of
the noise added to the
xt clean image after t steps


Ground truth noise representing
≈ the difference between the clear
and noisy image at time step t

xt ϵ̂ x0
[(Simpli ed from:) Ho et al. (2020): Denoising Diffusion Probabilistic Models] Kian Katanforoosh
fi
III. B. Loss function and training
The index t is also provided to the model to condition its noise prediction
Noisy image
Training process (xt)
The actual noise that was added to the original image to generate xt

Training process:

( . . . , t = 5, ϵ) 1. Sample:

( . . . , t = 45, ϵ)
(Noisy image, index of time step,
Database of images cumulative noise added)

( . . . , t = 3, ϵ)
2. Stochastic Gradient Descent

...
Reconstruction loss:

2
ℒ = ∥ϵ − ϵ∥̂ 2

( . . . , t = 19, ϵ)
Note that all these epsilons are different.

[Ho et al. (2020): Denoising Diffusion Probabilistic Models] Kian Katanforoosh


III. B. Loss function and training

In practice, the math is more complicated than this:

Variance schedule controlling the noise


scale at each time step
(0,1)

xt+1 = xt + ϵt xt+1 = 1 − βt ⋅ xt + βt ⋅ ϵ
“Add random Gaussian
“Erase/shrink pixels of the previous image” noise to certain pixels”

xt = x0 + ϵ xt = ᾱt ⋅ x0 + 1 − ᾱt ⋅ ϵ
t
where ϵ = ϵ0 + . . . + ϵt−1

where ᾱt = (1 − βi)
i=1

Kian Katanforoosh
𝒩
[(Simpli ed from:) Ho et al. (2020): Denoising Diffusion Probabilistic Models]
fi
III. C. Sampling

Inference process:
Diffusion Model

1. Initialization: Start with a random image


xB sampled from a Gaussian subtra

Diffusion Model
2. Progressive denoising: for each step
from t=T (pure noise) back to t=0, repeat:
subtra

- Predict the cumulative noise that would …


have been added to x0 to get xt
- Remove the predicted cumulative noise
from xt subtra

Diffusion Model

subtra

[Ho et al. (2020): Denoising Diffusion Probabilistic Models] Kian Katanforoosh


III. D. Latent Diffusion

x0 xt−1 xt xT

Encoder
model (VAE)
+ϵ0 +ϵ1 +ϵt

z0 z1 z2 ... zt zt+1 ... zT−1 zT


Latent space is a lower-dimensional representation of the original data. In this space, the
Predicts the cumulative encoder captures the most important features or patterns of the image while ignoring irrelevant
noise added from z0 to zt details. This compressed representation still retains enough information about the original image,
but in a more compact form.

zt Diffusion Model ϵ̂ Why does it help? By working in latent space, the model deals with a much smaller vector or
tensor compared to the high-dimensional pixel space of the full image. This makes the
learning process faster and less computationally expensive, while still preserving the critical
features of the data.

Note: During sampling (test time), you want to use a decoder to go from z0 to a clean image x0.

[(Simpli ed from:) Rombach et al. (2022): High-Resolution Image Synthesis with Latent Diffusion Models] Kian Katanforoosh
fi
III. D. Latent Diffusion

x0 xt−1 xt xT

Encoder
model (VAE)
+ϵ0 +ϵ1 +ϵt

z0 z1 z2 ... zt zt+1 ... zT−1 zT


Latent space is a lower-dimensional representation of the original data. In this space, the
Predicts the cumulative encoder captures the most important features or patterns of the image while ignoring irrelevant

zt
noise added from z0 to zt details. This compressed representation still retains enough information about the original image,
but in a more compact form.

Diffusion Model ϵ̂ Why does it help? By working in latent space, the model deals with a much smaller vector or

y
tensor compared to the high-dimensional pixel space of the full image. This makes the
learning process faster and less computationally expensive, while still preserving the critical
features of the data.

Embedded text prompt


(If you want to condition on a prompt, Note: During sampling (test time), you want to use a decoder to go from z0 to a clean image x0.
like Dall-E)

[(Simpli ed from:) Rombach et al. (2022): High-Resolution Image Synthesis with Latent Diffusion Models] Kian Katanforoosh
fi
III. E. Results & Applications

[Google (2022): Imagen]


[Meta (2022): Make-A-Video]
[OpenAI (2022): Dall-E] Kian Katanforoosh
Announcements

For next week:

Completed modules:
• C4M1: Foundations of Convolutional Neural Network (slides)
• C4M2: Deep Convolutional Models (slides)

Quizzes (due by 11:00 a.m. PST, 30 minutes prior to the start of lecture time, unless otherwise noted):
• The basics of ConvNets
• Deep convolutional models
Programming Assignments (due by 11:00 a.m. PST, 30 minutes prior to the start of lecture time,
unless otherwise noted):
• Convolutional Model: step by step
• Convolutional Model: application
• Keras Tutorial: This assignment is optional.
• Residual Networks

Kian Katanforoosh

You might also like