100% found this document useful (1 vote)

147 views55 pages

Diffusion: by Aryan Jain

Diffusion models generate images by gradually adding noise to images over multiple timesteps, then learning to reverse the process by denoising the images. They have achieved state-of-the-art results compared to GANs. Recent improvements include using a cosine rather than linear noise schedule, learning the covariance matrix, and injecting classifier gradients to improve fidelity. Latent diffusion models separate image generation into a perceptual compression stage and diffusion stage to improve efficiency.

Uploaded by

KaiWang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

147 views55 pages

Diffusion: by Aryan Jain

Uploaded by

KaiWang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 55

Diffusion

By Aryan Jain
Agenda 1.
2.
Theory of diffusion
Diffusion for image generation
3. Tricks to improve image synthesis models
4. Latent Diffusion Models
5. Examples of recent diffusion models
Diffusion
Density Modeling for Data Synthesis

Assume that all data comes from a distribution pdata(x):

● The goal of generative machine learning models is to learn
this distribution to the best of their ability — the distribution
approximated by the model is denoted as pθ(x)
● We generate new data by sampling from the learned
distribution
● In practice, train models to maximize the expected log
likelihood of pθ(x) (or minimizing negative log
likelihood)/minimize divergence between pθ(x) and pdata(x)
Prior Methods

VAE:

GAN:
Sampling from Noise
Diffusion

● Another kind of generative modeling technique that takes inspiration from physics (non-
equilibrium statistical physics and stochastic differential equations to be more exact)!
● Main idea: convert a well-known and simple base distribution (like a Gaussian) to the target
(data) distribution iteratively, with small step sizes, via a Markov chain:
○ Treat the the output of the Markov Chain as the model’s approximation for the learned
distribution
○ Inspiration? Estimating and analyzing small step sizes is more tractable/easier than
describing a single non-normalizable step from random noise to the learned distribution
(which is what VAEs/GANs are doing)
Anatomy of a Diffusion Model

1. Forward Process
2. Reverse Process
Forward Process

● Take a datapoint x_0 and keep gradually adding very small amounts of Gaussian noise to it
○ Vary the parameters of the Gaussian according to a noise schedule controlled by beta_t
● Repeat this process for T steps — as the timesteps increase, the more features of the original
input are destroyed
● You can prove with some math that as T approaches infinity, you eventually end up with an
Isotropic Gaussian (i.e. pure random noise)
A neat (reparametrization) trick!

Define

Then
Reverse Process

● The goal of a diffusion model is to learn the reverse denoising process to iteratively undo the
forward process
● In this way, the reverse process appears as if it is generating new data from random noise!
Finding the exact distribution is hard

The distribution of each timestep and q(x_t | x_{t-1}) depends on the entire data distribution:

● Computing this is computationally intractable (where else have we seen this dilemma?)
● However, we still need those to describe the reverse process. Can we approximate them
somehow?
What should the distribution look like?

Turns out that for small enough forward steps, i.e.

the reverse process step can be estimate is a Gaussian distribution too (take a course
of stochastic differential equations if you want learn more)!

Therefore, we can parametrize the learned reverse process as

such that
A preliminary objective

The VAE (ELBO) loss is a bound on the true log likelihood (also called the variational lower bound)

Apply the same trick to diffusion:

Expanding out,
A simplified objective

The reverse step conditioned on x_0 is a Gaussian:

After doing some algebra, each loss term can be approximated by

A simplified objective

Instead of predicting the mu, Ho et al. say that we should predict epsilon instead!

Thus, our loss becomes

A simplified objective

The authors of DDPM say that it’s fine to drop all that baggage in the front and instead just use

Note that this is not a variational lower bound on the log-likelihood anymore: in fact, you can view it
as a reweighted version of ELBO that emphasizes reconstruction quality!
Training
Sampling
Diffusion for Image
Generation
Forward process:
converting the image
distribution to pure noise

Reverse process: sampling

from the image
distribution, starting with
pure noise
UNet + Other Stuff

Diffusion models typically use a

U-Net on steroids as the noise
predictive model — you take the
good ol’ model that you are
already familiar with and add:
● Positional Embeddings
● ResNet Blocks
● ConvNext Blocks
● Attention Modules
● Group Normalization
● Swish and GeLU
It’s a massive kitchen sink of
modern CV tricks
Tricks for Improving
Generation
Linear vs Cosine Schedule

● A linear noise schedule converts initial data to noise really quickly, making the reverse process
harder for the model to learn.
● Researchers hypothesized that a cosine-like function that is changing relatively gradually near
the endpoints might work better
○ Note: It did end up working better but this choice of cosine was completely arbitrary

Linear (top) vs Cosine (bottom)

Learning a Covariance matrix

● DDPM authors said that it’s better to use a fixed covariance matrix
where or .
○ The intuition is that covariance does not contribute as significantly as the mean does to
the learned conditional distributions during the reverse process
○ However, it can still help us improve log-likelihood!
● So, Nichol and Dhariwal propose

This modification leads to better likelihood estimates while maintaining image quality!
Architecture Improvements

Nichol and Dhariwal proposed several architectural changes that seem to help diffusion training:

1. Increasing model depth vs width (not both): both help but increasing width is computationally
cheaper while providing similar gains as increased depth
2. Increasing number of attention heads and applying it to multiple resolutions
3. Stealing BigGAN residual blocks for upsampling and downsampling
4. Adaptive Group Normalization — hopes to better incorporate timestep (and potentially class)
information during the training/reverse process
Classifier Guidance

Recall conditional GANs from the previous lectures: they can be conditioned on class labels to
synthesize specific kinds of images. We can apply the same idea to diffusion!

The main idea is this:

● Take a pre-trained unconditional diffusion model
● During sampling, inject the gradients of a classifier model (that is trained from scratch on noisy
images) into the unconditional reverse process
● Classifier guidance trades off image diversity for model fidelity, allowing it to push the
performance of a diffusion model past that of a GAN
Classifier Guidance
Classifier Guidance

At a high level:
● FID and sFID captures
image quality
● Precision measures image
fidelity (“resemblance to
training images”)
● Recall measures image
diversity/distribution
coverage
Lower FID/sFID is better
Higher Precision and Recall is
better
Diffusion Models Beats GANs

BigGAN Diffusion Training Set

Diffusion Models Beats GANs

BigGAN Diffusion Training Set

Classifier-Free Guidance

Classifier guidance worked great but

● Can’t use a pre-trained classifier since classifier must be trained on noisy data
● Since classifier guidance injects classifier gradients into it’s training process, is it really not just
an adversarial attack that the classifier has learned to become robust against? It is hard to
interpret what classifier guidance is actually doing
● To avoid these dilemmas, researchers proposed ignoring an external classifier all together and
jointly training a class-conditioned and unconditional diffusion model simultaneously
● The goal of this paper was to understand the behavior of guidance, not to push the boundaries
of image synthesis, but…
Classifier-Free Guidance
Classifier vs Classifier-Free Guidance

1. Classifier-free guidance is pretty simple to implement while classifier guidance requires training
an external classifier on noisy data
2. Classifier-free guidance does not employ any kind of classifier gradients and cannot be
interpreted as an adversarial attack: it is more in line with traditional generative models
3. Classifier-free guidance is slower than classifier guidance by the virtue of requiring twice as
many reverse diffusion steps
4. Both lower the sample diversity to increase sample fidelity/quality: is this really acceptable?
Diversity
(unguided samples)
vs.
Fidelity
(guided samples)
Latent Diffusion Models
Latent Diffusion Models Motivation

● Training models in the pixel space is excessively computationally expensive (can easily multiple
days on a V100 GPU)
○ Even image synthesis is very slow compared to GANs
○ Images are high dimensional → more things to model
● Researchers observed that most “bits” of an image contribute to its perceptual characteristics
since aggressively compressing it usually maintains its semantic and conceptual composition
○ In layman’s terms, there are more bits for describing pixel-level details while less bits for
describing “the meaning” within an image
○ Generative models should learn the latter
● Can we separate these two components?
Latent Diffusion Models

Latent Diffusion Models can be divided into two stages:

1. Training perceptual compression models that strip away irrelevant high-level details and learn a
latent space that is semantically equivalent to the high level image pixel-space
a. The loss is a combination of a reconstruction loss, an adversarial loss (remember GANs?)
that promotes high quality decoder reconstruction, and regularization terms

2. Performing a diffusion process in this latent space. There are several benefits to this:
a. The diffusion process is only focusing on the relevant semantic bits of the data
b. Performing diffusion in a low dimensional space is significantly more efficient
U-Net
Examples of Recent
Diffusion Models
DALLE 2 (Text-to-Image)

Teddy bears mixing sparkling An astronaut riding a horse in a A bowl of soup as a planet in the
chemicals as mad scientists photorealistic style universe
Imagen (Text-to-Image)

A cute corgi lives in a house made of A majestic oil painting of a raccoon A robot couple fine-dining with the
sushi Queen wearing red French royal Eiffel Tower in the background
gown.
Video Diffusion (Text-to-Video)
Make-A-Video (Text-to-Video)

An artist’s brush painting on a A young couple walking in heavy Horse drinking water
canvas close up rain
Make-A-Video (Text-to-Video)

A confused grizzly bear in a calculus A golden retriever eating ice cream A panda playing on a swing set
class on a beautiful tropical beach at
sunset, high resolution
Imagen Video (Text-to-Video)
DreamFusion (Text-to-3D)

a fox holding a video game controller a lobster playing the saxophone

a corgi wearing a beret and holding a baguette, standing a human skeleton drinking a glass of red wine
up on two hind legs
Diffuser (Trajectory Planning)
Diffusion-QL (Offline RL)

Even applies to RL!

● Recall that diffusion is basically a

method for learning unknown
distributions.
● In offline RL, the goal is to train
an agent from offline datasets
● To ensure
Wrap-Up
Summary

We went over

● A quick tour of generative modeling and how image synthesis can be viewed as sampling from
a density
● Preliminary theory of diffusion (don’t worry if this is confusing; this is a very theory rich subject
and even I don’t know all the details!)
● Some tricks that modern diffusion models employ for image generation:
○ A U-Net architecture equipped with all kinds of modifications
○ Other architecture improvements
○ Several implementation tricks (different noise schedules, covariance parametrizations)
○ Classifier and classifier-free guidance
● Latent diffusion models for improving diffusion quality and efficiency
Main papers ● Deep Unsupervised Learning using
Nonequilibrium Thermodynamics:

referenced here! ●
https://arxiv.org/pdf/1503.03585.pdf
Denoising Diffusion Probabilistic Models:
https://arxiv.org/pdf/2006.11239.pdf
● Improved Denoising Diffusion Probabilistic
Models: https://arxiv.org/pdf/2102.09672.pdf
● Diffusion Models Beat GANs on Image
Synthesis: https://arxiv.org/pdf/2105.05233.pdf
● Classifier-free Diffusion Guidance:
Disclaimer: some of the foundational https://arxiv.org/pdf/2207.12598.pdf
work done on Diffusion is relatively ● High Resolution Image Synthesis with Latent
math and notation heavy! Diffusion Models:
https://arxiv.org/pdf/2112.10752.pdf
Other cool papers to ● Denoising Diffusion Implicit Models:
https://arxiv.org/pdf/1503.03585.pdf
●
check out! Generative Modeling by Estimating Gradients
of the Data Distribution:
https://yang-song.net/blog/2021/score/
● Sampling is as easy as learning the score:
theory for diffusion models with minimal data
assumptions: https://arxiv.org/pdf/2209.11215.pdf

Disclaimer: some of the foundational

work done on Diffusion is relatively
math and notation heavy!
Even more resources! Other resources:
● Lillian Weng’s Blog:
https://lilianweng.github.io/posts/2021-07-11-diffusion
-models/
● The Annotated Diffusion Model:
https://huggingface.co/blog/annotated-diffusion
● The Illustrated Stable Diffusion:
https://jalammar.github.io/illustrated-stable-diffusion/
● PyTorch implementation of the DDPM Unet:
https://nn.labml.ai/diffusion/ddpm/unet.html
● Guidance: a cheat code for diffusion models:
https://benanne.github.io/2022/05/26/guidance.html
● Understanding Diffusion Models: A Unified
Perspective: https://arxiv.org/pdf/2208.11970.pdf

IBM Security Verify Governance Level 2 Quiz Attempt Review
100% (1)
IBM Security Verify Governance Level 2 Quiz Attempt Review
11 pages
7-Knowledge Distillation
No ratings yet
7-Knowledge Distillation
29 pages
API Mode Reference-Guide
No ratings yet
API Mode Reference-Guide
60 pages
RFD900 Balloon Telemetry Senior Capstone Final Project Report
No ratings yet
RFD900 Balloon Telemetry Senior Capstone Final Project Report
77 pages
Intelligent Agents and Environment
No ratings yet
Intelligent Agents and Environment
9 pages
New CZ3005 Module 2 - Intelligent Agents and Search
No ratings yet
New CZ3005 Module 2 - Intelligent Agents and Search
66 pages
CNN RNN Assignment Set 4
0% (1)
CNN RNN Assignment Set 4
2 pages
The 1D Diffusion Equation
No ratings yet
The 1D Diffusion Equation
23 pages
Exercises 695 Clas
No ratings yet
Exercises 695 Clas
3 pages
Mini Cooper ATF Change PDF
No ratings yet
Mini Cooper ATF Change PDF
21 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
PPT_Btech CSE
No ratings yet
PPT_Btech CSE
17 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
UNIT-I_Introduction to Computer Vision
No ratings yet
UNIT-I_Introduction to Computer Vision
45 pages
04-Random-Variate Generation
No ratings yet
04-Random-Variate Generation
18 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Biosignals & Biosystems: Block 2. The Z-Transform
No ratings yet
Biosignals & Biosystems: Block 2. The Z-Transform
69 pages
AI-Lecture 12 - Simple Perceptron
100% (1)
AI-Lecture 12 - Simple Perceptron
24 pages
CNN PPT Unit Iv
No ratings yet
CNN PPT Unit Iv
134 pages
CH 3
No ratings yet
CH 3
25 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Expert System and Apllications: Ai - Iii-Unit
No ratings yet
Expert System and Apllications: Ai - Iii-Unit
27 pages
Bias, Variance, and Tradeoff
No ratings yet
Bias, Variance, and Tradeoff
8 pages
Diffusion Models
No ratings yet
Diffusion Models
46 pages
Master Thesis Template Polito
No ratings yet
Master Thesis Template Polito
16 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Chap 12
No ratings yet
Chap 12
120 pages
Dli Ambassador Benefits Web
No ratings yet
Dli Ambassador Benefits Web
5 pages
Lab 12
No ratings yet
Lab 12
5 pages
ECE 5th Sem Syllabus
0% (1)
ECE 5th Sem Syllabus
84 pages
Stochastic Search Methods
100% (1)
Stochastic Search Methods
45 pages
P4 Discrete Event System Simulation CH 01
No ratings yet
P4 Discrete Event System Simulation CH 01
36 pages
UNIT2
No ratings yet
UNIT2
25 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
P1 - Single Layer Feed Forward Networks
No ratings yet
P1 - Single Layer Feed Forward Networks
52 pages
Exercises695Clas Solution
100% (2)
Exercises695Clas Solution
13 pages
Differential Evolution in Search of Solutions by Vitaliy Feoktistov PDF
No ratings yet
Differential Evolution in Search of Solutions by Vitaliy Feoktistov PDF
200 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
Hw1 Theory Solution PuHK4fmHvB
No ratings yet
Hw1 Theory Solution PuHK4fmHvB
4 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
AI - 02 (Intelligent Agents)
No ratings yet
AI - 02 (Intelligent Agents)
36 pages
A Deep-Learning-Based Smart Healthcare System For Patients Discomfort Detection at The Edge of Internet of Things
No ratings yet
A Deep-Learning-Based Smart Healthcare System For Patients Discomfort Detection at The Edge of Internet of Things
9 pages
Chapter 3
No ratings yet
Chapter 3
12 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
1 page
DFT Domain Image
No ratings yet
DFT Domain Image
65 pages
Generative Adversarial Networks: Akrit Mohapatra Ece Department, Virginia Tech
No ratings yet
Generative Adversarial Networks: Akrit Mohapatra Ece Department, Virginia Tech
21 pages
Neuro-Fuzzy Programming To Finding Fuzzy Multiple Objective Linear Programming Problems
100% (1)
Neuro-Fuzzy Programming To Finding Fuzzy Multiple Objective Linear Programming Problems
6 pages
Artificial Intelligence - Adversarial Search
No ratings yet
Artificial Intelligence - Adversarial Search
4 pages
Unit1 SoftComputing
No ratings yet
Unit1 SoftComputing
33 pages
30 Assignments PDF
No ratings yet
30 Assignments PDF
5 pages
Chapter_1_Introduction_to_computer_vision_and_image_processing_for
No ratings yet
Chapter_1_Introduction_to_computer_vision_and_image_processing_for
81 pages
Machine Learning: Neural Networks
No ratings yet
Machine Learning: Neural Networks
22 pages
Autoencoders
No ratings yet
Autoencoders
66 pages
Chapter 2 IA
No ratings yet
Chapter 2 IA
49 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Iv. Single Layer Structures: 4.1. Perceptrons
No ratings yet
Iv. Single Layer Structures: 4.1. Perceptrons
26 pages
Handout - BITS-F464 - Machine - Learning - August 2019
No ratings yet
Handout - BITS-F464 - Machine - Learning - August 2019
4 pages
Diffusion
No ratings yet
Diffusion
55 pages
Stable Diffusion For Image Generation
No ratings yet
Stable Diffusion For Image Generation
23 pages
DiffusionModel DDPM
No ratings yet
DiffusionModel DDPM
52 pages
Ipad 10.9-Inch (10th Generation) - Apple (CA)
No ratings yet
Ipad 10.9-Inch (10th Generation) - Apple (CA)
1 page
CSC 222: Computer Organization: & Assembly Language
No ratings yet
CSC 222: Computer Organization: & Assembly Language
22 pages
Chapter 2 - IP Static Routing
No ratings yet
Chapter 2 - IP Static Routing
43 pages
Ekoure Digital Stethoscope Spec Sheet 19 05 29 PDF
No ratings yet
Ekoure Digital Stethoscope Spec Sheet 19 05 29 PDF
1 page
Lec1 CG
No ratings yet
Lec1 CG
16 pages
Braveheart: Composed by J.Horner
No ratings yet
Braveheart: Composed by J.Horner
19 pages
Assignment/Tugasan ABCS4103 Scriptwriting Penulisan Skrip May 2022 Semester
No ratings yet
Assignment/Tugasan ABCS4103 Scriptwriting Penulisan Skrip May 2022 Semester
13 pages
Chapter 2 - Basic Concepts of Modulation
No ratings yet
Chapter 2 - Basic Concepts of Modulation
25 pages
How To Back Up A Raspberry Pi SD Card - The Best Ways - All3DP
No ratings yet
How To Back Up A Raspberry Pi SD Card - The Best Ways - All3DP
1 page
Dissertation Voltaire
100% (2)
Dissertation Voltaire
6 pages
reducing-client-incidents-through-big-data-predictive-analytics
No ratings yet
reducing-client-incidents-through-big-data-predictive-analytics
10 pages
Mototrbo RSM Buyers Guide Eng
No ratings yet
Mototrbo RSM Buyers Guide Eng
12 pages
Feliix Lighting Catalog 2023 PDF
No ratings yet
Feliix Lighting Catalog 2023 PDF
494 pages
Unit 4
No ratings yet
Unit 4
9 pages
07 Rawlbolts Plugs Anchors
No ratings yet
07 Rawlbolts Plugs Anchors
1 page
Camuning Classroom Inventory Template
No ratings yet
Camuning Classroom Inventory Template
15 pages
Jason Brailow Announces The Global Launch of Brailow Media
No ratings yet
Jason Brailow Announces The Global Launch of Brailow Media
3 pages
ZaloPay API QuickPay Integration
No ratings yet
ZaloPay API QuickPay Integration
29 pages
CTSRE 2024
No ratings yet
CTSRE 2024
16 pages
important DS question
No ratings yet
important DS question
7 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
New Society Registraction Rajasthan
No ratings yet
New Society Registraction Rajasthan
17 pages
Bubbledeck design project 54-1
No ratings yet
Bubbledeck design project 54-1
48 pages
1482825639balmerol Protosteel 680
No ratings yet
1482825639balmerol Protosteel 680
2 pages
Societal Security-Business Continuity Management Systems-Guidance
No ratings yet
Societal Security-Business Continuity Management Systems-Guidance
10 pages
BBA 603 MANAGEMENT INFORMATION SYSTEM Notes Unit I and Unit II
No ratings yet
BBA 603 MANAGEMENT INFORMATION SYSTEM Notes Unit I and Unit II
20 pages

Diffusion: by Aryan Jain

Uploaded by

Diffusion: by Aryan Jain

Uploaded by

Diffusion

Assume that all data comes from a distribution pdata(x):

Turns out that for small enough forward steps, i.e.

Therefore, we can parametrize the learned reverse process as

Apply the same trick to diffusion:

The reverse step conditioned on x_0 is a Gaussian:

After doing some algebra, each loss term can be approximated by

Thus, our loss becomes

Reverse process: sampling

Diffusion models typically use a

Linear (top) vs Cosine (bottom)

The main idea is this:

BigGAN Diffusion Training Set

BigGAN Diffusion Training Set

Classifier guidance worked great but

Latent Diffusion Models can be divided into two stages:

a fox holding a video game controller a lobster playing the saxophone

Even applies to RL!

● Recall that diffusion is basically a

Disclaimer: some of the foundational

You might also like