Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion

Hoogeboom, Emiel; Mensink, Thomas; Heek, Jonathan; Lamerigts, Kay; Gao, Ruiqi; Salimans, Tim

Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.19324 (cs)

[Submitted on 25 Oct 2024 (v1), last revised 22 Mar 2025 (this version, v2)]

Title:Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion

Authors:Emiel Hoogeboom, Thomas Mensink, Jonathan Heek, Kay Lamerigts, Ruiqi Gao, Tim Salimans

View PDF HTML (experimental)

Abstract:Latent diffusion models have become the popular choice for scaling up diffusion models for high resolution image synthesis. Compared to pixel-space models that are trained end-to-end, latent models are perceived to be more efficient and to produce higher image quality at high resolution. Here we challenge these notions, and show that pixel-space models can be very competitive to latent models both in quality and efficiency, achieving 1.5 FID on ImageNet512 and new SOTA results on ImageNet128, ImageNet256 and Kinetics600.
We present a simple recipe for scaling end-to-end pixel-space diffusion models to high resolutions. 1: Use the sigmoid loss-weighting (Kingma & Gao, 2023) with our prescribed hyper-parameters. 2: Use our simplified memory-efficient architecture with fewer skip-connections. 3: Scale the model to favor processing the image at a high resolution with fewer parameters, rather than using more parameters at a lower resolution. Combining these with guidance intervals, we obtain a family of pixel-space diffusion models we call Simpler Diffusion (SiD2).

Comments:	Accepted to CVPR 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2410.19324 [cs.CV]
	(or arXiv:2410.19324v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.19324

Submission history

From: Emiel Hoogeboom [view email]
[v1] Fri, 25 Oct 2024 06:20:06 UTC (490 KB)
[v2] Sat, 22 Mar 2025 19:42:20 UTC (4,988 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators