Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping

Ding, Ning; Han, Jing; Tian, Yuchuan; Xu, Chao; Han, Kai; Tang, Yehui

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.06930 (cs)

[Submitted on 10 Mar 2025 (v1), last revised 29 Mar 2025 (this version, v2)]

Title:Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping

Authors:Ning Ding, Jing Han, Yuchuan Tian, Chao Xu, Kai Han, Yehui Tang

View PDF HTML (experimental)

Abstract:Diffusion Transformer (DiT) has now become the preferred choice for building image generation models due to its great generation capability. Unlike previous convolution-based UNet models, DiT is purely composed of a stack of transformer blocks, which renders DiT excellent in scalability like large language models. However, the growing model size and multi-step sampling paradigm bring about considerable pressure on deployment and inference. In this work, we propose a post-training quantization framework tailored for Diffusion Transforms to tackle these challenges. We firstly locate that the quantization difficulty of DiT mainly originates from the time-dependent channel-specific outliers. We propose a timestep-aware shift-and-scale strategy to smooth the activation distribution to reduce the quantization error. Secondly, based on the observation that activations of adjacent timesteps have similar distributions, we utilize a hierarchical clustering scheme to divide the denoising timesteps into multiple groups. We further design a re-parameterization scheme which absorbs the quantization parameters into nearby module to avoid redundant computations. Comprehensive experiments demonstrate that out PTQ method successfully quantize the Diffusion Transformer into 8-bit weight and 8-bit activation (W8A8) with state-of-the-art FiD score. And our method can further quantize DiT model into 4-bit weight and 8-bit activation (W4A8) without sacrificing generation quality.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.06930 [cs.CV]
	(or arXiv:2503.06930v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.06930

Submission history

From: Ning Ding [view email]
[v1] Mon, 10 Mar 2025 05:21:04 UTC (839 KB)
[v2] Sat, 29 Mar 2025 06:37:07 UTC (842 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators