Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Ma, Xuran; Liu, Yexin; Liu, Yaofu; Wu, Xianfeng; Zheng, Mingzhe; Wang, Zihao; Lim, Ser-Nam; Yang, Harry

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.03140 (cs)

[Submitted on 4 Apr 2025]

Title:Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Authors:Xuran Ma, Yexin Liu, Yaofu Liu, Xianfeng Wu, Mingzhe Zheng, Zihao Wang, Ser-Nam Lim, Harry Yang

View PDF HTML (experimental)

Abstract:Recent advances in diffusion models have demonstrated remarkable capabilities in video generation. However, the computational intensity remains a significant challenge for practical applications. While feature caching has been proposed to reduce the computational burden of diffusion models, existing methods typically overlook the heterogeneous significance of individual blocks, resulting in suboptimal reuse and degraded output quality. To this end, we address this gap by introducing ProfilingDiT, a novel adaptive caching strategy that explicitly disentangles foreground and background-focused blocks. Through a systematic analysis of attention distributions in diffusion models, we reveal a key observation: 1) Most layers exhibit a consistent preference for either foreground or background regions. 2) Predicted noise shows low inter-step similarity initially, which stabilizes as denoising progresses. This finding inspires us to formulate a selective caching strategy that preserves full computation for dynamic foreground elements while efficiently caching static background features. Our approach substantially reduces computational overhead while preserving visual fidelity. Extensive experiments demonstrate that our framework achieves significant acceleration (e.g., 2.01 times speedup for Wan2.1) while maintaining visual fidelity across comprehensive quality metrics, establishing a viable method for efficient video generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.03140 [cs.CV]
	(or arXiv:2504.03140v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.03140

Submission history

From: Xuran Ma [view email]
[v1] Fri, 4 Apr 2025 03:30:15 UTC (3,818 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators