Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

Chang, Yuanyuan; Yao, Yinghua; Qin, Tao; Wang, Mengmeng; Tsang, Ivor; Dai, Guang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2505.14254 (cs)

[Submitted on 20 May 2025]

Title:Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

Authors:Yuanyuan Chang, Yinghua Yao, Tao Qin, Mengmeng Wang, Ivor Tsang, Guang Dai

View PDF HTML (experimental)

Abstract:Text-to-image diffusion models have emerged as powerful tools for high-quality image generation and editing. Many existing approaches rely on text prompts as editing guidance. However, these methods are constrained by the need for manual prompt crafting, which can be time-consuming, introduce irrelevant details, and significantly limit editing performance. In this work, we propose optimizing semantic embeddings guided by attribute classifiers to steer text-to-image models toward desired edits, without relying on text prompts or requiring any training or fine-tuning of the diffusion model. We utilize classifiers to learn precise semantic embeddings at the dataset level. The learned embeddings are theoretically justified as the optimal representation of attribute semantics, enabling disentangled and accurate edits. Experiments further demonstrate that our method achieves high levels of disentanglement and strong generalization across different domains of data.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2505.14254 [cs.CV]
	(or arXiv:2505.14254v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2505.14254

Submission history

From: Yuanyuan Chang [view email]
[v1] Tue, 20 May 2025 12:07:01 UTC (33,001 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators