Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models

Motamed, Saman; Paudel, Danda Pani; Van Gool, Luc

Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.13833 (cs)

[Submitted on 23 Nov 2023 (v1), last revised 27 Sep 2024 (this version, v2)]

Title:Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models

Authors:Saman Motamed, Danda Pani Paudel, Luc Van Gool

View PDF

Abstract:Text-to-Image (T2I) models excel at synthesizing concepts such as nouns, appearances, and styles. To enable customized content creation based on a few example images of a concept, methods such as Textual Inversion and DreamBooth invert the desired concept and enable synthesizing it in new scenes. However, inverting personalized concepts that go beyond object appearance and style (adjectives and verbs) through natural language remains a challenge. Two key characteristics of these concepts contribute to the limitations of current inversion methods. 1) Adjectives and verbs are entangled with nouns (subject) and can hinder appearance-based inversion methods, where the subject appearance leaks into the concept embedding, and 2) describing such concepts often extends beyond single word embeddings.
In this study, we introduce Lego, a textual inversion method designed to invert subject-entangled concepts from a few example images. Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step and employs a Context Loss that guides the inversion of single/multi-embedding concepts. In a thorough user study, Lego-generated concepts were preferred over 70% of the time when compared to the baseline in terms of authentically generating concepts according to a reference. Additionally, visual question answering using an LLM suggested Lego-generated concepts are better aligned with the text description of the concept.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2311.13833 [cs.CV]
	(or arXiv:2311.13833v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.13833

Submission history

From: Saman Motamed [view email]
[v1] Thu, 23 Nov 2023 07:33:38 UTC (35,506 KB)
[v2] Fri, 27 Sep 2024 14:04:35 UTC (46,959 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators