Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs > arXiv:2311.13833

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.13833 (cs)
[Submitted on 23 Nov 2023 (v1), last revised 27 Sep 2024 (this version, v2)]

Title:Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models

Authors:Saman Motamed, Danda Pani Paudel, Luc Van Gool
View a PDF of the paper titled Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models, by Saman Motamed and Danda Pani Paudel and Luc Van Gool
View PDF
Abstract:Text-to-Image (T2I) models excel at synthesizing concepts such as nouns, appearances, and styles. To enable customized content creation based on a few example images of a concept, methods such as Textual Inversion and DreamBooth invert the desired concept and enable synthesizing it in new scenes. However, inverting personalized concepts that go beyond object appearance and style (adjectives and verbs) through natural language remains a challenge. Two key characteristics of these concepts contribute to the limitations of current inversion methods. 1) Adjectives and verbs are entangled with nouns (subject) and can hinder appearance-based inversion methods, where the subject appearance leaks into the concept embedding, and 2) describing such concepts often extends beyond single word embeddings.
In this study, we introduce Lego, a textual inversion method designed to invert subject-entangled concepts from a few example images. Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step and employs a Context Loss that guides the inversion of single/multi-embedding concepts. In a thorough user study, Lego-generated concepts were preferred over 70% of the time when compared to the baseline in terms of authentically generating concepts according to a reference. Additionally, visual question answering using an LLM suggested Lego-generated concepts are better aligned with the text description of the concept.
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2311.13833 [cs.CV]
  (or arXiv:2311.13833v2 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2311.13833
arXiv-issued DOI via DataCite

Submission history

From: Saman Motamed [view email]
[v1] Thu, 23 Nov 2023 07:33:38 UTC (35,506 KB)
[v2] Fri, 27 Sep 2024 14:04:35 UTC (46,959 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models, by Saman Motamed and Danda Pani Paudel and Luc Van Gool
  • View PDF
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs.CV
< prev   |   next >
new | recent | 2023-11
Change to browse by:
cs
cs.CL
cs.LG

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack