ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

Xing, Eric; Kolouju, Pranavi; Pless, Robert; Stylianou, Abby; Jacobs, Nathan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2505.20764 (cs)

[Submitted on 27 May 2025]

Title:ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

Authors:Eric Xing, Pranavi Kolouju, Robert Pless, Abby Stylianou, Nathan Jacobs

View PDF HTML (experimental)

Abstract:Composed image retrieval (CIR) is the task of retrieving a target image specified by a query image and a relative text that describes a semantic modification to the query image. Existing methods in CIR struggle to accurately represent the image and the text modification, resulting in subpar performance. To address this limitation, we introduce a CIR framework, ConText-CIR, trained with a Text Concept-Consistency loss that encourages the representations of noun phrases in the text modification to better attend to the relevant parts of the query image. To support training with this loss function, we also propose a synthetic data generation pipeline that creates training data from existing CIR datasets or unlabeled images. We show that these components together enable stronger performance on CIR tasks, setting a new state-of-the-art in composed image retrieval in both the supervised and zero-shot settings on multiple benchmark datasets, including CIRR and CIRCO. Source code, model checkpoints, and our new datasets are available at this https URL.

Comments:	15 pages, 8 figures, 6 tables. CVPR 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2505.20764 [cs.CV]
	(or arXiv:2505.20764v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2505.20764

Submission history

From: Eric Xing [view email]
[v1] Tue, 27 May 2025 06:09:57 UTC (14,155 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators