Probing and Inducing Combinational Creativity in Vision-Language Models

Peng, Yongqian; Ma, Yuxi; Wang, Mengmeng; Wang, Yuxuan; Wang, Yizhou; Zhang, Chi; Zhu, Yixin; Zheng, Zilong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.13120 (cs)

[Submitted on 17 Apr 2025 (v1), last revised 29 Apr 2025 (this version, v2)]

Title:Probing and Inducing Combinational Creativity in Vision-Language Models

Authors:Yongqian Peng, Yuxi Ma, Mengmeng Wang, Yuxuan Wang, Yizhou Wang, Chi Zhang, Yixin Zhu, Zilong Zheng

View PDF HTML (experimental)

Abstract:The ability to combine existing concepts into novel ideas stands as a fundamental hallmark of human intelligence. Recent advances in Vision-Language Models (VLMs) like GPT-4V and DALLE-3 have sparked debate about whether their outputs reflect combinational creativity--defined by M. A. Boden (1998) as synthesizing novel ideas through combining existing concepts--or sophisticated pattern matching of training data. Drawing inspiration from cognitive science, we investigate the combinational creativity of VLMs from the lens of concept blending. We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels: identifying input spaces, extracting shared attributes, and deriving novel semantic implications. To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework. Through extensive experiments, we demonstrate that in comprehension tasks, best VLMs have surpassed average human performance while falling short of expert-level understanding; in generation tasks, incorporating our IEI framework into the generation pipeline significantly enhances the creative quality of VLMs' outputs. Our findings establish both a theoretical foundation for evaluating artificial creativity and practical guidelines for improving creative generation in VLMs.

Comments:	Project page: this https URL The first two authors contribute equally
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2504.13120 [cs.CV]
	(or arXiv:2504.13120v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.13120

Submission history

From: Yongqian Peng [view email]
[v1] Thu, 17 Apr 2025 17:38:18 UTC (38,950 KB)
[v2] Tue, 29 Apr 2025 14:51:47 UTC (38,950 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Probing and Inducing Combinational Creativity in Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Probing and Inducing Combinational Creativity in Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators