Gamified crowd-sourcing of high-quality data for visual fine-tuning

Yadav, Shashank; Tomar, Rohan; Jain, Garvit; Ahooja, Chirag; Chaudhary, Shubham; Elkan, Charles

Computer Science > Artificial Intelligence

arXiv:2410.04038 (cs)

[Submitted on 5 Oct 2024 (v1), last revised 8 Oct 2024 (this version, v2)]

Title:Gamified crowd-sourcing of high-quality data for visual fine-tuning

Authors:Shashank Yadav, Rohan Tomar, Garvit Jain, Chirag Ahooja, Shubham Chaudhary, Charles Elkan

View PDF HTML (experimental)

Abstract:This paper introduces Gamified Adversarial Prompting (GAP), a framework that crowd-sources high-quality data for visual instruction tuning of large multimodal models. GAP transforms the data collection process into an engaging game, incentivizing players to provide fine-grained, challenging questions and answers that target gaps in the model's knowledge. Our contributions include (1) an approach to capture question-answer pairs from humans that directly address weaknesses in a model's knowledge, (2) a method for evaluating and rewarding players that successfully incentivizes them to provide high-quality submissions, and (3) a scalable, gamified platform that succeeds in collecting this data from over 50,000 participants in just a few weeks. Our implementation of GAP has significantly improved the accuracy of a small multimodal model, namely MiniCPM-Llama3-V-2.5-8B, increasing its GPT score from 0.147 to 0.477 on our dataset, approaching the benchmark set by the much larger GPT-4V. Moreover, we demonstrate that the data generated using MiniCPM-Llama3-V-2.5-8B also enhances its performance across other benchmarks, and exhibits cross-model benefits. Specifically, the same data improves the performance of QWEN2-VL-2B and QWEN2-VL-7B on the same multiple benchmarks.

Subjects:	Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2410.04038 [cs.AI]
	(or arXiv:2410.04038v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2410.04038

Submission history

From: Shashank Yadav [view email]
[v1] Sat, 5 Oct 2024 05:10:29 UTC (8,675 KB)
[v2] Tue, 8 Oct 2024 02:37:41 UTC (8,675 KB)

Computer Science > Artificial Intelligence

Title:Gamified crowd-sourcing of high-quality data for visual fine-tuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Gamified crowd-sourcing of high-quality data for visual fine-tuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators