DePlot: One-shot visual language reasoning by plot-to-table translation

Liu, Fangyu; Eisenschlos, Julian Martin; Piccinno, Francesco; Krichene, Syrine; Pang, Chenxi; Lee, Kenton; Joshi, Mandar; Chen, Wenhu; Collier, Nigel; Altun, Yasemin

Computer Science > Computation and Language

arXiv:2212.10505 (cs)

[Submitted on 20 Dec 2022 (v1), last revised 23 May 2023 (this version, v2)]

Title:DePlot: One-shot visual language reasoning by plot-to-table translation

Authors:Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun

View PDF

Abstract:Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

Comments:	ACL 2023 (Findings)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2212.10505 [cs.CL]
	(or arXiv:2212.10505v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.10505

Submission history

From: Fangyu Liu [view email]
[v1] Tue, 20 Dec 2022 18:20:50 UTC (7,307 KB)
[v2] Tue, 23 May 2023 18:28:39 UTC (8,875 KB)

Computer Science > Computation and Language

Title:DePlot: One-shot visual language reasoning by plot-to-table translation

Submission history

Access Paper:

References & Citations

1 blog link

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:DePlot: One-shot visual language reasoning by plot-to-table translation

Submission history

Access Paper:

References & Citations

1 blog link

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators