Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models

Polverini, Giulia; Gregorcic, Bor

doi:10.3389/feduc.2024.1452414

Physics > Physics Education

arXiv:2406.14685 (physics)

[Submitted on 20 Jun 2024 (v1), last revised 23 Oct 2024 (this version, v2)]

Title:Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models

Authors:Giulia Polverini, Bor Gregorcic

View PDF

Abstract:This study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring LMM-based chatbots' visual interpretation abilities. We evaluated both freely available chatbots (Gemini 1.0 Pro, Claude 3 Sonnet, Microsoft Copilot, and ChatGPT-4o) and subscription-based ones (Gemini 1.0 Ultra, Gemini 1.5 Pro API, Claude 3 Opus, and ChatGPT-4). We found that OpenAI's chatbots outperform all the others, with ChatGPT-4o showing the overall best performance. Contrary to expectations, we found no notable differences in the overall performance between freely available and subscription-based versions of Gemini and Claude 3 chatbots, with the exception of Gemini 1.5 Pro, available via API. In addition, we found that tasks relying more heavily on linguistic input were generally easier for chatbots than those requiring visual interpretation. The study provides a basis for considerations of LMM-based chatbot applications in STEM and medical education, and suggests directions for future research.

Subjects:	Physics Education (physics.ed-ph)
Cite as:	arXiv:2406.14685 [physics.ed-ph]
	(or arXiv:2406.14685v2 [physics.ed-ph] for this version)
	https://doi.org/10.48550/arXiv.2406.14685
Related DOI:	https://doi.org/10.3389/feduc.2024.1452414

Submission history

From: Giulia Polverini [view email]
[v1] Thu, 20 Jun 2024 19:17:59 UTC (4,542 KB)
[v2] Wed, 23 Oct 2024 07:41:07 UTC (1,478 KB)

Physics > Physics Education

Title:Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Physics > Physics Education

Title:Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators