Predicting Implicit Arguments in Procedural Video Instructions

Batra, Anil; Sevilla-Lara, Laura; Rohrbach, Marcus; Keller, Frank

Computer Science > Computation and Language

arXiv:2505.21068 (cs)

[Submitted on 27 May 2025]

Title:Predicting Implicit Arguments in Procedural Video Instructions

Authors:Anil Batra, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller

View PDF HTML (experimental)

Abstract:Procedural texts help AI enhance reasoning about context and action sequences. Transforming these into Semantic Role Labeling (SRL) improves understanding of individual steps by identifying predicate-argument structure like {verb,what,where/with}. Procedural instructions are highly elliptic, for instance, (i) add cucumber to the bowl and (ii) add sliced tomatoes, the second step's where argument is inferred from the context, referring to where the cucumber was placed. Prior SRL benchmarks often miss implicit arguments, leading to incomplete understanding. To address this, we introduce Implicit-VidSRL, a dataset that necessitates inferring implicit and explicit arguments from contextual information in multimodal cooking procedures. Our proposed dataset benchmarks multimodal models' contextual reasoning, requiring entity tracking through visual changes in recipes. We study recent multimodal LLMs and reveal that they struggle to predict implicit arguments of what and where/with from multi-modal procedural data given the verb. Lastly, we propose iSRL-Qwen2-VL, which achieves a 17% relative improvement in F1-score for what-implicit and a 14.7% for where/with-implicit semantic roles over GPT-4o.

Comments:	ACL 2025 Main
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2505.21068 [cs.CL]
	(or arXiv:2505.21068v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2505.21068

Submission history

From: Anil Batra [view email]
[v1] Tue, 27 May 2025 11:53:06 UTC (10,642 KB)

Computer Science > Computation and Language

Title:Predicting Implicit Arguments in Procedural Video Instructions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Predicting Implicit Arguments in Procedural Video Instructions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators