AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

Zhaxizhuoma, Zhaxizhuoma; Chen, Pengan; Wu, Ziniu; Sun, Jiawei; Wang, Dong; Zhou, Peng; Cao, Nieqing; Ding, Yan; Zhao, Bin; Li, Xuelong

Computer Science > Robotics

arXiv:2409.11905 (cs)

[Submitted on 18 Sep 2024 (v1), last revised 21 Mar 2025 (this version, v2)]

Title:AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

Authors:Zhaxizhuoma Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li

View PDF HTML (experimental)

Abstract:This paper presents AlignBot, a novel framework designed to optimize VLM-powered customized task planning for household robots by effectively aligning with user reminders. In domestic settings, aligning task planning with user reminders poses significant challenges due to the limited quantity, diversity, and multimodal nature of the reminders. To address these challenges, AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for GPT-4o. This adapter model internalizes diverse forms of user reminders-such as personalized preferences, corrective guidance, and contextual assistance-into structured instruction-formatted cues that prompt GPT-4o in generating customized task plans. Additionally, AlignBot integrates a dynamic retrieval mechanism that selects task-relevant historical successes as prompts for GPT-4o, further enhancing task planning accuracy. To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings. A multimodal dataset with over 1,500 entries derived from volunteer reminders is used for training and evaluation. The results demonstrate that AlignBot significantly improves customized task planning, outperforming existing LLM- and VLM-powered planners by interpreting and aligning with user reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline at 21.6%, reflecting a 65% improvement and over four times greater effectiveness. Supplementary materials are available at: this https URL

Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2409.11905 [cs.RO]
	(or arXiv:2409.11905v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2409.11905

Submission history

From: Pengan Chen [view email]
[v1] Wed, 18 Sep 2024 12:05:30 UTC (32,318 KB)
[v2] Fri, 21 Mar 2025 04:40:24 UTC (32,324 KB)

Computer Science > Robotics

Title:AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators