RoboGPT: an LLM-based Long-term Decision-making Embodied Agent for Instruction Following Tasks

Yaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Jinrui Liu, Haoran Li, Dongbin Zhao*, He Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Robotic agents are tasked with mastering common sense and making long-term sequential decisions to execute daily tasks based on natural language instructions. Recent advancements in Large Language Models (LLMs) have catalyzed efforts for complex robotic planning. However, despite their superior generalization and comprehension capabilities, LLM task plans sometimes suffer from issues of accuracy and feasibility. To address these challenges, we propose RoboGPT11For more details, please refer to our project page https://github.com/Cwb0106/RoboGPT., an embodied agent specifically designed to make long-term decisions for instruction following tasks. RoboGPT integrates three key modules: 1) RoboPlanner, an LLM-based planning module equipped with 67K embodied planning data, breaks down tasks into logical subgoals. We compile a new robotic dataset using a template feedback-based self-instruction method to fine-tune the Llama model. RoboPlanner with strong generalization can plan hundreds of instruction following tasks; 2) RoboSkill, customized for each subgoal to improve navigation and manipulation capabilities; 3) Re-Plan, a module that dynamically adjusts the subgoals based on real-time environmental feedback. By utilizing the precise semantic map generated by RoboSkill, the target objects can be replaced by calculating the similarity between subgoals and the objects present in the environment. Experimental results demonstrate that RoboGPT exceeds the performance of other state-of-the-art (SOTA) methods, particularly LLM-based methods, in terms of task planning rationality for hundreds of unseen daily tasks and even tasks from other domains.

Original languageEnglish
JournalIEEE Transactions on Cognitive and Developmental Systems
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • Daily instruction following tasks
  • Embodied AI
  • Embodied planning
  • Large language model
  • Self-instruction data generation

Fingerprint

Dive into the research topics of 'RoboGPT: an LLM-based Long-term Decision-making Embodied Agent for Instruction Following Tasks'. Together they form a unique fingerprint.

Cite this