RoboGPT: an LLM-based Long-term Decision-making Embodied Agent for Instruction Following Tasks

Yaran Chen; Wenbo Cui; Yuanwen Chen; Mining Tan; Xinyao Zhang; Jinrui Liu; Haoran Li; Dongbin Zhao; He Wang

doi:10.1109/TCDS.2025.3543364

RoboGPT: an LLM-based Long-term Decision-making Embodied Agent for Instruction Following Tasks

Yaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Jinrui Liu, Haoran Li, Dongbin Zhao^*, He Wang

^*Corresponding author for this work

Department of Intelligent Science

Research output: Contribution to journal › Article › peer-review

Abstract

Robotic agents are tasked with mastering common sense and making long-term sequential decisions to execute daily tasks based on natural language instructions. Recent advancements in Large Language Models (LLMs) have catalyzed efforts for complex robotic planning. However, despite their superior generalization and comprehension capabilities, LLM task plans sometimes suffer from issues of accuracy and feasibility. To address these challenges, we propose RoboGPT¹¹For more details, please refer to our project page https://github.com/Cwb0106/RoboGPT., an embodied agent specifically designed to make long-term decisions for instruction following tasks. RoboGPT integrates three key modules: 1) RoboPlanner, an LLM-based planning module equipped with 67K embodied planning data, breaks down tasks into logical subgoals. We compile a new robotic dataset using a template feedback-based self-instruction method to fine-tune the Llama model. RoboPlanner with strong generalization can plan hundreds of instruction following tasks; 2) RoboSkill, customized for each subgoal to improve navigation and manipulation capabilities; 3) Re-Plan, a module that dynamically adjusts the subgoals based on real-time environmental feedback. By utilizing the precise semantic map generated by RoboSkill, the target objects can be replaced by calculating the similarity between subgoals and the objects present in the environment. Experimental results demonstrate that RoboGPT exceeds the performance of other state-of-the-art (SOTA) methods, particularly LLM-based methods, in terms of task planning rationality for hundreds of unseen daily tasks and even tasks from other domains.

Original language	English
Journal	IEEE Transactions on Cognitive and Developmental Systems
DOIs	https://doi.org/10.1109/TCDS.2025.3543364
Publication status	Accepted/In press - 2025

Keywords

Daily instruction following tasks
Embodied AI
Embodied planning
Large language model
Self-instruction data generation

Access to Document

10.1109/TCDS.2025.3543364

Cite this

@article{0f54fa3db6be4aa0a19b3b8f3d50e109,

title = "RoboGPT: an LLM-based Long-term Decision-making Embodied Agent for Instruction Following Tasks",

abstract = "Robotic agents are tasked with mastering common sense and making long-term sequential decisions to execute daily tasks based on natural language instructions. Recent advancements in Large Language Models (LLMs) have catalyzed efforts for complex robotic planning. However, despite their superior generalization and comprehension capabilities, LLM task plans sometimes suffer from issues of accuracy and feasibility. To address these challenges, we propose RoboGPT11For more details, please refer to our project page https://github.com/Cwb0106/RoboGPT., an embodied agent specifically designed to make long-term decisions for instruction following tasks. RoboGPT integrates three key modules: 1) RoboPlanner, an LLM-based planning module equipped with 67K embodied planning data, breaks down tasks into logical subgoals. We compile a new robotic dataset using a template feedback-based self-instruction method to fine-tune the Llama model. RoboPlanner with strong generalization can plan hundreds of instruction following tasks; 2) RoboSkill, customized for each subgoal to improve navigation and manipulation capabilities; 3) Re-Plan, a module that dynamically adjusts the subgoals based on real-time environmental feedback. By utilizing the precise semantic map generated by RoboSkill, the target objects can be replaced by calculating the similarity between subgoals and the objects present in the environment. Experimental results demonstrate that RoboGPT exceeds the performance of other state-of-the-art (SOTA) methods, particularly LLM-based methods, in terms of task planning rationality for hundreds of unseen daily tasks and even tasks from other domains.",

keywords = "Daily instruction following tasks, Embodied AI, Embodied planning, Large language model, Self-instruction data generation",

author = "Yaran Chen and Wenbo Cui and Yuanwen Chen and Mining Tan and Xinyao Zhang and Jinrui Liu and Haoran Li and Dongbin Zhao and He Wang",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2025",

doi = "10.1109/TCDS.2025.3543364",

language = "English",

journal = "IEEE Transactions on Cognitive and Developmental Systems",

issn = "2379-8920",

}

TY - JOUR

T1 - RoboGPT

T2 - an LLM-based Long-term Decision-making Embodied Agent for Instruction Following Tasks

AU - Chen, Yaran

AU - Cui, Wenbo

AU - Chen, Yuanwen

AU - Tan, Mining

AU - Zhang, Xinyao

AU - Liu, Jinrui

AU - Li, Haoran

AU - Zhao, Dongbin

AU - Wang, He

PY - 2025

Y1 - 2025

N2 - Robotic agents are tasked with mastering common sense and making long-term sequential decisions to execute daily tasks based on natural language instructions. Recent advancements in Large Language Models (LLMs) have catalyzed efforts for complex robotic planning. However, despite their superior generalization and comprehension capabilities, LLM task plans sometimes suffer from issues of accuracy and feasibility. To address these challenges, we propose RoboGPT11For more details, please refer to our project page https://github.com/Cwb0106/RoboGPT., an embodied agent specifically designed to make long-term decisions for instruction following tasks. RoboGPT integrates three key modules: 1) RoboPlanner, an LLM-based planning module equipped with 67K embodied planning data, breaks down tasks into logical subgoals. We compile a new robotic dataset using a template feedback-based self-instruction method to fine-tune the Llama model. RoboPlanner with strong generalization can plan hundreds of instruction following tasks; 2) RoboSkill, customized for each subgoal to improve navigation and manipulation capabilities; 3) Re-Plan, a module that dynamically adjusts the subgoals based on real-time environmental feedback. By utilizing the precise semantic map generated by RoboSkill, the target objects can be replaced by calculating the similarity between subgoals and the objects present in the environment. Experimental results demonstrate that RoboGPT exceeds the performance of other state-of-the-art (SOTA) methods, particularly LLM-based methods, in terms of task planning rationality for hundreds of unseen daily tasks and even tasks from other domains.

AB - Robotic agents are tasked with mastering common sense and making long-term sequential decisions to execute daily tasks based on natural language instructions. Recent advancements in Large Language Models (LLMs) have catalyzed efforts for complex robotic planning. However, despite their superior generalization and comprehension capabilities, LLM task plans sometimes suffer from issues of accuracy and feasibility. To address these challenges, we propose RoboGPT11For more details, please refer to our project page https://github.com/Cwb0106/RoboGPT., an embodied agent specifically designed to make long-term decisions for instruction following tasks. RoboGPT integrates three key modules: 1) RoboPlanner, an LLM-based planning module equipped with 67K embodied planning data, breaks down tasks into logical subgoals. We compile a new robotic dataset using a template feedback-based self-instruction method to fine-tune the Llama model. RoboPlanner with strong generalization can plan hundreds of instruction following tasks; 2) RoboSkill, customized for each subgoal to improve navigation and manipulation capabilities; 3) Re-Plan, a module that dynamically adjusts the subgoals based on real-time environmental feedback. By utilizing the precise semantic map generated by RoboSkill, the target objects can be replaced by calculating the similarity between subgoals and the objects present in the environment. Experimental results demonstrate that RoboGPT exceeds the performance of other state-of-the-art (SOTA) methods, particularly LLM-based methods, in terms of task planning rationality for hundreds of unseen daily tasks and even tasks from other domains.

KW - Daily instruction following tasks

KW - Embodied AI

KW - Embodied planning

KW - Large language model

KW - Self-instruction data generation

UR - http://www.scopus.com/inward/record.url?scp=85218796371&partnerID=8YFLogxK

U2 - 10.1109/TCDS.2025.3543364

DO - 10.1109/TCDS.2025.3543364

M3 - Article

AN - SCOPUS:85218796371

SN - 2379-8920

JO - IEEE Transactions on Cognitive and Developmental Systems

JF - IEEE Transactions on Cognitive and Developmental Systems

ER -

RoboGPT: an LLM-based Long-term Decision-making Embodied Agent for Instruction Following Tasks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this