TY - JOUR
T1 - RoboGPT
T2 - an LLM-based Long-term Decision-making Embodied Agent for Instruction Following Tasks
AU - Chen, Yaran
AU - Cui, Wenbo
AU - Chen, Yuanwen
AU - Tan, Mining
AU - Zhang, Xinyao
AU - Liu, Jinrui
AU - Li, Haoran
AU - Zhao, Dongbin
AU - Wang, He
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2025
Y1 - 2025
N2 - Robotic agents are tasked with mastering common sense and making long-term sequential decisions to execute daily tasks based on natural language instructions. Recent advancements in Large Language Models (LLMs) have catalyzed efforts for complex robotic planning. However, despite their superior generalization and comprehension capabilities, LLM task plans sometimes suffer from issues of accuracy and feasibility. To address these challenges, we propose RoboGPT11For more details, please refer to our project page https://github.com/Cwb0106/RoboGPT., an embodied agent specifically designed to make long-term decisions for instruction following tasks. RoboGPT integrates three key modules: 1) RoboPlanner, an LLM-based planning module equipped with 67K embodied planning data, breaks down tasks into logical subgoals. We compile a new robotic dataset using a template feedback-based self-instruction method to fine-tune the Llama model. RoboPlanner with strong generalization can plan hundreds of instruction following tasks; 2) RoboSkill, customized for each subgoal to improve navigation and manipulation capabilities; 3) Re-Plan, a module that dynamically adjusts the subgoals based on real-time environmental feedback. By utilizing the precise semantic map generated by RoboSkill, the target objects can be replaced by calculating the similarity between subgoals and the objects present in the environment. Experimental results demonstrate that RoboGPT exceeds the performance of other state-of-the-art (SOTA) methods, particularly LLM-based methods, in terms of task planning rationality for hundreds of unseen daily tasks and even tasks from other domains.
AB - Robotic agents are tasked with mastering common sense and making long-term sequential decisions to execute daily tasks based on natural language instructions. Recent advancements in Large Language Models (LLMs) have catalyzed efforts for complex robotic planning. However, despite their superior generalization and comprehension capabilities, LLM task plans sometimes suffer from issues of accuracy and feasibility. To address these challenges, we propose RoboGPT11For more details, please refer to our project page https://github.com/Cwb0106/RoboGPT., an embodied agent specifically designed to make long-term decisions for instruction following tasks. RoboGPT integrates three key modules: 1) RoboPlanner, an LLM-based planning module equipped with 67K embodied planning data, breaks down tasks into logical subgoals. We compile a new robotic dataset using a template feedback-based self-instruction method to fine-tune the Llama model. RoboPlanner with strong generalization can plan hundreds of instruction following tasks; 2) RoboSkill, customized for each subgoal to improve navigation and manipulation capabilities; 3) Re-Plan, a module that dynamically adjusts the subgoals based on real-time environmental feedback. By utilizing the precise semantic map generated by RoboSkill, the target objects can be replaced by calculating the similarity between subgoals and the objects present in the environment. Experimental results demonstrate that RoboGPT exceeds the performance of other state-of-the-art (SOTA) methods, particularly LLM-based methods, in terms of task planning rationality for hundreds of unseen daily tasks and even tasks from other domains.
KW - Daily instruction following tasks
KW - Embodied AI
KW - Embodied planning
KW - Large language model
KW - Self-instruction data generation
UR - http://www.scopus.com/inward/record.url?scp=85218796371&partnerID=8YFLogxK
U2 - 10.1109/TCDS.2025.3543364
DO - 10.1109/TCDS.2025.3543364
M3 - Article
AN - SCOPUS:85218796371
SN - 2379-8920
JO - IEEE Transactions on Cognitive and Developmental Systems
JF - IEEE Transactions on Cognitive and Developmental Systems
ER -