TY - GEN
T1 - Reinforcement Learning Algorithm for Two-Leg Robot with DDPG and TD3
AU - Li, Dexuan
AU - Jin, Nanlin
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Reinforcement Learning (RL) is becoming popular for two-legged robots to learn and improve, through trial and error to adjust their actions based on feedback. Deep RL, combining RL with deep learning, handles high-dimensional states and action spaces in robotics. Two important Deep RL algorithms are Deep Deterministic Policy Gradient (DDPG) and Twin Delayed DDPG (TD3). Our proposed new algorithm that improves TD3 involves to maximize cumulative rewards by interacting with the environment. The robot learns a policy through exploration and exploitation. This paper investigates the scenario of robots continuously walk without falling. Our experiments show that the two legged robot can automatously adapt to the environment by themselves, without human solving the problems for them. DDPG shows promise but suffers from instability and hyperparameter sensitivity. Our improved TD3 mitigates DDPG’s overestimation bias, improving stability and performance. This study also evaluates stability, convergence, and computational efficiency of DDPG and TD3.
AB - Reinforcement Learning (RL) is becoming popular for two-legged robots to learn and improve, through trial and error to adjust their actions based on feedback. Deep RL, combining RL with deep learning, handles high-dimensional states and action spaces in robotics. Two important Deep RL algorithms are Deep Deterministic Policy Gradient (DDPG) and Twin Delayed DDPG (TD3). Our proposed new algorithm that improves TD3 involves to maximize cumulative rewards by interacting with the environment. The robot learns a policy through exploration and exploitation. This paper investigates the scenario of robots continuously walk without falling. Our experiments show that the two legged robot can automatously adapt to the environment by themselves, without human solving the problems for them. DDPG shows promise but suffers from instability and hyperparameter sensitivity. Our improved TD3 mitigates DDPG’s overestimation bias, improving stability and performance. This study also evaluates stability, convergence, and computational efficiency of DDPG and TD3.
KW - Continuous Control
KW - DDPG algorithm
KW - Robotics
KW - TD3 algorithm
UR - http://www.scopus.com/inward/record.url?scp=105002720965&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-3949-6_2
DO - 10.1007/978-981-96-3949-6_2
M3 - Conference Proceeding
AN - SCOPUS:105002720965
SN - 9789819639489
T3 - Lecture Notes in Networks and Systems
SP - 10
EP - 23
BT - Selected Proceedings from the 2nd International Conference on Intelligent Manufacturing and Robotics, ICIMR 2024 - Advances in Intelligent Manufacturing and Robotics
A2 - Chen, Wei
A2 - Ping Tan, Andrew Huey
A2 - Luo, Yang
A2 - Huang, Long
A2 - Zhu, Yuyi
A2 - PP Abdul Majeed, Anwar
A2 - Zhang, Fan
A2 - Yan, Yuyao
A2 - Liu, Chenguang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 2nd International Conference on Intelligent Manufacturing and Robotics, ICIMR 2024
Y2 - 22 August 2024 through 23 August 2024
ER -