TY - JOUR
T1 - Learning future representation with synthetic observations for sample-efficient reinforcement learning
AU - Liu, Xin
AU - Chen, Yaran
AU - Li, Haoran
AU - Zhao, Dongbin
N1 - Publisher Copyright:
© Science China Press 2025.
PY - 2025/5
Y1 - 2025/5
N2 - Image-based reinforcement learning (RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing self-supervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint, proposing a novel RL auxiliary task named learning future representation with synthetic observations (LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip (LNC) is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks (leading on 12/13 tasks), and enables advanced RL visual pre-training (outperforming the next best method by 1.51×) on action-free video demonstrations.
AB - Image-based reinforcement learning (RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing self-supervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint, proposing a novel RL auxiliary task named learning future representation with synthetic observations (LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip (LNC) is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks (leading on 12/13 tasks), and enables advanced RL visual pre-training (outperforming the next best method by 1.51×) on action-free video demonstrations.
KW - deep reinforcement learning (DRL)
KW - image-based RL
KW - RL for continuous control
KW - RL for embodied agents
KW - RL visual pre-training
KW - self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=105003858974&partnerID=8YFLogxK
U2 - 10.1007/s11432-024-4380-4
DO - 10.1007/s11432-024-4380-4
M3 - Article
AN - SCOPUS:105003858974
SN - 1674-733X
VL - 68
JO - Science China Information Sciences
JF - Science China Information Sciences
IS - 5
M1 - 150202
ER -