Learning future representation with synthetic observations for sample-efficient reinforcement learning

Xin Liu, Yaran Chen*, Haoran Li, Dongbin Zhao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Image-based reinforcement learning (RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing self-supervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint, proposing a novel RL auxiliary task named learning future representation with synthetic observations (LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip (LNC) is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks (leading on 12/13 tasks), and enables advanced RL visual pre-training (outperforming the next best method by 1.51×) on action-free video demonstrations.

Original languageEnglish
Article number150202
JournalScience China Information Sciences
Volume68
Issue number5
DOIs
Publication statusPublished - May 2025
Externally publishedYes

Keywords

  • deep reinforcement learning (DRL)
  • image-based RL
  • RL for continuous control
  • RL for embodied agents
  • RL visual pre-training
  • self-supervised learning

Fingerprint

Dive into the research topics of 'Learning future representation with synthetic observations for sample-efficient reinforcement learning'. Together they form a unique fingerprint.

Cite this