Learning future representation with synthetic observations for sample-efficient reinforcement learning

Xin Liu; Yaran Chen; Haoran Li; Dongbin Zhao

doi:10.1007/s11432-024-4380-4

Learning future representation with synthetic observations for sample-efficient reinforcement learning

Xin Liu, Yaran Chen^*, Haoran Li, Dongbin Zhao

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Image-based reinforcement learning (RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing self-supervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint, proposing a novel RL auxiliary task named learning future representation with synthetic observations (LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip (LNC) is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks (leading on 12/13 tasks), and enables advanced RL visual pre-training (outperforming the next best method by 1.51×) on action-free video demonstrations.

Original language	English
Article number	150202
Journal	Science China Information Sciences
Volume	68
Issue number	5
DOIs	https://doi.org/10.1007/s11432-024-4380-4
Publication status	Published - May 2025
Externally published	Yes

Keywords

deep reinforcement learning (DRL)
image-based RL
RL for continuous control
RL for embodied agents
RL visual pre-training
self-supervised learning

Access to Document

10.1007/s11432-024-4380-4

Cite this

@article{672dfa2e47b248fb8d48637474968c03,

title = "Learning future representation with synthetic observations for sample-efficient reinforcement learning",

abstract = "Image-based reinforcement learning (RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing self-supervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint, proposing a novel RL auxiliary task named learning future representation with synthetic observations (LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip (LNC) is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks (leading on 12/13 tasks), and enables advanced RL visual pre-training (outperforming the next best method by 1.51×) on action-free video demonstrations.",

keywords = "deep reinforcement learning (DRL), image-based RL, RL for continuous control, RL for embodied agents, RL visual pre-training, self-supervised learning",

author = "Xin Liu and Yaran Chen and Haoran Li and Dongbin Zhao",

note = "Publisher Copyright: {\textcopyright} Science China Press 2025.",

year = "2025",

month = may,

doi = "10.1007/s11432-024-4380-4",

language = "English",

volume = "68",

journal = "Science China Information Sciences",

issn = "1674-733X",

number = "5",

}

TY - JOUR

T1 - Learning future representation with synthetic observations for sample-efficient reinforcement learning

AU - Liu, Xin

AU - Chen, Yaran

AU - Li, Haoran

AU - Zhao, Dongbin

N1 - Publisher Copyright: © Science China Press 2025.

PY - 2025/5

Y1 - 2025/5

N2 - Image-based reinforcement learning (RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing self-supervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint, proposing a novel RL auxiliary task named learning future representation with synthetic observations (LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip (LNC) is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks (leading on 12/13 tasks), and enables advanced RL visual pre-training (outperforming the next best method by 1.51×) on action-free video demonstrations.

AB - Image-based reinforcement learning (RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing self-supervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint, proposing a novel RL auxiliary task named learning future representation with synthetic observations (LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip (LNC) is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks (leading on 12/13 tasks), and enables advanced RL visual pre-training (outperforming the next best method by 1.51×) on action-free video demonstrations.

KW - deep reinforcement learning (DRL)

KW - image-based RL

KW - RL for continuous control

KW - RL for embodied agents

KW - RL visual pre-training

KW - self-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=105003858974&partnerID=8YFLogxK

U2 - 10.1007/s11432-024-4380-4

DO - 10.1007/s11432-024-4380-4

M3 - Article

AN - SCOPUS:105003858974

SN - 1674-733X

VL - 68

JO - Science China Information Sciences

JF - Science China Information Sciences

IS - 5

M1 - 150202

ER -

Learning future representation with synthetic observations for sample-efficient reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this