Enhancing Reinforcement Learning via Transformer-Based State Predictive Representations

Minsong Liu, Yuanheng Zhu*, Yaran Chen, Dongbin Zhao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Enhancing state representations can effectively mitigate the issue of low sample efficiency in reinforcement learning (RL) within high-dimensional input environments. Existing methods attempt to improve sample efficiency by learning predictive state representations from sequence data. However, there still remain significant challenges in achieving a comprehensive understanding and learning of information within long sequences. Motivated by this, we introduce a transformer-based state predictive representations (TSPR)11Our code will be released at https://github.com/gourmet-liu/TSPR auxiliary task that promotes better representation learning through self-supervised goals. Specifically, we design a transformer-based predictive model to establish unidirectional and bidirectional prediction tasks for predicting state representations within the latent space. TSPR effectively exploits contextual information within sequences to learn more informative state representations, thereby contributing to the enhancement of policy training in RL. Extensive experiments demonstrate that the combination of TSPR with off-policy RL algorithms leads to a substantial improvement in the sample efficiency of RL. Furthermore, TSPR outperforms state-of-the-art sample-efficient RL methods on both the multiple continuous control (DMControl) and discrete control(Atari) tasks.

Original languageEnglish
Pages (from-to)4364-4375
Number of pages12
JournalIEEE Transactions on Artificial Intelligence
Volume5
Issue number9
DOIs
Publication statusPublished - 2024

Keywords

  • Reinforcement learning (RL)
  • representation learning
  • self-supervised Learning
  • transformer

Fingerprint

Dive into the research topics of 'Enhancing Reinforcement Learning via Transformer-Based State Predictive Representations'. Together they form a unique fingerprint.

Cite this