TY - JOUR
T1 - Enhancing Reinforcement Learning via Transformer-Based State Predictive Representations
AU - Liu, Minsong
AU - Zhu, Yuanheng
AU - Chen, Yaran
AU - Zhao, Dongbin
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2024
Y1 - 2024
N2 - Enhancing state representations can effectively mitigate the issue of low sample efficiency in reinforcement learning (RL) within high-dimensional input environments. Existing methods attempt to improve sample efficiency by learning predictive state representations from sequence data. However, there still remain significant challenges in achieving a comprehensive understanding and learning of information within long sequences. Motivated by this, we introduce a transformer-based state predictive representations (TSPR)11Our code will be released at https://github.com/gourmet-liu/TSPR auxiliary task that promotes better representation learning through self-supervised goals. Specifically, we design a transformer-based predictive model to establish unidirectional and bidirectional prediction tasks for predicting state representations within the latent space. TSPR effectively exploits contextual information within sequences to learn more informative state representations, thereby contributing to the enhancement of policy training in RL. Extensive experiments demonstrate that the combination of TSPR with off-policy RL algorithms leads to a substantial improvement in the sample efficiency of RL. Furthermore, TSPR outperforms state-of-the-art sample-efficient RL methods on both the multiple continuous control (DMControl) and discrete control(Atari) tasks.
AB - Enhancing state representations can effectively mitigate the issue of low sample efficiency in reinforcement learning (RL) within high-dimensional input environments. Existing methods attempt to improve sample efficiency by learning predictive state representations from sequence data. However, there still remain significant challenges in achieving a comprehensive understanding and learning of information within long sequences. Motivated by this, we introduce a transformer-based state predictive representations (TSPR)11Our code will be released at https://github.com/gourmet-liu/TSPR auxiliary task that promotes better representation learning through self-supervised goals. Specifically, we design a transformer-based predictive model to establish unidirectional and bidirectional prediction tasks for predicting state representations within the latent space. TSPR effectively exploits contextual information within sequences to learn more informative state representations, thereby contributing to the enhancement of policy training in RL. Extensive experiments demonstrate that the combination of TSPR with off-policy RL algorithms leads to a substantial improvement in the sample efficiency of RL. Furthermore, TSPR outperforms state-of-the-art sample-efficient RL methods on both the multiple continuous control (DMControl) and discrete control(Atari) tasks.
KW - Reinforcement learning (RL)
KW - representation learning
KW - self-supervised Learning
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85188947156&partnerID=8YFLogxK
U2 - 10.1109/TAI.2024.3379969
DO - 10.1109/TAI.2024.3379969
M3 - Article
AN - SCOPUS:85188947156
SN - 2691-4581
VL - 5
SP - 4364
EP - 4375
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
IS - 9
ER -