TY - GEN
T1 - GEMs-LLM
T2 - 21st International Conference on Intelligent Computing, ICIC 2025
AU - Wang, Yining
AU - Lu, Zhixiang
AU - Qian, Pin
AU - Su, Jionglong
AU - Zhou, Mian
AU - Li, Chong
AU - Jiang, Zhengyong
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - We introduce GEMs-LLM, a novel reinforcement learning framework for portfolio optimization that integrates Goal-aware Exploration and Multi-level Supervision (GEMs) with a large language model (DeepSeek-V3). Existing reinforcement learning approaches often suffer from high-dimensional state spaces, sparse rewards, and instability in financial environments. GEMs-LLM addresses these issues via a hierarchical structure: a high-level controller generates portfolio-level goals using both historical and synthetic future market data, while a low-level agent learns to execute these goals via multi-level policy supervision. To further align the agent's behavior with human trading intuition, DeepSeek-V3 is employed to simulate expert-like reasoning and refine decision outputs. GEMs-LLM supports off-policy training and removes the need for handcrafted goals, enhancing adaptability across markets. Empirical results on both U.S. and Chinese stock markets show that GEMs-LLM significantly outperforms strong baselines including Deep Deterministic Policy Gradient (DDPG), Oracle Policy Distillation (OPD), and pure GEMs variants. In particular, GEMs-LLM achieves the best performance in annualized Sharpe ratio (ASR) and downside deviation ratio (DDR), highlighting its robustness and potential for real-world deployment.
AB - We introduce GEMs-LLM, a novel reinforcement learning framework for portfolio optimization that integrates Goal-aware Exploration and Multi-level Supervision (GEMs) with a large language model (DeepSeek-V3). Existing reinforcement learning approaches often suffer from high-dimensional state spaces, sparse rewards, and instability in financial environments. GEMs-LLM addresses these issues via a hierarchical structure: a high-level controller generates portfolio-level goals using both historical and synthetic future market data, while a low-level agent learns to execute these goals via multi-level policy supervision. To further align the agent's behavior with human trading intuition, DeepSeek-V3 is employed to simulate expert-like reasoning and refine decision outputs. GEMs-LLM supports off-policy training and removes the need for handcrafted goals, enhancing adaptability across markets. Empirical results on both U.S. and Chinese stock markets show that GEMs-LLM significantly outperforms strong baselines including Deep Deterministic Policy Gradient (DDPG), Oracle Policy Distillation (OPD), and pure GEMs variants. In particular, GEMs-LLM achieves the best performance in annualized Sharpe ratio (ASR) and downside deviation ratio (DDR), highlighting its robustness and potential for real-world deployment.
KW - Large Language Model
KW - Oracle Policy Distillation
KW - Portfolio Optimization
KW - Reinforcement Learning
UR - https://www.scopus.com/pages/publications/105011351700
U2 - 10.1007/978-981-96-9949-0_43
DO - 10.1007/978-981-96-9949-0_43
M3 - Conference Proceeding
AN - SCOPUS:105011351700
SN - 9789819699483
T3 - Communications in Computer and Information Science
SP - 516
EP - 527
BT - Advanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
A2 - Huang, De-Shuang
A2 - Pan, Yijie
A2 - Chen, Wei
A2 - Li, Bo
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 26 July 2025 through 29 July 2025
ER -