Abstract
We introduce GEMs-LLM, a novel reinforcement learning framework for portfolio optimization that integrates Goal-aware Exploration and Multi-level Supervision (GEMs) with a large language model (DeepSeek-V3). Existing reinforcement learning approaches often suffer from high-dimensional state spaces, sparse rewards, and instability in financial environments. GEMs-LLM addresses these issues via a hierarchical structure: a high-level controller generates portfolio-level goals using both historical and synthetic future market data, while a lowlevel agent learns to execute these goals via multi-level policy supervision. To further align the agent's behavior with human trading intuition, DeepSeek-V3 is employed to simulate expert-like reasoning and refine decision outputs. GEMsLLM supports off-policy training and removes the need for handcrafted goals, enhancing adaptability across markets. Empirical results on both U.S. and Chinese stock markets show that GEMs-LLM significantly outperforms strong baselines including Deep Deterministic Policy Gradient (DDPG), Oracle Policy Distillation (OPD), and pure GEMs variants. In particular, GEMs-LLM achieves the best performance in annualized Sharpe ratio (ASR) and downside deviation ratio (DDR), highlighting its robustness and potential for real-world deployment
| Original language | English |
|---|---|
| Title of host publication | 2025 21st International Conference on Intelligent Computing |
| Publisher | Springer Nature Singapore |
| Pages | 516-527 |
| Publication status | Published - Jul 2025 |