GEMs-LLM: Integrating Large Language Models with Goal-Aware Exploration for RL-Based Portfolio Optimization

Yining Wang, Zhixiang Lu, Pin Qian, Jionglong Su, Mian Zhou, Chong Li, Zhengyong Jiang*

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

We introduce GEMs-LLM, a novel reinforcement learning framework for portfolio optimization that integrates Goal-aware Exploration and Multi-level Supervision (GEMs) with a large language model (DeepSeek-V3). Existing reinforcement learning approaches often suffer from high-dimensional state spaces, sparse rewards, and instability in financial environments. GEMs-LLM addresses these issues via a hierarchical structure: a high-level controller generates portfolio-level goals using both historical and synthetic future market data, while a low-level agent learns to execute these goals via multi-level policy supervision. To further align the agent's behavior with human trading intuition, DeepSeek-V3 is employed to simulate expert-like reasoning and refine decision outputs. GEMs-LLM supports off-policy training and removes the need for handcrafted goals, enhancing adaptability across markets. Empirical results on both U.S. and Chinese stock markets show that GEMs-LLM significantly outperforms strong baselines including Deep Deterministic Policy Gradient (DDPG), Oracle Policy Distillation (OPD), and pure GEMs variants. In particular, GEMs-LLM achieves the best performance in annualized Sharpe ratio (ASR) and downside deviation ratio (DDR), highlighting its robustness and potential for real-world deployment.

Original languageEnglish
Title of host publicationAdvanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
EditorsDe-Shuang Huang, Yijie Pan, Wei Chen, Bo Li
PublisherSpringer Science and Business Media Deutschland GmbH
Pages516-527
Number of pages12
ISBN (Print)9789819699483
DOIs
Publication statusPublished - 2025
Event21st International Conference on Intelligent Computing, ICIC 2025 - Ningbo, China
Duration: 26 Jul 202529 Jul 2025

Publication series

NameCommunications in Computer and Information Science
Volume2566 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference21st International Conference on Intelligent Computing, ICIC 2025
Country/TerritoryChina
CityNingbo
Period26/07/2529/07/25

Keywords

  • Large Language Model
  • Oracle Policy Distillation
  • Portfolio Optimization
  • Reinforcement Learning

Fingerprint

Dive into the research topics of 'GEMs-LLM: Integrating Large Language Models with Goal-Aware Exploration for RL-Based Portfolio Optimization'. Together they form a unique fingerprint.

Cite this