A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning

Ruoyu Sun; Yue Xi; Angelos Stefanidis; Zhengyong Jiang; Jionglong Su

doi:10.1007/s40747-025-01884-y

A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning

Ruoyu Sun, Yue Xi, Angelos Stefanidis, Zhengyong Jiang^*, Jionglong Su^*

^*Corresponding author for this work

School of AI and Advanced Computing

Xi'an Jiaotong-Liverpool University

Research output: Contribution to journal › Article › peer-review

Abstract

Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments.

Original language	English
Article number	311
Journal	Complex and Intelligent Systems
Volume	11
Issue number	7
DOIs	https://doi.org/10.1007/s40747-025-01884-y
Publication status	Published - Jul 2025

Keywords

Hierarchical deep reinforcement learning
Learning system
Multi-agent
Portfolio optimization

Access to Document

10.1007/s40747-025-01884-y

Cite this

@article{2160036f4694400488fad40ebb51432e,

title = "A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning",

abstract = "Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments.",

keywords = "Hierarchical deep reinforcement learning, Learning system, Multi-agent, Portfolio optimization",

author = "Ruoyu Sun and Yue Xi and Angelos Stefanidis and Zhengyong Jiang and Jionglong Su",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",

year = "2025",

month = jul,

doi = "10.1007/s40747-025-01884-y",

language = "English",

volume = "11",

journal = "Complex and Intelligent Systems",

issn = "2199-4536",

number = "7",

}

TY - JOUR

T1 - A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning

AU - Sun, Ruoyu

AU - Xi, Yue

AU - Stefanidis, Angelos

AU - Jiang, Zhengyong

AU - Su, Jionglong

N1 - Publisher Copyright: © The Author(s) 2025.

PY - 2025/7

Y1 - 2025/7

N2 - Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments.

AB - Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments.

KW - Hierarchical deep reinforcement learning

KW - Learning system

KW - Multi-agent

KW - Portfolio optimization

UR - http://www.scopus.com/inward/record.url?scp=105006910484&partnerID=8YFLogxK

U2 - 10.1007/s40747-025-01884-y

DO - 10.1007/s40747-025-01884-y

M3 - Article

AN - SCOPUS:105006910484

SN - 2199-4536

VL - 11

JO - Complex and Intelligent Systems

JF - Complex and Intelligent Systems

IS - 7

M1 - 311

ER -

A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this