Deep Reinforcement Learning-based Portfolio  Optimization with Black-Litterman Model under Elliptical Distributions

Daniil Mikriukov; Ruoyu Sun; Zhengyong Jiang

Deep Reinforcement Learning-based Portfolio Optimization with Black-Litterman Model under Elliptical Distributions

Daniil Mikriukov, Ruoyu Sun, Zhengyong Jiang^*

^*Corresponding author for this work

School of AI and Advanced Computing

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

This paper introduces a novel hybrid portfolio optimization framework that integrates deep reinforcement learning (DRL) with the Black-Litterman (BL) model under elliptical distributions. Traditional portfolio optimization methods often fail to adapt to non-stationary environments with high volatility and heavytailed asset return distributions, while standard DRL implementations struggle to effectively model complex statistical properties of financial returns. Our framework, BLED (Black-Litterman under Elliptical Distributions), utilizes the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm with transformerbased architectures for view generation and CNN-based risk aversion estimation to address these limitations. We incorporate elliptical distributions to more accurately model heavy-tailed asset returns, substantially improving risk assessment and portfolio allocation decisions. Empirical evaluation on Dow Jones Industrial Average constituent stocks demonstrates that our approach significantly outperforms both traditional portfolio optimization strategies and state-of-the-art DRL methods, achieving 76.1% accumulated returns compared to around 44% for the best baseline models. Risk-adjusted performance metrics show even more pronounced advantages, with Sharpe and Sortino ratios approximately 46% higher than top-performing baselines. Statistical analysis using the non-parametric Kruskal-Wallis test across multiple time periods confirms the significance of our performance improvements (p=0.030 for returns), demonstrating that BLED's advantages are robust across varying market conditions.

Original language	English
Title of host publication	2025 21st International Conference on Intelligent Computing
Publication status	Accepted/In press - 2025

Cite this

@inproceedings{5a5762ac11774338a0b94d5da2f33124,

title = "Deep Reinforcement Learning-based Portfolio Optimization with Black-Litterman Model under Elliptical Distributions",

abstract = "This paper introduces a novel hybrid portfolio optimization framework that integrates deep reinforcement learning (DRL) with the Black-Litterman (BL) model under elliptical distributions. Traditional portfolio optimization methods often fail to adapt to non-stationary environments with high volatility and heavytailed asset return distributions, while standard DRL implementations struggle to effectively model complex statistical properties of financial returns. Our framework, BLED (Black-Litterman under Elliptical Distributions), utilizes the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm with transformerbased architectures for view generation and CNN-based risk aversion estimation to address these limitations. We incorporate elliptical distributions to more accurately model heavy-tailed asset returns, substantially improving risk assessment and portfolio allocation decisions. Empirical evaluation on Dow Jones Industrial Average constituent stocks demonstrates that our approach significantly outperforms both traditional portfolio optimization strategies and state-of-the-art DRL methods, achieving 76.1% accumulated returns compared to around 44% for the best baseline models. Risk-adjusted performance metrics show even more pronounced advantages, with Sharpe and Sortino ratios approximately 46% higher than top-performing baselines. Statistical analysis using the non-parametric Kruskal-Wallis test across multiple time periods confirms the significance of our performance improvements (p=0.030 for returns), demonstrating that BLED's advantages are robust across varying market conditions.",

author = "Daniil Mikriukov and Ruoyu Sun and Zhengyong Jiang",

year = "2025",

language = "English",

booktitle = "2025 21st International Conference on Intelligent Computing",

}

TY - GEN

T1 - Deep Reinforcement Learning-based Portfolio Optimization with Black-Litterman Model under Elliptical Distributions

AU - Mikriukov, Daniil

AU - Sun, Ruoyu

AU - Jiang, Zhengyong

PY - 2025

Y1 - 2025

N2 - This paper introduces a novel hybrid portfolio optimization framework that integrates deep reinforcement learning (DRL) with the Black-Litterman (BL) model under elliptical distributions. Traditional portfolio optimization methods often fail to adapt to non-stationary environments with high volatility and heavytailed asset return distributions, while standard DRL implementations struggle to effectively model complex statistical properties of financial returns. Our framework, BLED (Black-Litterman under Elliptical Distributions), utilizes the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm with transformerbased architectures for view generation and CNN-based risk aversion estimation to address these limitations. We incorporate elliptical distributions to more accurately model heavy-tailed asset returns, substantially improving risk assessment and portfolio allocation decisions. Empirical evaluation on Dow Jones Industrial Average constituent stocks demonstrates that our approach significantly outperforms both traditional portfolio optimization strategies and state-of-the-art DRL methods, achieving 76.1% accumulated returns compared to around 44% for the best baseline models. Risk-adjusted performance metrics show even more pronounced advantages, with Sharpe and Sortino ratios approximately 46% higher than top-performing baselines. Statistical analysis using the non-parametric Kruskal-Wallis test across multiple time periods confirms the significance of our performance improvements (p=0.030 for returns), demonstrating that BLED's advantages are robust across varying market conditions.

AB - This paper introduces a novel hybrid portfolio optimization framework that integrates deep reinforcement learning (DRL) with the Black-Litterman (BL) model under elliptical distributions. Traditional portfolio optimization methods often fail to adapt to non-stationary environments with high volatility and heavytailed asset return distributions, while standard DRL implementations struggle to effectively model complex statistical properties of financial returns. Our framework, BLED (Black-Litterman under Elliptical Distributions), utilizes the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm with transformerbased architectures for view generation and CNN-based risk aversion estimation to address these limitations. We incorporate elliptical distributions to more accurately model heavy-tailed asset returns, substantially improving risk assessment and portfolio allocation decisions. Empirical evaluation on Dow Jones Industrial Average constituent stocks demonstrates that our approach significantly outperforms both traditional portfolio optimization strategies and state-of-the-art DRL methods, achieving 76.1% accumulated returns compared to around 44% for the best baseline models. Risk-adjusted performance metrics show even more pronounced advantages, with Sharpe and Sortino ratios approximately 46% higher than top-performing baselines. Statistical analysis using the non-parametric Kruskal-Wallis test across multiple time periods confirms the significance of our performance improvements (p=0.030 for returns), demonstrating that BLED's advantages are robust across varying market conditions.

M3 - Conference Proceeding

BT - 2025 21st International Conference on Intelligent Computing

ER -

Deep Reinforcement Learning-based Portfolio Optimization with Black-Litterman Model under Elliptical Distributions

Abstract

Fingerprint

Cite this