Structured products dynamic hedging based on reinforcement learning

Hao Xu, Cheng Xu, He Yan, Yanqi Sun*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


In the Black–Scholes model proposed in 1973, an investor can use a continuously rebalanced dynamic strategy to hedge the risk of a certain option, assuming that the underlying asset’s price is subject to geometric Brownian motion (a continuous-time stochastic process where the logarithm of the variable follows a Brownian motion) and the market is complete and frictionless, which is unrealistic due to the continuous changes in asset prices. The application of reinforcement learning (RL) in finance includes a variety of decision-making problems such as hedging, optimal execution, and portfolio optimization. RL can make full use of historical data or generate more data than other theories used to make decisions in finance such as stochastic control theory. There will be fewer assumptions and better performance with exploration and exploitation. In this article, we propose a reinforcement learning-based model that can help investors dynamically hedge financial products in discrete time using complex structured products: the Phoenix option (a note that only pays a coupon if the price of the underlying asset is above a certain barrier and redeems if the price breaches an autocall barrier) as an example in this paper. This model is highly expandable and can set an objective function according to the investor’s preferences; for example, the Sharpe ratio (a measure of risk-adjusted return that compares the return of an investment with its risk) is very lightweight because we do not assume the existence of an optimal hedging strategy.

Original languageEnglish
Pages (from-to)12285-12295
Number of pages11
JournalJournal of Ambient Intelligence and Humanized Computing
Issue number9
Publication statusPublished - 2023


  • Dynamic hedging
  • Reinforcement learning
  • Structure products


Dive into the research topics of 'Structured products dynamic hedging based on reinforcement learning'. Together they form a unique fingerprint.

Cite this