An online algorithm for the risk-aware restless bandit

Jianyu Xu, Lujie Chen*, Ou Tang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)


The multi-armed bandit (MAB) is a classical model for the exploration vs. exploitation trade-off. Among existing MAB models, the restless bandit model is of increasing interest because of its dynamic nature, which makes it highly applicable in practice. Like other MAB models, the traditional (risk-neutral) restless bandit model searches for the arm with the lowest mean cost and does not consider risk-aversion, which is critical in cases such as clinical trials and financial investment. This limitation thus hinders the application of the traditional restless bandit. Motivated by these concerns, we introduce a general risk measure that satisfies a mild restriction to formulate a risk-aware restless model; in particular, we set a risk measure as the criterion for the performance of each arm, instead of the expectation as in the traditional case. Compared with classical MAB models, we conclude that our model settings accommodate risk-aware researchers and decision makers. We present an index-based online algorithm for the problem, and derive an upper bound on the regret of this algorithm. Then, we conclude that our algorithm retains an instance-based regret of order O(log T/T), which is consistent with the classical MAB model. Further, some specific risk measures, namely, mean-deviation, shortfall and the discrete Kusuoka risk measure, are used to demonstrate the details of our framework.

Original languageEnglish
Pages (from-to)622-639
Number of pages18
JournalEuropean Journal of Operational Research
Issue number2
Publication statusPublished - 16 Apr 2021


  • Markov process
  • Multi-armed bandit
  • Online optimization
  • Risk measure
  • Risk-aware


Dive into the research topics of 'An online algorithm for the risk-aware restless bandit'. Together they form a unique fingerprint.

Cite this