An online algorithm for the risk-aware restless bandit

Jianyu Xu; Lujie Chen; Ou Tang

doi:10.1016/j.ejor.2020.08.028

An online algorithm for the risk-aware restless bandit

Jianyu Xu, Lujie Chen^*, Ou Tang

^*Corresponding author for this work

Department of Intelligent Operations and Marketing

Linköping University

Research output: Contribution to journal › Article › peer-review

7 Citations (Scopus)

Abstract

The multi-armed bandit (MAB) is a classical model for the exploration vs. exploitation trade-off. Among existing MAB models, the restless bandit model is of increasing interest because of its dynamic nature, which makes it highly applicable in practice. Like other MAB models, the traditional (risk-neutral) restless bandit model searches for the arm with the lowest mean cost and does not consider risk-aversion, which is critical in cases such as clinical trials and financial investment. This limitation thus hinders the application of the traditional restless bandit. Motivated by these concerns, we introduce a general risk measure that satisfies a mild restriction to formulate a risk-aware restless model; in particular, we set a risk measure as the criterion for the performance of each arm, instead of the expectation as in the traditional case. Compared with classical MAB models, we conclude that our model settings accommodate risk-aware researchers and decision makers. We present an index-based online algorithm for the problem, and derive an upper bound on the regret of this algorithm. Then, we conclude that our algorithm retains an instance-based regret of order O(log T/T), which is consistent with the classical MAB model. Further, some specific risk measures, namely, mean-deviation, shortfall and the discrete Kusuoka risk measure, are used to demonstrate the details of our framework.

Original language	English
Pages (from-to)	622-639
Number of pages	18
Journal	European Journal of Operational Research
Volume	290
Issue number	2
DOIs	https://doi.org/10.1016/j.ejor.2020.08.028
Publication status	Published - 16 Apr 2021

Keywords

Markov process
Multi-armed bandit
Online optimization
Risk measure
Risk-aware

Access to Document

10.1016/j.ejor.2020.08.028

Cite this

@article{4e56ecd53de044ddb37d735d8792f3da,

title = "An online algorithm for the risk-aware restless bandit",

abstract = "The multi-armed bandit (MAB) is a classical model for the exploration vs. exploitation trade-off. Among existing MAB models, the restless bandit model is of increasing interest because of its dynamic nature, which makes it highly applicable in practice. Like other MAB models, the traditional (risk-neutral) restless bandit model searches for the arm with the lowest mean cost and does not consider risk-aversion, which is critical in cases such as clinical trials and financial investment. This limitation thus hinders the application of the traditional restless bandit. Motivated by these concerns, we introduce a general risk measure that satisfies a mild restriction to formulate a risk-aware restless model; in particular, we set a risk measure as the criterion for the performance of each arm, instead of the expectation as in the traditional case. Compared with classical MAB models, we conclude that our model settings accommodate risk-aware researchers and decision makers. We present an index-based online algorithm for the problem, and derive an upper bound on the regret of this algorithm. Then, we conclude that our algorithm retains an instance-based regret of order O(log T/T), which is consistent with the classical MAB model. Further, some specific risk measures, namely, mean-deviation, shortfall and the discrete Kusuoka risk measure, are used to demonstrate the details of our framework.",

keywords = "Markov process, Multi-armed bandit, Online optimization, Risk measure, Risk-aware",

author = "Jianyu Xu and Lujie Chen and Ou Tang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2021",

month = apr,

day = "16",

doi = "10.1016/j.ejor.2020.08.028",

language = "English",

volume = "290",

pages = "622--639",

journal = "European Journal of Operational Research",

issn = "0377-2217",

publisher = "Elsevier",

number = "2",

}

TY - JOUR

T1 - An online algorithm for the risk-aware restless bandit

AU - Xu, Jianyu

AU - Chen, Lujie

AU - Tang, Ou

PY - 2021/4/16

Y1 - 2021/4/16

N2 - The multi-armed bandit (MAB) is a classical model for the exploration vs. exploitation trade-off. Among existing MAB models, the restless bandit model is of increasing interest because of its dynamic nature, which makes it highly applicable in practice. Like other MAB models, the traditional (risk-neutral) restless bandit model searches for the arm with the lowest mean cost and does not consider risk-aversion, which is critical in cases such as clinical trials and financial investment. This limitation thus hinders the application of the traditional restless bandit. Motivated by these concerns, we introduce a general risk measure that satisfies a mild restriction to formulate a risk-aware restless model; in particular, we set a risk measure as the criterion for the performance of each arm, instead of the expectation as in the traditional case. Compared with classical MAB models, we conclude that our model settings accommodate risk-aware researchers and decision makers. We present an index-based online algorithm for the problem, and derive an upper bound on the regret of this algorithm. Then, we conclude that our algorithm retains an instance-based regret of order O(log T/T), which is consistent with the classical MAB model. Further, some specific risk measures, namely, mean-deviation, shortfall and the discrete Kusuoka risk measure, are used to demonstrate the details of our framework.

AB - The multi-armed bandit (MAB) is a classical model for the exploration vs. exploitation trade-off. Among existing MAB models, the restless bandit model is of increasing interest because of its dynamic nature, which makes it highly applicable in practice. Like other MAB models, the traditional (risk-neutral) restless bandit model searches for the arm with the lowest mean cost and does not consider risk-aversion, which is critical in cases such as clinical trials and financial investment. This limitation thus hinders the application of the traditional restless bandit. Motivated by these concerns, we introduce a general risk measure that satisfies a mild restriction to formulate a risk-aware restless model; in particular, we set a risk measure as the criterion for the performance of each arm, instead of the expectation as in the traditional case. Compared with classical MAB models, we conclude that our model settings accommodate risk-aware researchers and decision makers. We present an index-based online algorithm for the problem, and derive an upper bound on the regret of this algorithm. Then, we conclude that our algorithm retains an instance-based regret of order O(log T/T), which is consistent with the classical MAB model. Further, some specific risk measures, namely, mean-deviation, shortfall and the discrete Kusuoka risk measure, are used to demonstrate the details of our framework.

KW - Markov process

KW - Multi-armed bandit

KW - Online optimization

KW - Risk measure

KW - Risk-aware

UR - http://www.scopus.com/inward/record.url?scp=85091004398&partnerID=8YFLogxK

U2 - 10.1016/j.ejor.2020.08.028

DO - 10.1016/j.ejor.2020.08.028

M3 - Article

AN - SCOPUS:85091004398

SN - 0377-2217

VL - 290

SP - 622

EP - 639

JO - European Journal of Operational Research

JF - European Journal of Operational Research

IS - 2

ER -

An online algorithm for the risk-aware restless bandit

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this