TY - JOUR
T1 - An online algorithm for the risk-aware restless bandit
AU - Xu, Jianyu
AU - Chen, Lujie
AU - Tang, Ou
N1 - Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2021/4/16
Y1 - 2021/4/16
N2 - The multi-armed bandit (MAB) is a classical model for the exploration vs. exploitation trade-off. Among existing MAB models, the restless bandit model is of increasing interest because of its dynamic nature, which makes it highly applicable in practice. Like other MAB models, the traditional (risk-neutral) restless bandit model searches for the arm with the lowest mean cost and does not consider risk-aversion, which is critical in cases such as clinical trials and financial investment. This limitation thus hinders the application of the traditional restless bandit. Motivated by these concerns, we introduce a general risk measure that satisfies a mild restriction to formulate a risk-aware restless model; in particular, we set a risk measure as the criterion for the performance of each arm, instead of the expectation as in the traditional case. Compared with classical MAB models, we conclude that our model settings accommodate risk-aware researchers and decision makers. We present an index-based online algorithm for the problem, and derive an upper bound on the regret of this algorithm. Then, we conclude that our algorithm retains an instance-based regret of order O(log T/T), which is consistent with the classical MAB model. Further, some specific risk measures, namely, mean-deviation, shortfall and the discrete Kusuoka risk measure, are used to demonstrate the details of our framework.
AB - The multi-armed bandit (MAB) is a classical model for the exploration vs. exploitation trade-off. Among existing MAB models, the restless bandit model is of increasing interest because of its dynamic nature, which makes it highly applicable in practice. Like other MAB models, the traditional (risk-neutral) restless bandit model searches for the arm with the lowest mean cost and does not consider risk-aversion, which is critical in cases such as clinical trials and financial investment. This limitation thus hinders the application of the traditional restless bandit. Motivated by these concerns, we introduce a general risk measure that satisfies a mild restriction to formulate a risk-aware restless model; in particular, we set a risk measure as the criterion for the performance of each arm, instead of the expectation as in the traditional case. Compared with classical MAB models, we conclude that our model settings accommodate risk-aware researchers and decision makers. We present an index-based online algorithm for the problem, and derive an upper bound on the regret of this algorithm. Then, we conclude that our algorithm retains an instance-based regret of order O(log T/T), which is consistent with the classical MAB model. Further, some specific risk measures, namely, mean-deviation, shortfall and the discrete Kusuoka risk measure, are used to demonstrate the details of our framework.
KW - Markov process
KW - Multi-armed bandit
KW - Online optimization
KW - Risk measure
KW - Risk-aware
UR - http://www.scopus.com/inward/record.url?scp=85091004398&partnerID=8YFLogxK
U2 - 10.1016/j.ejor.2020.08.028
DO - 10.1016/j.ejor.2020.08.028
M3 - Article
AN - SCOPUS:85091004398
SN - 0377-2217
VL - 290
SP - 622
EP - 639
JO - European Journal of Operational Research
JF - European Journal of Operational Research
IS - 2
ER -