Whittle index for restless bandits with expanding state spaces

Research output: Contribution to journalArticle

Abstract

We investigate Whittle index policy for a restless bandit model whose state
space may be enlarged under passive actions. This model arises in many important applications and extends the classical model introduced by Whittle in 1988. In the classical
model, one chooses a subset of arms to play at each time and accrue certain reward determined by the states of all arms and the subset of chosen arms. The state of each arm
evolves according to a Markov process whose parameters (transition matrix) may depend
on whether or not the arm is selected. The objective is to maximize the time-average reward over long-term. Weber and Weiss in 1990 proved the asymptotic optimality of Whittle index under a sufficient condition for the classical model, where Whittle’s indexability
was required. In this paper, we extend Whittle index to the general model as considered
here. Our extension is based on policy continuation and tie-breaking ordering of Whittle
index when new states join the system. By requiring a positive recurrent sub-state-space
and boundedness of immediate rewards, we show that randomization can achieve optimality under Whittle’s relaxed constraint. We further analyze the fluid dynamics of our model
and show that the asymptotic optimality of Whittle index under the strict constraint can
also be extended.
Original languageEnglish
Article number19090
Pages (from-to)372-384
Number of pages13
JournalNumerical Mathematics: A Journal of Chinese Universities
Volume42
Issue number4
Publication statusPublished - 2020

Keywords

  • restless multi-armed bandits
  • Whittle index
  • state expansion
  • policy continuation
  • optimality under relaxed constraints
  • fluid model
  • asymptotic optimality

Fingerprint

Dive into the research topics of 'Whittle index for restless bandits with expanding state spaces'. Together they form a unique fingerprint.

Cite this