Deterministic sequencing of exploration and exploitation for multi-armed bandit problems

Sattar Vakili, Keqin Liu, Qing Zhao

Research output: Contribution to journalArticlepeer-review

78 Citations (Scopus)

Abstract

In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length $T$. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T1/p) regret when the moments of the reward distributions exist up to the pth order for 1 < p 2 and O (T1/(1+p/2)) for p > 2. With the knowledge of an upperbound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, restless MAB with unknown dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum sp anning tree, and the dominating set problems under unknown random weights.

Original languageEnglish
Article number6516952
Pages (from-to)759-767
Number of pages9
JournalIEEE Journal on Selected Topics in Signal Processing
Volume7
Issue number5
DOIs
Publication statusPublished - 2013

Keywords

  • combinatorial multi-armed bandit
  • decentralized multi-armed bandit
  • deterministic sequencing of exploration and exploitation
  • Multi-armed bandit
  • regret
  • restless multi-armed bandit

Fingerprint

Dive into the research topics of 'Deterministic sequencing of exploration and exploitation for multi-armed bandit problems'. Together they form a unique fingerprint.

Cite this