Multi-armed bandit problems with heavy-tailed reward distributions

Keqin Liu*, Qing Zhao

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

20 Citations (Scopus)

Abstract

In the Multi-Armed Bandit (MAB) problem, a player selects one out of a set of arms to play at each time without knowing the arm reward statistics. The essence of the problem is the tradeoff between exploration and exploitation: playing a less explored arm to learn its reward statistics or playing the arm appearing to be the best. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that when the moment-generating functions of the arm reward distributions are properly bounded, the optimal logarithmic order of the regret can be achieved by DSEE. The condition on the reward distributions can be gradually relaxed at a cost of a higher (nevertheless, sublinear) regret order: for any positive integer p, O(T 1/p) regret can be achieved by DSEE when the moments of the reward distributions exist (only) up to the pth order. The proposed DSEE approach complements existing work on MAB by providing corresponding results under a set of relaxed conditions on the reward distributions. Furthermore, with a clearly defined tunable parameter - the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including decentralized MAB with partial reward observations and restless MAB with unknown Markov dynamics. Potential applications include dynamic spectrum access, multi-agent systems, Internet advertising and Web search.

Original languageEnglish
Title of host publication2011 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011
Pages485-492
Number of pages8
DOIs
Publication statusPublished - 2011
Event2011 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011 - Monticello, IL, United States
Duration: 28 Sept 201130 Sept 2011

Publication series

Name2011 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011

Conference

Conference2011 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011
Country/TerritoryUnited States
CityMonticello, IL
Period28/09/1130/09/11

Fingerprint

Dive into the research topics of 'Multi-armed bandit problems with heavy-tailed reward distributions'. Together they form a unique fingerprint.

Cite this