Decentralized multi-armed bandit with imperfect observations

Keqin Liu*, Qing Zhao, Bhaskar Krishnamachari

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

12 Citations (Scopus)

Abstract

We consider decentralized multi-armed bandit problems with multiple distributed players. At each time, each player chooses one of the N independent arms with unknown reward statistics to play. Players do not exchange information regarding their observations or actions. A collision occurs when multiple players choose the same arm. In this case, the reward obtained by a player involved in the collision deviates from the actual reward offered by this arm in an arbitrary and unknown way, thus making it harder to learn the underlying reward statistic of this arm. The objective is to design a decentralized arm selection policy to minimize the system regret defined as the total reward loss with respect to the ideal scenario of known reward model and centralized scheduling among players. We propose a decentralized policy that achieves O(√T) system regret where T is the length of the time horizon. The policy thus achieves the same maximum average reward as in the ideal scenario. Furthermore, the policy ensures fairness among players, i.e., players achieve the same local average reward at the same rate. These problems find applications in cognitive radio networks, multi-agent systems, Internet advertising and web search.

Original languageEnglish
Title of host publication2010 48th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2010
Pages1669-1674
Number of pages6
DOIs
Publication statusPublished - 2010
Event48th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2010 - Monticello, IL, United States
Duration: 29 Sept 20101 Oct 2010

Publication series

Name2010 48th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2010

Conference

Conference48th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2010
Country/TerritoryUnited States
CityMonticello, IL
Period29/09/101/10/10

Fingerprint

Dive into the research topics of 'Decentralized multi-armed bandit with imperfect observations'. Together they form a unique fingerprint.

Cite this