Decentralized multi-armed bandit with multiple distributed players

Keqin Liu*, Qing Zhao

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

31 Citations (Scopus)

Abstract

We formulate and study a decentralized multi-armed bandit (MAB) problem, where M distributed players compete for N independent arms with unknown reward statistics. At each time, each player chooses one arm to play without exchanging information with other players. Players choosing the same arm collide, and, depending on the collision model, either no one receives reward or the colliding players share the reward in an arbitrary way. We show that the minimum system regret of the decentralized MAB grows with time at the same logarithmic order as in the centralized counterpart where players act collectively as a single entity by exchanging observations and making decisions jointly. A general framework of constructing fair and order-optimal decentralized policies is established based on a Time Division Fair Sharing (TDFS) of the M best arms. A lower bound on the system regret growth rate is established for a general class of decentralized polices, to which all TDFS policies belong.We further develop several fair and order-optimal decentralized polices within the TDFS framework and study their performance in different applications including cognitive radio networks, multi-channel communications in unknown fading environment, target collecting in multi-agent systems, and web search and advertising.

Original languageEnglish
Title of host publication2010 Information Theory and Applications Workshop, ITA 2010 - Conference Proceedings
Pages568-577
Number of pages10
DOIs
Publication statusPublished - 2010
Event2010 Information Theory and Applications Workshop, ITA 2010 - San Diego, CA, United States
Duration: 31 Jan 20105 Feb 2010

Publication series

Name2010 Information Theory and Applications Workshop, ITA 2010 - Conference Proceedings

Conference

Conference2010 Information Theory and Applications Workshop, ITA 2010
Country/TerritoryUnited States
CitySan Diego, CA
Period31/01/105/02/10

Fingerprint

Dive into the research topics of 'Decentralized multi-armed bandit with multiple distributed players'. Together they form a unique fingerprint.

Cite this