Learning and sharing in a changing world: Non-Bayesian restless bandit with multiple players

Haoyang Liu*, Keqin Liu, Qing Zhao

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

19 Citations (Scopus)

Abstract

We consider decentralized restless multi-armed bandit problems with unknown dynamics and multiple players. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. Players activating the same arm at the same time collide and suffer from reward loss. The objective is to maximize the long-term reward by designing a decentralized arm selection policy to address unknown reward models and collisions among players. A decentralized policy is constructed that achieves a regret with logarithmic order. The result finds applications in communication networks, financial investment, and industrial engineering.

Original languageEnglish
Title of host publication2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings
Pages240-246
Number of pages7
DOIs
Publication statusPublished - 2011
Event2011 Information Theory and Applications Workshop, ITA 2011 - San Diego, CA, United States
Duration: 6 Feb 201111 Feb 2011

Publication series

Name2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings

Conference

Conference2011 Information Theory and Applications Workshop, ITA 2011
Country/TerritoryUnited States
CitySan Diego, CA
Period6/02/1111/02/11

Fingerprint

Dive into the research topics of 'Learning and sharing in a changing world: Non-Bayesian restless bandit with multiple players'. Together they form a unique fingerprint.

Cite this