Learning and sharing in a changing world: Non-Bayesian restless bandit with multiple players

Haoyang Liu; Keqin Liu; Qing Zhao

doi:10.1109/ITA.2011.5743588

Learning and sharing in a changing world: Non-Bayesian restless bandit with multiple players

Haoyang Liu^*, Keqin Liu, Qing Zhao

^*Corresponding author for this work

Department of Financial and Actuarial Mathematics

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

19 Citations (Scopus)

Abstract

We consider decentralized restless multi-armed bandit problems with unknown dynamics and multiple players. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. Players activating the same arm at the same time collide and suffer from reward loss. The objective is to maximize the long-term reward by designing a decentralized arm selection policy to address unknown reward models and collisions among players. A decentralized policy is constructed that achieves a regret with logarithmic order. The result finds applications in communication networks, financial investment, and industrial engineering.

Original language	English
Title of host publication	2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings
Pages	240-246
Number of pages	7
DOIs	https://doi.org/10.1109/ITA.2011.5743588
Publication status	Published - 2011
Event	2011 Information Theory and Applications Workshop, ITA 2011 - San Diego, CA, United States Duration: 6 Feb 2011 → 11 Feb 2011

Publication series

Name	2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings

Conference

Conference	2011 Information Theory and Applications Workshop, ITA 2011
Country/Territory	United States
City	San Diego, CA
Period	6/02/11 → 11/02/11

Access to Document

10.1109/ITA.2011.5743588

Cite this

Liu, H., Liu, K., & Zhao, Q. (2011). Learning and sharing in a changing world: Non-Bayesian restless bandit with multiple players. In 2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings (pp. 240-246). Article 5743588 (2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings). https://doi.org/10.1109/ITA.2011.5743588

@inproceedings{7c1282b4db3f431ba3f0d6b10dc9b89b,

title = "Learning and sharing in a changing world: Non-Bayesian restless bandit with multiple players",

abstract = "We consider decentralized restless multi-armed bandit problems with unknown dynamics and multiple players. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. Players activating the same arm at the same time collide and suffer from reward loss. The objective is to maximize the long-term reward by designing a decentralized arm selection policy to address unknown reward models and collisions among players. A decentralized policy is constructed that achieves a regret with logarithmic order. The result finds applications in communication networks, financial investment, and industrial engineering.",

author = "Haoyang Liu and Keqin Liu and Qing Zhao",

year = "2011",

doi = "10.1109/ITA.2011.5743588",

language = "English",

isbn = "9781457703614",

series = "2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings",

pages = "240--246",

booktitle = "2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings",

note = "2011 Information Theory and Applications Workshop, ITA 2011 ; Conference date: 06-02-2011 Through 11-02-2011",

}

Liu, H, Liu, K & Zhao, Q 2011, Learning and sharing in a changing world: Non-Bayesian restless bandit with multiple players. in 2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings., 5743588, 2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings, pp. 240-246, 2011 Information Theory and Applications Workshop, ITA 2011, San Diego, CA, United States, 6/02/11. https://doi.org/10.1109/ITA.2011.5743588

Learning and sharing in a changing world: Non-Bayesian restless bandit with multiple players. / Liu, Haoyang; Liu, Keqin; Zhao, Qing.
2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings. 2011. p. 240-246 5743588 (2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Learning and sharing in a changing world

T2 - 2011 Information Theory and Applications Workshop, ITA 2011

AU - Liu, Haoyang

AU - Liu, Keqin

AU - Zhao, Qing

PY - 2011

Y1 - 2011

N2 - We consider decentralized restless multi-armed bandit problems with unknown dynamics and multiple players. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. Players activating the same arm at the same time collide and suffer from reward loss. The objective is to maximize the long-term reward by designing a decentralized arm selection policy to address unknown reward models and collisions among players. A decentralized policy is constructed that achieves a regret with logarithmic order. The result finds applications in communication networks, financial investment, and industrial engineering.

AB - We consider decentralized restless multi-armed bandit problems with unknown dynamics and multiple players. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. Players activating the same arm at the same time collide and suffer from reward loss. The objective is to maximize the long-term reward by designing a decentralized arm selection policy to address unknown reward models and collisions among players. A decentralized policy is constructed that achieves a regret with logarithmic order. The result finds applications in communication networks, financial investment, and industrial engineering.

UR - http://www.scopus.com/inward/record.url?scp=79955764815&partnerID=8YFLogxK

U2 - 10.1109/ITA.2011.5743588

DO - 10.1109/ITA.2011.5743588

M3 - Conference Proceeding

AN - SCOPUS:79955764815

SN - 9781457703614

T3 - 2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings

SP - 240

EP - 246

BT - 2011 Information Theory and Applications Workshop, ITA 2011 - Conference Proceedings

Y2 - 6 February 2011 through 11 February 2011

ER -

Learning and sharing in a changing world: Non-Bayesian restless bandit with multiple players

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this