Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players

Keqin Liu; Qing Zhao

doi:10.1109/ICASSP.2010.5496131

Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players

Keqin Liu^*, Qing Zhao

^*Corresponding author for this work

Department of Financial and Actuarial Mathematics

Cornell University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

43 Citations (Scopus)

Abstract

We consider a cognitive radio network with distributed multiple secondary users, where each user independently searches for spectrum opportunities in multiple channels without exchanging information with others. The occupancy of each channel is modeled as an i.i.d. Bernoulli process with unknown mean. Users choosing the same channel collide, and none or only one receives reward depending on the collision model. This problem can be formulated as a decentralized multi-armed bandit problem. We measure the performance of a decentralized policy by the system regret, defined as the total reward loss with respect to the optimal performance under the perfect scenario where all channel parameters are known to all users and collisions among secondary users are eliminated through perfect scheduling. We show that the minimum system regret grows with time at the same logarithmic order as in the centralized counterpart, where users exchange observations and make decisions jointly. We propose a basic policy structure that ensures a Time Division Fair Sharing (TDFS) of the channels. Based on this basic TDFS structure, decentralized policies can be constructed to achieve this optimal order while ensuring fairness among users. Furthermore, we show that the proposed TDFS policy belongs to a general class of decentralized polices, for which a uniform performance benchmark is established. All results hold for general stochastic processes beyond Bernoulli and thus find a wide area of potential applications including multi-channel communication systems, multi-agent systems, web search and advertising, and social networks.

Original language	English
Title of host publication	2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3010-3013
Number of pages	4
ISBN (Print)	9781424442966
DOIs	https://doi.org/10.1109/ICASSP.2010.5496131
Publication status	Published - 2010
Event	2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Dallas, TX, United States Duration: 14 Mar 2010 → 19 Mar 2010

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)	1520-6149

Conference

Conference	2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Country/Territory	United States
City	Dallas, TX
Period	14/03/10 → 19/03/10

Keywords

Cognitive radios
Decentralized multi-armed bandit
Opportunistic spectrum access
Order-optimal policy

Access to Document

10.1109/ICASSP.2010.5496131

Cite this

Liu, K., & Zhao, Q. (2010). Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players. In 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings (pp. 3010-3013). Article 5496131 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2010.5496131

Liu, Keqin ; Zhao, Qing. / Distributed learning in cognitive radio networks : Multi-armed bandit with distributed multiple players. 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2010. pp. 3010-3013 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{a703a6c198a048cca3e9d5aaf3f1b120,

title = "Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players",

abstract = "We consider a cognitive radio network with distributed multiple secondary users, where each user independently searches for spectrum opportunities in multiple channels without exchanging information with others. The occupancy of each channel is modeled as an i.i.d. Bernoulli process with unknown mean. Users choosing the same channel collide, and none or only one receives reward depending on the collision model. This problem can be formulated as a decentralized multi-armed bandit problem. We measure the performance of a decentralized policy by the system regret, defined as the total reward loss with respect to the optimal performance under the perfect scenario where all channel parameters are known to all users and collisions among secondary users are eliminated through perfect scheduling. We show that the minimum system regret grows with time at the same logarithmic order as in the centralized counterpart, where users exchange observations and make decisions jointly. We propose a basic policy structure that ensures a Time Division Fair Sharing (TDFS) of the channels. Based on this basic TDFS structure, decentralized policies can be constructed to achieve this optimal order while ensuring fairness among users. Furthermore, we show that the proposed TDFS policy belongs to a general class of decentralized polices, for which a uniform performance benchmark is established. All results hold for general stochastic processes beyond Bernoulli and thus find a wide area of potential applications including multi-channel communication systems, multi-agent systems, web search and advertising, and social networks.",

keywords = "Cognitive radios, Decentralized multi-armed bandit, Opportunistic spectrum access, Order-optimal policy",

author = "Keqin Liu and Qing Zhao",

year = "2010",

doi = "10.1109/ICASSP.2010.5496131",

language = "English",

isbn = "9781424442966",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "3010--3013",

booktitle = "2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings",

note = "2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 ; Conference date: 14-03-2010 Through 19-03-2010",

}

Liu, K & Zhao, Q 2010, Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players. in 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings., 5496131, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 3010-3013, 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, Dallas, TX, United States, 14/03/10. https://doi.org/10.1109/ICASSP.2010.5496131

Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players. / Liu, Keqin; Zhao, Qing.
2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2010. p. 3010-3013 5496131 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Distributed learning in cognitive radio networks

T2 - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010

AU - Liu, Keqin

AU - Zhao, Qing

PY - 2010

Y1 - 2010

N2 - We consider a cognitive radio network with distributed multiple secondary users, where each user independently searches for spectrum opportunities in multiple channels without exchanging information with others. The occupancy of each channel is modeled as an i.i.d. Bernoulli process with unknown mean. Users choosing the same channel collide, and none or only one receives reward depending on the collision model. This problem can be formulated as a decentralized multi-armed bandit problem. We measure the performance of a decentralized policy by the system regret, defined as the total reward loss with respect to the optimal performance under the perfect scenario where all channel parameters are known to all users and collisions among secondary users are eliminated through perfect scheduling. We show that the minimum system regret grows with time at the same logarithmic order as in the centralized counterpart, where users exchange observations and make decisions jointly. We propose a basic policy structure that ensures a Time Division Fair Sharing (TDFS) of the channels. Based on this basic TDFS structure, decentralized policies can be constructed to achieve this optimal order while ensuring fairness among users. Furthermore, we show that the proposed TDFS policy belongs to a general class of decentralized polices, for which a uniform performance benchmark is established. All results hold for general stochastic processes beyond Bernoulli and thus find a wide area of potential applications including multi-channel communication systems, multi-agent systems, web search and advertising, and social networks.

AB - We consider a cognitive radio network with distributed multiple secondary users, where each user independently searches for spectrum opportunities in multiple channels without exchanging information with others. The occupancy of each channel is modeled as an i.i.d. Bernoulli process with unknown mean. Users choosing the same channel collide, and none or only one receives reward depending on the collision model. This problem can be formulated as a decentralized multi-armed bandit problem. We measure the performance of a decentralized policy by the system regret, defined as the total reward loss with respect to the optimal performance under the perfect scenario where all channel parameters are known to all users and collisions among secondary users are eliminated through perfect scheduling. We show that the minimum system regret grows with time at the same logarithmic order as in the centralized counterpart, where users exchange observations and make decisions jointly. We propose a basic policy structure that ensures a Time Division Fair Sharing (TDFS) of the channels. Based on this basic TDFS structure, decentralized policies can be constructed to achieve this optimal order while ensuring fairness among users. Furthermore, we show that the proposed TDFS policy belongs to a general class of decentralized polices, for which a uniform performance benchmark is established. All results hold for general stochastic processes beyond Bernoulli and thus find a wide area of potential applications including multi-channel communication systems, multi-agent systems, web search and advertising, and social networks.

KW - Cognitive radios

KW - Decentralized multi-armed bandit

KW - Opportunistic spectrum access

KW - Order-optimal policy

UR - http://www.scopus.com/inward/record.url?scp=78049402595&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2010.5496131

DO - 10.1109/ICASSP.2010.5496131

M3 - Conference Proceeding

AN - SCOPUS:78049402595

SN - 9781424442966

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 3010

EP - 3013

BT - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 14 March 2010 through 19 March 2010

ER -

Liu K, Zhao Q. Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players. In 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2010. p. 3010-3013. 5496131. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2010.5496131

Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this