TY - GEN
T1 - Learning from collisions in cognitive radio networks
T2 - 2010 IEEE Military Communications Conference, MILCOM 2010
AU - Liu, Keqin
AU - Zhao, Qing
PY - 2010
Y1 - 2010
N2 - We consider a decentralized multi-armed bandit problem arisen in the application of cognitive radio networks, where M distributed secondary users independently search for spectrum opportunities in N channels without information sharing. The channel occupancy is modeled as an i.i.d. Bernoulli process with unknown mean. A collision happens when multiple users choose the same channel, and none or only one receives reward depending on the collision model. Under a non-Bayesian formulation, the performance measure of a policy is given by the system regret, defied as the total reward loss with respect to the optimal performance in the ideal scenario where all channel parameters are known to all users and collisions among secondary users are eliminated through centralized scheduling. In our previous work, a Time Division Fair Sharing (TDFS) framework was proposed for constructing fair decentralized policies that achieve the same logarithmic order of the system regret growth rate as in the centralized counterpart where users exchange observations and make decisions jointly. This TDFS framework, however, requires pre-agreement among users regarding the offset in their time sharing schedule. In this work, we generalize the TDFS framework by eliminating the pre-agreement thus achieve a complete decentralization among users. We show that by learning from collisions, users can settle at orthogonal time-sharing offsets and the TDFS framework can maintain its logarithmic regret order and the fairness among users without pre-agreement. The result applies to general stochastic processes beyond Bernoulli and thus finds a wide range of applications including multi-channel communication systems, multi-agent systems, web search and advertising, and social networks.
AB - We consider a decentralized multi-armed bandit problem arisen in the application of cognitive radio networks, where M distributed secondary users independently search for spectrum opportunities in N channels without information sharing. The channel occupancy is modeled as an i.i.d. Bernoulli process with unknown mean. A collision happens when multiple users choose the same channel, and none or only one receives reward depending on the collision model. Under a non-Bayesian formulation, the performance measure of a policy is given by the system regret, defied as the total reward loss with respect to the optimal performance in the ideal scenario where all channel parameters are known to all users and collisions among secondary users are eliminated through centralized scheduling. In our previous work, a Time Division Fair Sharing (TDFS) framework was proposed for constructing fair decentralized policies that achieve the same logarithmic order of the system regret growth rate as in the centralized counterpart where users exchange observations and make decisions jointly. This TDFS framework, however, requires pre-agreement among users regarding the offset in their time sharing schedule. In this work, we generalize the TDFS framework by eliminating the pre-agreement thus achieve a complete decentralization among users. We show that by learning from collisions, users can settle at orthogonal time-sharing offsets and the TDFS framework can maintain its logarithmic regret order and the fairness among users without pre-agreement. The result applies to general stochastic processes beyond Bernoulli and thus finds a wide range of applications including multi-channel communication systems, multi-agent systems, web search and advertising, and social networks.
UR - http://www.scopus.com/inward/record.url?scp=79951645996&partnerID=8YFLogxK
U2 - 10.1109/MILCOM.2010.5680378
DO - 10.1109/MILCOM.2010.5680378
M3 - Conference Proceeding
AN - SCOPUS:79951645996
SN - 9781424481804
T3 - Proceedings - IEEE Military Communications Conference MILCOM
SP - 2262
EP - 2267
BT - 2010 IEEE Military Communications Conference, MILCOM 2010
Y2 - 31 October 2010 through 3 November 2010
ER -