Abstract
We consider dynamic spectrum access where distributed secondary users search for spectrum opportunities without knowing the primary traffic statistics. In each slot, a secondary transmitter chooses one channel to sense and subsequently transmit if the channel is sensed as idle. Sensing is imperfect, \ie an idle channel may be sensed as busy and vice versa. Without centralized control, each secondary user needs to independently identify the channels that offer the most opportunities while avoiding collisions with both primary and other secondary users. We address the problem within a cooperative game framework, where the objective is to maximize the throughput of the secondary network under a constraint on the collision with the primary system. The performance of a decentralized channel access policy is measured by the system regret, defined as the expected total performance loss with respect to the optimal performance in the ideal scenario where the traffic load of the primary system on each channel is known to all secondary users and collisions among secondary users are eliminated through centralized scheduling. By exploring the rich communication structure of the problem, we show that the optimal system regret has the same logarithmic order as in the centralized counterpart with perfect sensing. A decentralized policy is constructed to achieve the logarithmic order of the system regret. In a broader context, this work addresses imperfect reward observation in decentralized multi-armed bandit problems.
Original language | English |
---|---|
Article number | 6151769 |
Pages (from-to) | 1596-1604 |
Number of pages | 9 |
Journal | IEEE Transactions on Wireless Communications |
Volume | 11 |
Issue number | 4 |
DOIs | |
Publication status | Published - Apr 2012 |
Keywords
- cognitive radio
- cooperative game
- decentralized multi-armed bandit
- distributed learning
- dynamic spectrum access
- imperfect sensing
- system regret