TY - GEN
T1 - Adaptive shortest-path routing under unknown and stochastically varying link states
AU - Liu, Keqin
AU - Zhao, Qing
PY - 2012
Y1 - 2012
N2 - We consider adaptive shortest-path routing in wireless networks. In this problem, we aim to optimize the quality of communication between a source and a destination through adaptive path selection. Due to the randomness and uncertainties in the network dynamics, the state of each communication link varies over time according to a stochastic process with unknown distributions. The link states are not directly observable. The aggregated end-to-end cost of a path from the source to the destination is revealed after the path is chosen for communication. The objective is an adaptive path selection algorithm that minimizes regret defined as the additional cost over the ideal scenario where the best path is known a priori. This problem can be cast as a variation of the classic multi-armed bandit (MAB) problem with each path as an arm and arms dependent through common links. We show that by exploiting arm dependencies, a regret polynomial with the network size can be achieved while maintaining the optimal logarithmic order with time. This is in sharp contrast with the exponential regret order with the network size offered by a direct application of the classic MAB policies that ignores arm dependencies. Furthermore, our results are obtained under a general model of link state distributions (including heavy-tailed distributions). These results find applications in cognitive radio and ad hoc networks with unknown and dynamic communication environments.
AB - We consider adaptive shortest-path routing in wireless networks. In this problem, we aim to optimize the quality of communication between a source and a destination through adaptive path selection. Due to the randomness and uncertainties in the network dynamics, the state of each communication link varies over time according to a stochastic process with unknown distributions. The link states are not directly observable. The aggregated end-to-end cost of a path from the source to the destination is revealed after the path is chosen for communication. The objective is an adaptive path selection algorithm that minimizes regret defined as the additional cost over the ideal scenario where the best path is known a priori. This problem can be cast as a variation of the classic multi-armed bandit (MAB) problem with each path as an arm and arms dependent through common links. We show that by exploiting arm dependencies, a regret polynomial with the network size can be achieved while maintaining the optimal logarithmic order with time. This is in sharp contrast with the exponential regret order with the network size offered by a direct application of the classic MAB policies that ignores arm dependencies. Furthermore, our results are obtained under a general model of link state distributions (including heavy-tailed distributions). These results find applications in cognitive radio and ad hoc networks with unknown and dynamic communication environments.
UR - http://www.scopus.com/inward/record.url?scp=84866935969&partnerID=8YFLogxK
M3 - Conference Proceeding
AN - SCOPUS:84866935969
SN - 9783901882456
T3 - 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks, WiOpt 2012
SP - 232
EP - 237
BT - 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks, WiOpt 2012
T2 - 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks, WiOpt 2012
Y2 - 14 May 2012 through 18 May 2012
ER -