TY - GEN
T1 - Online learning for stochastic linear optimization problems
AU - Liu, Keqin
AU - Zhao, Qing
PY - 2012
Y1 - 2012
N2 - We consider the stochastic online linear optimization problems under unknown cost models. At each time, an action is chosen from a compact subset in R d and a random cost with an unknown distribution (depending on the action) is incurred. The expected value of the random cost is assumed to be a (unknown) linear function over the action space. The objective is to minimize the growth rate of regret (i.e., the increase in expected total cost compared to the ideal scenario with known cost models) in terms of both problem size and time horizon length. We propose a policy that achieves a regret polynomial in the problem size and sublinear in time. This policy also allows a performance tradeoff in terms of the problem size and the time horizon length: it can offer a linear regret order in problem size with an arbitrarily small sacrifice in the regret order with time. We further consider a special class of the problem when the action space is a polytope or finite and show that a regret logarithmic with time and polynomial with the problem size can be achieved. This special class includes the adaptive shortest-path routing problem in networks with unknown and stochastically varying link sates.
AB - We consider the stochastic online linear optimization problems under unknown cost models. At each time, an action is chosen from a compact subset in R d and a random cost with an unknown distribution (depending on the action) is incurred. The expected value of the random cost is assumed to be a (unknown) linear function over the action space. The objective is to minimize the growth rate of regret (i.e., the increase in expected total cost compared to the ideal scenario with known cost models) in terms of both problem size and time horizon length. We propose a policy that achieves a regret polynomial in the problem size and sublinear in time. This policy also allows a performance tradeoff in terms of the problem size and the time horizon length: it can offer a linear regret order in problem size with an arbitrarily small sacrifice in the regret order with time. We further consider a special class of the problem when the action space is a polytope or finite and show that a regret logarithmic with time and polynomial with the problem size can be achieved. This special class includes the adaptive shortest-path routing problem in networks with unknown and stochastically varying link sates.
UR - http://www.scopus.com/inward/record.url?scp=84860460204&partnerID=8YFLogxK
U2 - 10.1109/ITA.2012.6181775
DO - 10.1109/ITA.2012.6181775
M3 - Conference Proceeding
AN - SCOPUS:84860460204
SN - 9781467314725
T3 - 2012 Information Theory and Applications Workshop, ITA 2012 - Conference Proceedings
SP - 363
EP - 367
BT - 2012 Information Theory and Applications Workshop, ITA 2012 - Conference Proceedings
T2 - 2012 Information Theory and Applications Workshop, ITA 2012
Y2 - 5 February 2012 through 10 February 2012
ER -