The shortest path problem in the bandit setting

被引:1
|
作者
Gyorgy, Andras [1 ]
Linder, Tamas [2 ]
Lugosi, Gabor [3 ]
机构
[1] Hungarian Acad Sci, Comp & Automat Res Inst, Informat Lab, Lagymanyosi U 11, H-1111 Budapest, Hungary
[2] Queens Univ, Dept Math & Stat, Kingston, ON K7L 3N6, Canada
[3] Pompeu Fabra Univ, Dept Econ, Barcelona, Spain
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/ITW.2006.1633787
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The on-line shortest path problem is considered in the bandit setting. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary way, a decision maker has to pick in each round a path between two distinguished vertices, such that the weight of this path, given as the sum of the weights of its composing edges, be as small as possible. The decision maker has only limited information on how the weights of the edges are generated. In particular, the edge weights in the current round are unknown to the decision maker when it chooses a path, and after choosing a path, it learns only the weights of those edges that belong to the chosen path. An algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence)f the edge weights, by a quantity that is proportional to 1/root n and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier algorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than O(1/root n).
引用
收藏
页码:87 / +
页数:2
相关论文
共 50 条