The Irrevocable Multiarmed Bandit Problem

被引:15
|
作者
Farias, Vivek F. [1 ]
Madan, Ritesh [2 ]
机构
[1] MIT, Ctr Operat Res, Cambridge, MA 02139 USA
[2] Qualcomm New Jersey Res Ctr NJRC, Bridgewater, NJ 08807 USA
关键词
EFFICIENT ALLOCATION RULES; RESTLESS BANDITS; MULTIPLE PLAYS; REWARDS;
D O I
10.1287/opre.1100.0891
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper considers the multiarmed bandit problem with multiple simultaneous arm pulls and the additional restriction that we do not allow recourse to arms that were pulled at some point in the past but then discarded. This additional restriction is highly desirable from an operational perspective, and we refer to this problem as the "irrevocable multiarmed bandit" problem. We observe that natural modifications to well-known heuristics for multiarmed bandit problems that satisfy this irrevocability constraint have unsatisfactory performance and, thus motivated, introduce a new heuristic: the "packing" heuristic. We establish through numerical experiments that the packing heuristic offers excellent performance, even relative to heuristics that are not constrained to be irrevocable. We also provide a theoretical analysis that studies the "price" of irrevocability, i.e., the performance loss incurred in imposing the constraint we propose on the multiarmed bandit model. We show that this performance loss is uniformly bounded for a general class of multiarmed bandit problems and indicate its dependence on various problem parameters. Finally, we obtain a computationally fast algorithm to implement the packing heuristic; the algorithm renders the packing heuristic computationally cheaper than methods that rely on the computation of Gittins indices.
引用
收藏
页码:383 / 399
页数:17
相关论文
共 50 条
  • [1] The nonstochastic multiarmed bandit problem
    Auer, P
    Cesa-Bianchi, N
    Freund, Y
    Schapire, RE
    SIAM JOURNAL ON COMPUTING, 2003, 32 (01) : 48 - 77
  • [2] MULTIARMED BANDIT PROBLEM REVISITED
    ISHIKIDA, T
    VARAIYA, P
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1994, 83 (01) : 113 - 154
  • [3] A LEMMA ON THE MULTIARMED BANDIT PROBLEM
    TSITSIKLIS, JN
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1986, 31 (06) : 576 - 577
  • [4] THE MULTIARMED BANDIT PROBLEM - DECOMPOSITION AND COMPUTATION
    KATEHAKIS, MN
    VEINOTT, AF
    MATHEMATICS OF OPERATIONS RESEARCH, 1987, 12 (02) : 262 - 268
  • [5] ON MULTIARMED BANDIT PROBLEM WITH NUISANCE PARAMETER
    SUN, JY
    SCIENTIA SINICA SERIES A-MATHEMATICAL PHYSICAL ASTRONOMICAL & TECHNICAL SCIENCES, 1986, 29 (05): : 464 - 475
  • [6] ADAPTIVE TREATMENT ALLOCATION AND THE MULTIARMED BANDIT PROBLEM
    LAI, TL
    ANNALS OF STATISTICS, 1987, 15 (03): : 1091 - 1114
  • [7] EXTENSIONS OF THE MULTIARMED BANDIT PROBLEM - THE DISCOUNTED CASE
    VARAIYA, PP
    WALRAND, JC
    BUYUKKOC, C
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1985, 30 (05) : 426 - 439
  • [8] A Structured Multiarmed Bandit Problem and the Greedy Policy
    Mersereau, Adam J.
    Rusmevichientong, Paat
    Tsitsiklis, John N.
    47TH IEEE CONFERENCE ON DECISION AND CONTROL, 2008 (CDC 2008), 2008, : 4945 - 4950
  • [9] A Structured Multiarmed Bandit Problem and the Greedy Policy
    Mersereau, Adam J.
    Rusmevichientong, Paat
    Tsitsiklis, John N.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2009, 54 (12) : 2787 - 2802
  • [10] Finite-time Analysis of the Multiarmed Bandit Problem
    Peter Auer
    Nicolò Cesa-Bianchi
    Paul Fischer
    Machine Learning, 2002, 47 : 235 - 256