The Irrevocable Multiarmed Bandit Problem

被引:15
|
作者
Farias, Vivek F. [1 ]
Madan, Ritesh [2 ]
机构
[1] MIT, Ctr Operat Res, Cambridge, MA 02139 USA
[2] Qualcomm New Jersey Res Ctr NJRC, Bridgewater, NJ 08807 USA
关键词
EFFICIENT ALLOCATION RULES; RESTLESS BANDITS; MULTIPLE PLAYS; REWARDS;
D O I
10.1287/opre.1100.0891
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper considers the multiarmed bandit problem with multiple simultaneous arm pulls and the additional restriction that we do not allow recourse to arms that were pulled at some point in the past but then discarded. This additional restriction is highly desirable from an operational perspective, and we refer to this problem as the "irrevocable multiarmed bandit" problem. We observe that natural modifications to well-known heuristics for multiarmed bandit problems that satisfy this irrevocability constraint have unsatisfactory performance and, thus motivated, introduce a new heuristic: the "packing" heuristic. We establish through numerical experiments that the packing heuristic offers excellent performance, even relative to heuristics that are not constrained to be irrevocable. We also provide a theoretical analysis that studies the "price" of irrevocability, i.e., the performance loss incurred in imposing the constraint we propose on the multiarmed bandit model. We show that this performance loss is uniformly bounded for a general class of multiarmed bandit problems and indicate its dependence on various problem parameters. Finally, we obtain a computationally fast algorithm to implement the packing heuristic; the algorithm renders the packing heuristic computationally cheaper than methods that rely on the computation of Gittins indices.
引用
收藏
页码:383 / 399
页数:17
相关论文
共 50 条
  • [41] Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach
    Krishnasamy, Subhashini
    Sen, Rajat
    Johari, Ramesh
    Shakkottai, Sanjay
    OPERATIONS RESEARCH, 2021, 69 (01) : 315 - 330
  • [42] Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality
    Chang, Hyeong Soo
    Choe, Sanghee
    JOURNAL OF CONTROL SCIENCE AND ENGINEERING, 2015, 2015
  • [43] On Abruptly-Changing and Slowly -Varying Multiarmed Bandit Problems
    Wei, Lai
    Srivastava, Vaibhav
    2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 6291 - 6296
  • [44] Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics
    Liu, Haoyang
    Liu, Keqin
    Zhao, Qing
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (03) : 1902 - 1916
  • [46] Smoking and the Bandit: A Preliminary Study of Smoker and Nonsmoker Differences in Exploratory Behavior Measured With a Multiarmed Bandit Task
    Addicott, Merideth A.
    Pearson, John M.
    Wilson, Jessica
    Platt, Michael L.
    McClernon, F. Joseph
    EXPERIMENTAL AND CLINICAL PSYCHOPHARMACOLOGY, 2013, 21 (01) : 66 - 73
  • [47] Distributed Consensus Algorithm for Decision-Making in Multiagent Multiarmed Bandit
    Cheng, Xiaotong
    Maghsudi, Setareh
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (04): : 2187 - 2199
  • [48] Dynamic Online Pricing with Incomplete Information Using Multiarmed Bandit Experiments
    Misra, Kanishka
    Schwartz, Eric M.
    Abernethy, Jacob
    MARKETING SCIENCE, 2019, 38 (02) : 226 - 252
  • [49] Optimal learning dynamics of multiagent system in restless multiarmed bandit game
    Nakayama, Kazuaki
    Nakamura, Ryuzo
    Hisakado, Masato
    Mori, Shintaro
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 549
  • [50] Satisficing in Multiarmed Bandit Problems (vol 62, pg 3788, 2017)
    Reverdy, Paul
    Srivastava, Vaibhav
    Leonard, Naomi Ehrich
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (01) : 476 - 478