The Irrevocable Multiarmed Bandit Problem

被引：15

作者：

Farias, Vivek F. ^{[1
]}

Madan, Ritesh ^{[2
]}

机构：

[1] MIT, Ctr Operat Res, Cambridge, MA 02139 USA

[2] Qualcomm New Jersey Res Ctr NJRC, Bridgewater, NJ 08807 USA

来源：

OPERATIONS RESEARCH | 2011年 / 59卷 / 02期

关键词：

EFFICIENT ALLOCATION RULES; RESTLESS BANDITS; MULTIPLE PLAYS; REWARDS;

D O I：

10.1287/opre.1100.0891

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper considers the multiarmed bandit problem with multiple simultaneous arm pulls and the additional restriction that we do not allow recourse to arms that were pulled at some point in the past but then discarded. This additional restriction is highly desirable from an operational perspective, and we refer to this problem as the "irrevocable multiarmed bandit" problem. We observe that natural modifications to well-known heuristics for multiarmed bandit problems that satisfy this irrevocability constraint have unsatisfactory performance and, thus motivated, introduce a new heuristic: the "packing" heuristic. We establish through numerical experiments that the packing heuristic offers excellent performance, even relative to heuristics that are not constrained to be irrevocable. We also provide a theoretical analysis that studies the "price" of irrevocability, i.e., the performance loss incurred in imposing the constraint we propose on the multiarmed bandit model. We show that this performance loss is uniformly bounded for a general class of multiarmed bandit problems and indicate its dependence on various problem parameters. Finally, we obtain a computationally fast algorithm to implement the packing heuristic; the algorithm renders the packing heuristic computationally cheaper than methods that rely on the computation of Gittins indices.

引用

页码：383 / 399

页数：17

共 50 条

[31] An ε-Greedy Multiarmed Bandit Approach to Markov Decision Processes
Muqattash, Isa
Hu, Jiaqiao
STATS, 2023, 6 (01): : 99 - 112
[32] ASYMPTOTICALLY EFFICIENT ALLOCATION RULES FOR THE MULTIARMED BANDIT PROBLEM WITH MULTIPLE PLAYS .2. MARKOVIAN REWARDS
ANANTHARAM, V
VARAIYA, P
WALRAND, J
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1987, 32 (11) : 977 - 982
[33] Optimality of Myopic Policy for Restless Multiarmed Bandit with Imperfect Observation
Wang, Kehao
2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
[34] On Statistical Discrimination as a Failure of Social Learning: A Multiarmed Bandit Approach
Komiyama, Junpei
Noda, Shunya
MANAGEMENT SCIENCE, 2024,
[35] OPTIMAL STOPPING PROBLEMS FOR MULTIARMED BANDIT PROCESSES WITH ARMS INDEPENDENCE
YOSHIDA, Y
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1993, 26 (12) : 47 - 60
[36] LINEAR-PROGRAMMING FOR FINITE STATE MULTIARMED BANDIT PROBLEMS
CHEN, YR
KATEHAKIS, MN
MATHEMATICS OF OPERATIONS RESEARCH, 1986, 11 (01) : 180 - 183
[37] A GENERAL THEORY OF MULTIARMED BANDIT PROCESSES WITH CONSTRAINED ARM SWITCHES
Bao, Wenqing
Cai, Xiaoqiang
Wu, Xianyi
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2021, 59 (06) : 4666 - 4688
[38] Uncertainty-of-Information Scheduling: A Restless Multiarmed Bandit Framework
Chen, Gongpu
Liew, Soung Chang
Shao, Yulin
IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (09) : 6151 - 6173
[39] The biobjective multiarmed bandit: learning approximate lexicographic optimal allocations
Tekin, Cem
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (02) : 1065 - 1080
[40] Discounted multiarmed bandit problems on a collection of machines with varying speeds
Dunn, RT
Glazebrook, KD
MATHEMATICS OF OPERATIONS RESEARCH, 2004, 29 (02) : 266 - 279

← 1 2 3 4 5 →