The Irrevocable Multiarmed Bandit Problem

被引：15

作者：

Farias, Vivek F. ^{[1
]}

Madan, Ritesh ^{[2
]}

机构：

[1] MIT, Ctr Operat Res, Cambridge, MA 02139 USA

[2] Qualcomm New Jersey Res Ctr NJRC, Bridgewater, NJ 08807 USA

来源：

OPERATIONS RESEARCH | 2011年 / 59卷 / 02期

关键词：

EFFICIENT ALLOCATION RULES; RESTLESS BANDITS; MULTIPLE PLAYS; REWARDS;

D O I：

10.1287/opre.1100.0891

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper considers the multiarmed bandit problem with multiple simultaneous arm pulls and the additional restriction that we do not allow recourse to arms that were pulled at some point in the past but then discarded. This additional restriction is highly desirable from an operational perspective, and we refer to this problem as the "irrevocable multiarmed bandit" problem. We observe that natural modifications to well-known heuristics for multiarmed bandit problems that satisfy this irrevocability constraint have unsatisfactory performance and, thus motivated, introduce a new heuristic: the "packing" heuristic. We establish through numerical experiments that the packing heuristic offers excellent performance, even relative to heuristics that are not constrained to be irrevocable. We also provide a theoretical analysis that studies the "price" of irrevocability, i.e., the performance loss incurred in imposing the constraint we propose on the multiarmed bandit model. We show that this performance loss is uniformly bounded for a general class of multiarmed bandit problems and indicate its dependence on various problem parameters. Finally, we obtain a computationally fast algorithm to implement the packing heuristic; the algorithm renders the packing heuristic computationally cheaper than methods that rely on the computation of Gittins indices.

引用

页码：383 / 399

页数：17

共 50 条

[41] Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach
Krishnasamy, Subhashini
Sen, Rajat
Johari, Ramesh
Shakkottai, Sanjay
OPERATIONS RESEARCH, 2021, 69 (01) : 315 - 330
[42] Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality
Chang, Hyeong Soo
Choe, Sanghee
JOURNAL OF CONTROL SCIENCE AND ENGINEERING, 2015, 2015
[43] On Abruptly-Changing and Slowly -Varying Multiarmed Bandit Problems
Wei, Lai
Srivastava, Vaibhav
2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 6291 - 6296
[44] Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics
Liu, Haoyang
Liu, Keqin
Zhao, Qing
IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (03) : 1902 - 1916
[45] SAMPLE-MEAN BASED INDEX POLICIES WITH O(LOG-N) REGRET FOR THE MULTIARMED BANDIT PROBLEM
AGRAWAL, R
ADVANCES IN APPLIED PROBABILITY, 1995, 27 (04) : 1054 - 1078
[46] Smoking and the Bandit: A Preliminary Study of Smoker and Nonsmoker Differences in Exploratory Behavior Measured With a Multiarmed Bandit Task
Addicott, Merideth A.
Pearson, John M.
Wilson, Jessica
Platt, Michael L.
McClernon, F. Joseph
EXPERIMENTAL AND CLINICAL PSYCHOPHARMACOLOGY, 2013, 21 (01) : 66 - 73
[47] Distributed Consensus Algorithm for Decision-Making in Multiagent Multiarmed Bandit
Cheng, Xiaotong
Maghsudi, Setareh
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (04): : 2187 - 2199
[48] Dynamic Online Pricing with Incomplete Information Using Multiarmed Bandit Experiments
Misra, Kanishka
Schwartz, Eric M.
Abernethy, Jacob
MARKETING SCIENCE, 2019, 38 (02) : 226 - 252
[49] Optimal learning dynamics of multiagent system in restless multiarmed bandit game
Nakayama, Kazuaki
Nakamura, Ryuzo
Hisakado, Masato
Mori, Shintaro
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 549
[50] Satisficing in Multiarmed Bandit Problems (vol 62, pg 3788, 2017)
Reverdy, Paul
Srivastava, Vaibhav
Leonard, Naomi Ehrich
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (01) : 476 - 478

← 1 2 3 4 5 →