Robust control of the multi-armed bandit problem

被引：3

作者：

Caro, Felipe ^{[1
]}

Das Gupta, Aparupa ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Anderson Sch Management, Los Angeles, CA 90095 USA

来源：

ANNALS OF OPERATIONS RESEARCH | 2022年 / 317卷 / 02期

关键词：

Multiarmed bandit; Index policies; Bellman equation; Robust Markov decision processes; Uncertain transition matrix; Project selection; MARKOV DECISION-PROCESSES; OPTIMAL ADAPTIVE POLICIES; ALLOCATION;

D O I：

10.1007/s10479-015-1965-7

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittins index that is the solution to a robust optimal stopping-time problem and can be computed effectively with an equivalent restart problem. We then characterize the optimal policy of the robust MAB as a project-by-project retirement policy but we show that arms become dependent so the policy based on the robust Gittins index is not optimal. For a project selection problem, we show that the robust Gittins index policy is near optimal but its implementation requires more computational effort than solving a non-robust MAB problem. Hence, we propose a Lagrangian index policy that requires the same computational effort as evaluating the indices of a non-robust MAB and is within 1 % of the optimum in the robust project selection problem.

引用

页码：461 / 480

页数：20

共 50 条

[31] Scaling Multi-Armed Bandit Algorithms
Fouche, Edouard
Komiyama, Junpei
Boehm, Klemens
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1449 - 1459
[32] The Multi-Armed Bandit With Stochastic Plays
Lesage-Landry, Antoine
Taylor, Joshua A.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286
[33] Satisficing in Multi-Armed Bandit Problems
Reverdy, Paul
Srivastava, Vaibhav
Leonard, Naomi Ehrich
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3788 - 3803
[34] Multi-armed Bandit with Additional Observations
Yun, Donggyu
Proutiere, Alexandre
Ahn, Sumyeong
Shin, Jinwoo
Yi, Yung
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2018, 2 (01)
[35] IMPROVING STRATEGIES FOR THE MULTI-ARMED BANDIT
POHLENZ, S
MARKOV PROCESS AND CONTROL THEORY, 1989, 54 : 158 - 163
[36] MULTI-ARMED BANDIT ALLOCATION INDEXES
JONES, PW
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 1989, 40 (12) : 1158 - 1159
[37] A Multi-armed Bandit Approach to Distributed Robust Beamforming in Multicell Networks
Zhang, Xinruo
Nakhai, Mohammad Reza
Ariffin, Wan Nur Suryani Firuz Wan
2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
[38] The Multi-fidelity Multi-armed Bandit
Kandasamy, Kirthevasan
Dasarathy, Gautam
Schneider, Jeff
Poczos, Barnabas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[39] Multi-armed Bandit with Additional Observations
Yun D.
Ahn S.
Proutiere A.
Shin J.
Yi Y.
2018, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (46): : 53 - 55
[40] A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem
Madhushani, Udari
Leonard, Naomi Ehrich
2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), 2020, : 1677 - 1682

← 1 2 3 4 5 →