A Multi-Armed Bandit Problem with the Optimal Arm Depending on a Hidden Markov Model

被引：0

作者：

Gulcu, Talha Cihad

机构：

来源：

2021 IEEE INFORMATION THEORY WORKSHOP (ITW) | 2021年

关键词：

D O I：

10.1109/ITW48936.2021.9611510

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider a novel multi-armed bandit setup in which the reward distribution of each arm depends on a single discrete Markov process. This setup involves correlation among arms, as well as correlation among each time instant when one of the arms is pulled. For this problem we show that the cumulative regret has to grow linearly with the number of instances where the outcome of the previous arm pull cannot be determined uniquely. We propose an algorithm relying on the empirical transition matrix and analyze its performance. The algorithm is shown to minimize the contribution of regret for the time instances where the outcome of the previous arm pull can be identified uniquely. This implies that the algorithm performs order-wise optimally. We experimentally show that our algorithm can perform better than the correlated-UCB algorithm introduced by Gupta et. al. in 2018 and the classical UCB algorithm.

引用

页数：6

共 50 条

[31] SOME REWARD PENALTY RULES FOR THE MULTI-ARMED BANDIT PROBLEM WHICH ARE ASYMPTOTICALLY OPTIMAL
GLAZEBROOK, KD
ADVANCES IN APPLIED PROBABILITY, 1983, 15 (01) : 221 - 222
[32] Optimal Clustering with Noisy Queries via Multi-Armed Bandit
Xia, Jinghui
Huang, Zengfeng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[33] An asymptotically optimal strategy for constrained multi-armed bandit problems
Chang, Hyeong Soo
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2020, 91 (03) : 545 - 557
[34] Multi-armed Bandit processes with optimal selection of the operating times
Pilar Ibarrola
Ricardo Vélez
Test, 2005, 14 : 239 - 255
[35] An asymptotically optimal strategy for constrained multi-armed bandit problems
Hyeong Soo Chang
Mathematical Methods of Operations Research, 2020, 91 : 545 - 557
[36] Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit
Bouneffouf, Djallel
Parthasarathy, Srinivasan
Samulowitz, Horst
Wistuba, Martin
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2016 - 2022
[37] Multi-armed bandit processes with optimal selection of the operating times
Ibarrola, P
Vélez, R
TEST, 2005, 14 (01) : 239 - 255
[38] The non-stationary stochastic multi-armed bandit problem
Allesiardo R.
Féraud R.
Maillard O.-A.
Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
[39] On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models
Kaufmann, Emilie
Cappe, Olivier
Garivier, Aurelien
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[40] Multi-armed bandit for the cyclic minimum sitting arrangement problem
Robles, Marcos
Cavero, Sergio
Pardo, Eduardo G.
Cordon, Oscar
COMPUTERS & OPERATIONS RESEARCH, 2025, 179

← 1 2 3 4 5 →