A Multi-Armed Bandit Problem with the Optimal Arm Depending on a Hidden Markov Model

被引：0

作者：

Gulcu, Talha Cihad

机构：

来源：

2021 IEEE INFORMATION THEORY WORKSHOP (ITW) | 2021年

关键词：

D O I：

10.1109/ITW48936.2021.9611510

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider a novel multi-armed bandit setup in which the reward distribution of each arm depends on a single discrete Markov process. This setup involves correlation among arms, as well as correlation among each time instant when one of the arms is pulled. For this problem we show that the cumulative regret has to grow linearly with the number of instances where the outcome of the previous arm pull cannot be determined uniquely. We propose an algorithm relying on the empirical transition matrix and analyze its performance. The algorithm is shown to minimize the contribution of regret for the time instances where the outcome of the previous arm pull can be identified uniquely. This implies that the algorithm performs order-wise optimally. We experimentally show that our algorithm can perform better than the correlated-UCB algorithm introduced by Gupta et. al. in 2018 and the classical UCB algorithm.

引用

页数：6

共 50 条

[21] The sample complexity of exploration in the multi-armed bandit problem
Mannor, S
Tsitsiklis, JN
JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 5 : 623 - 648
[22] Interface Design Optimization as a Multi-Armed Bandit Problem
Lomas, J. Derek
Forlizzi, Jodi
Poonwala, Nikhil
Patel, Nirmal
Shodhan, Sharan
Patel, Kishan
Koedinger, Ken
Brunskill, Emma
34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016, 2016, : 4142 - 4153
[23] Scalable Discrete Sampling as a Multi-Armed Bandit Problem
Chen, Yutian
Ghahramani, Zoubin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[24] Online Optimization Algorithms for Multi-Armed Bandit Problem
Kamalov, Mikhail
Dobrynin, Vladimir
Balykina, Yulia
2017 CONSTRUCTIVE NONSMOOTH ANALYSIS AND RELATED TOPICS (DEDICATED TO THE MEMORY OF V.F. DEMYANOV) (CNSA), 2017, : 141 - 143
[25] THE MULTI-ARMED BANDIT PROBLEM: AN EFFICIENT NONPARAMETRIC SOLUTION
Chan, Hock Peng
ANNALS OF STATISTICS, 2020, 48 (01): : 346 - 373
[26] Achieving fairness in the stochastic multi-armed bandit problem
Patil, Vishakha
Ghalme, Ganesh
Nair, Vineet
Narahari, Y.
1600, Microtome Publishing (22): : 1 - 31
[27] The multi-armed bandit, with constraints
Eric V. Denardo
Eugene A. Feinberg
Uriel G. Rothblum
Annals of Operations Research, 2013, 208 : 37 - 62
[28] The multi-armed bandit, with constraints
Denardo, Eric V.
Feinberg, Eugene A.
Rothblum, Uriel G.
ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 37 - 62
[29] The Assistive Multi-Armed Bandit
Chan, Lawrence
Hadfield-Menell, Dylan
Srinivasa, Siddhartha
Dragan, Anca
HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 354 - 363
[30] Multi-armed bandit games
Gursoy, Kemal
ANNALS OF OPERATIONS RESEARCH, 2024,

← 1 2 3 4 5 →