A Multi-Armed Bandit Problem with the Optimal Arm Depending on a Hidden Markov Model

被引:0
|
作者
Gulcu, Talha Cihad
机构
关键词
D O I
10.1109/ITW48936.2021.9611510
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider a novel multi-armed bandit setup in which the reward distribution of each arm depends on a single discrete Markov process. This setup involves correlation among arms, as well as correlation among each time instant when one of the arms is pulled. For this problem we show that the cumulative regret has to grow linearly with the number of instances where the outcome of the previous arm pull cannot be determined uniquely. We propose an algorithm relying on the empirical transition matrix and analyze its performance. The algorithm is shown to minimize the contribution of regret for the time instances where the outcome of the previous arm pull can be identified uniquely. This implies that the algorithm performs order-wise optimally. We experimentally show that our algorithm can perform better than the correlated-UCB algorithm introduced by Gupta et. al. in 2018 and the classical UCB algorithm.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] The sample complexity of exploration in the multi-armed bandit problem
    Mannor, S
    Tsitsiklis, JN
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 5 : 623 - 648
  • [22] Interface Design Optimization as a Multi-Armed Bandit Problem
    Lomas, J. Derek
    Forlizzi, Jodi
    Poonwala, Nikhil
    Patel, Nirmal
    Shodhan, Sharan
    Patel, Kishan
    Koedinger, Ken
    Brunskill, Emma
    34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016, 2016, : 4142 - 4153
  • [23] Scalable Discrete Sampling as a Multi-Armed Bandit Problem
    Chen, Yutian
    Ghahramani, Zoubin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [24] Online Optimization Algorithms for Multi-Armed Bandit Problem
    Kamalov, Mikhail
    Dobrynin, Vladimir
    Balykina, Yulia
    2017 CONSTRUCTIVE NONSMOOTH ANALYSIS AND RELATED TOPICS (DEDICATED TO THE MEMORY OF V.F. DEMYANOV) (CNSA), 2017, : 141 - 143
  • [25] THE MULTI-ARMED BANDIT PROBLEM: AN EFFICIENT NONPARAMETRIC SOLUTION
    Chan, Hock Peng
    ANNALS OF STATISTICS, 2020, 48 (01): : 346 - 373
  • [26] Achieving fairness in the stochastic multi-armed bandit problem
    Patil, Vishakha
    Ghalme, Ganesh
    Nair, Vineet
    Narahari, Y.
    1600, Microtome Publishing (22): : 1 - 31
  • [27] The multi-armed bandit, with constraints
    Eric V. Denardo
    Eugene A. Feinberg
    Uriel G. Rothblum
    Annals of Operations Research, 2013, 208 : 37 - 62
  • [28] The multi-armed bandit, with constraints
    Denardo, Eric V.
    Feinberg, Eugene A.
    Rothblum, Uriel G.
    ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 37 - 62
  • [29] The Assistive Multi-Armed Bandit
    Chan, Lawrence
    Hadfield-Menell, Dylan
    Srinivasa, Siddhartha
    Dragan, Anca
    HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 354 - 363
  • [30] Multi-armed bandit games
    Gursoy, Kemal
    ANNALS OF OPERATIONS RESEARCH, 2024,