A Multi-Armed Bandit Problem with the Optimal Arm Depending on a Hidden Markov Model

被引:0
|
作者
Gulcu, Talha Cihad
机构
关键词
D O I
10.1109/ITW48936.2021.9611510
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider a novel multi-armed bandit setup in which the reward distribution of each arm depends on a single discrete Markov process. This setup involves correlation among arms, as well as correlation among each time instant when one of the arms is pulled. For this problem we show that the cumulative regret has to grow linearly with the number of instances where the outcome of the previous arm pull cannot be determined uniquely. We propose an algorithm relying on the empirical transition matrix and analyze its performance. The algorithm is shown to minimize the contribution of regret for the time instances where the outcome of the previous arm pull can be identified uniquely. This implies that the algorithm performs order-wise optimally. We experimentally show that our algorithm can perform better than the correlated-UCB algorithm introduced by Gupta et. al. in 2018 and the classical UCB algorithm.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Finite dimensional algorithms for the hidden Markov model multi-armed bandit problem
    Krishnamurthy, V
    Mickova, J
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 2865 - 2868
  • [2] Finite dimensional algorithms for the Hidden Markov Model multi-armed bandit problem
    Krishnamurthy, Vikram
    Mickova, Josipa
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 5 : 2865 - 2868
  • [3] The budgeted multi-armed bandit problem
    Madani, O
    Lizotte, DJ
    Greiner, R
    LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 643 - 645
  • [4] THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES
    Perchet, Vianney
    Rigollet, Philippe
    ANNALS OF STATISTICS, 2013, 41 (02): : 693 - 721
  • [5] Tug-of-War Model for Multi-armed Bandit Problem
    Kim, Song-Ju
    Aono, Masashi
    Hara, Masahiko
    UNCONVENTIONAL COMPUTATION, PROCEEDINGS, 2010, 6079 : 69 - +
  • [6] ON MULTI-ARMED BANDIT PROBLEM WITH NUISANCE PARAMETER
    孙嘉阳
    Science China Mathematics, 1986, (05) : 464 - 475
  • [7] Robust control of the multi-armed bandit problem
    Caro, Felipe
    Das Gupta, Aparupa
    ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) : 461 - 480
  • [8] ON MULTI-ARMED BANDIT PROBLEM WITH NUISANCE PARAMETER
    孙嘉阳
    ScienceinChina,SerA., 1986, Ser.A.1986 (05) : 464 - 475
  • [9] An Adaptive Algorithm in Multi-Armed Bandit Problem
    Zhang X.
    Zhou Q.
    Liang B.
    Xu J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (03): : 643 - 654
  • [10] Multi-armed Bandit Requiring Monotone Arm Sequences
    Chen, Ningyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34