A Multi-Armed Bandit Problem with the Optimal Arm Depending on a Hidden Markov Model

被引:0
|
作者
Gulcu, Talha Cihad
机构
关键词
D O I
10.1109/ITW48936.2021.9611510
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider a novel multi-armed bandit setup in which the reward distribution of each arm depends on a single discrete Markov process. This setup involves correlation among arms, as well as correlation among each time instant when one of the arms is pulled. For this problem we show that the cumulative regret has to grow linearly with the number of instances where the outcome of the previous arm pull cannot be determined uniquely. We propose an algorithm relying on the empirical transition matrix and analyze its performance. The algorithm is shown to minimize the contribution of regret for the time instances where the outcome of the previous arm pull can be identified uniquely. This implies that the algorithm performs order-wise optimally. We experimentally show that our algorithm can perform better than the correlated-UCB algorithm introduced by Gupta et. al. in 2018 and the classical UCB algorithm.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Multi-armed bandit problem with online clustering as side information
    Dzhoha, Andrii
    Rozora, Iryna
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2023, 427
  • [42] SOM-based Algorithm for Multi-armed Bandit Problem
    Manome, Nobuhito
    Shinohara, Shuji
    Suzuki, Kouta
    Tomonaga, Kosuke
    Mitsuyoshi, Shunji
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [43] DYNAMIC ALLOCATION INDEX FOR THE DISCOUNTED MULTI-ARMED BANDIT PROBLEM
    GITTINS, JC
    JONES, DM
    BIOMETRIKA, 1979, 66 (03) : 561 - 565
  • [44] Revisiting the multi-armed bandit model for the optimal design of clinical trials: benefits and drawbacks
    Sofia S Villar
    Jack Bowden
    James Wason
    Trials, 14 (Suppl 1)
  • [45] Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
    Komiyama, Junpei
    Honda, Junya
    Nakagawa, Hiroshi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1152 - 1161
  • [46] Dynamic Multi-Armed Bandit with Covariates
    Pavlidis, Nicos G.
    Tasoulis, Dimitris K.
    Adams, Niall M.
    Hand, David J.
    ECAI 2008, PROCEEDINGS, 2008, 178 : 777 - +
  • [47] Scaling Multi-Armed Bandit Algorithms
    Fouche, Edouard
    Komiyama, Junpei
    Boehm, Klemens
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1449 - 1459
  • [48] The Multi-Armed Bandit With Stochastic Plays
    Lesage-Landry, Antoine
    Taylor, Joshua A.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286
  • [49] Satisficing in Multi-Armed Bandit Problems
    Reverdy, Paul
    Srivastava, Vaibhav
    Leonard, Naomi Ehrich
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3788 - 3803
  • [50] MULTI-ARMED BANDIT ALLOCATION INDEXES
    JONES, PW
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 1989, 40 (12) : 1158 - 1159