Multi-armed Bandits with Generalized Temporally-Partitioned Rewards

被引:0
|
作者
van den Broek, Ronald C. [1 ]
Litjens, Rik [1 ]
Sagis, Tobias [1 ]
Verbeeke, Nina [1 ]
Gajane, Pratik [1 ]
机构
[1] Eindhoven Univ Technol, Eindhoven, Netherlands
基金
荷兰研究理事会;
关键词
Multi-armed bandits; Delayed rewards; Temporally-partitioned rewards;
D O I
10.1007/978-3-031-58547-0_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Decision-making problems of sequential nature, where decisions made in the past may have an impact on the future, are used to model many practically important applications. In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays. Motivated by such scenarios, we propose a novel problem formulation called multi-armed bandits with generalized temporally-partitioned rewards. To formalize how feedback about a decision is partitioned across several time steps, we introduce beta-spread property. We derive a lower bound on the performance of any uniformly efficient algorithm for the considered problem. Moreover, we provide an algorithm called TP-UCB-FR-G and prove an upper bound on its performance measure. In some scenarios, our upper bound improves upon the state of the art. We provide experimental results validating the proposed algorithm and our theoretical results.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 50 条
  • [1] Trading off Rewards and Errors in Multi-Armed Bandits
    Erraqabi, Akram
    Lazaric, Alessandro
    Valko, Michal
    Brunskill, Emma
    Liu, Yun-En
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 709 - 717
  • [2] Multi-Armed Bandits With Self-Information Rewards
    Weinberger, Nir
    Yemini, Michal
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (11) : 7160 - 7184
  • [3] Parametrized Stochastic Multi-armed Bandits with Binary Rewards
    Jiang, Chong
    Srikant, R.
    2011 AMERICAN CONTROL CONFERENCE, 2011, : 119 - 124
  • [4] The value of information in multi-armed bandits with exponentially distributed rewards
    Ryzhov, Ilya O.
    Powell, Warren B.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 1363 - 1372
  • [5] Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints
    Xu, Huanle
    Liu, Yang
    Lau, Wing Cheong
    Li, Rui
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2554 - 2560
  • [6] Multi-player Multi-armed Bandits: Decentralized Learning with IID Rewards
    Kalathil, Dileep
    Nayyar, Naumaan
    Jain, Rahul
    2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 853 - 860
  • [7] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
    Lee, Kyungjae
    Yang, Hongjun
    Lim, Sungbin
    Oh, Songhwai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] On Kernelized Multi-armed Bandits
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [9] Multi-armed Bandits with Compensation
    Wang, Siwei
    Huang, Longbo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [10] Regional Multi-Armed Bandits
    Wang, Zhiyang
    Zhou, Ruida
    Shen, Cong
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84