Multi-armed Bandits with Generalized Temporally-Partitioned Rewards

被引：0

作者：

van den Broek, Ronald C. ^{[1
]}

Litjens, Rik ^{[1
]}

Sagis, Tobias ^{[1
]}

Verbeeke, Nina ^{[1
]}

Gajane, Pratik ^{[1
]}

机构：

[1] Eindhoven Univ Technol, Eindhoven, Netherlands

来源：

ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024 | 2024年 / 14641卷

基金：

荷兰研究理事会;

关键词：

Multi-armed bandits; Delayed rewards; Temporally-partitioned rewards;

D O I：

10.1007/978-3-031-58547-0_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Decision-making problems of sequential nature, where decisions made in the past may have an impact on the future, are used to model many practically important applications. In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays. Motivated by such scenarios, we propose a novel problem formulation called multi-armed bandits with generalized temporally-partitioned rewards. To formalize how feedback about a decision is partitioned across several time steps, we introduce beta-spread property. We derive a lower bound on the performance of any uniformly efficient algorithm for the considered problem. Moreover, we provide an algorithm called TP-UCB-FR-G and prove an upper bound on its performance measure. In some scenarios, our upper bound improves upon the state of the art. We provide experimental results validating the proposed algorithm and our theoretical results.

引用

页码：41 / 52

页数：12

共 50 条

[1] Trading off Rewards and Errors in Multi-Armed Bandits
Erraqabi, Akram
Lazaric, Alessandro
Valko, Michal
Brunskill, Emma
Liu, Yun-En
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 709 - 717
[2] Multi-Armed Bandits With Self-Information Rewards
Weinberger, Nir
Yemini, Michal
IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (11) : 7160 - 7184
[3] Parametrized Stochastic Multi-armed Bandits with Binary Rewards
Jiang, Chong
Srikant, R.
2011 AMERICAN CONTROL CONFERENCE, 2011, : 119 - 124
[4] The value of information in multi-armed bandits with exponentially distributed rewards
Ryzhov, Ilya O.
Powell, Warren B.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 1363 - 1372
[5] Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints
Xu, Huanle
Liu, Yang
Lau, Wing Cheong
Li, Rui
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2554 - 2560
[6] Multi-player Multi-armed Bandits: Decentralized Learning with IID Rewards
Kalathil, Dileep
Nayyar, Naumaan
Jain, Rahul
2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 853 - 860
[7] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
Lee, Kyungjae
Yang, Hongjun
Lim, Sungbin
Oh, Songhwai
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[8] On Kernelized Multi-armed Bandits
Chowdhury, Sayak Ray
Gopalan, Aditya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[9] Multi-armed Bandits with Compensation
Wang, Siwei
Huang, Longbo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[10] Regional Multi-Armed Bandits
Wang, Zhiyang
Zhou, Ruida
Shen, Cong
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84

← 1 2 3 4 5 →