Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

被引：0

作者：

Besbes, Omar ^{[1
]}

Gur, Yonatan ^{[2
]}

Zeevi, Assaf ^{[1
]}

机构：

[1] Columbia Univ, New York, NY 10027 USA

[2] Stanford Univ, Stanford, CA 94305 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷

关键词：

REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's objective is to maximize his cumulative expected earnings over some given horizon of play T. To do this, the gambler needs to acquire information about arms (exploration) while simultaneously optimizing immediate rewards (exploitation); the price paid due to this trade off is often referred to as the regret, and the main question is how small can this price be as a function of the horizon length T. This problem has been studied extensively when the reward distributions do not change over time; an assumption that supports a sharp characterization of the regret, yet is often violated in practical settings. In this paper, we focus on a MAB formulation which allows for a broad range of temporal uncertainties in the rewards, while still maintaining mathematical tractability. We fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret, and by establishing a connection between the adversarial and the stochastic MAB frameworks.

引用

页数：9

共 50 条

[21] Thompson Sampling Based Multi-Armed-Bandit Mechanism Using Neural Networks
Manisha, Padala
Gujar, Sujit
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2111 - 2113
[22] PPAR: A Privacy-Preserving Adaptive Ranking Algorithm for Multi-Armed-Bandit Crowdsourcing
Chen, Shuzhen
Yu, Dongxiao
Li, Feng
Zou, Zongrui
Liang, Weifa
Cheng, Xiuzhen
2022 IEEE/ACM 30TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2022,
[23] Multi-armed bandit with sub-exponential rewards
Jia, Huiwen
Shi, Cong
Shen, Siqian
OPERATIONS RESEARCH LETTERS, 2021, 49 (05) : 728 - 733
[24] Budget-limited multi-armed bandit problem with dynamic rewards and proposed algorithms
Niimi, Makoto
Ito, Takayuki
2015 IIAI 4TH INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2015, : 540 - 545
[25] The Multi-Armed Bandit Problem under Delayed Rewards Conditions in Digital Campaign Management
Martin, M.
Jimenez-Martin, A.
Mateos, A.
2019 6TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT 2019), 2019, : 952 - 957
[26] The Multi-Armed Bandit With Stochastic Plays
Lesage-Landry, Antoine
Taylor, Joshua A.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286
[27] Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem
Madhushani, Udari
Leonard, Naomi Ehrich
2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 3502 - 3507
[28] Inventory Routing Problem with Non-stationary Stochastic Demands
Yadollahi, Ehsan
Aghezzaf, El-Houssaine
Walraevens, Joris
Raa, Birger
ICINCO: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL 2, 2019, : 658 - 665
[29] Bandit Convex Optimization in Non-stationary Environments
Zhao, Peng
Wang, Guanghui
Zhang, Lijun
Zhou, Zhi-Hua
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1508 - 1517
[30] Bandit Convex Optimization in Non-stationary Environments
Zhao, Peng
Wang, Guanghui
Zhang, Lijun
Zhou, Zhi-Hua
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22

← 1 2 3 4 5 →