Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

被引:0
|
作者
Besbes, Omar [1 ]
Gur, Yonatan [2 ]
Zeevi, Assaf [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Stanford Univ, Stanford, CA 94305 USA
关键词
REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's objective is to maximize his cumulative expected earnings over some given horizon of play T. To do this, the gambler needs to acquire information about arms (exploration) while simultaneously optimizing immediate rewards (exploitation); the price paid due to this trade off is often referred to as the regret, and the main question is how small can this price be as a function of the horizon length T. This problem has been studied extensively when the reward distributions do not change over time; an assumption that supports a sharp characterization of the regret, yet is often violated in practical settings. In this paper, we focus on a MAB formulation which allows for a broad range of temporal uncertainties in the rewards, while still maintaining mathematical tractability. We fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret, and by establishing a connection between the adversarial and the stochastic MAB frameworks.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Thompson Sampling Based Multi-Armed-Bandit Mechanism Using Neural Networks
    Manisha, Padala
    Gujar, Sujit
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2111 - 2113
  • [22] PPAR: A Privacy-Preserving Adaptive Ranking Algorithm for Multi-Armed-Bandit Crowdsourcing
    Chen, Shuzhen
    Yu, Dongxiao
    Li, Feng
    Zou, Zongrui
    Liang, Weifa
    Cheng, Xiuzhen
    2022 IEEE/ACM 30TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2022,
  • [23] Multi-armed bandit with sub-exponential rewards
    Jia, Huiwen
    Shi, Cong
    Shen, Siqian
    OPERATIONS RESEARCH LETTERS, 2021, 49 (05) : 728 - 733
  • [24] Budget-limited multi-armed bandit problem with dynamic rewards and proposed algorithms
    Niimi, Makoto
    Ito, Takayuki
    2015 IIAI 4TH INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2015, : 540 - 545
  • [25] The Multi-Armed Bandit Problem under Delayed Rewards Conditions in Digital Campaign Management
    Martin, M.
    Jimenez-Martin, A.
    Mateos, A.
    2019 6TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT 2019), 2019, : 952 - 957
  • [26] The Multi-Armed Bandit With Stochastic Plays
    Lesage-Landry, Antoine
    Taylor, Joshua A.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286
  • [27] Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem
    Madhushani, Udari
    Leonard, Naomi Ehrich
    2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 3502 - 3507
  • [28] Inventory Routing Problem with Non-stationary Stochastic Demands
    Yadollahi, Ehsan
    Aghezzaf, El-Houssaine
    Walraevens, Joris
    Raa, Birger
    ICINCO: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL 2, 2019, : 658 - 665
  • [29] Bandit Convex Optimization in Non-stationary Environments
    Zhao, Peng
    Wang, Guanghui
    Zhang, Lijun
    Zhou, Zhi-Hua
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1508 - 1517
  • [30] Bandit Convex Optimization in Non-stationary Environments
    Zhao, Peng
    Wang, Guanghui
    Zhang, Lijun
    Zhou, Zhi-Hua
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22