Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

被引:0
|
作者
Besbes, Omar [1 ]
Gur, Yonatan [2 ]
Zeevi, Assaf [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Stanford Univ, Stanford, CA 94305 USA
关键词
REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's objective is to maximize his cumulative expected earnings over some given horizon of play T. To do this, the gambler needs to acquire information about arms (exploration) while simultaneously optimizing immediate rewards (exploitation); the price paid due to this trade off is often referred to as the regret, and the main question is how small can this price be as a function of the horizon length T. This problem has been studied extensively when the reward distributions do not change over time; an assumption that supports a sharp characterization of the regret, yet is often violated in practical settings. In this paper, we focus on a MAB formulation which allows for a broad range of temporal uncertainties in the rewards, while still maintaining mathematical tractability. We fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret, and by establishing a connection between the adversarial and the stochastic MAB frameworks.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] The non-stationary stochastic multi-armed bandit problem
    Allesiardo R.
    Féraud R.
    Maillard O.-A.
    Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
  • [3] Stochastic Multi-Armed Bandits with Non-Stationary Rewards Generated by a Linear Dynamical System
    Gornet, Jonathan
    Hosseinzadeh, Mehdi
    Sinopoli, Bruno
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 1460 - 1465
  • [4] DYNAMIC SPECTRUM ACCESS WITH NON-STATIONARY MULTI-ARMED BANDIT
    Alaya-Feki, Afef Ben Hadj
    Moulines, Eric
    LeCornec, Alain
    2008 IEEE 9TH WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, VOLS 1 AND 2, 2008, : 416 - 420
  • [5] On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards
    Gai, Yi
    Krishnamachari, Bhaskar
    Liu, Mingyan
    2011 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE (GLOBECOM 2011), 2011,
  • [6] Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
    Koulouriotis, D. E.
    Xanthopoulos, A.
    APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) : 913 - 922
  • [7] Contextual Multi-Armed Bandit With Costly Feature Observation in Non-Stationary Environments
    Ghoorchian, Saeed
    Kortukov, Evgenii
    Maghsudi, Setareh
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 820 - 830
  • [8] LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
    de Curto, J.
    de Zarza, I.
    Roig, Gemma
    Cano, Juan Carlos
    Manzoni, Pietro
    Calafate, Carlos T.
    ELECTRONICS, 2023, 12 (13)
  • [9] Reward Attack on Stochastic Bandits with Non-stationary Rewards
    Yang, Chenye
    Liu, Guanlin
    Lai, Lifeng
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1387 - 1393
  • [10] Stochastic Multi-path Routing Problem with Non-stationary Rewards: Building PayU's Dynamic Routing
    Trivedi, Pankaj
    Singh, Arvind
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1707 - 1712