Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

被引:51
|
作者
Koulouriotis, D. E. [1 ]
Xanthopoulos, A. [1 ]
机构
[1] Democritus Univ Thrace, Sch Engn, Dept Prod & Management Engn, Dragana, Greece
关键词
decision-making agents; action selection; exploration-exploitation; multi-armed bandit; genetic algorithms; reinforcement learning;
D O I
10.1016/j.amc.2007.07.043
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation-exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods' performances. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:913 / 922
页数:10
相关论文
共 50 条
  • [21] Percentile optimization in multi-armed bandit problems
    Ghatrani, Zahra
    Ghate, Archis
    ANNALS OF OPERATIONS RESEARCH, 2024, 340 (2-3) : 837 - 862
  • [22] Ambiguity aversion in multi-armed bandit problems
    Anderson, Christopher M.
    THEORY AND DECISION, 2012, 72 (01) : 15 - 33
  • [23] Multi-armed Bandit Problems with Strategic Arms
    Braverman, Mark
    Mao, Jieming
    Schneider, Jon
    Weinberg, S. Matthew
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [24] Ambiguity aversion in multi-armed bandit problems
    Christopher M. Anderson
    Theory and Decision, 2012, 72 : 15 - 33
  • [25] Smart topology detection using multi-armed bandit reinforcement learning method
    Sonmez, Ferda Ozdemir
    Hankin, Chris
    Malacaria, Pasquale
    INFORMATION SECURITY JOURNAL, 2024,
  • [26] Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
    Besbes, Omar
    Gur, Yonatan
    Zeevi, Assaf
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [27] MABFuzz: Multi-Armed Bandit Algorithms for Fuzzing Processors
    Gohil, Vasudev
    Kande, Rahul
    Chen, Chen
    Sadeghi, Ahmad-Reza
    Rajendran, Jeyavijayan
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [28] Fair Link Prediction with Multi-Armed Bandit Algorithms
    Wang, Weixiang
    Soundarajan, Sucheta
    PROCEEDINGS OF THE 15TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2023, 2023, : 219 - 228
  • [29] Online Optimization Algorithms for Multi-Armed Bandit Problem
    Kamalov, Mikhail
    Dobrynin, Vladimir
    Balykina, Yulia
    2017 CONSTRUCTIVE NONSMOOTH ANALYSIS AND RELATED TOPICS (DEDICATED TO THE MEMORY OF V.F. DEMYANOV) (CNSA), 2017, : 141 - 143
  • [30] Gaussian multi-armed bandit problems with multiple objectives
    Reverdy, Paul
    2016 AMERICAN CONTROL CONFERENCE (ACC), 2016, : 5263 - 5269