Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

被引:51
|
作者
Koulouriotis, D. E. [1 ]
Xanthopoulos, A. [1 ]
机构
[1] Democritus Univ Thrace, Sch Engn, Dept Prod & Management Engn, Dragana, Greece
关键词
decision-making agents; action selection; exploration-exploitation; multi-armed bandit; genetic algorithms; reinforcement learning;
D O I
10.1016/j.amc.2007.07.043
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation-exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods' performances. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:913 / 922
页数:10
相关论文
共 50 条
  • [31] Contextual Multi-Armed Bandits for Non-Stationary Wireless Network Selection
    Martinez, Lluis
    Vidal, Josep
    Cabrera-Bean, Margarita
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 285 - 290
  • [32] Finite budget analysis of multi-armed bandit problems
    Xia, Yingce
    Qin, Tao
    Ding, Wenkui
    Li, Haifang
    Zhang, Xudong
    Yu, Nenghai
    Liu, Tie-Yan
    NEUROCOMPUTING, 2017, 258 : 13 - 29
  • [33] A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems
    D. E. Koulouriotis
    A. Xanthopoulos
    Operational Research, 2008, 8 (2) : 105 - 122
  • [34] The multi-armed bandit, with constraints
    Eric V. Denardo
    Eugene A. Feinberg
    Uriel G. Rothblum
    Annals of Operations Research, 2013, 208 : 37 - 62
  • [35] The multi-armed bandit, with constraints
    Denardo, Eric V.
    Feinberg, Eugene A.
    Rothblum, Uriel G.
    ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 37 - 62
  • [36] The Assistive Multi-Armed Bandit
    Chan, Lawrence
    Hadfield-Menell, Dylan
    Srinivasa, Siddhartha
    Dragan, Anca
    HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 354 - 363
  • [37] Multi-armed bandit games
    Gursoy, Kemal
    ANNALS OF OPERATIONS RESEARCH, 2024,
  • [38] Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback
    Wang, Siwei
    Wang, Haoyun
    Huang, Longbo
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10210 - 10217
  • [39] Adaptive Active Learning as a Multi-armed Bandit Problem
    Czarnecki, Wojciech M.
    Podolak, Igor T.
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990
  • [40] Improving multi-armed bandit algorithms in online pricing settings
    Trovo, Francesco
    Paladino, Stefano
    Restelli, Marcello
    Gatti, Nicola
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2018, 98 : 196 - 235