Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

被引:51
|
作者
Koulouriotis, D. E. [1 ]
Xanthopoulos, A. [1 ]
机构
[1] Democritus Univ Thrace, Sch Engn, Dept Prod & Management Engn, Dragana, Greece
关键词
decision-making agents; action selection; exploration-exploitation; multi-armed bandit; genetic algorithms; reinforcement learning;
D O I
10.1016/j.amc.2007.07.043
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation-exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods' performances. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:913 / 922
页数:10
相关论文
共 50 条
  • [41] Distributed Learning in Multi-Armed Bandit With Multiple Players
    Liu, Keqin
    Zhao, Qing
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (11) : 5667 - 5681
  • [42] Transfer Learning in Multi-Armed Bandit: A Causal Approach
    Zhang, Junzhe
    Bareinboim, Elias
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1778 - 1780
  • [43] Multi-armed bandit algorithms over DASH for multihomed client
    Hodroj, Ali
    Ibrahim, Marc
    Hadjadj-Aoul, Yassine
    Sericola, Bruno
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2021, 37 (04) : 244 - 253
  • [44] Reconfigurable and Computationally Efficient Architecture for Multi-armed Bandit Algorithms
    Santosh, S. V. Sai
    Darak, S. J.
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [45] A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits
    Alami, Reda
    Mahfoud, Mohammed
    Achab, Mastane
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 272 - 280
  • [46] Contextual Multi-Armed Bandits for Non-Stationary Heterogeneous Mobile Edge Computing
    Wirth, Maximilian
    Ortiz, Andrea
    Klein, Anja
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5599 - 5604
  • [47] A Satisficing Strategy with Variable Reference in the Multi-armed Bandit Problems
    Kohno, Yu
    Takahashi, Tatsuji
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
  • [48] GAUSSIAN PROCESS MODELLING OF DEPENDENCIES IN MULTI-ARMED BANDIT PROBLEMS
    Dorard, Louis
    Glowacka, Dorota
    Shawe-Taylor, John
    PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH SOR 09, 2009, : 77 - 84
  • [49] Time-Varying Stochastic Multi-Armed Bandit Problems
    Vakili, Sattar
    Zhao, Qing
    Zhou, Yuan
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 2103 - 2107