Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

被引：51

作者：

Koulouriotis, D. E. ^{[1
]}

Xanthopoulos, A. ^{[1
]}

机构：

[1] Democritus Univ Thrace, Sch Engn, Dept Prod & Management Engn, Dragana, Greece

来源：

APPLIED MATHEMATICS AND COMPUTATION | 2008年 / 196卷 / 02期

关键词：

decision-making agents; action selection; exploration-exploitation; multi-armed bandit; genetic algorithms; reinforcement learning;

D O I：

10.1016/j.amc.2007.07.043

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation-exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods' performances. (C) 2007 Elsevier Inc. All rights reserved.

引用

页码：913 / 922

页数：10

共 50 条

[21] Percentile optimization in multi-armed bandit problems
Ghatrani, Zahra
Ghate, Archis
ANNALS OF OPERATIONS RESEARCH, 2024, 340 (2-3) : 837 - 862
[22] Ambiguity aversion in multi-armed bandit problems
Anderson, Christopher M.
THEORY AND DECISION, 2012, 72 (01) : 15 - 33
[23] Multi-armed Bandit Problems with Strategic Arms
Braverman, Mark
Mao, Jieming
Schneider, Jon
Weinberg, S. Matthew
CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[24] Ambiguity aversion in multi-armed bandit problems
Christopher M. Anderson
Theory and Decision, 2012, 72 : 15 - 33
[25] Smart topology detection using multi-armed bandit reinforcement learning method
Sonmez, Ferda Ozdemir
Hankin, Chris
Malacaria, Pasquale
INFORMATION SECURITY JOURNAL, 2024,
[26] Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
Besbes, Omar
Gur, Yonatan
Zeevi, Assaf
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[27] MABFuzz: Multi-Armed Bandit Algorithms for Fuzzing Processors
Gohil, Vasudev
Kande, Rahul
Chen, Chen
Sadeghi, Ahmad-Reza
Rajendran, Jeyavijayan
2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
[28] Fair Link Prediction with Multi-Armed Bandit Algorithms
Wang, Weixiang
Soundarajan, Sucheta
PROCEEDINGS OF THE 15TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2023, 2023, : 219 - 228
[29] Online Optimization Algorithms for Multi-Armed Bandit Problem
Kamalov, Mikhail
Dobrynin, Vladimir
Balykina, Yulia
2017 CONSTRUCTIVE NONSMOOTH ANALYSIS AND RELATED TOPICS (DEDICATED TO THE MEMORY OF V.F. DEMYANOV) (CNSA), 2017, : 141 - 143
[30] Gaussian multi-armed bandit problems with multiple objectives
Reverdy, Paul
2016 AMERICAN CONTROL CONFERENCE (ACC), 2016, : 5263 - 5269

← 1 2 3 4 5 →