Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

被引：51

作者：

Koulouriotis, D. E. ^{[1
]}

Xanthopoulos, A. ^{[1
]}

机构：

[1] Democritus Univ Thrace, Sch Engn, Dept Prod & Management Engn, Dragana, Greece

来源：

APPLIED MATHEMATICS AND COMPUTATION | 2008年 / 196卷 / 02期

关键词：

decision-making agents; action selection; exploration-exploitation; multi-armed bandit; genetic algorithms; reinforcement learning;

D O I：

10.1016/j.amc.2007.07.043

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation-exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods' performances. (C) 2007 Elsevier Inc. All rights reserved.

引用

页码：913 / 922

页数：10

共 50 条

[31] Contextual Multi-Armed Bandits for Non-Stationary Wireless Network Selection
Martinez, Lluis
Vidal, Josep
Cabrera-Bean, Margarita
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 285 - 290
[32] Finite budget analysis of multi-armed bandit problems
Xia, Yingce
Qin, Tao
Ding, Wenkui
Li, Haifang
Zhang, Xudong
Yu, Nenghai
Liu, Tie-Yan
NEUROCOMPUTING, 2017, 258 : 13 - 29
[33] A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems
D. E. Koulouriotis
A. Xanthopoulos
Operational Research, 2008, 8 (2) : 105 - 122
[34] The multi-armed bandit, with constraints
Eric V. Denardo
Eugene A. Feinberg
Uriel G. Rothblum
Annals of Operations Research, 2013, 208 : 37 - 62
[35] The multi-armed bandit, with constraints
Denardo, Eric V.
Feinberg, Eugene A.
Rothblum, Uriel G.
ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 37 - 62
[36] The Assistive Multi-Armed Bandit
Chan, Lawrence
Hadfield-Menell, Dylan
Srinivasa, Siddhartha
Dragan, Anca
HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 354 - 363
[37] Multi-armed bandit games
Gursoy, Kemal
ANNALS OF OPERATIONS RESEARCH, 2024,
[38] Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback
Wang, Siwei
Wang, Haoyun
Huang, Longbo
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10210 - 10217
[39] Adaptive Active Learning as a Multi-armed Bandit Problem
Czarnecki, Wojciech M.
Podolak, Igor T.
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990
[40] Improving multi-armed bandit algorithms in online pricing settings
Trovo, Francesco
Paladino, Stefano
Restelli, Marcello
Gatti, Nicola
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2018, 98 : 196 - 235

← 1 2 3 4 5 →