Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

被引：51

作者：

Koulouriotis, D. E. ^{[1
]}

Xanthopoulos, A. ^{[1
]}

机构：

[1] Democritus Univ Thrace, Sch Engn, Dept Prod & Management Engn, Dragana, Greece

来源：

APPLIED MATHEMATICS AND COMPUTATION | 2008年 / 196卷 / 02期

关键词：

decision-making agents; action selection; exploration-exploitation; multi-armed bandit; genetic algorithms; reinforcement learning;

D O I：

10.1016/j.amc.2007.07.043

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation-exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods' performances. (C) 2007 Elsevier Inc. All rights reserved.

引用

页码：913 / 922

页数：10

共 50 条

[41] Distributed Learning in Multi-Armed Bandit With Multiple Players
Liu, Keqin
Zhao, Qing
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (11) : 5667 - 5681
[42] Transfer Learning in Multi-Armed Bandit: A Causal Approach
Zhang, Junzhe
Bareinboim, Elias
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1778 - 1780
[43] Multi-armed bandit algorithms over DASH for multihomed client
Hodroj, Ali
Ibrahim, Marc
Hadjadj-Aoul, Yassine
Sericola, Bruno
INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2021, 37 (04) : 244 - 253
[44] Reconfigurable and Computationally Efficient Architecture for Multi-armed Bandit Algorithms
Santosh, S. V. Sai
Darak, S. J.
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[45] A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits
Alami, Reda
Mahfoud, Mohammed
Achab, Mastane
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 272 - 280
[46] Contextual Multi-Armed Bandits for Non-Stationary Heterogeneous Mobile Edge Computing
Wirth, Maximilian
Ortiz, Andrea
Klein, Anja
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5599 - 5604
[47] A Satisficing Strategy with Variable Reference in the Multi-armed Bandit Problems
Kohno, Yu
Takahashi, Tatsuji
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
[48] GAUSSIAN PROCESS MODELLING OF DEPENDENCIES IN MULTI-ARMED BANDIT PROBLEMS
Dorard, Louis
Glowacka, Dorota
Shawe-Taylor, John
PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH SOR 09, 2009, : 77 - 84
[49] Time-Varying Stochastic Multi-Armed Bandit Problems
Vakili, Sattar
Zhao, Qing
Zhou, Yuan
CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 2103 - 2107
[50] An asymptotically optimal strategy for constrained multi-armed bandit problems
Chang, Hyeong Soo
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2020, 91 (03) : 545 - 557

← 1 2 3 4 5 →