A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps

被引：1

作者：

Manome, Nobuhito ^{[1
,2
]}

Shinohara, Shuji ^{[2
]}

Suzuki, Kouta ^{[1
,2
]}

Tomonaga, Kosuke ^{[1
,2
]}

Mitsuyoshi, Shunji ^{[2
]}

机构：

[1] SoftBank Robot Grp Corp, Tokyo, Japan

[2] Univ Tokyo, Tokyo, Japan

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I | 2019年 / 11727卷

关键词：

Multi-armed bandit problem; Self-organizing maps; Sequential decision making;

D O I：

10.1007/978-3-030-30487-4_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the multitude of potential courses of action, communication robots designed to satisfy the users facing them must take appropriate action more rapidly. In practice however, user requests often change while these robots are determining the most appropriate actions for these users. Therefore, it is difficult for robots to derive an appropriate course of action. This issue has been formalized as the "multi-armed bandit (MAB) problem." The MAB problem points to an environment featuring multiple levers (arms) where pulling an arm has a certain probability of yielding a reward; the issue is to determine how to select the levers to pull to maximize the rewards gained. To solve this problem, we considered a new MAB problem algorithm using self-organizing maps that is adaptable to stationary and non-stationary environments. For this paper, numerous experiments were conducted considering a stochastic MAB problem in both stationary and non-stationary environments. As a result, we determined that the proposed algorithm demonstrated equivalent or improved capability in stationary environments with numerous arms and consistently strong effectiveness in a non-stationary environment compared to the existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.

引用

页码：529 / 540

页数：12

共 50 条

[1] The non-stationary stochastic multi-armed bandit problem
Allesiardo R.
Féraud R.
Maillard O.-A.
Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
[2] Contextual Multi-Armed Bandit With Costly Feature Observation in Non-Stationary Environments
Ghoorchian, Saeed
Kortukov, Evgenii
Maghsudi, Setareh
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 820 - 830
[3] LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
de Curto, J.
de Zarza, I.
Roig, Gemma
Cano, Juan Carlos
Manzoni, Pietro
Calafate, Carlos T.
ELECTRONICS, 2023, 12 (13)
[4] DYNAMIC SPECTRUM ACCESS WITH NON-STATIONARY MULTI-ARMED BANDIT
Alaya-Feki, Afef Ben Hadj
Moulines, Eric
LeCornec, Alain
2008 IEEE 9TH WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, VOLS 1 AND 2, 2008, : 416 - 420
[5] A robust and flexible model of hierarchical self-organizing maps for non-stationary environments
Salas, R.
Moreno, S.
Allende, H.
Moraga, C.
NEUROCOMPUTING, 2007, 70 (16-18) : 2744 - 2757
[6] On self-organizing maps learning with high adaptability under non-stationary environments
Isokawa, Teijiro
Iwatani, Kenji
Ohtsuka, Akitsugu
Kamiura, Naotake
Matsui, Nobuyuki
2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 2218 - +
[7] Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
Koulouriotis, D. E.
Xanthopoulos, A.
APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) : 913 - 922
[8] Non-stationary stochastic multi-armed bandit problems with external information on stationarity
Namba H.
Transactions of the Japanese Society for Artificial Intelligence, 2021, 36 (03) : D - K84_1
[9] Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings
Bonnefoi, Remi
Besson, Lilian
Moy, Christophe
Kaufmann, Emilie
Palicot, Jacques
COGNITIVE RADIO ORIENTED WIRELESS NETWORKS, 2018, 228 : 173 - 185
[10] Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
Besbes, Omar
Gur, Yonatan
Zeevi, Assaf
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27

← 1 2 3 4 5 →