A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps

被引：1

作者：

Manome, Nobuhito ^{[1
,2
]}

Shinohara, Shuji ^{[2
]}

Suzuki, Kouta ^{[1
,2
]}

Tomonaga, Kosuke ^{[1
,2
]}

Mitsuyoshi, Shunji ^{[2
]}

机构：

[1] SoftBank Robot Grp Corp, Tokyo, Japan

[2] Univ Tokyo, Tokyo, Japan

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I | 2019年 / 11727卷

关键词：

Multi-armed bandit problem; Self-organizing maps; Sequential decision making;

D O I：

10.1007/978-3-030-30487-4_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the multitude of potential courses of action, communication robots designed to satisfy the users facing them must take appropriate action more rapidly. In practice however, user requests often change while these robots are determining the most appropriate actions for these users. Therefore, it is difficult for robots to derive an appropriate course of action. This issue has been formalized as the "multi-armed bandit (MAB) problem." The MAB problem points to an environment featuring multiple levers (arms) where pulling an arm has a certain probability of yielding a reward; the issue is to determine how to select the levers to pull to maximize the rewards gained. To solve this problem, we considered a new MAB problem algorithm using self-organizing maps that is adaptable to stationary and non-stationary environments. For this paper, numerous experiments were conducted considering a stochastic MAB problem in both stationary and non-stationary environments. As a result, we determined that the proposed algorithm demonstrated equivalent or improved capability in stationary environments with numerous arms and consistently strong effectiveness in a non-stationary environment compared to the existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.

引用

页码：529 / 540

页数：12

共 50 条

[21] Stochastic Multi-Armed Bandits with Non-Stationary Rewards Generated by a Linear Dynamical System
Gornet, Jonathan
Hosseinzadeh, Mehdi
Sinopoli, Bruno
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 1460 - 1465
[22] On Effectiveness of the Mirror Decent Algorithm for a Stochastic Multi-Armed Bandit Governed by a Stationary Finite Markov Chain
Nazin, Alexander
Miller, Boris
2013 3RD AUSTRALIAN CONTROL CONFERENCE (AUCC), 2013, : 244 - 250
[23] Differential Evolution Algorithm Applied to Non-Stationary Bandit Problem
St-Pierre, David L.
Liu, Jialin
2014 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2014, : 2397 - 2403
[24] Reinforcement learning algorithm for non-stationary environments
Padakandla, Sindhu
Prabuchandran, K. J.
Bhatnagar, Shalabh
APPLIED INTELLIGENCE, 2020, 50 (11) : 3590 - 3606
[25] Reinforcement learning algorithm for non-stationary environments
Sindhu Padakandla
Prabuchandran K. J.
Shalabh Bhatnagar
Applied Intelligence, 2020, 50 : 3590 - 3606
[26] SELF-ORGANIZING MAPS - STATIONARY STATES, METASTABILITY AND CONVERGENCE RATE
ERWIN, E
OBERMAYER, K
SCHULTEN, K
BIOLOGICAL CYBERNETICS, 1992, 67 (01) : 35 - 45
[27] A Multi-Agent Based Evolutionary Algorithm in Non-stationary Environments
Yan, Yang
Wang, Hongfeng
Wang, Dingwei
Yang, Shengxiang
Wang, Dazhi
2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 2967 - +
[28] Piecewise-Stationary Multi-Objective Multi-Armed Bandit With Application to Joint Communications and Sensing
Balef, Amir Rezaei
Maghsudi, Setareh
IEEE WIRELESS COMMUNICATIONS LETTERS, 2023, 12 (05) : 809 - 813
[29] A Change-Detection Based Framework for Piecewise-Stationary Multi-Armed Bandit Problem
Liu, Fang
Lee, Joohyun
Shroff, Ness
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3651 - 3658
[30] Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers
Wang, Yaping
Peng, Zhicheng
Zhang, Riquan
Xiao, Qian
STATISTICAL THEORY AND RELATED FIELDS, 2021, 5 (02) : 122 - 133

← 1 2 3 4 5 →