A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps

被引:1
|
作者
Manome, Nobuhito [1 ,2 ]
Shinohara, Shuji [2 ]
Suzuki, Kouta [1 ,2 ]
Tomonaga, Kosuke [1 ,2 ]
Mitsuyoshi, Shunji [2 ]
机构
[1] SoftBank Robot Grp Corp, Tokyo, Japan
[2] Univ Tokyo, Tokyo, Japan
关键词
Multi-armed bandit problem; Self-organizing maps; Sequential decision making;
D O I
10.1007/978-3-030-30487-4_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the multitude of potential courses of action, communication robots designed to satisfy the users facing them must take appropriate action more rapidly. In practice however, user requests often change while these robots are determining the most appropriate actions for these users. Therefore, it is difficult for robots to derive an appropriate course of action. This issue has been formalized as the "multi-armed bandit (MAB) problem." The MAB problem points to an environment featuring multiple levers (arms) where pulling an arm has a certain probability of yielding a reward; the issue is to determine how to select the levers to pull to maximize the rewards gained. To solve this problem, we considered a new MAB problem algorithm using self-organizing maps that is adaptable to stationary and non-stationary environments. For this paper, numerous experiments were conducted considering a stochastic MAB problem in both stationary and non-stationary environments. As a result, we determined that the proposed algorithm demonstrated equivalent or improved capability in stationary environments with numerous arms and consistently strong effectiveness in a non-stationary environment compared to the existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.
引用
收藏
页码:529 / 540
页数:12
相关论文
共 50 条
  • [1] The non-stationary stochastic multi-armed bandit problem
    Allesiardo R.
    Féraud R.
    Maillard O.-A.
    Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
  • [2] Contextual Multi-Armed Bandit With Costly Feature Observation in Non-Stationary Environments
    Ghoorchian, Saeed
    Kortukov, Evgenii
    Maghsudi, Setareh
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 820 - 830
  • [3] LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
    de Curto, J.
    de Zarza, I.
    Roig, Gemma
    Cano, Juan Carlos
    Manzoni, Pietro
    Calafate, Carlos T.
    ELECTRONICS, 2023, 12 (13)
  • [4] DYNAMIC SPECTRUM ACCESS WITH NON-STATIONARY MULTI-ARMED BANDIT
    Alaya-Feki, Afef Ben Hadj
    Moulines, Eric
    LeCornec, Alain
    2008 IEEE 9TH WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, VOLS 1 AND 2, 2008, : 416 - 420
  • [5] A robust and flexible model of hierarchical self-organizing maps for non-stationary environments
    Salas, R.
    Moreno, S.
    Allende, H.
    Moraga, C.
    NEUROCOMPUTING, 2007, 70 (16-18) : 2744 - 2757
  • [6] On self-organizing maps learning with high adaptability under non-stationary environments
    Isokawa, Teijiro
    Iwatani, Kenji
    Ohtsuka, Akitsugu
    Kamiura, Naotake
    Matsui, Nobuyuki
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 2218 - +
  • [7] Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
    Koulouriotis, D. E.
    Xanthopoulos, A.
    APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) : 913 - 922
  • [9] Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings
    Bonnefoi, Remi
    Besson, Lilian
    Moy, Christophe
    Kaufmann, Emilie
    Palicot, Jacques
    COGNITIVE RADIO ORIENTED WIRELESS NETWORKS, 2018, 228 : 173 - 185
  • [10] Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
    Besbes, Omar
    Gur, Yonatan
    Zeevi, Assaf
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27