A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps

被引:1
|
作者
Manome, Nobuhito [1 ,2 ]
Shinohara, Shuji [2 ]
Suzuki, Kouta [1 ,2 ]
Tomonaga, Kosuke [1 ,2 ]
Mitsuyoshi, Shunji [2 ]
机构
[1] SoftBank Robot Grp Corp, Tokyo, Japan
[2] Univ Tokyo, Tokyo, Japan
关键词
Multi-armed bandit problem; Self-organizing maps; Sequential decision making;
D O I
10.1007/978-3-030-30487-4_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the multitude of potential courses of action, communication robots designed to satisfy the users facing them must take appropriate action more rapidly. In practice however, user requests often change while these robots are determining the most appropriate actions for these users. Therefore, it is difficult for robots to derive an appropriate course of action. This issue has been formalized as the "multi-armed bandit (MAB) problem." The MAB problem points to an environment featuring multiple levers (arms) where pulling an arm has a certain probability of yielding a reward; the issue is to determine how to select the levers to pull to maximize the rewards gained. To solve this problem, we considered a new MAB problem algorithm using self-organizing maps that is adaptable to stationary and non-stationary environments. For this paper, numerous experiments were conducted considering a stochastic MAB problem in both stationary and non-stationary environments. As a result, we determined that the proposed algorithm demonstrated equivalent or improved capability in stationary environments with numerous arms and consistently strong effectiveness in a non-stationary environment compared to the existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.
引用
收藏
页码:529 / 540
页数:12
相关论文
共 50 条
  • [21] Stochastic Multi-Armed Bandits with Non-Stationary Rewards Generated by a Linear Dynamical System
    Gornet, Jonathan
    Hosseinzadeh, Mehdi
    Sinopoli, Bruno
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 1460 - 1465
  • [22] On Effectiveness of the Mirror Decent Algorithm for a Stochastic Multi-Armed Bandit Governed by a Stationary Finite Markov Chain
    Nazin, Alexander
    Miller, Boris
    2013 3RD AUSTRALIAN CONTROL CONFERENCE (AUCC), 2013, : 244 - 250
  • [23] Differential Evolution Algorithm Applied to Non-Stationary Bandit Problem
    St-Pierre, David L.
    Liu, Jialin
    2014 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2014, : 2397 - 2403
  • [24] Reinforcement learning algorithm for non-stationary environments
    Padakandla, Sindhu
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    APPLIED INTELLIGENCE, 2020, 50 (11) : 3590 - 3606
  • [25] Reinforcement learning algorithm for non-stationary environments
    Sindhu Padakandla
    Prabuchandran K. J.
    Shalabh Bhatnagar
    Applied Intelligence, 2020, 50 : 3590 - 3606
  • [26] SELF-ORGANIZING MAPS - STATIONARY STATES, METASTABILITY AND CONVERGENCE RATE
    ERWIN, E
    OBERMAYER, K
    SCHULTEN, K
    BIOLOGICAL CYBERNETICS, 1992, 67 (01) : 35 - 45
  • [27] A Multi-Agent Based Evolutionary Algorithm in Non-stationary Environments
    Yan, Yang
    Wang, Hongfeng
    Wang, Dingwei
    Yang, Shengxiang
    Wang, Dazhi
    2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 2967 - +
  • [28] Piecewise-Stationary Multi-Objective Multi-Armed Bandit With Application to Joint Communications and Sensing
    Balef, Amir Rezaei
    Maghsudi, Setareh
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2023, 12 (05) : 809 - 813
  • [29] A Change-Detection Based Framework for Piecewise-Stationary Multi-Armed Bandit Problem
    Liu, Fang
    Lee, Joohyun
    Shroff, Ness
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3651 - 3658
  • [30] Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers
    Wang, Yaping
    Peng, Zhicheng
    Zhang, Riquan
    Xiao, Qian
    STATISTICAL THEORY AND RELATED FIELDS, 2021, 5 (02) : 122 - 133