A Satisficing Strategy with Variable Reference in the Multi-armed Bandit Problems

被引:0
|
作者
Kohno, Yu [1 ]
Takahashi, Tatsuji [2 ]
机构
[1] Tokyo Denki Univ, Grad Sch Adv Sci & Technol, Hiki, Saitama 3500394, Japan
[2] Tokyo Denki Univ, Hiki, Saitama 3500394, Japan
关键词
Symmetric reasoning; decision-making; N armed bandit problem; speed-accuracy trade-off;
D O I
10.1063/1.4912815
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The loosely symmetric model (LS) is as a subjective probability model that came from human beings' cognitive characteristics. To suggest a value to apply human beings' cognitive characteristics, we developed a value function "loosely symmetric model with variable reference" (LS-aVR) that expanded LS in the decision-amaking. It is important how get a reference value having an agent from environment to determine whether an algorithm using LS-aVR explores in comparison with a reference value. In this study, we proposed using statistical knowledge in an online method to acquire a reference value. Therefore we succeeded in making the result that new method exceeded a superior existing model in the multi-aarmed banded problem that is a kind of decision-amaking problems.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Arm Space Decomposition as a Strategy for Tackling Large Scale Multi-Armed Bandit Problems
    Gupta, Neha
    Granmo, Ole-Christoffer
    Agrawala, Ashok
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 252 - 257
  • [22] A Multi-Armed Bandit Selection Strategy for Hyper-heuristics
    Ferreira, Alexandre Silvestre
    Goncalves, Richard Aderbal
    Pozo, Aurora
    2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 525 - 532
  • [23] GAUSSIAN PROCESS MODELLING OF DEPENDENCIES IN MULTI-ARMED BANDIT PROBLEMS
    Dorard, Louis
    Glowacka, Dorota
    Shawe-Taylor, John
    PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH SOR 09, 2009, : 77 - 84
  • [24] Time-Varying Stochastic Multi-Armed Bandit Problems
    Vakili, Sattar
    Zhao, Qing
    Zhou, Yuan
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 2103 - 2107
  • [25] Synchronization and optimality for multi-armed bandit problems in continuous time
    ElKaroui, N
    Karatzas, I
    COMPUTATIONAL & APPLIED MATHEMATICS, 1997, 16 (02): : 117 - 151
  • [26] Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
    Vakili, Sattar
    Liu, Keqin
    Zhao, Qing
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) : 759 - 767
  • [27] The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems
    Evirgen, Noyan
    Kose, Alper
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 331 - 336
  • [28] On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems
    Kim, Baekjin
    Tewari, Ambuj
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [29] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
    Bubeck, Sebastien
    Cesa-Bianchi, Nicolo
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01): : 1 - 122
  • [30] Dynamic Multi-Armed Bandit with Covariates
    Pavlidis, Nicos G.
    Tasoulis, Dimitris K.
    Adams, Niall M.
    Hand, David J.
    ECAI 2008, PROCEEDINGS, 2008, 178 : 777 - +