Arm Space Decomposition as a Strategy for Tackling Large Scale Multi-Armed Bandit Problems

被引:0
|
作者
Gupta, Neha [1 ]
Granmo, Ole-Christoffer [2 ]
Agrawala, Ashok [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Univ Agder, Grimstad, Norway
关键词
GAMES;
D O I
10.1109/ICMLA.2013.51
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent multi-armed bandit based optimization schemes provide near-optimal balancing of arm exploration against arm exploitation, allowing the optimal arm to be identified with probability arbitrarily close to unity. However, the convergence speed drops dramatically as the number of bandit arms grows large, simply because singling out the optimal arm requires experimentation with all of the available arms. Furthermore, effective exploration and exploitation typically demands computational resources that grow linearly with the number of arms. Although the former problem can be remedied to some degree when prior knowledge about arm correlation is available, the latter problem persists. In this paper we propose a Thompson Sampling (TS) based scheme for exploring an arm space of size K by decomposing it into two separate arm spaces, each of size root K, thus achieving sub-linear scalability. In brief, two dedicated Thompson Samplers explore each arm space separately. However, at each iteration, arm selection feedback is obtained by jointly considering the arms selected by each of the Thompson Samplers, mapping them into the original arm space. This kind of decentralized decision-making can be modeled as a game theory problem, where two independent decision makers interact in terms of a common pay-off game. Our scheme requires no communication between the decision makers, who have complete autonomy over their actions. Thus it is ideal for coordinating autonomous agents in a multi-agent system. Extensive experiments, including instances possessing multiple Nash equilibria, demonstrate remarkable performance benefits. Although TS based schemes already are among the top-performing bandit players, our proposed arm space decomposition scheme provide drastic improvements for large arm spaces, not only in terms of processing speed and memory usage, but also in terms of an improved ability to identify the optimal arm, increasing with the number of bandit arms.
引用
收藏
页码:252 / 257
页数:6
相关论文
共 50 条
  • [21] The Assistive Multi-Armed Bandit
    Chan, Lawrence
    Hadfield-Menell, Dylan
    Srinivasa, Siddhartha
    Dragan, Anca
    HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 354 - 363
  • [22] Multi-armed bandit games
    Gursoy, Kemal
    ANNALS OF OPERATIONS RESEARCH, 2024,
  • [23] A Multi-Armed Bandit Selection Strategy for Hyper-heuristics
    Ferreira, Alexandre Silvestre
    Goncalves, Richard Aderbal
    Pozo, Aurora
    2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 525 - 532
  • [24] GAUSSIAN PROCESS MODELLING OF DEPENDENCIES IN MULTI-ARMED BANDIT PROBLEMS
    Dorard, Louis
    Glowacka, Dorota
    Shawe-Taylor, John
    PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH SOR 09, 2009, : 77 - 84
  • [25] Time-Varying Stochastic Multi-Armed Bandit Problems
    Vakili, Sattar
    Zhao, Qing
    Zhou, Yuan
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 2103 - 2107
  • [26] On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models
    Kaufmann, Emilie
    Cappe, Olivier
    Garivier, Aurelien
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [27] Synchronization and optimality for multi-armed bandit problems in continuous time
    ElKaroui, N
    Karatzas, I
    COMPUTATIONAL & APPLIED MATHEMATICS, 1997, 16 (02): : 117 - 151
  • [28] Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
    Vakili, Sattar
    Liu, Keqin
    Zhao, Qing
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) : 759 - 767
  • [29] The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems
    Evirgen, Noyan
    Kose, Alper
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 331 - 336
  • [30] On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems
    Kim, Baekjin
    Tewari, Ambuj
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32