Arm Space Decomposition as a Strategy for Tackling Large Scale Multi-Armed Bandit Problems

被引:0
|
作者
Gupta, Neha [1 ]
Granmo, Ole-Christoffer [2 ]
Agrawala, Ashok [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Univ Agder, Grimstad, Norway
关键词
GAMES;
D O I
10.1109/ICMLA.2013.51
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent multi-armed bandit based optimization schemes provide near-optimal balancing of arm exploration against arm exploitation, allowing the optimal arm to be identified with probability arbitrarily close to unity. However, the convergence speed drops dramatically as the number of bandit arms grows large, simply because singling out the optimal arm requires experimentation with all of the available arms. Furthermore, effective exploration and exploitation typically demands computational resources that grow linearly with the number of arms. Although the former problem can be remedied to some degree when prior knowledge about arm correlation is available, the latter problem persists. In this paper we propose a Thompson Sampling (TS) based scheme for exploring an arm space of size K by decomposing it into two separate arm spaces, each of size root K, thus achieving sub-linear scalability. In brief, two dedicated Thompson Samplers explore each arm space separately. However, at each iteration, arm selection feedback is obtained by jointly considering the arms selected by each of the Thompson Samplers, mapping them into the original arm space. This kind of decentralized decision-making can be modeled as a game theory problem, where two independent decision makers interact in terms of a common pay-off game. Our scheme requires no communication between the decision makers, who have complete autonomy over their actions. Thus it is ideal for coordinating autonomous agents in a multi-agent system. Extensive experiments, including instances possessing multiple Nash equilibria, demonstrate remarkable performance benefits. Although TS based schemes already are among the top-performing bandit players, our proposed arm space decomposition scheme provide drastic improvements for large arm spaces, not only in terms of processing speed and memory usage, but also in terms of an improved ability to identify the optimal arm, increasing with the number of bandit arms.
引用
收藏
页码:252 / 257
页数:6
相关论文
共 50 条
  • [41] Multi-armed Bandit with Additional Observations
    Yun D.
    Ahn S.
    Proutiere A.
    Shin J.
    Yi Y.
    2018, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (46): : 53 - 55
  • [42] A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem
    Madhushani, Udari
    Leonard, Naomi Ehrich
    2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), 2020, : 1677 - 1682
  • [43] CCN Interest Forwarding Strategy as Multi-Armed Bandit Model with Delays
    Avrachenkov, Konstantin
    Jacko, Peter
    2012 6TH INTERNATIONAL CONFERENCE ON NETWORK GAMES, CONTROL AND OPTIMIZATION (NETGCOOP), 2012, : 38 - 43
  • [44] A Multi-Armed Bandit Problem with the Optimal Arm Depending on a Hidden Markov Model
    Gulcu, Talha Cihad
    2021 IEEE INFORMATION THEORY WORKSHOP (ITW), 2021,
  • [45] Maximal Expectation as Upper Confidence Bound for Multi-armed Bandit Problems
    Kao, Kuo-Yuan
    Chen, I-Hao
    2014 IEEE 7TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC), 2014, : 325 - 329
  • [46] Mean-Variance and Value at Risk in Multi-Armed Bandit Problems
    Vakili, Sattar
    Zhao, Qing
    2015 53RD ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2015, : 1330 - 1335
  • [47] Empirical Gittins index strategies with ?-explorations for multi-armed bandit problems
    Li, Xiao
    Li, Yuqiang
    Wu, Xianyi
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 180
  • [48] Thompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems
    Ghalme, Ganesh
    Jain, Shweta
    Gujar, Sujit
    Narahari, Y.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 87 - 95
  • [49] Modeling Choice Variation in Search Strategies with Multi-armed Bandit Problems
    Sharma, Neha
    Dutt, Varun
    2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND DATA SCIENCE (MLDS 2017), 2017, : 91 - 97
  • [50] Solving multi-armed bandit problems using a chaotic microresonator comb
    Cuevas, Jonathan
    Iwami, Ryugo
    Uchida, Atsushi
    Minoshima, Kaoru
    Kuse, Naoya
    APL PHOTONICS, 2024, 9 (03)