Arm Space Decomposition as a Strategy for Tackling Large Scale Multi-Armed Bandit Problems

被引：0

作者：

Gupta, Neha ^{[1
]}

Granmo, Ole-Christoffer ^{[2
]}

Agrawala, Ashok ^{[1
]}

机构：

[1] Univ Maryland, College Pk, MD 20742 USA

[2] Univ Agder, Grimstad, Norway

来源：

2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1 | 2013年

关键词：

GAMES;

D O I：

10.1109/ICMLA.2013.51

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent multi-armed bandit based optimization schemes provide near-optimal balancing of arm exploration against arm exploitation, allowing the optimal arm to be identified with probability arbitrarily close to unity. However, the convergence speed drops dramatically as the number of bandit arms grows large, simply because singling out the optimal arm requires experimentation with all of the available arms. Furthermore, effective exploration and exploitation typically demands computational resources that grow linearly with the number of arms. Although the former problem can be remedied to some degree when prior knowledge about arm correlation is available, the latter problem persists. In this paper we propose a Thompson Sampling (TS) based scheme for exploring an arm space of size K by decomposing it into two separate arm spaces, each of size root K, thus achieving sub-linear scalability. In brief, two dedicated Thompson Samplers explore each arm space separately. However, at each iteration, arm selection feedback is obtained by jointly considering the arms selected by each of the Thompson Samplers, mapping them into the original arm space. This kind of decentralized decision-making can be modeled as a game theory problem, where two independent decision makers interact in terms of a common pay-off game. Our scheme requires no communication between the decision makers, who have complete autonomy over their actions. Thus it is ideal for coordinating autonomous agents in a multi-agent system. Extensive experiments, including instances possessing multiple Nash equilibria, demonstrate remarkable performance benefits. Although TS based schemes already are among the top-performing bandit players, our proposed arm space decomposition scheme provide drastic improvements for large arm spaces, not only in terms of processing speed and memory usage, but also in terms of an improved ability to identify the optimal arm, increasing with the number of bandit arms.

引用

页码：252 / 257

页数：6

共 50 条

[31] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Bubeck, Sebastien
Cesa-Bianchi, Nicolo
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01): : 1 - 122
[32] Dynamic Multi-Armed Bandit with Covariates
Pavlidis, Nicos G.
Tasoulis, Dimitris K.
Adams, Niall M.
Hand, David J.
ECAI 2008, PROCEEDINGS, 2008, 178 : 777 - +
[33] Scaling Multi-Armed Bandit Algorithms
Fouche, Edouard
Komiyama, Junpei
Boehm, Klemens
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1449 - 1459
[34] The budgeted multi-armed bandit problem
Madani, O
Lizotte, DJ
Greiner, R
LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 643 - 645
[35] The Multi-Armed Bandit With Stochastic Plays
Lesage-Landry, Antoine
Taylor, Joshua A.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286
[36] MULTI-ARMED BANDIT ALLOCATION INDEXES
JONES, PW
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 1989, 40 (12) : 1158 - 1159
[37] Multi-armed Bandit with Additional Observations
Yun, Donggyu
Proutiere, Alexandre
Ahn, Sumyeong
Shin, Jinwoo
Yi, Yung
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2018, 2 (01)
[38] IMPROVING STRATEGIES FOR THE MULTI-ARMED BANDIT
POHLENZ, S
MARKOV PROCESS AND CONTROL THEORY, 1989, 54 : 158 - 163
[39] THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES
Perchet, Vianney
Rigollet, Philippe
ANNALS OF STATISTICS, 2013, 41 (02): : 693 - 721
[40] The Multi-fidelity Multi-armed Bandit
Kandasamy, Kirthevasan
Dasarathy, Gautam
Schneider, Jeff
Poczos, Barnabas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29

← 1 2 3 4 5 →