Thresholding Bandits with Augmented UCB

被引：0

作者：

Mukherjee, Subhojyoti ^{[1
]}

Purushothama, Naveen Kolar ^{[2
]}

Sudarsanam, Nandan ^{[3
]}

Ravindran, Balaraman ^{[1
]}

机构：

[1] Indian Inst Technol Madras, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India

[2] Indian Inst Technol Madras, Dept Elect Engn, Chennai, Tamil Nadu, India

[3] Indian Inst Technol Madras, Dept Management Studies, Chennai, Tamil Nadu, India

来源：

PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2017年

关键词：

MULTIARMED BANDIT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB is that it uses both mean and variance estimates to eliminate arms that have been sufficiently explored; to the best of our knowledge this is the first algorithm to employ such an approach for the considered TBP. Theoretically, we obtain an upper bound on the loss (probability of mis-classification) incurred by AugUCB. Although UCBEV in literature provides a better guarantee, it is important to emphasize that UCBEV has access to problem complexity (whose computation requires arms' mean and variances), and hence is not realistic in practice; this is in contrast to AugUCB whose implementation does not require any such complexity inputs. We conduct extensive simulation experiments to validate the performance of AugUCB. Through our simulation work, we establish that AugUCB, owing to its utilization of variance estimates, performs significantly better than the state-of-the-art APT, CSAR and other non variance-based algorithms.

引用

页码：2515 / 2521

页数：7

共 50 条

[1] Efficient Kernel UCB for Contextual Bandits
Zenati, Houssam
Bietti, Alberto
Diemert, Eustache
Mairal, Julien
Martin, Matthieu
Gaillard, Pierre
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 5689 - 5720
[2] Thresholding Graph Bandits with GrAPL
LeJeune, Daniel
Dasarathy, Gautam
Baraniuk, Richard G.
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2476 - 2484
[3] Interconnected Neural Linear Contextual Bandits with UCB Exploration
Chen, Yang
Xie, Miao
Liu, Jiamou
Zhao, Kaiqi
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 : 169 - 181
[4] Nonstationary Stochastic Bandits: UCB Policies and Minimax Regret
Wei, Lai
Srivastava, Vaibhav
IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2024, 3 : 128 - 142
[5] Cornering Stationary and Restless Mixing Bandits with Remix-UCB
Audiffren, Julien
Ralaivola, Liva
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[6] Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
Foster, Dylan J.
Rakhlin, Alexander
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[7] Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
Foster, Dylan J.
Rakhlin, Alexander
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[8] UCB-based Algorithms for Multinomial Logistic Regression Bandits
Amani, Sanae
Thrampoulidis, Christos
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[9] An Environmentally Sensitive Jamming Bandits Using Improved UCB Method
Zheng, Yuzhuo
Wang, Jun
Mao, Shaoqing
Han, Dongmei
PROCEEDINGS OF 2020 IEEE 15TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2020), 2020, : 295 - 299
[10] Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits
Ouhamma, Reda
Degenne, Remy
Gaillard, Pierre
Perchet, Vianney
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →