Minimax Policy for Heavy-Tailed Bandits

被引：3

作者：

Wei, Lai ^{[1
]}

Srivastava, Vaibhav ^{[1
]}

机构：

[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48823 USA

来源：

IEEE CONTROL SYSTEMS LETTERS | 2021年 / 5卷 / 04期

关键词：

Heavy-tailed distribution; stochastic MAB; worst-case regret; minimax policy;

D O I：

10.1109/LCSYS.2020.3035767

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the stochastic Multi-Armed Bandit (MAB) problem under worst-case regret and heavytailed reward distribution. We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS. We show that if the moment of order 1 + epsilon for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution-dependent logarithm regret.

引用

页码：1423 / 1428

页数：6

共 50 条

[1] Minimax Policy for Heavy-tailed Bandits
Wei, Lai
Srivastava, Vaibhav
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1155 - 1160
[2] Robust Heavy-Tailed Linear Bandits Algorithm
Ma L.
Zhao P.
Zhou Z.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (06): : 1385 - 1395
[3] Stochastic Graphical Bandits with Heavy-Tailed Rewards
Gou, Yutian
Yi, Jinfeng
Zhang, Lijun
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 734 - 744
[4] No-Regret Algorithms for Heavy-Tailed Linear Bandits
Medina, Andres Munoz
Yang, Scott
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[5] Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards
Lu, Shiyin
Wang, Guanghui
Hu, Yao
Zhang, Lijun
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[6] Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards
Xue, Bo
Wang, Yimu
Wan, Yuanyu
Yi, Jinfeng
Zhang, Lijun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[7] Low-rank Matrix Bandits with Heavy-tailed Rewards
Kang, Yue
Hsieh, Cho-Jui
Lee, Thomas C. M.
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 1863 - 1889
[8] Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
Xue, Bo
Wang, Guanghui
Wang, Yimu
Zhang, Lijun
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2936 - 2942
[9] Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs
Shao, Han
Yu, Xiaotian
King, Irwin
Lyu, Michael R.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[10] Pure Exploration of Multi-Armed Bandits with Heavy-Tailed Payoffs
Yu, Xiaotian
Shao, Han
Lyu, Michael R.
King, Irwin
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 937 - 946

← 1 2 3 4 5 →