Minimax Policy for Heavy-Tailed Bandits

被引:3
|
作者
Wei, Lai [1 ]
Srivastava, Vaibhav [1 ]
机构
[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48823 USA
来源
IEEE CONTROL SYSTEMS LETTERS | 2021年 / 5卷 / 04期
关键词
Heavy-tailed distribution; stochastic MAB; worst-case regret; minimax policy;
D O I
10.1109/LCSYS.2020.3035767
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the stochastic Multi-Armed Bandit (MAB) problem under worst-case regret and heavytailed reward distribution. We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS. We show that if the moment of order 1 + epsilon for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution-dependent logarithm regret.
引用
收藏
页码:1423 / 1428
页数:6
相关论文
共 50 条
  • [1] Minimax Policy for Heavy-tailed Bandits
    Wei, Lai
    Srivastava, Vaibhav
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1155 - 1160
  • [2] Robust Heavy-Tailed Linear Bandits Algorithm
    Ma L.
    Zhao P.
    Zhou Z.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (06): : 1385 - 1395
  • [3] Stochastic Graphical Bandits with Heavy-Tailed Rewards
    Gou, Yutian
    Yi, Jinfeng
    Zhang, Lijun
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 734 - 744
  • [4] No-Regret Algorithms for Heavy-Tailed Linear Bandits
    Medina, Andres Munoz
    Yang, Scott
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [5] Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards
    Lu, Shiyin
    Wang, Guanghui
    Hu, Yao
    Zhang, Lijun
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [6] Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards
    Xue, Bo
    Wang, Yimu
    Wan, Yuanyu
    Yi, Jinfeng
    Zhang, Lijun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Low-rank Matrix Bandits with Heavy-tailed Rewards
    Kang, Yue
    Hsieh, Cho-Jui
    Lee, Thomas C. M.
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 1863 - 1889
  • [8] Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
    Xue, Bo
    Wang, Guanghui
    Wang, Yimu
    Zhang, Lijun
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2936 - 2942
  • [9] Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs
    Shao, Han
    Yu, Xiaotian
    King, Irwin
    Lyu, Michael R.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [10] Pure Exploration of Multi-Armed Bandits with Heavy-Tailed Payoffs
    Yu, Xiaotian
    Shao, Han
    Lyu, Michael R.
    King, Irwin
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 937 - 946