Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive Regret Bounds

被引:0
|
作者
Ito, Shinji [1 ]
Takemura, Kei [1 ]
机构
[1] NEC Corp Ltd, Tokyo, Japan
关键词
OF-BOTH-WORLDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a linear bandit algorithm that is adaptive to environments at two different levels of hierarchy. At the higher level, the proposed algorithm adapts to a variety of types of environments. More precisely, it achieves best-of-three-worlds regret bounds, i.e., of O(root T log T) for adversarial environments and of O(log T/Delta(min) + root C log T/Delta(min)) for stochastic environments with adversarial corruptions, where T, Delta(min), and C denote, respectively, the time horizon, the minimum sub-optimality gap, and the total amount of the corruption. Note that polynomial factors in the dimensionality are omitted here. At the lower level, in each of the adversarial and stochastic regimes, the proposed algorithm adapts to certain environmental characteristics, thereby performing better. The proposed algorithm has data-dependent regret bounds that depend on all of the cumulative loss for the optimal action, the total quadratic variation, and the path-length of the loss vector sequence. In addition, for stochastic environments, the proposed algorithm has a variance-adaptive regret bound of O(sigma(2) log T/Delta(min)) as well, where sigma(2) denotes the maximum variance of the feedback loss. The proposed algorithm is based on the SCRiBLe algorithm (Abernethy et al., 2012). By incorporating into this a new technique we call scaled-up sampling, we obtain high-level adaptability, and by incorporating the technique of optimistic online learning, we obtain low-level adaptability.
引用
收藏
页数:25
相关论文
共 15 条
  • [1] Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm
    Kong, Fang
    Zhao, Canzhe
    Li, Shuai
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195 : 657 - 673
  • [2] Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs
    Kim, Yeoneung
    Yang, Insoon
    Jun, Kwang-Sung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [3] Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds
    Ito, Shinji
    Tsuchiya, Taira
    Honda, Junya
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [4] Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator
    Dean, Sarah
    Mania, Horia
    Matni, Nikolai
    Recht, Benjamin
    Tu, Stephen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [5] Regret Lower Bounds for Unbiased Adaptive Control of Linear Quadratic Regulators
    Ziemann, Ingvar
    Sandberg, Henrik
    IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (03): : 785 - 790
  • [6] Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit
    Li, Ke
    Yang, Yun
    Narisetty, Naveen N.
    ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (02): : 5652 - 5695
  • [7] SAdaBoundNc: an adaptive subgradient online learning algorithm with logarithmic regret bounds
    Lin Wang
    Xin Wang
    Tao Li
    Ruijuan Zheng
    Junlong Zhu
    Mingchuan Zhang
    Neural Computing and Applications, 2023, 35 : 8051 - 8063
  • [8] SAdaBoundNc: an adaptive subgradient online learning algorithm with logarithmic regret bounds
    Wang, Lin
    Wang, Xin
    Li, Tao
    Zheng, Ruijuan
    Zhu, Junlong
    Zhang, Mingchuan
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (11): : 8051 - 8063
  • [9] Sparse polynomial chaos algorithm with a variance-adaptive design domain for the uncertainty quantification and optimization of grating structures
    Papadopoulos, Aristeides d.
    Syvridis, Dimitris
    Glytsis, Elias n.
    APPLIED OPTICS, 2025, 64 (02) : 451 - 458
  • [10] Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency
    Zhao, Heyang
    He, Jiafan
    Zhou, Dongruo
    Zhang, Tong
    Gu, Quanquan
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195