Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive Regret Bounds

被引：0

作者：

Ito, Shinji ^{[1
]}

Takemura, Kei ^{[1
]}

机构：

[1] NEC Corp Ltd, Tokyo, Japan

来源：

THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195 | 2023年 / 195卷

关键词：

OF-BOTH-WORLDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a linear bandit algorithm that is adaptive to environments at two different levels of hierarchy. At the higher level, the proposed algorithm adapts to a variety of types of environments. More precisely, it achieves best-of-three-worlds regret bounds, i.e., of O(root T log T) for adversarial environments and of O(log T/Delta(min) + root C log T/Delta(min)) for stochastic environments with adversarial corruptions, where T, Delta(min), and C denote, respectively, the time horizon, the minimum sub-optimality gap, and the total amount of the corruption. Note that polynomial factors in the dimensionality are omitted here. At the lower level, in each of the adversarial and stochastic regimes, the proposed algorithm adapts to certain environmental characteristics, thereby performing better. The proposed algorithm has data-dependent regret bounds that depend on all of the cumulative loss for the optimal action, the total quadratic variation, and the path-length of the loss vector sequence. In addition, for stochastic environments, the proposed algorithm has a variance-adaptive regret bound of O(sigma(2) log T/Delta(min)) as well, where sigma(2) denotes the maximum variance of the feedback loss. The proposed algorithm is based on the SCRiBLe algorithm (Abernethy et al., 2012). By incorporating into this a new technique we call scaled-up sampling, we obtain high-level adaptability, and by incorporating the technique of optimistic online learning, we obtain low-level adaptability.

引用

页数：25

共 15 条

[1] Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm
Kong, Fang
Zhao, Canzhe
Li, Shuai
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195 : 657 - 673
[2] Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs
Kim, Yeoneung
Yang, Insoon
Jun, Kwang-Sung
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[3] Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds
Ito, Shinji
Tsuchiya, Taira
Honda, Junya
CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
[4] Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator
Dean, Sarah
Mania, Horia
Matni, Nikolai
Recht, Benjamin
Tu, Stephen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[5] Regret Lower Bounds for Unbiased Adaptive Control of Linear Quadratic Regulators
Ziemann, Ingvar
Sandberg, Henrik
IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (03): : 785 - 790
[6] Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit
Li, Ke
Yang, Yun
Narisetty, Naveen N.
ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (02): : 5652 - 5695
[7] SAdaBoundNc: an adaptive subgradient online learning algorithm with logarithmic regret bounds
Lin Wang
Xin Wang
Tao Li
Ruijuan Zheng
Junlong Zhu
Mingchuan Zhang
Neural Computing and Applications, 2023, 35 : 8051 - 8063
[8] SAdaBoundNc: an adaptive subgradient online learning algorithm with logarithmic regret bounds
Wang, Lin
Wang, Xin
Li, Tao
Zheng, Ruijuan
Zhu, Junlong
Zhang, Mingchuan
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (11): : 8051 - 8063
[9] Sparse polynomial chaos algorithm with a variance-adaptive design domain for the uncertainty quantification and optimization of grating structures
Papadopoulos, Aristeides d.
Syvridis, Dimitris
Glytsis, Elias n.
APPLIED OPTICS, 2025, 64 (02) : 451 - 458
[10] Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency
Zhao, Heyang
He, Jiafan
Zhou, Dongruo
Zhang, Tong
Gu, Quanquan
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195

← 1 2 →