STOCHASTIC CONVEX OPTIMIZATION WITH BANDIT FEEDBACK

被引：39

作者：

Agarwal, Alekh ^{[1
]}

Foster, Dean P. ^{[2
]}

Hsu, Daniel ^{[3
]}

Kakade, Sham M. ^{[3
]}

Rakhlin, Alexander ^{[2
]}

机构：

[1] Microsoft Res, New York, NY 10016 USA

[2] Univ Penn, Dept Stat, Philadelphia, PA 19104 USA

[3] Microsoft Res, Cambridge, MA 02142 USA

来源：

SIAM JOURNAL ON OPTIMIZATION | 2013年 / 23卷 / 01期

基金：

美国国家科学基金会;

关键词：

derivative-free optimization; bandit optimization; ellipsoid method;

D O I：

10.1137/110850827

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit (i.e., noisy zeroth-order) feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x is an element of X. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs (O) over tilde (poly(d)root T) regret. Since any algorithm has regret at least Omega(root T) on this problem, our algorithm is optimal in terms of the scaling with T.

引用

页码：213 / 240

页数：28

共 50 条

[1] Distributed Online Stochastic-Constrained Convex Optimization With Bandit Feedback
Wang, Cong
Xu, Shengyuan
Yuan, Deming
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 63 - 75
[2] Vector Optimization with Stochastic Bandit Feedback
Ararat, Cagin
Tekin, Cem
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[3] Online Stochastic Optimization under Correlated Bandit Feedback
Azar, Mohammad Gheshlaghi
Lazaric, Alessandro
Brunskill, Emma
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1557 - 1565
[4] Constrained distributed online convex optimization with bandit feedback for unbalanced digraphs
Tada, Keishin
Hayashi, Naoki
Takai, Shigemasa
IET CONTROL THEORY AND APPLICATIONS, 2024, 18 (02): : 184 - 200
[5] Online Convex Optimization With Time-Varying Constraints and Bandit Feedback
Cao, Xuanyu
Liu, K. J. Ray
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (07) : 2665 - 2680
[6] On the Time-Varying Constraints and Bandit Feedback of Online Convex Optimization
Cao, Xuanyu
Liu, K. J. Ray
2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
[7] Optimistic Bandit Convex Optimization
Mohri, Mehryar
Yang, Scott
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[8] Online bandit convex optimisation with stochastic constraints via two-point feedback
Yu, Jichi
Li, Jueyou
Chen, Guo
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2023, 54 (10) : 2089 - 2105
[9] Technical Note-On Adaptivity in Nonstationary Stochastic Optimization with Bandit Feedback
Wang, Yining
OPERATIONS RESEARCH, 2025, 73 (02)
[10] Event-triggered distributed online convex optimization with delayed bandit feedback
Xiong, Menghui
Zhang, Baoyong
Yuan, Deming
Zhang, Yijun
Chen, Jun
APPLIED MATHEMATICS AND COMPUTATION, 2023, 445

← 1 2 3 4 5 →