STOCHASTIC CONVEX OPTIMIZATION WITH BANDIT FEEDBACK

被引:39
|
作者
Agarwal, Alekh [1 ]
Foster, Dean P. [2 ]
Hsu, Daniel [3 ]
Kakade, Sham M. [3 ]
Rakhlin, Alexander [2 ]
机构
[1] Microsoft Res, New York, NY 10016 USA
[2] Univ Penn, Dept Stat, Philadelphia, PA 19104 USA
[3] Microsoft Res, Cambridge, MA 02142 USA
基金
美国国家科学基金会;
关键词
derivative-free optimization; bandit optimization; ellipsoid method;
D O I
10.1137/110850827
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit (i.e., noisy zeroth-order) feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x is an element of X. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs (O) over tilde (poly(d)root T) regret. Since any algorithm has regret at least Omega(root T) on this problem, our algorithm is optimal in terms of the scaling with T.
引用
收藏
页码:213 / 240
页数:28
相关论文
共 50 条
  • [1] Distributed Online Stochastic-Constrained Convex Optimization With Bandit Feedback
    Wang, Cong
    Xu, Shengyuan
    Yuan, Deming
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 63 - 75
  • [2] Vector Optimization with Stochastic Bandit Feedback
    Ararat, Cagin
    Tekin, Cem
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [3] Online Stochastic Optimization under Correlated Bandit Feedback
    Azar, Mohammad Gheshlaghi
    Lazaric, Alessandro
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1557 - 1565
  • [4] Constrained distributed online convex optimization with bandit feedback for unbalanced digraphs
    Tada, Keishin
    Hayashi, Naoki
    Takai, Shigemasa
    IET CONTROL THEORY AND APPLICATIONS, 2024, 18 (02): : 184 - 200
  • [5] Online Convex Optimization With Time-Varying Constraints and Bandit Feedback
    Cao, Xuanyu
    Liu, K. J. Ray
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (07) : 2665 - 2680
  • [6] On the Time-Varying Constraints and Bandit Feedback of Online Convex Optimization
    Cao, Xuanyu
    Liu, K. J. Ray
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [7] Optimistic Bandit Convex Optimization
    Mohri, Mehryar
    Yang, Scott
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [8] Online bandit convex optimisation with stochastic constraints via two-point feedback
    Yu, Jichi
    Li, Jueyou
    Chen, Guo
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2023, 54 (10) : 2089 - 2105
  • [9] Technical Note-On Adaptivity in Nonstationary Stochastic Optimization with Bandit Feedback
    Wang, Yining
    OPERATIONS RESEARCH, 2025, 73 (02)
  • [10] Event-triggered distributed online convex optimization with delayed bandit feedback
    Xiong, Menghui
    Zhang, Baoyong
    Yuan, Deming
    Zhang, Yijun
    Chen, Jun
    APPLIED MATHEMATICS AND COMPUTATION, 2023, 445