Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

被引:0
|
作者
Lin, Tianyi [1 ]
Pacchiano, Aldo [2 ]
Yu, Yaodong [1 ]
Jordan, Michael. I. [1 ,3 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
[2] Microsoft Res, New York, NY USA
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA USA
关键词
ENERGY MINIMIZATION; ALGORITHMS; APPROXIMATION; OPTIMIZATION; GREEDY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained non-submodular minimization with delayed costs in both full information and bandit feedback settings. In contrast to previous works on online unconstrained submodular minimization, we focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms in static and delayed scenarios. We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded. Key to our approach is the notion of (alpha; beta)-regret and the extension of the generic convex relaxation model from El Halabi & Jegelka (2020), the analysis of which is of independent interest. We conduct and showcase several simulation studies to demonstrate the efficacy of our algorithms.
引用
收藏
页数:27
相关论文
共 13 条
  • [1] Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback
    Zhang, Mingrui
    Chen, Lin
    Hassani, Hamed
    Karbasi, Amin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] Online Learning for Non-monotone DR-Submodular Maximization: From Full Information to Bandit Feedback
    Zhang, Qixin
    Deng, Zengde
    Chen, Zaiyi
    Zhou, Kuangqi
    Hu, Haoyuan
    Yang, Yu
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [3] Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
    Swaminathan, Adith
    Joachims, Thorsten
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 814 - 823
  • [4] Event-triggered distributed online convex optimization with delayed bandit feedback
    Xiong, Menghui
    Zhang, Baoyong
    Yuan, Deming
    Zhang, Yijun
    Chen, Jun
    APPLIED MATHEMATICS AND COMPUTATION, 2023, 445
  • [5] Interior-Point Methods for Full-Information and Bandit Online Learning
    Abernethy, Jacob D.
    Hazan, Elad
    Rakhlin, Alexander
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (07) : 4164 - 4175
  • [6] Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization
    Swaminathan, Adith
    Joachims, Thorsten
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 1731 - 1755
  • [7] Learning-augmented Online Minimization of Age of Information and Transmission Costs
    Liu, Zhongdong
    Zhang, Keyuan
    Li, Bin
    Sun, Yin
    Hou, Y. Thomas
    Ji, Bo
    IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS, INFOCOM WKSHPS 2024, 2024,
  • [8] Learning from Delayed Semi-Bandit Feedback under Strong Fairness Guarantees
    Steiger, Juaren
    Li, Bin
    Lu, Ning
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 1379 - 1388
  • [9] A Markov Game of Age of Information From Strategic Sources With Full Online Information
    Pagin, Matteo
    Badia, Leonardo
    Zorzi, Michele
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 76 - 81
  • [10] Regret minimization in online Bayesian persuasion: Handling adversarial receiver?s types under full and partial feedback models
    Castiglioni, Matteo
    Celli, Andrea
    Marchesi, Alberto
    Gatti, Nicola
    ARTIFICIAL INTELLIGENCE, 2023, 314