Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

被引：0

作者：

Lin, Tianyi ^{[1
]}

Pacchiano, Aldo ^{[2
]}

Yu, Yaodong ^{[1
]}

Jordan, Michael. I. ^{[1
,3
]}

机构：

[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA

[2] Microsoft Res, New York, NY USA

[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

ENERGY MINIMIZATION; ALGORITHMS; APPROXIMATION; OPTIMIZATION; GREEDY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained non-submodular minimization with delayed costs in both full information and bandit feedback settings. In contrast to previous works on online unconstrained submodular minimization, we focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms in static and delayed scenarios. We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded. Key to our approach is the notion of (alpha; beta)-regret and the extension of the generic convex relaxation model from El Halabi & Jegelka (2020), the analysis of which is of independent interest. We conduct and showcase several simulation studies to demonstrate the efficacy of our algorithms.

引用

页数：27

共 13 条

[1] Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback
Zhang, Mingrui
Chen, Lin
Hassani, Hamed
Karbasi, Amin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[2] Online Learning for Non-monotone DR-Submodular Maximization: From Full Information to Bandit Feedback
Zhang, Qixin
Deng, Zengde
Chen, Zaiyi
Zhou, Kuangqi
Hu, Haoyuan
Yang, Yu
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[3] Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
Swaminathan, Adith
Joachims, Thorsten
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 814 - 823
[4] Event-triggered distributed online convex optimization with delayed bandit feedback
Xiong, Menghui
Zhang, Baoyong
Yuan, Deming
Zhang, Yijun
Chen, Jun
APPLIED MATHEMATICS AND COMPUTATION, 2023, 445
[5] Interior-Point Methods for Full-Information and Bandit Online Learning
Abernethy, Jacob D.
Hazan, Elad
Rakhlin, Alexander
IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (07) : 4164 - 4175
[6] Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization
Swaminathan, Adith
Joachims, Thorsten
JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 1731 - 1755
[7] Learning-augmented Online Minimization of Age of Information and Transmission Costs
Liu, Zhongdong
Zhang, Keyuan
Li, Bin
Sun, Yin
Hou, Y. Thomas
Ji, Bo
IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS, INFOCOM WKSHPS 2024, 2024,
[8] Learning from Delayed Semi-Bandit Feedback under Strong Fairness Guarantees
Steiger, Juaren
Li, Bin
Lu, Ning
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 1379 - 1388
[9] A Markov Game of Age of Information From Strategic Sources With Full Online Information
Pagin, Matteo
Badia, Leonardo
Zorzi, Michele
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 76 - 81
[10] Regret minimization in online Bayesian persuasion: Handling adversarial receiver?s types under full and partial feedback models
Castiglioni, Matteo
Celli, Andrea
Marchesi, Alberto
Gatti, Nicola
ARTIFICIAL INTELLIGENCE, 2023, 314

← 1 2 →