Learning from Delayed Semi-Bandit Feedback under Strong Fairness Guarantees

被引：5

作者：

Steiger, Juaren ^{[1
]}

Li, Bin ^{[2
]}

Lu, Ning ^{[1
]}

机构：

[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON, Canada

[2] Penn State Univ, Sch Elect Engn & Comp Sci, State Coll, PA USA

来源：

IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022) | 2022年

关键词：

D O I：

10.1109/INFOCOM48880.2022.9796683

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-armed bandit frameworks, including combinatorial semi-bandits and sleeping bandits, are commonly employed to model problems in communication networks and other engineering domains. In such problems, feedback to the learning agent is often delayed (e.g. communication delays in a wireless network or conversion delays in online advertising). Moreover, arms in a bandit problem often represent entities required to be treated fairly, i.e. the arms should be played at least a required fraction of the time. In contrast to the previously studied asymptotic fairness, many real-time systems require such fairness guarantees to hold even in the short-term (e.g. ensuring the credibility of information flows in an industrial Internet of Things (IoT) system). To that end, we develop the Learning with Delays under Fairness (LDF) algorithm to solve combinatorial semi-bandit problems with sleeping arms and delayed feedback, which we prove guarantees strong (short-term) fairness. While previous theoretical work on bandit problems with delayed feedback typically derive instance-dependent regret bounds, this approach proves to be challenging when simultaneously considering fairness. We instead derive a novel instance-independent regret bound in this setting which agrees with state-of-the-art bounds. We verify our theoretical results with extensive simulations using both synthetic and real-world datasets.

引用

页码：1379 / 1388

页数：10

共 48 条

[21] Learning Equilibria in Matching Markets from Bandit Feedback
Jagadeesan, Meena
Wei, Alexander
Wang, Yixin
Jordan, Michael I.
Steinhardt, Jacob
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[22] Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback
Lin, Tianyi
Pacchiano, Aldo
Yu, Yaodong
Jordan, Michael. I.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[23] Learning Structured Predictors from Bandit Feedback for Interactive NLP
Sokolov, Artem
Kreutzer, Julia
Lo, Christopher
Riezler, Stefan
PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1610 - 1620
[24] Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
Swaminathan, Adith
Joachims, Thorsten
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 814 - 823
[25] LEARNING FROM DELAYED FEEDBACK IN ADOLESCENCE
Davidow, Juliet Y.
Foerde, Karin
Galvan, Adriana
Shohamy, Daphna
JOURNAL OF COGNITIVE NEUROSCIENCE, 2013, : 167 - 167
[26] Online Multiclass Learning with "Bandit" Feedback under a Confidence-Weighted Approach
Shi, Chaoran
Wang, Xiong
Tian, Xiaohua
Gan, Xiaoying
Wang, Xinbing
2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
[27] Simulating Bandit Learning from User Feedback for Extractive Question Answering
Gao, Ge
Choi, Eunsol
Artzi, Yoav
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5167 - 5179
[28] Risk-Averse Trees for Learning from Logged Bandit Feedback
Trovo, Francesco
Paladino, Stefano
Simone, Paolo
Restelli, Marcello
Gatti, Nicola
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 976 - 983
[29] Targeting Optimization for Internet Advertising by Learning from Logged Bandit Feedback
Gasparini, Margherita
Nuara, Alessandro
Trovo, Francesco
Gatti, Nicola
Restelli, Marcello
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[30] Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization
Swaminathan, Adith
Joachims, Thorsten
JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 1731 - 1755

← 1 2 3 4 5 →