Learning from Delayed Semi-Bandit Feedback under Strong Fairness Guarantees

被引：5

作者：

Steiger, Juaren ^{[1
]}

Li, Bin ^{[2
]}

Lu, Ning ^{[1
]}

机构：

[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON, Canada

[2] Penn State Univ, Sch Elect Engn & Comp Sci, State Coll, PA USA

来源：

IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022) | 2022年

关键词：

D O I：

10.1109/INFOCOM48880.2022.9796683

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-armed bandit frameworks, including combinatorial semi-bandits and sleeping bandits, are commonly employed to model problems in communication networks and other engineering domains. In such problems, feedback to the learning agent is often delayed (e.g. communication delays in a wireless network or conversion delays in online advertising). Moreover, arms in a bandit problem often represent entities required to be treated fairly, i.e. the arms should be played at least a required fraction of the time. In contrast to the previously studied asymptotic fairness, many real-time systems require such fairness guarantees to hold even in the short-term (e.g. ensuring the credibility of information flows in an industrial Internet of Things (IoT) system). To that end, we develop the Learning with Delays under Fairness (LDF) algorithm to solve combinatorial semi-bandit problems with sleeping arms and delayed feedback, which we prove guarantees strong (short-term) fairness. While previous theoretical work on bandit problems with delayed feedback typically derive instance-dependent regret bounds, this approach proves to be challenging when simultaneously considering fairness. We instead derive a novel instance-independent regret bound in this setting which agrees with state-of-the-art bounds. We verify our theoretical results with extensive simulations using both synthetic and real-world datasets.

引用

页码：1379 / 1388

页数：10

共 48 条

[1] An Efficient Algorithm for Learning with Semi-bandit Feedback
Neu, Gergely
Bartok, Gabor
ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 234 - 248
[2] ONLINE LEARNING FOR COMPUTATION PEER OFFLOADING WITH SEMI-BANDIT FEEDBACK
Zhu, Hongbin
Kang, Kai
Luo, Xiliang
Qian, Hua
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4524 - 4528
[3] Optimal Resource Allocation with Semi-Bandit Feedback
Lattimore, Tor
Crammer, Koby
Szepesvari, Csaba
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2014, : 477 - 486
[4] Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback
Wen, Zheng
Kveton, Branislav
Valko, Michal
Vaswani, Sharan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[5] Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback
Letard, Alexandre
Amghar, Tassadit
Camp, Olivier
Gutowski, Nicolas
2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1073 - 1078
[6] Linear Multi-Resource Allocation with Semi-Bandit Feedback
Lattimore, Tor
Crammer, Koby
Szepesvari, Csaba
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[7] Stochastic Online Greedy Learning with Semi-bandit Feedbacks
Lin, Tian
Li, Jian
Chen, Wei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[8] Playing Repeated Network Interdiction Games with Semi-Bandit Feedback
Guo, Qingyu
An, Bo
Tran-Thanh, Long
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3682 - 3690
[9] Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback
Jourdan, Marc
Mutny, Mojmir
Kirschner, Johannes
Krause, Andreas
ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
[10] Online Second Price Auction with Semi-Bandit Feedback under the Non-Stationary Setting
Zhao, Haoyu
Chen, Wei
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6893 - 6900

← 1 2 3 4 5 →