Learning from Delayed Semi-Bandit Feedback under Strong Fairness Guarantees

被引:5
|
作者
Steiger, Juaren [1 ]
Li, Bin [2 ]
Lu, Ning [1 ]
机构
[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON, Canada
[2] Penn State Univ, Sch Elect Engn & Comp Sci, State Coll, PA USA
关键词
D O I
10.1109/INFOCOM48880.2022.9796683
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-armed bandit frameworks, including combinatorial semi-bandits and sleeping bandits, are commonly employed to model problems in communication networks and other engineering domains. In such problems, feedback to the learning agent is often delayed (e.g. communication delays in a wireless network or conversion delays in online advertising). Moreover, arms in a bandit problem often represent entities required to be treated fairly, i.e. the arms should be played at least a required fraction of the time. In contrast to the previously studied asymptotic fairness, many real-time systems require such fairness guarantees to hold even in the short-term (e.g. ensuring the credibility of information flows in an industrial Internet of Things (IoT) system). To that end, we develop the Learning with Delays under Fairness (LDF) algorithm to solve combinatorial semi-bandit problems with sleeping arms and delayed feedback, which we prove guarantees strong (short-term) fairness. While previous theoretical work on bandit problems with delayed feedback typically derive instance-dependent regret bounds, this approach proves to be challenging when simultaneously considering fairness. We instead derive a novel instance-independent regret bound in this setting which agrees with state-of-the-art bounds. We verify our theoretical results with extensive simulations using both synthetic and real-world datasets.
引用
收藏
页码:1379 / 1388
页数:10
相关论文
共 48 条
  • [21] Learning Equilibria in Matching Markets from Bandit Feedback
    Jagadeesan, Meena
    Wei, Alexander
    Wang, Yixin
    Jordan, Michael I.
    Steinhardt, Jacob
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [22] Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback
    Lin, Tianyi
    Pacchiano, Aldo
    Yu, Yaodong
    Jordan, Michael. I.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [23] Learning Structured Predictors from Bandit Feedback for Interactive NLP
    Sokolov, Artem
    Kreutzer, Julia
    Lo, Christopher
    Riezler, Stefan
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1610 - 1620
  • [24] Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
    Swaminathan, Adith
    Joachims, Thorsten
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 814 - 823
  • [25] LEARNING FROM DELAYED FEEDBACK IN ADOLESCENCE
    Davidow, Juliet Y.
    Foerde, Karin
    Galvan, Adriana
    Shohamy, Daphna
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2013, : 167 - 167
  • [26] Online Multiclass Learning with "Bandit" Feedback under a Confidence-Weighted Approach
    Shi, Chaoran
    Wang, Xiong
    Tian, Xiaohua
    Gan, Xiaoying
    Wang, Xinbing
    2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
  • [27] Simulating Bandit Learning from User Feedback for Extractive Question Answering
    Gao, Ge
    Choi, Eunsol
    Artzi, Yoav
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5167 - 5179
  • [28] Risk-Averse Trees for Learning from Logged Bandit Feedback
    Trovo, Francesco
    Paladino, Stefano
    Simone, Paolo
    Restelli, Marcello
    Gatti, Nicola
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 976 - 983
  • [29] Targeting Optimization for Internet Advertising by Learning from Logged Bandit Feedback
    Gasparini, Margherita
    Nuara, Alessandro
    Trovo, Francesco
    Gatti, Nicola
    Restelli, Marcello
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [30] Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization
    Swaminathan, Adith
    Joachims, Thorsten
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 1731 - 1755