Counterfactual contextual bandit for recommendation under delayed feedback

被引:0
|
作者
Cai R. [1 ,2 ]
Lu R. [1 ]
Chen W. [1 ]
Hao Z. [1 ,3 ]
机构
[1] School of Computer Science, Guangdong University of Technology, Guangzhou
[2] Peng Cheng Laboratory, Shenzhen
[3] College of Mathematics and Computer Science, Shantou University, Shantou
基金
中国国家自然科学基金;
关键词
Causal inference; Contextual bandit; Delayed feedback; Recommendation system;
D O I
10.1007/s00521-024-09800-0
中图分类号
学科分类号
摘要
The recommendation system has far-reaching significance and great practical value, which alleviates people’s troubles about choosing from a huge amount of information. The existing recommendation system usually faces the selection bias problem due to the ignorance of samples with delayed feedback. To alleviate this problem, by modeling the recommendation as a batch contextual bandit problem, we propose a counterfactual reward estimation approach in this work. First, we formalize the counterfactual problem as “would the user be interested in the recommended item if the delayed time is before the collection time point?". The above counterfactual reward is estimated in a survival analysis framework, by fully exploring the causal generation process of user feedback on batch data. Second, based on the above estimated counterfactual rewards, the policy of batch contextual bandit is updated for online recommendation in the next episode. Third, new batch data are generated in the online recommendation for further counterfactual reward estimation. The above three steps are iteratively conducted until the optimal policy is learned. We also prove the sub-linear regret bound of the learned bandit policy theoretically. Our method achieved a 4% improvement in average reward compared to the baseline methods in experiments conducted on synthetic and Criteo datasets, demonstrating the efficacy of our approach. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
引用
收藏
页码:14599 / 14613
页数:14
相关论文
共 50 条
  • [1] Counterfactual Reward Modification for Streaming Recommendation with Delayed Feedback
    Zhang, Xiao
    Jia, Haonan
    Su, Hanjing
    Wang, Wenhan
    Xu, Jun
    Wen, Ji-Rong
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 41 - 50
  • [2] Task Replication for Vehicular Cloud: Contextual Combinatorial Bandit with Delayed Feedback
    Chen, Lixing
    Xu, Jie
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 748 - 756
  • [3] Efficient Counterfactual Learning from Bandit Feedback
    Narita, Yusuke
    Yasui, Shota
    Yata, Kohei
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4634 - 4641
  • [4] Contextual Dependent Click Bandit Algorithm for Web Recommendation
    Liu, Weiwen
    Li, Shuai
    Zhang, Shengyu
    COMPUTING AND COMBINATORICS (COCOON 2018), 2018, 10976 : 39 - 50
  • [5] Transferable Contextual Bandit for Cross-Domain Recommendation
    Liu, Bo
    Wei, Ying
    Zhang, Yu
    Yan, Zhixian
    Yang, Qiang
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3619 - 3626
  • [6] Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
    Swaminathan, Adith
    Joachims, Thorsten
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 814 - 823
  • [7] Contextual Multi-Armed Bandit for Email Layout Recommendation
    Chen, Yan
    Vankov, Emilian
    Baltrunas, Linas
    Donovan, Preston
    Mehta, Akash
    Schroeder, Benjamin
    Herman, Matthew
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 400 - 402
  • [8] Expert Features for a Student Support Recommendation Contextual Bandit Algorithm
    Lee, Morgan P.
    Siedahmed, Abubakir
    Heffernan, Neil T.
    FOURTEENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, LAK 2024, 2024, : 864 - 870
  • [9] Budgeted Recommendation with Delayed Feedback
    Liu, Kweiguu
    Maghsudi, Setareh
    Yokoo, Makoto
    GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 3, WORLDCIST 2024, 2024, 987 : 202 - 213
  • [10] Constrained contextual bandit algorithm for limited-budget recommendation system
    Zhao, Yafei
    Yang, Long
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 128