Counterfactual contextual bandit for recommendation under delayed feedback

被引：0

作者：

Cai R. ^{[1
,2
]}

Lu R. ^{[1
]}

Chen W. ^{[1
]}

Hao Z. ^{[1
,3
]}

机构：

[1] School of Computer Science, Guangdong University of Technology, Guangzhou

[2] Peng Cheng Laboratory, Shenzhen

[3] College of Mathematics and Computer Science, Shantou University, Shantou

来源：

Neural Computing and Applications | 2024年 / 36卷 / 23期

基金：

中国国家自然科学基金;

关键词：

Causal inference; Contextual bandit; Delayed feedback; Recommendation system;

D O I：

10.1007/s00521-024-09800-0

中图分类号：

学科分类号：

摘要：

The recommendation system has far-reaching significance and great practical value, which alleviates people’s troubles about choosing from a huge amount of information. The existing recommendation system usually faces the selection bias problem due to the ignorance of samples with delayed feedback. To alleviate this problem, by modeling the recommendation as a batch contextual bandit problem, we propose a counterfactual reward estimation approach in this work. First, we formalize the counterfactual problem as “would the user be interested in the recommended item if the delayed time is before the collection time point?". The above counterfactual reward is estimated in a survival analysis framework, by fully exploring the causal generation process of user feedback on batch data. Second, based on the above estimated counterfactual rewards, the policy of batch contextual bandit is updated for online recommendation in the next episode. Third, new batch data are generated in the online recommendation for further counterfactual reward estimation. The above three steps are iteratively conducted until the optimal policy is learned. We also prove the sub-linear regret bound of the learned bandit policy theoretically. Our method achieved a 4% improvement in average reward compared to the baseline methods in experiments conducted on synthetic and Criteo datasets, demonstrating the efficacy of our approach. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

引用

页码：14599 / 14613

页数：14

共 50 条

[1] Counterfactual Reward Modification for Streaming Recommendation with Delayed Feedback
Zhang, Xiao
Jia, Haonan
Su, Hanjing
Wang, Wenhan
Xu, Jun
Wen, Ji-Rong
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 41 - 50
[2] Task Replication for Vehicular Cloud: Contextual Combinatorial Bandit with Delayed Feedback
Chen, Lixing
Xu, Jie
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 748 - 756
[3] Efficient Counterfactual Learning from Bandit Feedback
Narita, Yusuke
Yasui, Shota
Yata, Kohei
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4634 - 4641
[4] Contextual Dependent Click Bandit Algorithm for Web Recommendation
Liu, Weiwen
Li, Shuai
Zhang, Shengyu
COMPUTING AND COMBINATORICS (COCOON 2018), 2018, 10976 : 39 - 50
[5] Transferable Contextual Bandit for Cross-Domain Recommendation
Liu, Bo
Wei, Ying
Zhang, Yu
Yan, Zhixian
Yang, Qiang
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3619 - 3626
[6] Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
Swaminathan, Adith
Joachims, Thorsten
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 814 - 823
[7] Contextual Multi-Armed Bandit for Email Layout Recommendation
Chen, Yan
Vankov, Emilian
Baltrunas, Linas
Donovan, Preston
Mehta, Akash
Schroeder, Benjamin
Herman, Matthew
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 400 - 402
[8] Expert Features for a Student Support Recommendation Contextual Bandit Algorithm
Lee, Morgan P.
Siedahmed, Abubakir
Heffernan, Neil T.
FOURTEENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, LAK 2024, 2024, : 864 - 870
[9] Budgeted Recommendation with Delayed Feedback
Liu, Kweiguu
Maghsudi, Setareh
Yokoo, Makoto
GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 3, WORLDCIST 2024, 2024, 987 : 202 - 213
[10] Constrained contextual bandit algorithm for limited-budget recommendation system
Zhao, Yafei
Yang, Long
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 128

← 1 2 3 4 5 →