Budgeted Recommendation with Delayed Feedback

被引：0

作者：

Liu, Kweiguu ^{[1
]}

Maghsudi, Setareh ^{[2
]}

Yokoo, Makoto ^{[1
]}

机构：

[1] Kyushu Univ, Fac Informat Sci & Elect Engn, Fukuoka 8190395, Japan

[2] Ruhr Univ Bochum, Fac Elect Engn & Informat Technol, D-44801 Bochum, Germany

来源：

GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 3, WORLDCIST 2024 | 2024年 / 987卷

关键词：

Budget Constraints; Delayed Feedback; Online Learning; Resource Allocation;

D O I：

10.1007/978-3-031-60221-4_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.

引用

页码：202 / 213

页数：12

共 50 条

[1] Counterfactual contextual bandit for recommendation under delayed feedback
Cai R.
Lu R.
Chen W.
Hao Z.
Neural Computing and Applications, 2024, 36 (23) : 14599 - 14613
[2] Counterfactual Reward Modification for Streaming Recommendation with Delayed Feedback
Zhang, Xiao
Jia, Haonan
Su, Hanjing
Wang, Wenhan
Xu, Jun
Wen, Ji-Rong
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 41 - 50
[3] Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments
Wang, Dairui
Cao, Junyu
Zhang, Yan
Qi, Wei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] Delayed feedback
Gorelik, G
JOURNAL OF PHYSICS-USSR, 1939, 1 : 465 - 470
[5] Implicit Feedback Mining for Recommendation
Song, Yan
Yang, Ping
Zhang, Chunhong
Ji, Yang
BIG DATA COMPUTING AND COMMUNICATIONS, 2015, 9196 : 373 - 385
[6] Denoising Implicit Feedback for Recommendation
Wang, Wenjie
Feng, Fuli
He, Xiangnan
Nie, Liqiang
Chua, Tat-Seng
WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, : 373 - 381
[7] Deep Feedback Network for Recommendation
Xie, Ruobing
Ling, Cheng
Wang, Yalong
Wang, Rui
Xia, Feng
Lin, Leyu
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2519 - 2525
[8] EFFECTS OF IMMEDIATE INFORMATION FEEDBACK AND DELAYED INFORMATION FEEDBACK ON DELAYED RETENTION
BECK, FW
LINDSEY, JD
JOURNAL OF EDUCATIONAL RESEARCH, 1979, 72 (05): : 283 - 284
[9] EFFECTS OF DELAYED INFORMATION FEEDBACK AND FEEDBACK CUES IN LEARNING ON DELAYED RETENTION
SASSENRA.JM
YONGE, GD
JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1969, 60 (03) : 174 - &
[10] Learning with Delayed Feedback
Pranavan, Theivendiram
Sim, Terence
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4895 - 4902

← 1 2 3 4 5 →