Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

被引：0

作者：

Cai, Tianchi ^{[1
]}

Bao, Shenliao ^{[1
]}

Jiang, Jiyan ^{[2
]}

Zhou, Shiji ^{[2
]}

Zhang, Wenpeng ^{[1
]}

Gu, Lihong ^{[1
]}

Gu, Jinjie ^{[1
]}

Zhang, Guannan ^{[1
]}

机构：

[1] Ant Grp, Hangzhou, Peoples R China

[2] Tsinghua Univ, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

关键词：

Recommender System; Reinforcement Learning;

D O I：

10.1145/3539618.3592022

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.

引用

页码：2179 / 2183

页数：5

共 50 条

[1] Reinforcement Learning-Based Model-Free Controller for Feedback Stabilization of Robotic Systems
Singh, Rupam
Bhushan, Bharat
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7059 - 7073
[2] Model-free average reward multi-step reinforcement learning
Hu, Guanghua
Wu, Cangpu
Kongzhi Lilun Yu Yinyong/Control Theory and Applications, 2000, 17 (05): : 660 - 664
[3] Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments
Stulp, Freek
Buchli, Jonas
Ellmer, Alice
Mistry, Michael
Theodorou, Evangelos A.
Schaal, Stefan
IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2012, 4 (04) : 330 - 341
[4] Model-free Reinforcement Learning for Stochastic Stackelberg Security Games
Mishra, Rajesh K.
Vasal, Deepanshu
Vishwanath, Sriram
2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 348 - 353
[5] Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient
Beck, Edgar
Bockelmann, Carsten
Dekorsy, Armin
2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 367 - 373
[6] Research on Improvement of Model-Free Average Reward Reinforcement Learning and Its Simulation Experiment
Chen, Wei
Zhai, Zhenkun
Li, Xiong
Guo, Jing
Wang, Jie
CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 4933 - 4936
[7] Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
Zhang, Zihan
Xie, Qiaomin
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
[8] Model-free Policy Learning with Reward Gradients
Lan, Qingfong
Tosatto, Samuele
Farrahi, Homayoon
Mahmood, A. Rupam
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[9] Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives
Bozkurt, Alper Kamil
Wang, Yu
Zavlanos, Michael M.
Pajic, Miroslav
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10649 - 10655
[10] From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization
Song, Zitao
Wang, Yining
Qian, Pin
Song, Sifan
Coenen, Frans
Jiang, Zhengyong
Su, Jionglong
APPLIED INTELLIGENCE, 2023, 53 (12) : 15188 - 15203

← 1 2 3 4 5 →