Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

被引:0
|
作者
Cai, Tianchi [1 ]
Bao, Shenliao [1 ]
Jiang, Jiyan [2 ]
Zhou, Shiji [2 ]
Zhang, Wenpeng [1 ]
Gu, Lihong [1 ]
Gu, Jinjie [1 ]
Zhang, Guannan [1 ]
机构
[1] Ant Grp, Hangzhou, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年
关键词
Recommender System; Reinforcement Learning;
D O I
10.1145/3539618.3592022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.
引用
收藏
页码:2179 / 2183
页数:5
相关论文
共 50 条
  • [31] Counterfactual Credit Assignment in Model-Free Reinforcement Learning
    Mesnard, Thomas
    Weber, Theophane
    Viola, Fabio
    Thakoor, Shantanu
    Saade, Alaa
    Harutyunyan, Anna
    Dabney, Will
    Stepleton, Tom
    Heess, Nicolas
    Guez, Arthur
    Moulines, Eric
    Hutter, Marcus
    Buesing, Lars
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [32] Driving in Dense Traffic with Model-Free Reinforcement Learning
    Saxena, Dhruv Mauria
    Bae, Sangjae
    Nakhaei, Alireza
    Fujimura, Kikuo
    Likhachev, Maxim
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 5385 - 5392
  • [33] Model-Free Reinforcement Learning with Continuous Action in Practice
    Degris, Thomas
    Pilarski, Patrick M.
    Sutton, Richard S.
    2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 2177 - 2182
  • [34] Covariance matrix adaptation for model-free reinforcement learning
    Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct
    2013, Lavoisier, 14 rue de Provigny, Cachan Cedex, F-94236, France (27)
  • [35] Robotic Table Tennis with Model-Free Reinforcement Learning
    Gao, Wenbo
    Graesser, Laura
    Choromanski, Krzysztof
    Song, Xingyou
    Lazic, Nevena
    Sanketi, Pannag
    Sindhwani, Vikas
    Jaitly, Navdeep
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5556 - 5563
  • [36] MODEL-FREE ONLINE REINFORCEMENT LEARNING OF A ROBOTIC MANIPULATOR
    Sweafford, Jerry, Jr.
    Fahimi, Farbod
    MECHATRONIC SYSTEMS AND CONTROL, 2019, 47 (03): : 136 - 143
  • [37] Model-free H control of Itô stochastic system via off-policy reinforcement learning
    Zhang, Weihai
    Guo, Jing
    Jiang, Xiushan
    AUTOMATICA, 2025, 174
  • [38] Model-Free Learning for Massive MIMO Systems: Stochastic Approximation Adjoint Iterative Learning Control
    Aarnoudse, Leontine
    Oomen, Tom
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2181 - 2186
  • [39] Model-free stochastic learning in adaptive wireless networks
    Chandramouli, R.
    2007 IEEE SARNOFF SYMPOSIUM, 2007, : 462 - 466
  • [40] Reinforcement Learning with Stochastic Reward Machines
    Corazza, Jan
    Gavran, Ivan
    Neider, Daniel
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6429 - 6436