Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

被引:0
|
作者
Cai, Tianchi [1 ]
Bao, Shenliao [1 ]
Jiang, Jiyan [2 ]
Zhou, Shiji [2 ]
Zhang, Wenpeng [1 ]
Gu, Lihong [1 ]
Gu, Jinjie [1 ]
Zhang, Guannan [1 ]
机构
[1] Ant Grp, Hangzhou, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
关键词
Recommender System; Reinforcement Learning;
D O I
10.1145/3539618.3592022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.
引用
收藏
页码:2179 / 2183
页数:5
相关论文
共 50 条
  • [1] Reinforcement Learning-Based Model-Free Controller for Feedback Stabilization of Robotic Systems
    Singh, Rupam
    Bhushan, Bharat
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7059 - 7073
  • [2] Model-free average reward multi-step reinforcement learning
    Hu, Guanghua
    Wu, Cangpu
    Kongzhi Lilun Yu Yinyong/Control Theory and Applications, 2000, 17 (05): : 660 - 664
  • [3] Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments
    Stulp, Freek
    Buchli, Jonas
    Ellmer, Alice
    Mistry, Michael
    Theodorou, Evangelos A.
    Schaal, Stefan
    IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2012, 4 (04) : 330 - 341
  • [4] Model-free Reinforcement Learning for Stochastic Stackelberg Security Games
    Mishra, Rajesh K.
    Vasal, Deepanshu
    Vishwanath, Sriram
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 348 - 353
  • [5] Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient
    Beck, Edgar
    Bockelmann, Carsten
    Dekorsy, Armin
    2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 367 - 373
  • [6] Research on Improvement of Model-Free Average Reward Reinforcement Learning and Its Simulation Experiment
    Chen, Wei
    Zhai, Zhenkun
    Li, Xiong
    Guo, Jing
    Wang, Jie
    CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 4933 - 4936
  • [7] Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
    Zhang, Zihan
    Xie, Qiaomin
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [8] Model-free Policy Learning with Reward Gradients
    Lan, Qingfong
    Tosatto, Samuele
    Farrahi, Homayoon
    Mahmood, A. Rupam
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [9] Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives
    Bozkurt, Alper Kamil
    Wang, Yu
    Zavlanos, Michael M.
    Pajic, Miroslav
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10649 - 10655
  • [10] From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization
    Song, Zitao
    Wang, Yining
    Qian, Pin
    Song, Sifan
    Coenen, Frans
    Jiang, Zhengyong
    Su, Jionglong
    APPLIED INTELLIGENCE, 2023, 53 (12) : 15188 - 15203