Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

被引:0
|
作者
Cai, Tianchi [1 ]
Bao, Shenliao [1 ]
Jiang, Jiyan [2 ]
Zhou, Shiji [2 ]
Zhang, Wenpeng [1 ]
Gu, Lihong [1 ]
Gu, Jinjie [1 ]
Zhang, Guannan [1 ]
机构
[1] Ant Grp, Hangzhou, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年
关键词
Recommender System; Reinforcement Learning;
D O I
10.1145/3539618.3592022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.
引用
收藏
页码:2179 / 2183
页数:5
相关论文
共 50 条
  • [21] Model-free Based Reinforcement Learning Control Strategy of Aircraft Attitude Systems
    Huang, Dingcui
    Hu, Jiangping
    Peng, Zhinan
    Chen, Bo
    Hao, Mingrui
    Ghosh, Bijoy Kumar
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 743 - 748
  • [22] On Model-free Reinforcement Learning for Switched Linear Systems: A Subspace Clustering Approach
    Li, Hao
    Chen, Hua
    Zhang, Wei
    2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 123 - 130
  • [23] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
    Liu, Yongshuai
    Halev, Avishai
    Liu, Xin
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
  • [24] Reward-Mediated, Model-Free Reinforcement-Learning Mechanisms in Pavlovian and Instrumental Tasks Are Related
    Afshar, Neema Moin
    Cinotti, Francois
    Martin, David
    Khamassi, Mehdi
    Calu, Donna J.
    Taylor, Jane R.
    Groman, Stephanie M.
    JOURNAL OF NEUROSCIENCE, 2023, 43 (03): : 458 - 471
  • [25] Improving Optimistic Exploration in Model-Free Reinforcement Learning
    Grzes, Marek
    Kudenko, Daniel
    ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, 2009, 5495 : 360 - 369
  • [26] Model-Free Preference-Based Reinforcement Learning
    Wirth, Christian
    Fuernkranz, Johannes
    Neumann, Gerhard
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2222 - 2228
  • [27] Constrained model-free reinforcement learning for process optimization
    Pan, Elton
    Petsagkourakis, Panagiotis
    Mowbray, Max
    Zhang, Dongda
    del Rio-Chanona, Ehecatl Antonio
    COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
  • [28] Model-Free μ Synthesis via Adversarial Reinforcement Learning
    Keivan, Darioush
    Havens, Aaron
    Seiler, Peter
    Dullerud, Geir
    Hu, Bin
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3335 - 3341
  • [29] An adaptive clustering method for model-free reinforcement learning
    Matt, A
    Regensburger, G
    INMIC 2004: 8TH INTERNATIONAL MULTITOPIC CONFERENCE, PROCEEDINGS, 2004, : 362 - 367
  • [30] Model-Free Reinforcement Learning for Mean Field Games
    Mishra, Rajesh
    Vasal, Deepanshu
    Vishwanath, Sriram
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (04): : 2141 - 2151