Supervised Advantage Actor-Critic for Recommender Systems

被引:14
|
作者
Xin, Xin [1 ]
Karatzoglou, Alexandros [2 ]
Arapakis, Ioannis [3 ]
Jose, Joemon M. [4 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
[2] Google Res, London, England
[3] Tel Res, Barcelona, Spain
[4] Univ Glasgow, Glasgow, Lanark, Scotland
基金
国家重点研发计划;
关键词
Recommendation; Reinforcement Learning; Actor-Critic; Q-learning; Advantage Actor-Critic; Negative Sampling;
D O I
10.1145/3488560.3498494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.
引用
收藏
页码:1186 / 1196
页数:11
相关论文
共 50 条
  • [1] Off-Policy Actor-critic for Recommender Systems
    Chen, Minmin
    Xu, Can
    Gatto, Vince
    Jain, Devanshu
    Kumar, Aviral
    Chi, Ed
    PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
  • [2] Advantage Actor-Critic for Autonomous Intersection Management
    Ayeelyan, John
    Lee, Guan-Hung
    Hsu, Hsiu-Chun
    Hsiung, Pao-Ann
    VEHICLES, 2022, 4 (04): : 1391 - 1412
  • [3] Adaptive Advantage Estimation for Actor-Critic Algorithms
    Chen, Yurou
    Zhang, Fengyi
    Liu, Zhiyong
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] SOAC: Supervised Off-Policy Actor -Critic for Recommender Systems
    Wu, Shiqing
    Xu, Guandong
    Wang, Xianzhi
    23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 14121 - 14626
  • [5] Asynchronous Advantage Actor-Critic with Double Attention Mechanisms
    Ling X.-H.
    Li J.
    Zhu F.
    Liu Q.
    Fu Y.-C.
    Zhu, Fei (zhufei@suda.edu.cn), 2020, Science Press (43): : 93 - 106
  • [6] An improved scheduling with advantage actor-critic for Storm workloads
    Dong, Gaoqiang
    Wang, Jia
    Wang, Mingjing
    Su, Tingting
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (10): : 13421 - 13433
  • [7] A supervised Actor-Critic approach for adaptive cruise control
    Zhao, Dongbin
    Wang, Bin
    Liu, Derong
    SOFT COMPUTING, 2013, 17 (11) : 2089 - 2099
  • [8] Variational value learning in advantage actor-critic reinforcement learning
    Zhang, Yaozhong
    Han, Jiaqi
    Hu, Xiaofang
    Dan, Shihao
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1955 - 1960
  • [9] An accelerated asynchronous advantage actor-critic algorithm applied in papermaking
    Wang, Xuechun
    Zhuang, Zhiwei
    Zou, Luobao
    Zhang, Weidong
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 8637 - 8642
  • [10] SMONAC: Supervised Multiobjective Negative Actor-Critic for Sequential Recommendation
    Zhou, Fei
    Luo, Biao
    Wu, Zhengke
    Huang, Tingwen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 13