Supervised Advantage Actor-Critic for Recommender Systems

被引:14
|
作者
Xin, Xin [1 ]
Karatzoglou, Alexandros [2 ]
Arapakis, Ioannis [3 ]
Jose, Joemon M. [4 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
[2] Google Res, London, England
[3] Tel Res, Barcelona, Spain
[4] Univ Glasgow, Glasgow, Lanark, Scotland
来源
WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2022年
基金
国家重点研发计划;
关键词
Recommendation; Reinforcement Learning; Actor-Critic; Q-learning; Advantage Actor-Critic; Negative Sampling;
D O I
10.1145/3488560.3498494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.
引用
收藏
页码:1186 / 1196
页数:11
相关论文
共 50 条
  • [21] Supervised actor-critic reinforcement learning with action feedback for algorithmic trading
    Sun, Qizhou
    Si, Yain-Whar
    APPLIED INTELLIGENCE, 2023, 53 (13) : 16875 - 16892
  • [22] Supervised actor-critic reinforcement learning with action feedback for algorithmic trading
    Qizhou Sun
    Yain-Whar Si
    Applied Intelligence, 2023, 53 : 16875 - 16892
  • [23] An actor-critic based recommender system with context-aware user modeling
    Bukhari, Maryam
    Maqsood, Muazzam
    Adil, Farhan
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (05)
  • [24] Variational actor-critic algorithms*,**
    Zhu, Yuhua
    Ying, Lexing
    ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2023, 29
  • [25] Error controlled actor-critic
    Gao, Xingen
    Chao, Fei
    Zhou, Changle
    Ge, Zhen
    Yang, Longzhi
    Chang, Xiang
    Shang, Changjing
    Shen, Qiang
    INFORMATION SCIENCES, 2022, 612 : 62 - 74
  • [26] A Hessian Actor-Critic Algorithm
    Wang, Jing
    Paschalidis, Ioannis Ch
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1131 - 1136
  • [27] Optimization of Robot Environment Interaction Based on Asynchronous Advantage Actor-Critic Algorithm
    Xu, Jitang
    Chen, Qiang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 1350 - 1359
  • [28] Natural actor-critic algorithms
    Bhatnagar, Shalabh
    Sutton, Richard S.
    Ghavamzadeh, Mohammad
    Lee, Mark
    AUTOMATICA, 2009, 45 (11) : 2471 - 2482
  • [29] Actor-Critic Instance Segmentation
    Araslanov, Nikita
    Rothkopf, Constantin A.
    Roth, Stefan
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8229 - 8238
  • [30] Neural Architecture Search with Synchronous Advantage Actor-Critic Methods and Partial Training
    Kyriakides, George
    Margaritis, Konstantinos G.
    10TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2018), 2018,