Supervised Advantage Actor-Critic for Recommender Systems

被引:14
|
作者
Xin, Xin [1 ]
Karatzoglou, Alexandros [2 ]
Arapakis, Ioannis [3 ]
Jose, Joemon M. [4 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
[2] Google Res, London, England
[3] Tel Res, Barcelona, Spain
[4] Univ Glasgow, Glasgow, Lanark, Scotland
来源
WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2022年
基金
国家重点研发计划;
关键词
Recommendation; Reinforcement Learning; Actor-Critic; Q-learning; Advantage Actor-Critic; Negative Sampling;
D O I
10.1145/3488560.3498494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.
引用
收藏
页码:1186 / 1196
页数:11
相关论文
共 50 条
  • [11] Actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
  • [12] On actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
  • [13] Natural Actor-Critic
    Peters, Jan
    Schaal, Stefan
    NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
  • [14] Natural Actor-Critic
    Peters, J
    Vijayakumar, S
    Schaal, S
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 280 - 291
  • [15] A New Advantage Actor-Critic Algorithm For Multi-Agent Environments
    Paczolay, Gabor
    Harmati, Istvan
    2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
  • [16] Towards Understanding Asynchronous Advantage Actor-Critic: Convergence and Linear Speedup
    Shen, Han
    Zhang, Kaiqing
    Hong, Mingyi
    Chen, Tianyi
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 2579 - 2594
  • [17] An Advantage Actor-Critic Algorithm with Confidence Exploration for Open Information Extraction
    Liu, Guiliang
    Li, Xu
    Sun, Miningming
    Li, Ping
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 217 - 225
  • [18] Adversarial retraining attack of asynchronous advantage actor-critic based pathfinding
    Chen Tong
    Liu Jiqiang
    Xiang Yingxiao
    Niu Wenjia
    Tong Endong
    Wang Shuoru
    Li He
    Chang Liang
    Li Gang
    Alfred, Chen Qi
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (05) : 2323 - 2346
  • [19] Traffic signal control method based on asynchronous advantage actor-critic
    Ye, Baolin
    Sun, Ruitao
    Wu, Weimin
    Chen, Bin
    Yao, Qing
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (08): : 1671 - 1680
  • [20] An Actor-Critic Algorithm With Second-Order Actor and Critic
    Wang, Jing
    Paschalidis, Ioannis Ch.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2689 - 2703