Supervised Advantage Actor-Critic for Recommender Systems

被引：14

作者：

Xin, Xin ^{[1
]}

Karatzoglou, Alexandros ^{[2
]}

Arapakis, Ioannis ^{[3
]}

Jose, Joemon M. ^{[4
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

[2] Google Res, London, England

[3] Tel Res, Barcelona, Spain

[4] Univ Glasgow, Glasgow, Lanark, Scotland

来源：

WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2022年

基金：

国家重点研发计划;

关键词：

Recommendation; Reinforcement Learning; Actor-Critic; Q-learning; Advantage Actor-Critic; Negative Sampling;

D O I：

10.1145/3488560.3498494

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.

引用

页码：1186 / 1196

页数：11

共 50 条

[21] Supervised actor-critic reinforcement learning with action feedback for algorithmic trading
Sun, Qizhou
Si, Yain-Whar
APPLIED INTELLIGENCE, 2023, 53 (13) : 16875 - 16892
[22] Supervised actor-critic reinforcement learning with action feedback for algorithmic trading
Qizhou Sun
Yain-Whar Si
Applied Intelligence, 2023, 53 : 16875 - 16892
[23] An actor-critic based recommender system with context-aware user modeling
Bukhari, Maryam
Maqsood, Muazzam
Adil, Farhan
ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (05)
[24] Variational actor-critic algorithms*,**
Zhu, Yuhua
Ying, Lexing
ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2023, 29
[25] Error controlled actor-critic
Gao, Xingen
Chao, Fei
Zhou, Changle
Ge, Zhen
Yang, Longzhi
Chang, Xiang
Shang, Changjing
Shen, Qiang
INFORMATION SCIENCES, 2022, 612 : 62 - 74
[26] A Hessian Actor-Critic Algorithm
Wang, Jing
Paschalidis, Ioannis Ch
2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1131 - 1136
[27] Optimization of Robot Environment Interaction Based on Asynchronous Advantage Actor-Critic Algorithm
Xu, Jitang
Chen, Qiang
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 1350 - 1359
[28] Natural actor-critic algorithms
Bhatnagar, Shalabh
Sutton, Richard S.
Ghavamzadeh, Mohammad
Lee, Mark
AUTOMATICA, 2009, 45 (11) : 2471 - 2482
[29] Actor-Critic Instance Segmentation
Araslanov, Nikita
Rothkopf, Constantin A.
Roth, Stefan
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8229 - 8238
[30] Neural Architecture Search with Synchronous Advantage Actor-Critic Methods and Partial Training
Kyriakides, George
Margaritis, Konstantinos G.
10TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2018), 2018,

← 1 2 3 4 5 →