Supervised Advantage Actor-Critic for Recommender Systems

被引：14

作者：

Xin, Xin ^{[1
]}

Karatzoglou, Alexandros ^{[2
]}

Arapakis, Ioannis ^{[3
]}

Jose, Joemon M. ^{[4
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

[2] Google Res, London, England

[3] Tel Res, Barcelona, Spain

[4] Univ Glasgow, Glasgow, Lanark, Scotland

来源：

WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2022年

基金：

国家重点研发计划;

关键词：

Recommendation; Reinforcement Learning; Actor-Critic; Q-learning; Advantage Actor-Critic; Negative Sampling;

D O I：

10.1145/3488560.3498494

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.

引用

页码：1186 / 1196

页数：11

共 50 条

[1] Off-Policy Actor-critic for Recommender Systems
Chen, Minmin
Xu, Can
Gatto, Vince
Jain, Devanshu
Kumar, Aviral
Chi, Ed
PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
[2] Advantage Actor-Critic for Autonomous Intersection Management
Ayeelyan, John
Lee, Guan-Hung
Hsu, Hsiu-Chun
Hsiung, Pao-Ann
VEHICLES, 2022, 4 (04): : 1391 - 1412
[3] Adaptive Advantage Estimation for Actor-Critic Algorithms
Chen, Yurou
Zhang, Fengyi
Liu, Zhiyong
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[4] SOAC: Supervised Off-Policy Actor -Critic for Recommender Systems
Wu, Shiqing
Xu, Guandong
Wang, Xianzhi
23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 14121 - 14626
[5] Asynchronous Advantage Actor-Critic with Double Attention Mechanisms
Ling X.-H.
Li J.
Zhu F.
Liu Q.
Fu Y.-C.
Zhu, Fei (zhufei@suda.edu.cn), 2020, Science Press (43): : 93 - 106
[6] An improved scheduling with advantage actor-critic for Storm workloads
Dong, Gaoqiang
Wang, Jia
Wang, Mingjing
Su, Tingting
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (10): : 13421 - 13433
[7] A supervised Actor-Critic approach for adaptive cruise control
Zhao, Dongbin
Wang, Bin
Liu, Derong
SOFT COMPUTING, 2013, 17 (11) : 2089 - 2099
[8] Variational value learning in advantage actor-critic reinforcement learning
Zhang, Yaozhong
Han, Jiaqi
Hu, Xiaofang
Dan, Shihao
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1955 - 1960
[9] An accelerated asynchronous advantage actor-critic algorithm applied in papermaking
Wang, Xuechun
Zhuang, Zhiwei
Zou, Luobao
Zhang, Weidong
PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 8637 - 8642
[10] SMONAC: Supervised Multiobjective Negative Actor-Critic for Sequential Recommendation
Zhou, Fei
Luo, Biao
Wu, Zhengke
Huang, Tingwen
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 13

← 1 2 3 4 5 →