Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

被引：0

作者：

Cai, Tianchi ^{[1
]}

Bao, Shenliao ^{[1
]}

Jiang, Jiyan ^{[2
]}

Zhou, Shiji ^{[2
]}

Zhang, Wenpeng ^{[1
]}

Gu, Lihong ^{[1
]}

Gu, Jinjie ^{[1
]}

Zhang, Guannan ^{[1
]}

机构：

[1] Ant Grp, Hangzhou, Peoples R China

[2] Tsinghua Univ, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

关键词：

Recommender System; Reinforcement Learning;

D O I：

10.1145/3539618.3592022

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.

引用

页码：2179 / 2183

页数：5

共 50 条

[31] Counterfactual Credit Assignment in Model-Free Reinforcement Learning
Mesnard, Thomas
Weber, Theophane
Viola, Fabio
Thakoor, Shantanu
Saade, Alaa
Harutyunyan, Anna
Dabney, Will
Stepleton, Tom
Heess, Nicolas
Guez, Arthur
Moulines, Eric
Hutter, Marcus
Buesing, Lars
Munos, Remi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[32] Driving in Dense Traffic with Model-Free Reinforcement Learning
Saxena, Dhruv Mauria
Bae, Sangjae
Nakhaei, Alireza
Fujimura, Kikuo
Likhachev, Maxim
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 5385 - 5392
[33] Model-Free Reinforcement Learning with Continuous Action in Practice
Degris, Thomas
Pilarski, Patrick M.
Sutton, Richard S.
2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 2177 - 2182
[34] Covariance matrix adaptation for model-free reinforcement learning
Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct
2013, Lavoisier, 14 rue de Provigny, Cachan Cedex, F-94236, France (27)
[35] Robotic Table Tennis with Model-Free Reinforcement Learning
Gao, Wenbo
Graesser, Laura
Choromanski, Krzysztof
Song, Xingyou
Lazic, Nevena
Sanketi, Pannag
Sindhwani, Vikas
Jaitly, Navdeep
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5556 - 5563
[36] MODEL-FREE ONLINE REINFORCEMENT LEARNING OF A ROBOTIC MANIPULATOR
Sweafford, Jerry, Jr.
Fahimi, Farbod
MECHATRONIC SYSTEMS AND CONTROL, 2019, 47 (03): : 136 - 143
[37] Model-free H control of Itô stochastic system via off-policy reinforcement learning
Zhang, Weihai
Guo, Jing
Jiang, Xiushan
AUTOMATICA, 2025, 174
[38] Model-Free Learning for Massive MIMO Systems: Stochastic Approximation Adjoint Iterative Learning Control
Aarnoudse, Leontine
Oomen, Tom
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2181 - 2186
[39] Model-free stochastic learning in adaptive wireless networks
Chandramouli, R.
2007 IEEE SARNOFF SYMPOSIUM, 2007, : 462 - 466
[40] Reinforcement Learning with Stochastic Reward Machines
Corazza, Jan
Gavran, Ivan
Neider, Daniel
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6429 - 6436

← 1 2 3 4 5 →