DEN-DQL: Quick Convergent Deep Q-Learning with Double Exploration Networks for News Recommendation

被引：0

作者：

Song, Zhanghan ^{[1
]}

Zhang, Dian ^{[1
]}

Shi, Xiaochuan ^{[1
]}

Li, Wei ^{[2
]}

Ma, Chao ^{[1
]}

Wu, Libing ^{[1
]}

机构：

[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China

[2] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi, Jiangsu, Peoples R China

来源：

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年

关键词：

Reinforcement Learning; Deep Q-Learning; News Recommendation; Double Exploration Networks;

D O I：

10.1109/IJCNN52387.2021.9533818

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the dynamic characteristics of news and user preferences, personalized recommendation is a challenging problem. Traditional recommendation methods simply focus on current reward, which just recommend items to maximize the number of current clicks. And this may reduce users' interest in similar items. Although the news recommendation framework based on deep reinforcement learning preciously proposed (i.e, DRL, based on deep Q-learning) has the advantages of focusing on future total rewards and dynamic interactive recommendation, it has two issues. First, its exploration method is slow to converge, which may bring new users a bad experience. Second, it is hard to train on off-line data set because the reward is difficult to be determined. In order to address the aforementioned issues, we propose a framework named DEN-DQL for news recommendation based on deep Q-learning with double exploration networks. Also, we develop a new method to calculate rewards and use an off-line data set to simulate the online news clicking environment to train DEN-DQL. Then, the well trained DEN-DQL is tested in the online environment of the same data set, which demonstrates at least 10% improvement of the proposed DEN-DQL.

引用

页数：8

共 50 条

[21] Double-deep Q-learning to increase the efficiency of metasurface holograms
Sajedian, Iman
Lee, Heon
Rho, Junsuk
SCIENTIFIC REPORTS, 2019, 9 (1)
[22] Erosion-safe operation using double deep Q-learning
Visbech, Jens
Gocmen, Tuhfe
Rethore, Pierre-Elouan
Hasager, Charlotte Bay
SCIENCE OF MAKING TORQUE FROM WIND, TORQUE 2024, 2024, 2767
[23] Double-deep Q-learning to increase the efficiency of metasurface holograms
Iman Sajedian
Heon Lee
Junsuk Rho
Scientific Reports, 9
[24] Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks
L. A. Prashanth
Abhranil Chatterjee
Shalabh Bhatnagar
Wireless Networks, 2014, 20 : 2589 - 2604
[25] Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks
Prashanth, L. A.
Chatterjee, Abhranil
Bhatnagar, Shalabh
WIRELESS NETWORKS, 2014, 20 (08) : 2589 - 2604
[26] Improved Exploration Strategy for Q-Learning Based Multipath Routing in SDN Networks
Hassen, Houda
Meherzi, Soumaya
Jemaa, Zouhair Ben
JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, 2024, 32 (02)
[27] Improved Exploration Strategy for Q-Learning Based Multipath Routing in SDN Networks
Houda Hassen
Soumaya Meherzi
Zouhair Ben Jemaa
Journal of Network and Systems Management, 2024, 32
[28] Multi-Agent Exploration for Faster and Reliable Deep Q-Learning Convergence in Reinforcement Learning
Majumdar, Abhijit
Benavidez, Patrick
Jamshidi, Mo
2018 WORLD AUTOMATION CONGRESS (WAC), 2018, : 222 - 227
[29] A Double Deep Q-Learning Model for Energy-Efficient Edge Scheduling
Zhang, Qingchen
Lin, Man
Yang, Laurence T.
Chen, Zhikui
Khan, Samee U.
Li, Peng
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2019, 12 (05) : 739 - 749
[30] Inverse design of the MMI power splitter by asynchronous double deep Q-learning
Xu, Xiaopeng
Li, Yu
Huang, Weiping
OPTICS EXPRESS, 2021, 29 (22) : 35951 - 35964

← 1 2 3 4 5 →