Learning Automata Based Q-Learning for Content Placement in Cooperative Caching

被引：43

作者：

Yang, Zhong ^{[1
]}

Liu, Yuanwei ^{[1
]}

Chen, Yue ^{[1
]}

Jiao, Lei ^{[2
]}

机构：

[1] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England

[2] Univ Agder, Dept Informat & Commun Technol, N-4879 Grimstad, Norway

来源：

IEEE TRANSACTIONS ON COMMUNICATIONS | 2020年 / 68卷 / 06期

关键词：

Cooperative caching; Wireless communication; Prediction algorithms; Recurrent neural networks; Learning automata; Optimization; Quality of experience; Learning automata based Q-learning; quality of experience (QoE); wireless cooperative caching; user mobility prediction; content popularity prediction; NONORTHOGONAL MULTIPLE-ACCESS; WIRELESS; OPTIMIZATION; NETWORKS; DELIVERY;

D O I：

10.1109/TCOMM.2020.2982136

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

An optimization problem of content placement in cooperative caching is formulated, with the aim of maximizing the sum mean opinion score (MOS) of mobile users. Firstly, as user mobility and content popularity have significant impacts on the user experience, a recurrent neural network (RNN) is invoked for user mobility prediction and content popularity prediction. More particularly, practical data collected from GPS-tracker app on smartphones is tackled to test the accuracy of user mobility prediction. Then, based on the predicted mobile users' positions and content popularity, a learning automata based Q-learning (LAQL) algorithm for cooperative caching is proposed, in which learning automata (LA) is invoked for Q-learning to obtain an optimal action selection in a random and stationary environment. It is proven that the LA based action selection scheme is capable of enabling every state to select the optimal action with arbitrary high probability if Q-learning is able to converge to the optimal Q value eventually. In the LAQL algorithm, a central processor acts as the intelligent agent, which allocate contents to BSs according to the reward or penalty from the feedback of the BSs and users, iteratively. To characterize the performance of the proposed LAQL algorithms, sum MOS of users is applied to define the reward function. Extensive simulation results reveal that: 1) the prediction error of RNNs based algorithm lessen with the increase of iterations and nodes; 2) the proposed LAQL achieves significant performance improvement against traditional Q-learning algorithm; and 3) the cooperative caching scheme is capable of outperforming non-cooperative caching and random caching of 3% and 4%, respectively.

引用

页码：3667 / 3680

页数：14

共 50 条

[21] Cooperative Q-learning based channel selection for cognitive radio networks
Slimeni, Feten
Chtourou, Zied
Scheers, Bart
Le Nir, Vincent
Attia, Rabah
WIRELESS NETWORKS, 2019, 25 (07) : 4161 - 4171
[22] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[23] Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games
Amhraoui, Elmehdi
Masrour, Tawfik
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2781 - 2797
[24] Distributed Caching based on Decentralized Learning Automata
Marini, Loris
Li, Jun
Li, Yonghui
2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2015, : 3807 - 3812
[25] Reinforcement distribution in a team of cooperative Q-learning agents
Abbasi, Zahra
Abbasi, Mohammad Ali
PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 154 - +
[26] Multi-goal Q-learning of cooperative teams
Li, Jing
Sheng, Zhaohan
Ng, KwanChew
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 1565 - 1574
[27] Distributed lazy Q-learning for cooperative mobile robots
Touzet, Claude F.
International Journal of Advanced Robotic Systems, 2004, 1 (01) : 5 - 13
[28] Q-LEARNING
WATKINS, CJCH
DAYAN, P
MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
[29] Learning rates for Q-Learning
Even-Dar, E
Mansour, Y
COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
[30] Learning rates for Q-learning
Even-Dar, E
Mansour, Y
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25

← 1 2 3 4 5 →