Learning Automata Based Q-Learning for Content Placement in Cooperative Caching

被引:43
|
作者
Yang, Zhong [1 ]
Liu, Yuanwei [1 ]
Chen, Yue [1 ]
Jiao, Lei [2 ]
机构
[1] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England
[2] Univ Agder, Dept Informat & Commun Technol, N-4879 Grimstad, Norway
关键词
Cooperative caching; Wireless communication; Prediction algorithms; Recurrent neural networks; Learning automata; Optimization; Quality of experience; Learning automata based Q-learning; quality of experience (QoE); wireless cooperative caching; user mobility prediction; content popularity prediction; NONORTHOGONAL MULTIPLE-ACCESS; WIRELESS; OPTIMIZATION; NETWORKS; DELIVERY;
D O I
10.1109/TCOMM.2020.2982136
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
An optimization problem of content placement in cooperative caching is formulated, with the aim of maximizing the sum mean opinion score (MOS) of mobile users. Firstly, as user mobility and content popularity have significant impacts on the user experience, a recurrent neural network (RNN) is invoked for user mobility prediction and content popularity prediction. More particularly, practical data collected from GPS-tracker app on smartphones is tackled to test the accuracy of user mobility prediction. Then, based on the predicted mobile users' positions and content popularity, a learning automata based Q-learning (LAQL) algorithm for cooperative caching is proposed, in which learning automata (LA) is invoked for Q-learning to obtain an optimal action selection in a random and stationary environment. It is proven that the LA based action selection scheme is capable of enabling every state to select the optimal action with arbitrary high probability if Q-learning is able to converge to the optimal Q value eventually. In the LAQL algorithm, a central processor acts as the intelligent agent, which allocate contents to BSs according to the reward or penalty from the feedback of the BSs and users, iteratively. To characterize the performance of the proposed LAQL algorithms, sum MOS of users is applied to define the reward function. Extensive simulation results reveal that: 1) the prediction error of RNNs based algorithm lessen with the increase of iterations and nodes; 2) the proposed LAQL achieves significant performance improvement against traditional Q-learning algorithm; and 3) the cooperative caching scheme is capable of outperforming non-cooperative caching and random caching of 3% and 4%, respectively.
引用
收藏
页码:3667 / 3680
页数:14
相关论文
共 50 条
  • [21] Cooperative Q-learning based channel selection for cognitive radio networks
    Slimeni, Feten
    Chtourou, Zied
    Scheers, Bart
    Le Nir, Vincent
    Attia, Rabah
    WIRELESS NETWORKS, 2019, 25 (07) : 4161 - 4171
  • [22] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [23] Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games
    Amhraoui, Elmehdi
    Masrour, Tawfik
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2781 - 2797
  • [24] Distributed Caching based on Decentralized Learning Automata
    Marini, Loris
    Li, Jun
    Li, Yonghui
    2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2015, : 3807 - 3812
  • [25] Reinforcement distribution in a team of cooperative Q-learning agents
    Abbasi, Zahra
    Abbasi, Mohammad Ali
    PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 154 - +
  • [26] Multi-goal Q-learning of cooperative teams
    Li, Jing
    Sheng, Zhaohan
    Ng, KwanChew
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 1565 - 1574
  • [27] Distributed lazy Q-learning for cooperative mobile robots
    Touzet, Claude F.
    International Journal of Advanced Robotic Systems, 2004, 1 (01) : 5 - 13
  • [28] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [29] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
  • [30] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25