Accelerating Model-Free Reinforcement Learning With Imperfect Model Knowledge in Dynamic Spectrum Access

被引:14
|
作者
Li, Lianjun [1 ]
Liu, Lingjia [1 ]
Bai, Jianan [1 ]
Chang, Hao-Hsuan [1 ]
Chen, Hao [2 ]
Ashdown, Jonathan D. [3 ]
Zhang, Jianzhong [2 ]
Yi, Yang [1 ]
机构
[1] Virginia Tech, Elect & Comp Engn Dept, Blacksburg, VA 24061 USA
[2] Samsung Res Amer, Stand & Mobil Innovat Lab, Plano, TX 75023 USA
[3] Air Force Res Lab, Informat Directorate, Rome, NY 13441 USA
来源
IEEE INTERNET OF THINGS JOURNAL | 2020年 / 7卷 / 08期
基金
美国国家科学基金会;
关键词
Computational modeling; Learning (artificial intelligence); Sensors; Wireless communication; Acceleration; Complexity theory; Internet of Things; Dynamic spectrum access (DSA); imperfect model; reinforcement learning (RL); training acceleration; wireless communications systems; NETWORKS;
D O I
10.1109/JIOT.2020.2988268
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current studies that Our records indicate that Hao-Hsuan Chang is a Graduate Student Member of the IEEE. Please verify. Our records indicate that Jonathan D. Ashdown is a Member of the IEEE. Please verify. apply reinforcement learning (RL) to dynamic spectrum access (DSA) problems in wireless communications systems mainly focus on model-free RL (MFRL). However, in practice, MFRL requires a large number of samples to achieve good performance making it impractical in real-time applications such as DSA. Combining model-free and model-based RL can potentially reduce the sample complexity while achieving a similar level of performance as MFRL as long as the learned model is accurate enough. However, in a complex environment, the learned model is never perfect. In this article, we combine model-free and model-based RL, and introduce an algorithm that can work with an imperfectly learned model to accelerate the MFRL. Results show our algorithm achieves higher sample efficiency than the standard MFRL algorithm and the Dyna algorithm (a standard algorithm integrating model-based RL and MFRL) with much lower computation complexity than the Dyna algorithm. For the extreme case where the learned model is highly inaccurate, the Dyna algorithm performs even worse than the MFRL algorithm while our algorithm can still outperform the MFRL algorithm.
引用
收藏
页码:7517 / 7528
页数:12
相关论文
共 50 条
  • [41] Model-free Reinforcement Learning for Stochastic Stackelberg Security Games
    Mishra, Rajesh K.
    Vasal, Deepanshu
    Vishwanath, Sriram
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 348 - 353
  • [42] Mastering the game of Stratego with model-free multiagent reinforcement learning
    Perolat, Julien
    De Vylder, Bart
    Hennes, Daniel
    Tarassov, Eugene
    Strub, Florian
    de Boer, Vincent
    Muller, Paul
    Connor, Jerome T.
    Burch, Neil
    Anthony, Thomas
    McAleer, Stephen
    Elie, Romuald
    Cen, Sarah H.
    Wang, Zhe
    Gruslys, Audrunas
    Malysheva, Aleksandra
    Khan, Mina
    Ozair, Sherjil
    Timbers, Finbarr
    Pohlen, Toby
    Eccles, Tom
    Rowland, Mark
    Lanctot, Marc
    Lespiau, Jean-Baptiste
    Piot, Bilal
    Omidshafiei, Shayegan
    Lockhart, Edward
    Sifre, Laurent
    Beauguerlange, Nathalie
    Munos, Remi
    Silver, David
    Singh, Satinder
    Hassabis, Demis
    Tuyls, Karl
    SCIENCE, 2022, 378 (6623) : 990 - +
  • [43] Model-free reinforcement learning from expert demonstrations: a survey
    Ramirez, Jorge
    Yu, Wen
    Perrusquia, Adolfo
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3213 - 3241
  • [44] Model-Free Emergency Frequency Control Based on Reinforcement Learning
    Chen, Chunyu
    Cui, Mingjian
    Li, Fangxing
    Yin, Shengfei
    Wang, Xinan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (04) : 2336 - 2346
  • [45] Model-Free Reinforcement Learning for Branching Markov Decision Processes
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 2021, 12760 : 651 - 673
  • [46] Plume Tracing via Model-Free Reinforcement Learning Method
    Hu, Hangkai
    Song, Shiji
    Chen, C. L. Phillip
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (08) : 2515 - 2527
  • [47] Model-free reinforcement learning from expert demonstrations: a survey
    Jorge Ramírez
    Wen Yu
    Adolfo Perrusquía
    Artificial Intelligence Review, 2022, 55 : 3213 - 3241
  • [48] Model-Free Deep Inverse Reinforcement Learning by Logistic Regression
    Eiji Uchibe
    Neural Processing Letters, 2018, 47 : 891 - 905
  • [49] Safe Reinforcement Learning via a Model-Free Safety Certifier
    Modares, Amir
    Sadati, Nasser
    Esmaeili, Babak
    Yaghmaie, Farnaz Adib
    Modares, Hamidreza
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3302 - 3311
  • [50] On Distributed Model-Free Reinforcement Learning Control with Stability Guarantee
    Mukherjee, Sayak
    Thanh Long Vu
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2175 - 2180