Accelerating Model-Free Reinforcement Learning With Imperfect Model Knowledge in Dynamic Spectrum Access

被引：14

作者：

Li, Lianjun ^{[1
]}

Liu, Lingjia ^{[1
]}

Bai, Jianan ^{[1
]}

Chang, Hao-Hsuan ^{[1
]}

Chen, Hao ^{[2
]}

Ashdown, Jonathan D. ^{[3
]}

Zhang, Jianzhong ^{[2
]}

Yi, Yang ^{[1
]}

机构：

[1] Virginia Tech, Elect & Comp Engn Dept, Blacksburg, VA 24061 USA

[2] Samsung Res Amer, Stand & Mobil Innovat Lab, Plano, TX 75023 USA

[3] Air Force Res Lab, Informat Directorate, Rome, NY 13441 USA

来源：

IEEE INTERNET OF THINGS JOURNAL | 2020年 / 7卷 / 08期

基金：

美国国家科学基金会;

关键词：

Computational modeling; Learning (artificial intelligence); Sensors; Wireless communication; Acceleration; Complexity theory; Internet of Things; Dynamic spectrum access (DSA); imperfect model; reinforcement learning (RL); training acceleration; wireless communications systems; NETWORKS;

D O I：

10.1109/JIOT.2020.2988268

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Current studies that Our records indicate that Hao-Hsuan Chang is a Graduate Student Member of the IEEE. Please verify. Our records indicate that Jonathan D. Ashdown is a Member of the IEEE. Please verify. apply reinforcement learning (RL) to dynamic spectrum access (DSA) problems in wireless communications systems mainly focus on model-free RL (MFRL). However, in practice, MFRL requires a large number of samples to achieve good performance making it impractical in real-time applications such as DSA. Combining model-free and model-based RL can potentially reduce the sample complexity while achieving a similar level of performance as MFRL as long as the learned model is accurate enough. However, in a complex environment, the learned model is never perfect. In this article, we combine model-free and model-based RL, and introduce an algorithm that can work with an imperfectly learned model to accelerate the MFRL. Results show our algorithm achieves higher sample efficiency than the standard MFRL algorithm and the Dyna algorithm (a standard algorithm integrating model-based RL and MFRL) with much lower computation complexity than the Dyna algorithm. For the extreme case where the learned model is highly inaccurate, the Dyna algorithm performs even worse than the MFRL algorithm while our algorithm can still outperform the MFRL algorithm.

引用

页码：7517 / 7528

页数：12

共 50 条

[41] Model-free Reinforcement Learning for Stochastic Stackelberg Security Games
Mishra, Rajesh K.
Vasal, Deepanshu
Vishwanath, Sriram
2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 348 - 353
[42] Mastering the game of Stratego with model-free multiagent reinforcement learning
Perolat, Julien
De Vylder, Bart
Hennes, Daniel
Tarassov, Eugene
Strub, Florian
de Boer, Vincent
Muller, Paul
Connor, Jerome T.
Burch, Neil
Anthony, Thomas
McAleer, Stephen
Elie, Romuald
Cen, Sarah H.
Wang, Zhe
Gruslys, Audrunas
Malysheva, Aleksandra
Khan, Mina
Ozair, Sherjil
Timbers, Finbarr
Pohlen, Toby
Eccles, Tom
Rowland, Mark
Lanctot, Marc
Lespiau, Jean-Baptiste
Piot, Bilal
Omidshafiei, Shayegan
Lockhart, Edward
Sifre, Laurent
Beauguerlange, Nathalie
Munos, Remi
Silver, David
Singh, Satinder
Hassabis, Demis
Tuyls, Karl
SCIENCE, 2022, 378 (6623) : 990 - +
[43] Model-free reinforcement learning from expert demonstrations: a survey
Ramirez, Jorge
Yu, Wen
Perrusquia, Adolfo
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3213 - 3241
[44] Model-Free Emergency Frequency Control Based on Reinforcement Learning
Chen, Chunyu
Cui, Mingjian
Li, Fangxing
Yin, Shengfei
Wang, Xinan
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (04) : 2336 - 2346
[45] Model-Free Reinforcement Learning for Branching Markov Decision Processes
Hahn, Ernst Moritz
Perez, Mateo
Schewe, Sven
Somenzi, Fabio
Trivedi, Ashutosh
Wojtczak, Dominik
COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 2021, 12760 : 651 - 673
[46] Plume Tracing via Model-Free Reinforcement Learning Method
Hu, Hangkai
Song, Shiji
Chen, C. L. Phillip
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (08) : 2515 - 2527
[47] Model-free reinforcement learning from expert demonstrations: a survey
Jorge Ramírez
Wen Yu
Adolfo Perrusquía
Artificial Intelligence Review, 2022, 55 : 3213 - 3241
[48] Model-Free Deep Inverse Reinforcement Learning by Logistic Regression
Eiji Uchibe
Neural Processing Letters, 2018, 47 : 891 - 905
[49] Safe Reinforcement Learning via a Model-Free Safety Certifier
Modares, Amir
Sadati, Nasser
Esmaeili, Babak
Yaghmaie, Farnaz Adib
Modares, Hamidreza
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3302 - 3311
[50] On Distributed Model-Free Reinforcement Learning Control with Stability Guarantee
Mukherjee, Sayak
Thanh Long Vu
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2175 - 2180

← 1 2 3 4 5 →