Accelerating Model-Free Reinforcement Learning With Imperfect Model Knowledge in Dynamic Spectrum Access

被引：14

作者：

Li, Lianjun ^{[1
]}

Liu, Lingjia ^{[1
]}

Bai, Jianan ^{[1
]}

Chang, Hao-Hsuan ^{[1
]}

Chen, Hao ^{[2
]}

Ashdown, Jonathan D. ^{[3
]}

Zhang, Jianzhong ^{[2
]}

Yi, Yang ^{[1
]}

机构：

[1] Virginia Tech, Elect & Comp Engn Dept, Blacksburg, VA 24061 USA

[2] Samsung Res Amer, Stand & Mobil Innovat Lab, Plano, TX 75023 USA

[3] Air Force Res Lab, Informat Directorate, Rome, NY 13441 USA

来源：

IEEE INTERNET OF THINGS JOURNAL | 2020年 / 7卷 / 08期

基金：

美国国家科学基金会;

关键词：

Computational modeling; Learning (artificial intelligence); Sensors; Wireless communication; Acceleration; Complexity theory; Internet of Things; Dynamic spectrum access (DSA); imperfect model; reinforcement learning (RL); training acceleration; wireless communications systems; NETWORKS;

D O I：

10.1109/JIOT.2020.2988268

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Current studies that Our records indicate that Hao-Hsuan Chang is a Graduate Student Member of the IEEE. Please verify. Our records indicate that Jonathan D. Ashdown is a Member of the IEEE. Please verify. apply reinforcement learning (RL) to dynamic spectrum access (DSA) problems in wireless communications systems mainly focus on model-free RL (MFRL). However, in practice, MFRL requires a large number of samples to achieve good performance making it impractical in real-time applications such as DSA. Combining model-free and model-based RL can potentially reduce the sample complexity while achieving a similar level of performance as MFRL as long as the learned model is accurate enough. However, in a complex environment, the learned model is never perfect. In this article, we combine model-free and model-based RL, and introduce an algorithm that can work with an imperfectly learned model to accelerate the MFRL. Results show our algorithm achieves higher sample efficiency than the standard MFRL algorithm and the Dyna algorithm (a standard algorithm integrating model-based RL and MFRL) with much lower computation complexity than the Dyna algorithm. For the extreme case where the learned model is highly inaccurate, the Dyna algorithm performs even worse than the MFRL algorithm while our algorithm can still outperform the MFRL algorithm.

引用

页码：7517 / 7528

页数：12

共 50 条

[31] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
Swazinna, Phillip
Udluft, Steffen
Hein, Daniel
Runkler, Thomas
IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26
[32] Hybrid control for combining model-based and model-free reinforcement learning
Pinosky, Allison
Abraham, Ian
Broad, Alexander
Argall, Brenna
Murphey, Todd D.
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (06): : 337 - 355
[33] Linear Quadratic Control Using Model-Free Reinforcement Learning
Yaghmaie, Farnaz Adib
Gustafsson, Fredrik
Ljung, Lennart
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (02) : 737 - 752
[34] On Distributed Model-Free Reinforcement Learning Control With Stability Guarantee
Mukherjee, Sayak
Vu, Thanh Long
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (05): : 1615 - 1620
[35] Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments
Stulp, Freek
Buchli, Jonas
Ellmer, Alice
Mistry, Michael
Theodorou, Evangelos A.
Schaal, Stefan
IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2012, 4 (04) : 330 - 341
[36] Model-Free Recurrent Reinforcement Learning for AUV Horizontal Control
Huo, Yujia
Li, Yiping
Feng, Xisheng
3RD INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL AND ROBOTICS ENGINEERING (CACRE 2018), 2018, 428
[37] Limit Reachability for Model-Free Reinforcement Learning of ω-Regular Objectives
Hahn, Ernst Moritz
Perez, Mateo
Schewe, Sven
Somenzi, Fabio
Trivedi, Ashutosh
Wojtczak, Dominik
PROCEEDINGS OF THE 5TH INTERNATIONAL WORKSHOP ON SYMBOLIC-NUMERIC METHODS FOR REASONING ABOUT CPS AND IOT (SNR 2019), 2019, : 16 - 18
[38] Model-Free Control for Soft Manipulators based on Reinforcement Learning
You, Xuanke
Zhang, Yixiao
Chen, Xiaotong
Liu, Xinghua
Wang, Zhanchi
Jiang, Hao
Chen, Xiaoping
2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2909 - 2915
[39] Model-Free Reinforcement Learning with the Decision-Estimation Coefficient
Foster, Dylan J.
Golowich, Noah
Qian, Jian
Rakhlin, Alexander
Sekhari, Ayush
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[40] On the importance of hyperparameters tuning for model-free reinforcement learning algorithms
Tejer, Mateusz
Szezepanski, Rafal
2024 12TH INTERNATIONAL CONFERENCE ON CONTROL, MECHATRONICS AND AUTOMATION, ICCMA, 2024, : 78 - 82

← 1 2 3 4 5 →