Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

被引：0

作者：

Wang, Guojian ^{[1
,4
]}

Wu, Faguo ^{[1
,2
,3
,4
,5
]}

Zhang, Xiao ^{[1
,3
,4
,5
]}

Guo, Ning ^{[1
,4
]}

Zheng, Zhiming ^{[2
,3
,4
,5
]}

机构：

[1] Beihang Univ, Sch Math Sci, Beijing 100191, Peoples R China

[2] Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China

[3] Zhongguancun Lab, Beijing 100194, Peoples R China

[4] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China

[5] Beihang Univ, Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 285卷

基金：

中国国家自然科学基金;

关键词：

Deep reinforcement learning; Hard-exploration problem; Policy gradient; Offline suboptimal demonstrations;

D O I：

10.1016/j.knosys.2023.111334

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) faces significant challenges in addressing hard -exploration tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory -constrained exploration strategy for DRL. The proposed method guides the agent's policy away from suboptimal solutions by regarding previous offline demonstrations as references. Specifically, this approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy -gradient -based optimization algorithm that utilizes adaptive clipped trajectory -distance rewards for both singleand multi -agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst -case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrated the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both singleand multi -agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at https://github.com/buaawgj/TACE.

引用

页数：20

共 50 条

[41] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
XueMei Gan
Ying Zuo
AnSi Zhang
ShaoBo Li
Fei Tao
Science China Technological Sciences, 2023, 66 : 1937 - 1951
[42] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
Gan, XueMei
Zuo, Ying
Zhang, AnSi
Li, ShaoBo
Tao, Fei
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2023, 66 (07) : 1937 - 1951
[43] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
GAN XueMei
ZUO Ying
ZHANG AnSi
LI ShaoBo
TAO Fei
Science China(Technological Sciences), 2023, 66 (07) : 1937 - 1951
[44] Constrained attractor selection using deep reinforcement learning
Wang, Xue-She
Turner, James D.
Mann, Brian P.
JOURNAL OF VIBRATION AND CONTROL, 2021, 27 (5-6) : 502 - 514
[45] Constrained deep reinforcement learning for maritime platform defense
Markowitz, Jared
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
[46] Constrained Deep Reinforcement Learning for Fronthaul Compression Optimization
Gronland, Axel
Russo, Alessio
Jedra, Yassir
Klaiqi, Bleron
Gelabert, Xavier
2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 498 - 504
[47] RA-TSC: Learning Adaptive Traffic Signal Control Strategy via Deep Reinforcement Learning
Du, Yu
Wei ShangGuan
Rong, Dingchao
Chai, Linguo
2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 3275 - 3280
[48] #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Tang, Haoran
Houthooft, Rein
Foote, Davis
Stooke, Adam
Chen, Xi
Duan, Yan
Schulman, John
De Turck, Filip
Abbeel, Pieter
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[49] Deep reinforcement learning for adaptive mesh refinement
Foucart, Corbin
Charous, Aaron
Lermusiaux, Pierre F. J.
JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 491
[50] Adaptive Slope Locomotion with Deep Reinforcement Learning
Jones, William
Blum, Tamir
Yoshida, Kazuya
2020 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII), 2020, : 546 - 550

← 1 2 3 4 5 →