Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

被引：0

作者：

Wang, Guojian ^{[1
,4
]}

Wu, Faguo ^{[1
,2
,3
,4
,5
]}

Zhang, Xiao ^{[1
,3
,4
,5
]}

Guo, Ning ^{[1
,4
]}

Zheng, Zhiming ^{[2
,3
,4
,5
]}

机构：

[1] Beihang Univ, Sch Math Sci, Beijing 100191, Peoples R China

[2] Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China

[3] Zhongguancun Lab, Beijing 100194, Peoples R China

[4] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China

[5] Beihang Univ, Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 285卷

基金：

中国国家自然科学基金;

关键词：

Deep reinforcement learning; Hard-exploration problem; Policy gradient; Offline suboptimal demonstrations;

D O I：

10.1016/j.knosys.2023.111334

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) faces significant challenges in addressing hard -exploration tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory -constrained exploration strategy for DRL. The proposed method guides the agent's policy away from suboptimal solutions by regarding previous offline demonstrations as references. Specifically, this approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy -gradient -based optimization algorithm that utilizes adaptive clipped trajectory -distance rewards for both singleand multi -agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst -case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrated the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both singleand multi -agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at https://github.com/buaawgj/TACE.

引用

页数：20

共 50 条

[31] Real-time adaptive entry trajectory generation with modular policy and deep reinforcement learning
Peng, Gaoxiang
Wang, Bo
Liu, Lei
Fan, Huijin
Cheng, Zhongtao
AEROSPACE SCIENCE AND TECHNOLOGY, 2023, 142
[32] Adaptive Nonlinear Model Predictive Horizon Using Deep Reinforcement Learning for Optimal Trajectory Planning
Al Younes, Younes
Barczyk, Martin
DRONES, 2022, 6 (11)
[33] Reentry trajectory optimization based on Deep Reinforcement Learning
Gao, Jiashi
Shi, Xinming
Cheng, Zhongtao
Xiong, Jizhang
Liu, Lei
Wang, Yongji
Yang, Ye
PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 2588 - 2592
[34] Sub-trajectory clustering with deep reinforcement learning
Liang, Anqi
Yao, Bin
Wang, Bo
Liu, Yinpei
Chen, Zhida
Xie, Jiong
Li, Feifei
VLDB JOURNAL, 2024, 33 (03): : 685 - 702
[35] Sub-trajectory clustering with deep reinforcement learning
Anqi Liang
Bin Yao
Bo Wang
Yinpei Liu
Zhida Chen
Jiong Xie
Feifei Li
The VLDB Journal, 2024, 33 : 685 - 702
[36] Deep Reinforcement Learning for Trajectory Generation and Optimisation of UAVs
Akhtar, Mishma
Maqsood, Adnan
Verbeke, Mathias
2023 10TH INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN AIR AND SPACE TECHNOLOGIES, RAST, 2023,
[37] Robust trajectory-constrained frequency control for microgrids considering model linearization error
Zhang, Yichen
Chen, Chen
Hong, Tianqi
Cui, Bai
Xu, Zhe
Chen, Bo
Qiu, Feng
APPLIED ENERGY, 2023, 333
[38] When to Replan? An Adaptive Replanning Strategy for Autonomous Navigation using Deep Reinforcement Learning
Honda, Kohei
Yonetani, Ryo
Nishimura, Mai
Kozuno, Tadashi
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 6650 - 6656
[39] Novel adaptive stability enhancement strategy for power systems based on deep reinforcement learning
Zhao, Yincheng
Hu, Weihao
Zhang, Guozhou
Huang, Qi
Chen, Zhe
Blaabjerg, Frede
INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2023, 152
[40] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
GAN XueMei
ZUO Ying
ZHANG AnSi
LI ShaoBo
TAO Fei
Science China(Technological Sciences), 2023, (07) : 1937 - 1951

← 1 2 3 4 5 →