Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

被引：0

作者：

Wang, Guojian ^{[1
,4
]}

Wu, Faguo ^{[1
,2
,3
,4
,5
]}

Zhang, Xiao ^{[1
,3
,4
,5
]}

Guo, Ning ^{[1
,4
]}

Zheng, Zhiming ^{[2
,3
,4
,5
]}

机构：

[1] Beihang Univ, Sch Math Sci, Beijing 100191, Peoples R China

[2] Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China

[3] Zhongguancun Lab, Beijing 100194, Peoples R China

[4] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China

[5] Beihang Univ, Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 285卷

基金：

中国国家自然科学基金;

关键词：

Deep reinforcement learning; Hard-exploration problem; Policy gradient; Offline suboptimal demonstrations;

D O I：

10.1016/j.knosys.2023.111334

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) faces significant challenges in addressing hard -exploration tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory -constrained exploration strategy for DRL. The proposed method guides the agent's policy away from suboptimal solutions by regarding previous offline demonstrations as references. Specifically, this approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy -gradient -based optimization algorithm that utilizes adaptive clipped trajectory -distance rewards for both singleand multi -agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst -case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrated the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both singleand multi -agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at https://github.com/buaawgj/TACE.

引用

页数：20

共 50 条

[1] A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning
Liang, Xingxing
Chen, Li
Feng, Yanghe
Liu, Zhong
Ma, Yang
Huang, Kuihua
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2021, 20 (02)
[2] Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
Hong, Zhang-Wei
Shann, Tzu-Yun
Su, Shih-Yang
Chang, Yi-Hsiang
Fu, Tsu-Jui
Lee, Chun-Yi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[3] Exploration Strategy based on Validity of Actions in Deep Reinforcement Learning
Yoon, Hyung-Suk
Lee, Sang-Hyun
Seo, Seung-Woo
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 6134 - 6139
[4] A Robust Integration of GPS and MEMS-INS Through Trajectory-constrained Adaptive Kalman Filtering
Zhou, Zebo
Li, Yong
Rizos, Chris
Shen, Yunzhong
PROCEEDINGS OF THE 22ND INTERNATIONAL TECHNICAL MEETING OF THE SATELLITE DIVISION OF THE INSTITUTE OF NAVIGATION (ION GNSS 2009), 2009, : 995 - 1003
[5] Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning
Xu, Zhi-xiong
Cao, Lei
Chen, Xi-liang
Li, Chen-xi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09): : 2409 - 2412
[6] Trajectory-Constrained Deep Latent Visual Attention for Improved Local Planning in Presence of Heterogeneous Terrain
Wapnick, Stefan
Manderson, Travis
Meger, David
Dudek, Gregory
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 460 - 467
[7] Trajectory-Constrained Collective Circular Motion With Different Phase Arrangements
Jain, Anoop
Ghose, Debasish
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (05) : 2237 - 2244
[8] Constrained Reinforcement Learning in Hard Exploration Problems
Pankayaraj, Pathmanathan
Varakantham, Pradeep
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15055 - 15063
[9] Deep Reinforcement Factorization Machines: A Deep Reinforcement Learning Model with Random Exploration Strategy and High Deployment Efficiency
Yu, Huaidong
Yin, Jian
APPLIED SCIENCES-BASEL, 2022, 12 (11):
[10] Adaptive Exploration Strategies for Reinforcement Learning
Hwang, Kao-Shing
Li, Chih-Wen
Jiang, Wei-Cheng
2017 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2017, : 16 - 19

← 1 2 3 4 5 →