Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

被引：0

作者：

Wang, Guojian ^{[1
,4
]}

Wu, Faguo ^{[1
,2
,3
,4
,5
]}

Zhang, Xiao ^{[1
,3
,4
,5
]}

Guo, Ning ^{[1
,4
]}

Zheng, Zhiming ^{[2
,3
,4
,5
]}

机构：

[1] Beihang Univ, Sch Math Sci, Beijing 100191, Peoples R China

[2] Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China

[3] Zhongguancun Lab, Beijing 100194, Peoples R China

[4] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China

[5] Beihang Univ, Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 285卷

基金：

中国国家自然科学基金;

关键词：

Deep reinforcement learning; Hard-exploration problem; Policy gradient; Offline suboptimal demonstrations;

D O I：

10.1016/j.knosys.2023.111334

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) faces significant challenges in addressing hard -exploration tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory -constrained exploration strategy for DRL. The proposed method guides the agent's policy away from suboptimal solutions by regarding previous offline demonstrations as references. Specifically, this approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy -gradient -based optimization algorithm that utilizes adaptive clipped trajectory -distance rewards for both singleand multi -agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst -case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrated the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both singleand multi -agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at https://github.com/buaawgj/TACE.

引用

页数：20

共 50 条

[21] Adaptive Bitrate Algorithms via Deep Reinforcement Learning With Digital Twins Assisted Trajectory
Ye, Jin
Qin, Shaowen
Xiao, Qingyu
Jiang, Wenchao
Tang, Xin
Li, Xiaohuan
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (04): : 3522 - 3535
[22] Deep Reinforcement Learning for Adaptive Learning Systems
Li, Xiao
Xu, Hanchen
Zhang, Jinming
Chang, Hua-hua
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2023, 48 (02) : 220 - 243
[23] Adaptive Exploration Strategy With Multi-Attribute Decision-Making for Reinforcement Learning
Hu, Chunyang
Xu, Meng
IEEE ACCESS, 2020, 8 : 32353 - 32364
[24] An adaptive testing item selection strategy via a deep reinforcement learning approach
Wang, Pujue
Liu, Hongyun
Xu, Mingqi
BEHAVIOR RESEARCH METHODS, 2024, 56 (08) : 8695 - 8714
[25] A stochastic exploration strategy for satisficing reinforcement learning
Katayama, S
Kobayashi, S
INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 296 - 303
[26] Deep Adaptive Control: Deep Reinforcement Learning-Based Adaptive Vehicle Trajectory Control Algorithms for Different Risk Levels
He, Yixu
Liu, Yang
Yang, Lan
Qu, Xiaobo
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1654 - 1666
[27] An improved frontier-based robot exploration strategy combined with deep reinforcement learning
Wang, Rui
Zhang, Jie
Lyu, Ming
Yan, Cheng
Chen, Yaowei
ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 181
[28] Deep Reinforcement Learning for Trajectory Path Planning and Distributed Inference in Resource-Constrained UAV Swarms
Dhuheir, Marwan
Baccour, Emna
Erbad, Aiman
Al-Obaidi, Sinan Sabeeh
Hamdi, Mounir
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (09) : 8185 - 8201
[29] Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning
Ota, Kei
Jha, Devesh K.
Oiki, Tomoaki
Miura, Mamoru
Nammoto, Takashi
Nikovski, Daniel
Mariyama, Toshisada
2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 3487 - 3494
[30] Autonomous exploration through deep reinforcement learning
Yan, Xiangda
Huang, Jie
He, Keyan
Hong, Huajie
Xu, Dasheng
INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2023, 50 (05): : 793 - 803

← 1 2 3 4 5 →