Reward Space Noise for Exploration in Deep Reinforcement Learning

被引：1

作者：

Sun, Chuxiong ^{[1
]}

Wang, Rui ^{[1
]}

Li, Qian ^{[2
]}

Hu, Xiaohui ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Software, 4,South Fourth St, Beijing, Peoples R China

[2] Univ Technol Sydney, Coll Comp Sci & Technol, Sydney, NSW 2007, Australia

[3] Chinese Acad Sci, Inst Software, 4 South Fourth St, Beijing, Peoples R China

来源：

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE | 2021年 / 35卷 / 10期

关键词：

Reinforcement learning; exploration-exploitation; deep learning;

D O I：

10.1142/S0218001421520133

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A fundamental challenge for reinforcement learning (RL) is how to achieve efficient exploration in initially unknown environments. Most state-of-the-art RL algorithms leverage action space noise to drive exploration. The classical strategies are computationally efficient and straightforward to implement. However, these methods may fail to perform effectively in complex environments. To address this issue, we propose a novel strategy named reward space noise (RSN) for farsighted and consistent exploration in RL. By introducing the stochasticity from reward space, we are able to change agent's understanding about environment and perturb its behaviors. We find that the simple RSN can achieve consistent exploration and scale to complex domains without intensive computational cost. To demonstrate the effectiveness and scalability of the proposed method, we implement a deep Q-learning agent with reward noise and evaluate its exploratory performance on a set of Atari games which are challenging for the naive epsilon-greedy strategy. The results show that reward noise outperforms action noise in most games and performs comparably in others. Concretely, we found that in the early training, the best exploratory performance of reward noise is obviously better than action noise, which demonstrates that the reward noise can quickly explore the valuable states and aid in finding the optimal policy. Moreover, the average scores and learning efficiency of reward noise are also higher than action noise through the whole training, which indicates that the reward noise can generate more stable and consistent performance.

引用

页数：21

共 50 条

[1] Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning
Xu, Zhi-xiong
Cao, Lei
Chen, Xi-liang
Li, Chen-xi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09): : 2409 - 2412
[2] Reward poisoning attacks in deep reinforcement learning based on exploration strategies
Cai, Kanting
Zhu, Xiangbin
Hu, Zhaolong
NEUROCOMPUTING, 2023, 553
[3] Reward-Free Exploration for Reinforcement Learning
Jin, Chi
Krishnamurthy, Akshay
Simchowitz, Max
Yu, Tiancheng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[4] A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment
Liu, Xi
Ma, Long
Chen, Zhen
Zheng, Changgang
Chen, Ren
Liao, Yong
Yang, Shufan
ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 216 - 221
[5] Occupancy Reward-Driven Exploration with Deep Reinforcement Learning for Mobile Robot System
Kamalova, Albina
Lee, Suk Gyu
Kwon, Soon Hak
APPLIED SCIENCES-BASEL, 2022, 12 (18):
[6] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
Yang, Yulong
Cao, Weihua
Guo, Linwei
Gan, Chao
Wu, Min
2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
[7] Skill Reward for Safe Deep Reinforcement Learning
Cheng, Jiangchang
Yu, Fumin
Zhang, Hongliang
Dai, Yinglong
UBIQUITOUS SECURITY, 2022, 1557 : 203 - 213
[8] Hindsight Reward Shaping in Deep Reinforcement Learning
de Villiers, Byron
Sabatta, Deon
2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
[9] Multi‑agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios
Fang, Baofu
Yu, Tingting
Wang, Hao
Wang, Zaijun
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 37 (05): : 435 - 446
[10] Long-Term Visitation Value for Deep Exploration in Sparse-Reward Reinforcement Learning
Parisi, Simone
Tateo, Davide
Hensel, Maximilian
D'Eramo, Carlo
Peters, Jan
Pajarinen, Joni
ALGORITHMS, 2022, 15 (03)

← 1 2 3 4 5 →