Reward Space Noise for Exploration in Deep Reinforcement Learning

被引:1
|
作者
Sun, Chuxiong [1 ]
Wang, Rui [1 ]
Li, Qian [2 ]
Hu, Xiaohui [3 ]
机构
[1] Chinese Acad Sci, Inst Software, 4,South Fourth St, Beijing, Peoples R China
[2] Univ Technol Sydney, Coll Comp Sci & Technol, Sydney, NSW 2007, Australia
[3] Chinese Acad Sci, Inst Software, 4 South Fourth St, Beijing, Peoples R China
关键词
Reinforcement learning; exploration-exploitation; deep learning;
D O I
10.1142/S0218001421520133
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A fundamental challenge for reinforcement learning (RL) is how to achieve efficient exploration in initially unknown environments. Most state-of-the-art RL algorithms leverage action space noise to drive exploration. The classical strategies are computationally efficient and straightforward to implement. However, these methods may fail to perform effectively in complex environments. To address this issue, we propose a novel strategy named reward space noise (RSN) for farsighted and consistent exploration in RL. By introducing the stochasticity from reward space, we are able to change agent's understanding about environment and perturb its behaviors. We find that the simple RSN can achieve consistent exploration and scale to complex domains without intensive computational cost. To demonstrate the effectiveness and scalability of the proposed method, we implement a deep Q-learning agent with reward noise and evaluate its exploratory performance on a set of Atari games which are challenging for the naive epsilon-greedy strategy. The results show that reward noise outperforms action noise in most games and performs comparably in others. Concretely, we found that in the early training, the best exploratory performance of reward noise is obviously better than action noise, which demonstrates that the reward noise can quickly explore the valuable states and aid in finding the optimal policy. Moreover, the average scores and learning efficiency of reward noise are also higher than action noise through the whole training, which indicates that the reward noise can generate more stable and consistent performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning
    Xu, Zhi-xiong
    Cao, Lei
    Chen, Xi-liang
    Li, Chen-xi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09): : 2409 - 2412
  • [2] Reward poisoning attacks in deep reinforcement learning based on exploration strategies
    Cai, Kanting
    Zhu, Xiangbin
    Hu, Zhaolong
    NEUROCOMPUTING, 2023, 553
  • [3] Reward-Free Exploration for Reinforcement Learning
    Jin, Chi
    Krishnamurthy, Akshay
    Simchowitz, Max
    Yu, Tiancheng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [4] A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment
    Liu, Xi
    Ma, Long
    Chen, Zhen
    Zheng, Changgang
    Chen, Ren
    Liao, Yong
    Yang, Shufan
    ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 216 - 221
  • [5] Occupancy Reward-Driven Exploration with Deep Reinforcement Learning for Mobile Robot System
    Kamalova, Albina
    Lee, Suk Gyu
    Kwon, Soon Hak
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [6] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
    Yang, Yulong
    Cao, Weihua
    Guo, Linwei
    Gan, Chao
    Wu, Min
    2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
  • [7] Skill Reward for Safe Deep Reinforcement Learning
    Cheng, Jiangchang
    Yu, Fumin
    Zhang, Hongliang
    Dai, Yinglong
    UBIQUITOUS SECURITY, 2022, 1557 : 203 - 213
  • [8] Hindsight Reward Shaping in Deep Reinforcement Learning
    de Villiers, Byron
    Sabatta, Deon
    2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
  • [9] Multi‑agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios
    Fang, Baofu
    Yu, Tingting
    Wang, Hao
    Wang, Zaijun
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 37 (05): : 435 - 446
  • [10] Long-Term Visitation Value for Deep Exploration in Sparse-Reward Reinforcement Learning
    Parisi, Simone
    Tateo, Davide
    Hensel, Maximilian
    D'Eramo, Carlo
    Peters, Jan
    Pajarinen, Joni
    ALGORITHMS, 2022, 15 (03)