Reward Space Noise for Exploration in Deep Reinforcement Learning

被引:1
|
作者
Sun, Chuxiong [1 ]
Wang, Rui [1 ]
Li, Qian [2 ]
Hu, Xiaohui [3 ]
机构
[1] Chinese Acad Sci, Inst Software, 4,South Fourth St, Beijing, Peoples R China
[2] Univ Technol Sydney, Coll Comp Sci & Technol, Sydney, NSW 2007, Australia
[3] Chinese Acad Sci, Inst Software, 4 South Fourth St, Beijing, Peoples R China
关键词
Reinforcement learning; exploration-exploitation; deep learning;
D O I
10.1142/S0218001421520133
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A fundamental challenge for reinforcement learning (RL) is how to achieve efficient exploration in initially unknown environments. Most state-of-the-art RL algorithms leverage action space noise to drive exploration. The classical strategies are computationally efficient and straightforward to implement. However, these methods may fail to perform effectively in complex environments. To address this issue, we propose a novel strategy named reward space noise (RSN) for farsighted and consistent exploration in RL. By introducing the stochasticity from reward space, we are able to change agent's understanding about environment and perturb its behaviors. We find that the simple RSN can achieve consistent exploration and scale to complex domains without intensive computational cost. To demonstrate the effectiveness and scalability of the proposed method, we implement a deep Q-learning agent with reward noise and evaluate its exploratory performance on a set of Atari games which are challenging for the naive epsilon-greedy strategy. The results show that reward noise outperforms action noise in most games and performs comparably in others. Concretely, we found that in the early training, the best exploratory performance of reward noise is obviously better than action noise, which demonstrates that the reward noise can quickly explore the valuable states and aid in finding the optimal policy. Moreover, the average scores and learning efficiency of reward noise are also higher than action noise through the whole training, which indicates that the reward noise can generate more stable and consistent performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Policy-based deep reinforcement learning for sparse reward environment
    Kim M.
    Kim J.-S.
    Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (03): : 506 - 514
  • [42] Deep Reinforcement Learning-based Image Captioning with Embedding Reward
    Ren, Zhou
    Wang, Xiaoyu
    Zhang, Ning
    Lv, Xutao
    Li, Li-Jia
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1151 - 1159
  • [43] Full Gradient Deep Reinforcement Learning for Average-Reward Criterion
    Pagare, Tejas
    Borkar, Vivek
    Avrachenkov, Konstantin
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [44] On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
    Zhang, Yiming
    Ross, Keith W.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [45] Deep Reinforcement Learning With Optimized Reward Functions for Robotic Trajectory Planning
    Xie, Jiexin
    Shao, Zhenzhou
    Li, Yue
    Guan, Yong
    Tan, Jindong
    IEEE ACCESS, 2019, 7 : 105669 - 105679
  • [46] Modular deep reinforcement learning from reward and punishment for robot navigation
    Wang, Jiexin
    Elfwing, Stefan
    Uchibe, Eiji
    NEURAL NETWORKS, 2021, 135 : 115 - 126
  • [47] An Improvement on Mapless Navigation with Deep Reinforcement Learning: A Reward Shaping Approach
    Alipanah, Arezoo
    Moosavian, S. Ali A.
    2022 10TH RSI INTERNATIONAL CONFERENCE ON ROBOTICS AND MECHATRONICS (ICROM), 2022, : 261 - 266
  • [48] Deep Reinforcement Learning by Parallelizing Reward and Punishment using the MaxPain Architecture
    Wang, Jiexin
    Elfwing, Stefan
    Uchibe, Eiji
    2018 JOINT IEEE 8TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2018, : 175 - 180
  • [49] Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward
    Shao, Kun
    Zhu, Yuanheng
    Tang, Zhentao
    Zhao, Dongbin
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [50] Advances in deep space exploration via simulators & deep learning
    Bird, James
    Petzold, Linda
    Lubin, Philip
    Deacon, Julia
    NEW ASTRONOMY, 2021, 84