Reward Space Noise for Exploration in Deep Reinforcement Learning

被引:1
|
作者
Sun, Chuxiong [1 ]
Wang, Rui [1 ]
Li, Qian [2 ]
Hu, Xiaohui [3 ]
机构
[1] Chinese Acad Sci, Inst Software, 4,South Fourth St, Beijing, Peoples R China
[2] Univ Technol Sydney, Coll Comp Sci & Technol, Sydney, NSW 2007, Australia
[3] Chinese Acad Sci, Inst Software, 4 South Fourth St, Beijing, Peoples R China
关键词
Reinforcement learning; exploration-exploitation; deep learning;
D O I
10.1142/S0218001421520133
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A fundamental challenge for reinforcement learning (RL) is how to achieve efficient exploration in initially unknown environments. Most state-of-the-art RL algorithms leverage action space noise to drive exploration. The classical strategies are computationally efficient and straightforward to implement. However, these methods may fail to perform effectively in complex environments. To address this issue, we propose a novel strategy named reward space noise (RSN) for farsighted and consistent exploration in RL. By introducing the stochasticity from reward space, we are able to change agent's understanding about environment and perturb its behaviors. We find that the simple RSN can achieve consistent exploration and scale to complex domains without intensive computational cost. To demonstrate the effectiveness and scalability of the proposed method, we implement a deep Q-learning agent with reward noise and evaluate its exploratory performance on a set of Atari games which are challenging for the naive epsilon-greedy strategy. The results show that reward noise outperforms action noise in most games and performs comparably in others. Concretely, we found that in the early training, the best exploratory performance of reward noise is obviously better than action noise, which demonstrates that the reward noise can quickly explore the valuable states and aid in finding the optimal policy. Moreover, the average scores and learning efficiency of reward noise are also higher than action noise through the whole training, which indicates that the reward noise can generate more stable and consistent performance.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Deep Reinforcement Learning with Feedback-based Exploration
    Scholten, Jan
    Wout, Daan
    Celemin, Carlos
    Kober, Jens
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 803 - 808
  • [32] Exploration of Unknown Environments Using Deep Reinforcement Learning
    McCalmon, Joseph
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15970 - 15971
  • [33] Deep Reinforcement Learning with Risk-Seeking Exploration
    Dilokthanakul, Nat
    Shanahan, Murray
    FROM ANIMALS TO ANIMATS 15, 2018, 10994 : 201 - 211
  • [34] Designing Deep Reinforcement Learning for Human Parameter Exploration
    Scurto, Hugo
    Van Kerrebroeck, Bavo
    Caramiaux, Baptiste
    Bevilacqua, Frederic
    ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION, 2021, 28 (01)
  • [35] Improving exploration in deep reinforcement learning for stock trading
    Zemzem, Wiem
    Tagina, Moncef
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2023, 72 (04) : 288 - 295
  • [36] Action Space Shaping in Deep Reinforcement Learning
    Kanervisto, Anssi
    Scheller, Christian
    Hautamaki, Ville
    2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 479 - 486
  • [37] Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration
    Li, Tingguang
    Pan, Jin
    Zhu, Delong
    Meng, Max Q. -H.
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 648 - 653
  • [38] Scheduling the NASA Deep Space Network with Deep Reinforcement Learning
    Goh, Edwin
    Venkataram, Hamsa Shwetha
    Hoffmann, Mark
    Johnston, Mark D.
    Wilson, Brian
    2021 IEEE AEROSPACE CONFERENCE (AEROCONF 2021), 2021,
  • [39] Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards
    Devidze, Rati
    Kamalaruban, Parameswaran
    Singla, Adish
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [40] Deep Adversarial Reinforcement Learning With Noise Compensation by Autoencoder
    Ohashi, Kohei
    Nakanishi, Kosuke
    Sasaki, Wataru
    Yasui, Yuji
    Ishii, Shin
    IEEE ACCESS, 2021, 9 : 143901 - 143912