Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States

被引:0
|
作者
Banerjee, Chayan [1 ]
Chen, Zhiyong [1 ]
Noman, Nasimul [2 ]
机构
[1] Univ Newcastle, Sch Engn, Callaghan, NSW 2308, Australia
[2] Univ Newcastle, Sch Informat & Phys Sci, Callaghan, NSW 2308, Australia
关键词
D O I
10.1109/CDC49753.2023.10383350
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Improvement of exploration and exploitation using more efficient samples is a critical issue in reinforcement learning algorithms. A basic strategy of a learning algorithm is to facilitate indiscriminate exploration of the entire environment state space, as well as to encourage exploration of rarely visited states rather than frequently visited ones. Under this strategy, we propose a new method to boost exploration through an intrinsic reward, based on the measurement of a state's novelty and the associated benefit of exploring the state, collectively called plausible novelty. By incentivizing exploration of plausible novel states, an actor-critic (AC) algorithm can improve its sample efficiency and, consequently, its training performance. The new method is verified through extensive simulations of continuous control tasks in MuJoCo environments, using a variety of prominent off-policy AC algorithms.
引用
收藏
页码:7009 / 7014
页数:6
相关论文
共 50 条
  • [31] Developing adaptive traffic signal control by actor-critic and direct exploration methods
    Aslani, Mohammad
    Mesgari, Mohammad Saadi
    Seipel, Stefan
    Wiering, Marco
    PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-TRANSPORT, 2019, 172 (05) : 289 - 298
  • [32] Variance-constrained actor-critic algorithms for discounted and average reward MDPs
    Prashanth, L. A.
    Ghavamzadeh, Mohammad
    MACHINE LEARNING, 2016, 105 (03) : 367 - 417
  • [33] Distributed Actor-Critic Algorithms for Multiagent Reinforcement Learning Over Directed Graphs
    Dai, Pengcheng
    Yu, Wenwu
    Wang, He
    Baldi, Simone
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7210 - 7221
  • [34] Variance-constrained actor-critic algorithms for discounted and average reward MDPs
    L. A. Prashanth
    Mohammad Ghavamzadeh
    Machine Learning, 2016, 105 : 367 - 417
  • [35] Natural Gradient Actor-Critic Algorithms using Random Rectangular Coarse Coding
    Kimura, Hajime
    2008 PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-7, 2008, : 1945 - 1952
  • [36] Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
    Laroche, Romain
    des Combes, Remi Tachet
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 5658 - 5688
  • [37] Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
    Jia, Yanwei
    Zhou, Xun Yu
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [38] A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory
    Bao Xi
    Rui Wang
    Ying-Hao Cai
    Tao Lu
    Shuo Wang
    International Journal of Automation and Computing, 2021, 18 : 619 - 631
  • [39] Model Learning Actor-Critic Algorithms: Performance Evaluation in a Motion Control Task
    Grondman, Ivo
    Busoniu, Lucian
    Babuska, Robert
    2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 5272 - 5277
  • [40] Algorithms for Variance Reduction in a Policy-Gradient Based Actor-Critic Framework
    Awate, Yogesh P.
    ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 130 - 136