Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States

被引:0
|
作者
Banerjee, Chayan [1 ]
Chen, Zhiyong [1 ]
Noman, Nasimul [2 ]
机构
[1] Univ Newcastle, Sch Engn, Callaghan, NSW 2308, Australia
[2] Univ Newcastle, Sch Informat & Phys Sci, Callaghan, NSW 2308, Australia
关键词
D O I
10.1109/CDC49753.2023.10383350
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Improvement of exploration and exploitation using more efficient samples is a critical issue in reinforcement learning algorithms. A basic strategy of a learning algorithm is to facilitate indiscriminate exploration of the entire environment state space, as well as to encourage exploration of rarely visited states rather than frequently visited ones. Under this strategy, we propose a new method to boost exploration through an intrinsic reward, based on the measurement of a state's novelty and the associated benefit of exploring the state, collectively called plausible novelty. By incentivizing exploration of plausible novel states, an actor-critic (AC) algorithm can improve its sample efficiency and, consequently, its training performance. The new method is verified through extensive simulations of continuous control tasks in MuJoCo environments, using a variety of prominent off-policy AC algorithms.
引用
收藏
页码:7009 / 7014
页数:6
相关论文
共 50 条
  • [1] Actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
  • [2] On actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
  • [3] Variational actor-critic algorithms*,**
    Zhu, Yuhua
    Ying, Lexing
    ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2023, 29
  • [4] Natural actor-critic algorithms
    Bhatnagar, Shalabh
    Sutton, Richard S.
    Ghavamzadeh, Mohammad
    Lee, Mark
    AUTOMATICA, 2009, 45 (11) : 2471 - 2482
  • [5] Better Exploration with Optimistic Actor-Critic
    Ciosek, Kamil
    Quan Vuong
    Loftin, Robert
    Hofmann, Katja
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Importance sampling actor-critic algorithms
    Williams, Jason L.
    Fisher, John W., III
    Willsky, Alan S.
    2006 AMERICAN CONTROL CONFERENCE, VOLS 1-12, 2006, 1-12 : 1625 - +
  • [7] Actor-Critic Algorithms for Variance Minimization
    Awate, Yogesh P.
    TECHNOLOGICAL DEVELOPMENTS IN EDUCATION AND AUTOMATION, 2010, : 455 - 460
  • [8] Bias in Natural Actor-Critic Algorithms
    Thomas, Philip S.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [9] Boosting On-Policy Actor-Critic With Shallow Updates in Critic
    Li, Luntong
    Zhu, Yuanheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 10
  • [10] Actor-Critic Algorithms with Online Feature Adaptation
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    Borkar, Vivek S.
    ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2016, 26 (04):