Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States

被引:0
|
作者
Banerjee, Chayan [1 ]
Chen, Zhiyong [1 ]
Noman, Nasimul [2 ]
机构
[1] Univ Newcastle, Sch Engn, Callaghan, NSW 2308, Australia
[2] Univ Newcastle, Sch Informat & Phys Sci, Callaghan, NSW 2308, Australia
关键词
D O I
10.1109/CDC49753.2023.10383350
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Improvement of exploration and exploitation using more efficient samples is a critical issue in reinforcement learning algorithms. A basic strategy of a learning algorithm is to facilitate indiscriminate exploration of the entire environment state space, as well as to encourage exploration of rarely visited states rather than frequently visited ones. Under this strategy, we propose a new method to boost exploration through an intrinsic reward, based on the measurement of a state's novelty and the associated benefit of exploring the state, collectively called plausible novelty. By incentivizing exploration of plausible novel states, an actor-critic (AC) algorithm can improve its sample efficiency and, consequently, its training performance. The new method is verified through extensive simulations of continuous control tasks in MuJoCo environments, using a variety of prominent off-policy AC algorithms.
引用
收藏
页码:7009 / 7014
页数:6
相关论文
共 50 条
  • [21] A Critical Point Analysis of Actor-Critic Algorithms with Neural Networks
    Gottwald, Martin
    Shen, Hao
    Diepold, Klaus
    IFAC PAPERSONLINE, 2022, 55 (15): : 27 - 32
  • [22] Parametrized actor-critic algorithms for finite-horizon MDPs
    Abdulla, Mohammed Shahid
    Bhatnagar, Shalabh
    2007 AMERICAN CONTROL CONFERENCE, VOLS 1-13, 2007, : 2701 - 2706
  • [23] Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
    Xu, Tengyu
    Wang, Zhe
    Liang, Yingbin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [24] An Advantage Actor-Critic Algorithm with Confidence Exploration for Open Information Extraction
    Liu, Guiliang
    Li, Xu
    Sun, Miningming
    Li, Ping
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 217 - 225
  • [25] SafeTAC: Safe Tsallis Actor-Critic Reinforcement Learning for Safer Exploration
    Kim, Dohyeong
    Heo, Jaeseok
    Oh, Songhwai
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 4070 - 4075
  • [26] A constrained optimization perspective on actor-critic algorithms and application to network routing
    Prashanth, L. A.
    Prasad, H. L.
    Bhatnagar, Shalabh
    Chandra, Prakash
    SYSTEMS & CONTROL LETTERS, 2016, 92 : 46 - 51
  • [27] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [28] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [29] Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms
    Zheng, Liyuan
    Fiez, Tanner
    Alumbaugh, Zane
    Chasnov, Benjamin
    Ratliff, Lillian J.
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9217 - 9224
  • [30] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Trivedi, Prashant
    Hemachandra, Nandyala
    DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55