Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States

被引：0

作者：

Banerjee, Chayan ^{[1
]}

Chen, Zhiyong ^{[1
]}

Noman, Nasimul ^{[2
]}

机构：

[1] Univ Newcastle, Sch Engn, Callaghan, NSW 2308, Australia

[2] Univ Newcastle, Sch Informat & Phys Sci, Callaghan, NSW 2308, Australia

来源：

2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC | 2023年

关键词：

D O I：

10.1109/CDC49753.2023.10383350

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Improvement of exploration and exploitation using more efficient samples is a critical issue in reinforcement learning algorithms. A basic strategy of a learning algorithm is to facilitate indiscriminate exploration of the entire environment state space, as well as to encourage exploration of rarely visited states rather than frequently visited ones. Under this strategy, we propose a new method to boost exploration through an intrinsic reward, based on the measurement of a state's novelty and the associated benefit of exploring the state, collectively called plausible novelty. By incentivizing exploration of plausible novel states, an actor-critic (AC) algorithm can improve its sample efficiency and, consequently, its training performance. The new method is verified through extensive simulations of continuous control tasks in MuJoCo environments, using a variety of prominent off-policy AC algorithms.

引用

页码：7009 / 7014

页数：6

共 50 条

[1] Actor-critic algorithms
Konda, VR
Tsitsiklis, JN
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
[2] On actor-critic algorithms
Konda, VR
Tsitsiklis, JN
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
[3] Variational actor-critic algorithms*,**
Zhu, Yuhua
Ying, Lexing
ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2023, 29
[4] Natural actor-critic algorithms
Bhatnagar, Shalabh
Sutton, Richard S.
Ghavamzadeh, Mohammad
Lee, Mark
AUTOMATICA, 2009, 45 (11) : 2471 - 2482
[5] Better Exploration with Optimistic Actor-Critic
Ciosek, Kamil
Quan Vuong
Loftin, Robert
Hofmann, Katja
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[6] Importance sampling actor-critic algorithms
Williams, Jason L.
Fisher, John W., III
Willsky, Alan S.
2006 AMERICAN CONTROL CONFERENCE, VOLS 1-12, 2006, 1-12 : 1625 - +
[7] Actor-Critic Algorithms for Variance Minimization
Awate, Yogesh P.
TECHNOLOGICAL DEVELOPMENTS IN EDUCATION AND AUTOMATION, 2010, : 455 - 460
[8] Bias in Natural Actor-Critic Algorithms
Thomas, Philip S.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[9] Boosting On-Policy Actor-Critic With Shallow Updates in Critic
Li, Luntong
Zhu, Yuanheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 10
[10] Actor-Critic Algorithms with Online Feature Adaptation
Prabuchandran, K. J.
Bhatnagar, Shalabh
Borkar, Vivek S.
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2016, 26 (04):

← 1 2 3 4 5 →