Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States

被引：0

作者：

Banerjee, Chayan ^{[1
]}

Chen, Zhiyong ^{[1
]}

Noman, Nasimul ^{[2
]}

机构：

[1] Univ Newcastle, Sch Engn, Callaghan, NSW 2308, Australia

[2] Univ Newcastle, Sch Informat & Phys Sci, Callaghan, NSW 2308, Australia

来源：

2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC | 2023年

关键词：

D O I：

10.1109/CDC49753.2023.10383350

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Improvement of exploration and exploitation using more efficient samples is a critical issue in reinforcement learning algorithms. A basic strategy of a learning algorithm is to facilitate indiscriminate exploration of the entire environment state space, as well as to encourage exploration of rarely visited states rather than frequently visited ones. Under this strategy, we propose a new method to boost exploration through an intrinsic reward, based on the measurement of a state's novelty and the associated benefit of exploring the state, collectively called plausible novelty. By incentivizing exploration of plausible novel states, an actor-critic (AC) algorithm can improve its sample efficiency and, consequently, its training performance. The new method is verified through extensive simulations of continuous control tasks in MuJoCo environments, using a variety of prominent off-policy AC algorithms.

引用

页码：7009 / 7014

页数：6

共 50 条

[31] Developing adaptive traffic signal control by actor-critic and direct exploration methods
Aslani, Mohammad
Mesgari, Mohammad Saadi
Seipel, Stefan
Wiering, Marco
PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-TRANSPORT, 2019, 172 (05) : 289 - 298
[32] Variance-constrained actor-critic algorithms for discounted and average reward MDPs
Prashanth, L. A.
Ghavamzadeh, Mohammad
MACHINE LEARNING, 2016, 105 (03) : 367 - 417
[33] Distributed Actor-Critic Algorithms for Multiagent Reinforcement Learning Over Directed Graphs
Dai, Pengcheng
Yu, Wenwu
Wang, He
Baldi, Simone
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7210 - 7221
[34] Variance-constrained actor-critic algorithms for discounted and average reward MDPs
L. A. Prashanth
Mohammad Ghavamzadeh
Machine Learning, 2016, 105 : 367 - 417
[35] Natural Gradient Actor-Critic Algorithms using Random Rectangular Coarse Coding
Kimura, Hajime
2008 PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-7, 2008, : 1945 - 1952
[36] Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Laroche, Romain
des Combes, Remi Tachet
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 5658 - 5688
[37] Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Jia, Yanwei
Zhou, Xun Yu
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[38] A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory
Bao Xi
Rui Wang
Ying-Hao Cai
Tao Lu
Shuo Wang
International Journal of Automation and Computing, 2021, 18 : 619 - 631
[39] Model Learning Actor-Critic Algorithms: Performance Evaluation in a Motion Control Task
Grondman, Ivo
Busoniu, Lucian
Babuska, Robert
2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 5272 - 5277
[40] Algorithms for Variance Reduction in a Policy-Gradient Based Actor-Critic Framework
Awate, Yogesh P.
ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 130 - 136

← 1 2 3 4 5 →