Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States

被引：0

作者：

Banerjee, Chayan ^{[1
]}

Chen, Zhiyong ^{[1
]}

Noman, Nasimul ^{[2
]}

机构：

[1] Univ Newcastle, Sch Engn, Callaghan, NSW 2308, Australia

[2] Univ Newcastle, Sch Informat & Phys Sci, Callaghan, NSW 2308, Australia

来源：

2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC | 2023年

关键词：

D O I：

10.1109/CDC49753.2023.10383350

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Improvement of exploration and exploitation using more efficient samples is a critical issue in reinforcement learning algorithms. A basic strategy of a learning algorithm is to facilitate indiscriminate exploration of the entire environment state space, as well as to encourage exploration of rarely visited states rather than frequently visited ones. Under this strategy, we propose a new method to boost exploration through an intrinsic reward, based on the measurement of a state's novelty and the associated benefit of exploring the state, collectively called plausible novelty. By incentivizing exploration of plausible novel states, an actor-critic (AC) algorithm can improve its sample efficiency and, consequently, its training performance. The new method is verified through extensive simulations of continuous control tasks in MuJoCo environments, using a variety of prominent off-policy AC algorithms.

引用

页码：7009 / 7014

页数：6

共 50 条

[21] A Critical Point Analysis of Actor-Critic Algorithms with Neural Networks
Gottwald, Martin
Shen, Hao
Diepold, Klaus
IFAC PAPERSONLINE, 2022, 55 (15): : 27 - 32
[22] Parametrized actor-critic algorithms for finite-horizon MDPs
Abdulla, Mohammed Shahid
Bhatnagar, Shalabh
2007 AMERICAN CONTROL CONFERENCE, VOLS 1-13, 2007, : 2701 - 2706
[23] Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
Xu, Tengyu
Wang, Zhe
Liang, Yingbin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[24] An Advantage Actor-Critic Algorithm with Confidence Exploration for Open Information Extraction
Liu, Guiliang
Li, Xu
Sun, Miningming
Li, Ping
PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 217 - 225
[25] SafeTAC: Safe Tsallis Actor-Critic Reinforcement Learning for Safer Exploration
Kim, Dohyeong
Heo, Jaeseok
Oh, Songhwai
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 4070 - 4075
[26] A constrained optimization perspective on actor-critic algorithms and application to network routing
Prashanth, L. A.
Prasad, H. L.
Bhatnagar, Shalabh
Chandra, Prakash
SYSTEMS & CONTROL LETTERS, 2016, 92 : 46 - 51
[27] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
Diddigi, Raghuram Bharadwaj
Reddy, D. Sai Koti
Prabuchandran, K. J.
Bhatnagar, Shalabh
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
[28] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
Prashant Trivedi
Nandyala Hemachandra
Dynamic Games and Applications, 2023, 13 : 25 - 55
[29] Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms
Zheng, Liyuan
Fiez, Tanner
Alumbaugh, Zane
Chasnov, Benjamin
Ratliff, Lillian J.
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9217 - 9224
[30] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
Trivedi, Prashant
Hemachandra, Nandyala
DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55

← 1 2 3 4 5 →