Entropy regularization methods for parameter space exploration

被引：5

作者：

Han, Shuai ^{[1
,2
,3
]}

Zhou, Wenbo ^{[1
,4
,5
]}

Lu, Shuai ^{[1
,2
,6
]}

Zhu, Sheng ^{[1
,6
]}

Gong, Xiaoyu ^{[1
,2
]}

机构：

[1] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China

[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China

[3] Univ Utrecht, Dept Informat & Comp Sci, NL-3584 CC Utrecht, Netherlands

[4] Northeast Normal Univ, Sch Informat Sci & Technol, Changchun 130117, Peoples R China

[5] Northeast Normal Univ, Minist Educ, Key Lab Appl Stat, Changchun 130024, Peoples R China

[6] Jilin Univ, Coll Software, Changchun 130012, Peoples R China

来源：

INFORMATION SCIENCES | 2023年 / 622卷 / 476-489期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Reinforcement learning; Entropy regularization; Exploration; Parameter spaces; Deterministic policy gradients;

D O I：

10.1016/j.ins.2022.11.099

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Entropy regularization is an important approach to improve exploration and enhance pol-icy stability for reinforcement learning. However, in previous study, entropy regularization is applied only to action spaces. In this paper, we apply entropy regularization to parameter spaces. We use learnable noisy layers to parameterize the policy network to obtain a learn-able entropy. Also, we derive the expression for the entropy of the noisy parameter and an upper bound on the joint entropy. Based on these, we propose a model-free method named deep pseudo deterministic policy gradients based on entropy regularization (DPGER). This method maximizes the entropy of each noisy parameter in the early learning process to promote exploration, and minimizes the joint entropy of the noisy parameters in the later learning process to facilitate the formation of stable policies. We test our method on four Mujoco environments with five random seeds. The results show that our method brings better performance compared to previous methods. (c) 2022 Elsevier Inc. All rights reserved.

引用

页码：476 / 489

页数：14

共 50 条

[1] Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space
Fan, Yingying
Lv, Jinchi
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (503) : 1044 - 1061
[2] A componentwise iterated relative entropy regularization method with updated prior and regularization parameter
Rullgard, H.
Oktem, O.
Skoglund, U.
INVERSE PROBLEMS, 2007, 23 (05) : 2121 - 2139
[3] Visual Parameter Space Exploration in Time and Space
Piccolotto, Nikolaus
Boegl, Markus
Miksch, Silvia
COMPUTER GRAPHICS FORUM, 2023, 42 (06)
[4] Convergence of regularization methods with filter functions for a regularization parameter chosen with GSURE
Sixou, B.
9TH INTERNATIONAL CONFERENCE ON NEW COMPUTATIONAL METHODS FOR INVERSE PROBLEMS, NCMIP 2019, 2020, 1476
[5] Visual Exploration of Algorithm Parameter Space
Franken, Nelis
2009 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-5, 2009, : 389 - 398
[6] Optimal numerical methods for choosing an optimal regularization parameter
Okamoto, K.
Li, B. Q.
NUMERICAL HEAT TRANSFER PART B-FUNDAMENTALS, 2007, 51 (06) : 515 - 533
[7] Scale-space properties of regularization methods
Radmoser, E
Scherzer, O
Weickert, J
SCALE-SPACE THEORIES IN COMPUTER VISION, 1999, 1682 : 211 - 222
[8] DIAGRAMMATIC METHODS IN PHASE-SPACE REGULARIZATION
BERN, Z
HALPERN, MB
ZEITSCHRIFT FUR PHYSIK C-PARTICLES AND FIELDS, 1988, 39 (03): : 381 - 391
[9] Entropy considerations in constraining the mSUGRA parameter space
Nunez, Dario
Sussman, Roberto A.
Zavala, Jesus
Nellen, Lukas
Cabral-Rosetti, Luis G.
Mondragon, Myriam
PARTICLES AND FIELDS, PT A, 2006, 857 : 321 - +
[10] Symplectomorphic registration with phase space regularization by entropy spectrum pathways
Galinsky, Vitaly L.
Frank, Lawrence R.
MAGNETIC RESONANCE IN MEDICINE, 2019, 81 (02) : 1335 - 1352

← 1 2 3 4 5 →