Entropy regularization methods for parameter space exploration

被引:5
|
作者
Han, Shuai [1 ,2 ,3 ]
Zhou, Wenbo [1 ,4 ,5 ]
Lu, Shuai [1 ,2 ,6 ]
Zhu, Sheng [1 ,6 ]
Gong, Xiaoyu [1 ,2 ]
机构
[1] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[3] Univ Utrecht, Dept Informat & Comp Sci, NL-3584 CC Utrecht, Netherlands
[4] Northeast Normal Univ, Sch Informat Sci & Technol, Changchun 130117, Peoples R China
[5] Northeast Normal Univ, Minist Educ, Key Lab Appl Stat, Changchun 130024, Peoples R China
[6] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Reinforcement learning; Entropy regularization; Exploration; Parameter spaces; Deterministic policy gradients;
D O I
10.1016/j.ins.2022.11.099
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entropy regularization is an important approach to improve exploration and enhance pol-icy stability for reinforcement learning. However, in previous study, entropy regularization is applied only to action spaces. In this paper, we apply entropy regularization to parameter spaces. We use learnable noisy layers to parameterize the policy network to obtain a learn-able entropy. Also, we derive the expression for the entropy of the noisy parameter and an upper bound on the joint entropy. Based on these, we propose a model-free method named deep pseudo deterministic policy gradients based on entropy regularization (DPGER). This method maximizes the entropy of each noisy parameter in the early learning process to promote exploration, and minimizes the joint entropy of the noisy parameters in the later learning process to facilitate the formation of stable policies. We test our method on four Mujoco environments with five random seeds. The results show that our method brings better performance compared to previous methods. (c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [31] The maximum entropy algorithm for the determination of the Tikhonov regularization parameter in quantitative remote sensing inversion
    Zhao, HR
    Xu, WL
    Yang, H
    Li, XW
    Wang, JD
    Cui, HX
    IGARSS 2003: IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS I - VII, PROCEEDINGS: LEARNING FROM EARTH'S SHAPES AND SIZES, 2003, : 3875 - 3877
  • [32] STANDARD EXAMPLE METHOD FOR DETERMINATION OF THE REGULARIZATION PARAMETER IN THE LAVRENTYEV AND TIKHONOV METHODS
    VERLAN, AF
    SIZIKOV, VS
    DOPOVIDI AKADEMII NAUK UKRAINSKOI RSR SERIYA A-FIZIKO-MATEMATICHNI TA TECHNICHNI NAUKI, 1979, (06): : 465 - 469
  • [33] On the monotone error rule for parameter choice in iterative and continuous regularization methods
    Hämarik, U
    Tautenhahn, U
    BIT, 2001, 41 (05): : 1029 - 1038
  • [34] Improved Parameter Estimation in Kinetic Models: Selection and Tuning of Regularization Methods
    Gabor, Attila
    Banga, Julio R.
    COMPUTATIONAL METHODS IN SYSTEMS BIOLOGY, CMSB 2014, 2014, 8859 : 45 - 60
  • [35] A comparison of two methods for choosing the regularization parameter for the inverse problem of electrocardiography
    Lowther, DA
    Throne, RD
    Olson, LG
    Windle, JR
    BIOMEDICAL SCIENCES INSTRUMENTATION, VOL 38, 2002, 38 : 257 - 261
  • [36] Regularization regression methods for aerodynamic parameter estimation from flight data
    Kumar, Ajit
    Ghosh, A. K.
    AIRCRAFT ENGINEERING AND AEROSPACE TECHNOLOGY, 2023, 95 (05): : 820 - 830
  • [37] On the Monotone Error Rule for Parameter Choice in Iterative and Continuous Regularization Methods
    U. Hämarik
    U. Tautenhahn
    BIT Numerical Mathematics, 2001, 41 : 1029 - 1038
  • [38] Regularization of currents and entropy
    Dinh, TC
    Sibony, N
    ANNALES SCIENTIFIQUES DE L ECOLE NORMALE SUPERIEURE, 2004, 37 (06): : 959 - 971
  • [39] A STUDY OF METHODS OF CHOOSING THE SMOOTHING PARAMETER IN IMAGE-RESTORATION BY REGULARIZATION
    THOMPSON, AM
    BROWN, JC
    KAY, JW
    TITTERINGTON, DM
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1991, 13 (04) : 326 - 339
  • [40] The convergence of a new heuristic parameter selection criterion for general regularization methods
    Neubauer, Andreas
    INVERSE PROBLEMS, 2008, 24 (05)