Automatic Temperature Parameter Tuning for Reinforcement Learning Using Path Integral Policy Improvement

被引：0

作者：

Nakano, Hiroyasu ^{[1
]}

Ariizumi, Ryo ^{[2
]}

Asai, Toru ^{[1
]}

Azuma, Shun-Ichi ^{[3
]}

机构：

[1] Nagoya Univ, Dept Mech Syst Engn, Nagoya, Aichi 4648603, Japan

[2] Tokyo Univ Agr & Technol, Dept Mech Syst Engn, Yoshida Honmachi, Tokyo 1848588, Japan

[3] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 12期

基金：

日本学术振兴会; 日本科学技术振兴机构;

关键词：

Robots; Tuning; Task analysis; Covariance matrices; Trajectory; Legged locomotion; Learning systems; Legged robot; policy improvement; reinforcement learning (RL); robotics; snake robot;

D O I：

10.1109/TNNLS.2023.3312857

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, we propose a novel variant of path integral policy improvement with covariance matrix adaptation (PI2-CMA), which is a reinforcement learning (RL) algorithm that aims to optimize a parameterized policy for the continuous behavior of robots. PI2-CMA has a hyperparameter called the temperature parameter, and its value is critical for performance; however, little research has been conducted on it and the existing method still contains a tunable parameter, which can be critical to performance. Therefore, tuning by trial and error is necessary in the existing method. Moreover, we show that there is a problem setting that cannot be learned by the existing method. The proposed method solves both problems by automatically adjusting the temperature parameter for each update. We confirmed the effectiveness of the proposed method using numerical tests.

引用

页码：18200 / 18211

页数：12

共 50 条

[21] Tuning pianos using reinforcement learning
Millard, Matthew
Tizhoosh, Hamid R.
APPLIED ACOUSTICS, 2007, 68 (05) : 576 - 593
[22] Automatic collective motion tuning using actor-critic deep reinforcement learning
Abpeikar, Shadi
Kasmarik, Kathryn
Garratt, Matthew
Hunjet, Robert
Khan, Md Mohiuddin
Qiu, Huanneng
SWARM AND EVOLUTIONARY COMPUTATION, 2022, 72
[23] Automatic Hyperparameter Tuning in Deep Convolutional Neural Networks Using Asynchronous Reinforcement Learning
Neary, Patrick L.
2018 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (ICCC), 2018, : 73 - 77
[24] Predictive path following control for mobile robots with automatic parameter tuning
Tika, Argtim
Hiremath, Sandesh
Bajcinca, Naim
2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
[25] Continuous Parameter Control in Genetic Algorithms using Policy Gradient Reinforcement Learning
de Miguel Gomez, Alejandro
Toosi, Farshad Ghassemi
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 115 - 122
[26] Deep Reinforcement Learning for Parameter Tuning of Robot Visual Servoing
Xu, Meng
Wang, Jianping
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (02)
[27] Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning
Rosbach, Sascha
Li, Xing
Grossjohann, Simon
Homoceanu, Silviu
Roth, Stefan
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5187 - 5193
[28] CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement Learning
Mai, Genting
He, Zilong
Yu, Guangba
Chen, Zhiming
Chen, Pengfei
PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 269 - 278
[29] Automatic depression severity assessment with deep learning using parameter-efficient tuning
Lau, Clinton
Zhu, Xiaodan
Chan, Wai-Yip
FRONTIERS IN PSYCHIATRY, 2023, 14
[30] Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
Barreto, Andre
Borsa, Diana
Quan, John
Schaul, Tom
Silver, David
Hessel, Matteo
Mankowitz, Daniel
Zidek, Augustin
Munos, Remi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80

← 1 2 3 4 5 →