Faster MIL-based Subgoal Identification for Reinforcement Learning by Tuning Fewer Hyperparameters

被引：0

作者：

Sunel, Saim ^{[1
]}

Cilden, Erkin ^{[2
]}

Polat, Faruk ^{[1
]}

机构：

[1] Middle East Tech Univ, Dept Comp Engn, TR-06800 Ankara, Turkiye

[2] STM Def Technol Engn & Trade Inc, RF & Simulat Syst Directorate, Ankara, Turkiye

来源：

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS | 2024年 / 19卷 / 02期

关键词：

Subgoal identification; expectation-maximization; diverse density; hyper-parameter search; multiple instance learning; reinforcement learning; DISCOVERY; FRAMEWORK;

D O I：

10.1145/3643852

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Variousmethods have been proposed in the literature for identifying subgoals in discrete reinforcement learning (RL) tasks. Once subgoals are discovered, task decomposition methods can be employed to improve the learning performance of agents. In this study, we classify prominent subgoal identification methods for discrete RL tasks in the literature into the following three categories: graph-based, statistics-based, and multi-instance learning (MIL)-based. As contributions, first, we introduce a newMIL-based subgoal identification algorithm called EMDD-RL and experimentally compare it with a previous MIL-based method. The previous approach adapts MIL's Diverse Density (DD) algorithm, whereas our method considers Expected-Maximization Diverse Density (EMDD). The advantage of EMDD over DD is that it can yield more accurate results with less computation demand thanks to the expectation-maximization algorithm. EMDD-RL modifies some of the algorithmic steps of EMDD to identify subgoals in discrete RL problems. Second, we evaluate the methods in several RL tasks for the hyperparameter tuning overhead they incur. Third, we propose a new RL problem called key-room and compare the methods for their subgoal identification performances in this new task. Experiment results show that MIL-based subgoal identification methods could be preferred to the algorithms of the other two categories in practice.

引用

页数：29

共 50 条

[41] Tuning Apex DQN: A Reinforcement Learning based Deep Q-Network Algorithm
Ruhela, Dhani
Ruhela, Amit
PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2024, PEARC 2024, 2024,
[42] Deep reinforcement learning based parameter self-tuning control strategy for VSG
Xiong, Kang
Hu, Weihao
Zhang, Guozhou
Zhang, Zhenyuan
Chen, Zhe
ENERGY REPORTS, 2022, 8 : 219 - 226
[43] A Lithology Identification Approach Based on Machine Learning With Evolutionary Parameter Tuning
Saporetti, Camila Martins
da Fonseca, Leonardo Goliatt
Pereira, Egberto
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (12) : 1819 - 1823
[44] AUV hydrodynamic coefficient offline identification based on deep reinforcement learning
Wang, Zhanyuan
Luo, Wanzhen
Zhang, Tiedong
Li, Kai
Liao, Yuchen
Jia, Jinjun
Jiang, Dapeng
OCEAN ENGINEERING, 2024, 304
[45] Identification method for collective consensus mechanism based on inverse reinforcement learning
Yu X.
Wu W.
Luo J.
Li W.
Zhongguo Kexue Jishu Kexue/Scientia Sinica Technologica, 2023, 53 (02): : 258 - 267
[46] PecidRL: Petition expectation correction and identification based on deep reinforcement learning
Li, Ying
Fang, Wensi
Sun, Hang
Liu, Xiangyu
Du, Wei
Liu, Yijun
Li, Qianqian
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
[47] Reinforcement Learning Energy Management Strategy of Tram Based on Condition Identification
Mo H.
Yang Z.
Lin F.
Wang Y.
An X.
Diangong Jishu Xuebao/Transactions of China Electrotechnical Society, 2021, 36 (19): : 4170 - 4182
[48] Hidden state and reinforcement learning with instance-based state identification
McCallum, RA
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1996, 26 (03): : 464 - 473
[49] WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning
Jia-Ke Ge
Yan-Feng Chai
Yun-Peng Chai
Journal of Computer Science and Technology, 2021, 36 : 741 - 761
[50] WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning
Ge, Jia-Ke
Chai, Yan-Feng
Chai, Yun-Peng
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2021, 36 (04) : 741 - 761

← 1 2 3 4 5 →