Efficient exploration by switching agents according to degree of convergence of learning on Heterogeneous Multi-Agent Reinforcement Learning in Single Robot

被引：1

作者：

Narita, Riku ^{[1
]}

Matsushima, Tatsufumi ^{[2
]}

Kurashige, Kentarou ^{[1
]}

机构：

[1] Muroran Inst Technol, Div Informat & Elect Engn, Muroran, Hokkaido, Japan

[2] Panasonic Its Co Ltd, Dev Ctr 1, Sect 1, Yokohama, Kanagawa, Japan

来源：

2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021) | 2021年

关键词：

Reinforcement Learning; MARL; Explore;

D O I：

10.1109/SSCI50451.2021.9659982

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, a robot is required to perform autonomously in complex environment. Some researchers use reinforcement learning that learns actions autonomously according to environment. Reinforcement learning requires exploratory actions, but in conventional reinforcement learning it was random. Random exploratory actions are inefficient and takes a lot of time to learn. To prevent inefficient exploratory actions, we proposed a method that uses Heterogeneous Multi-Agent Reinforcement Learning system (HMARL) in previous research. HMARL enables efficient exploratory actions by using multiple agents with heterogeneous learning spaces. HMARL system is a system that performs exploratory actions using the learning of multiple agents. In addition, HMARL needs an index that autonomously selects an agent from among all the agents inside heterogeneous learning space. We propose a method to select an agent using the degree of convergence of the learning of the agents in HMARL based on the TD errors. As a result, efficient exploratory actions by multiple agents with different learning spaces was achieved. Then, experiment to compare the proposed method and the method of previous research was conducted. From experimental results, the usefulness of the proposed method has been demonstrated.

引用

页数：6

共 50 条

[21] Learning Efficient Multi-agent Cooperative Visual Exploration
Yu, Chao
Yang, Xinyi
Gao, Jiaxuan
Yang, Huazhong
Wang, Yu
Wu, Yi
COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 497 - 515
[22] Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning
Zhang, Yanqiang
Feng, Dawei
Ding, Bo
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 358 - 372
[23] UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning
Gupta, Tarun
Mahajan, Anuj
Peng, Bei
Bohmer, Wendelin
Whiteson, Shimon
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[24] Strangeness-driven exploration in multi-agent reinforcement learning
Kim, Ju-Bong
Choi, Ho-Bin
Han, Youn-Hee
NEURAL NETWORKS, 2024, 172
[25] Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods
Wang Qisheng
Wang Qichao
Li Xiao
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13949 - 13950
[26] Transfer Learning Method Using Ontology for Heterogeneous Multi-agent Reinforcement Learning
Kono, Hitoshi
Kamimura, Akiya
Tomita, Kohji
Murata, Yuta
Suzuki, Tsuyoshi
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (10) : 156 - 164
[27] Safe multi-agent reinforcement learning for multi-robot control
Gu, Shangding
Kuba, Jakub Grudzien
Chen, Yuanpei
Du, Yali
Yang, Long
Knoll, Alois
Yang, Yaodong
ARTIFICIAL INTELLIGENCE, 2023, 319
[28] Study of reinforcement learning based on multi-agent robot systems
College of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
J. Comput. Inf. Syst., 2007, 5 (2001-2006): : 2001 - 2006
[29] ACTION DISCOVERY FOR SINGLE AND MULTI-AGENT REINFORCEMENT LEARNING
Banerjee, Bikramjit
Kraemer, Landon
ADVANCES IN COMPLEX SYSTEMS, 2011, 14 (02): : 279 - 305
[30] Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems
Kapetanakis, S
Kudenko, D
ADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS II: ADAPTATION AND MULTI-AGENT LEARNING, 2005, 3394 : 119 - 131

← 1 2 3 4 5 →