Decentralized graph-based multi-agent reinforcement learning using reward machines

被引：4

作者：

Hu, Jueming ^{[1
]}

Xu, Zhe ^{[1
]}

Wang, Weichang ^{[2
]}

Qu, Guannan ^{[3
]}

Pang, Yutian ^{[1
]}

Liu, Yongming ^{[1
]}

机构：

[1] Arizona State Univ, Sch Engn Matter Transport & Energy, Tempe, AZ 85287 USA

[2] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ USA

[3] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA

来源：

NEUROCOMPUTING | 2024年 / 564卷

基金：

美国国家科学基金会;

关键词：

Decentralized; Multi-agent; Reinforcement learning; Reward machine; Efficiency; SUBGOAL AUTOMATA; ALGORITHMS; INDUCTION;

D O I：

10.1016/j.neucom.2023.126974

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent's task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation.

引用

页数：11

共 50 条

[1] Integration of Decentralized Graph-Based Multi-Agent Reinforcement Learning with Digital Twin for Traffic Signal Optimization
Kumarasamy, Vijayalakshmi K.
Saroj, Abhilasha Jairam
Liang, Yu
Wu, Dalei
Hunter, Michael P.
Guin, Angshuman
Sartipi, Mina
SYMMETRY-BASEL, 2024, 16 (04):
[2] Learning Heterogeneous Strategies via Graph-based Multi-agent Reinforcement Learning
Li, Yang
Luo, Xiangfeng
Xie, Shaorong
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 709 - 713
[3] Collaborative Information Dissemination with Graph-Based Multi-Agent Reinforcement Learning
Galliera, Raffaele
Venable, Kristen Brent
Bassani, Matteo
Suri, Niranjan
ALGORITHMIC DECISION THEORY, ADT 2024, 2025, 15248 : 160 - 173
[4] Multi-Agent Collaborative Exploration through Graph-based Deep Reinforcement Learning
Luo, Tianze
Subagdja, Budhitama
Wang, Di
Tan, Ah-Hwee
2019 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA), 2019, : 2 - 7
[5] A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning
Pan, Wei
Liu, Cheng
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2023, 18 (01)
[6] Learning Decentralized Traffic Signal Controllers With Multi-Agent Graph Reinforcement Learning
Zhang, Yao
Yu, Zhiwen
Zhang, Jun
Wang, Liang
Luan, Tom H.
Guo, Bin
Yuen, Chau
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (06) : 7180 - 7195
[7] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
Duc Thien Nguyen
Yeoh, William
Lau, Hoong Chuin
Zilberstein, Shlomo
Zhang, Chongjie
PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1447 - 1455
[8] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
Duc Thien Nguyen
Yeoh, William
Hoong Chuin Lau
Zilberstein, Shlomo
Zhang, Chongjie
AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1341 - 1342
[9] Graph-based multi-agent reinforcement learning for collaborative search and tracking of multiple UAVs
Zhao, Bocheng
Huo, Mingying
Li, Zheng
Feng, Wenyu
Yu, Ze
Qi, Naiming
Wang, Shaohai
CHINESE JOURNAL OF AERONAUTICS, 2025, 38 (03)
[10] Graph-based Selection-Activation Reinforcement Learning for Heterogenous Multi-agent Collaboration
Chen, Hao-Xiang
Zhang, Xi-Wen
Shen, Jun-Nan
Chinese Control Conference, CCC, 2024, : 5835 - 5840

← 1 2 3 4 5 →