Decentralized graph-based multi-agent reinforcement learning using reward machines

被引:4
|
作者
Hu, Jueming [1 ]
Xu, Zhe [1 ]
Wang, Weichang [2 ]
Qu, Guannan [3 ]
Pang, Yutian [1 ]
Liu, Yongming [1 ]
机构
[1] Arizona State Univ, Sch Engn Matter Transport & Energy, Tempe, AZ 85287 USA
[2] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ USA
[3] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Decentralized; Multi-agent; Reinforcement learning; Reward machine; Efficiency; SUBGOAL AUTOMATA; ALGORITHMS; INDUCTION;
D O I
10.1016/j.neucom.2023.126974
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent's task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Integration of Decentralized Graph-Based Multi-Agent Reinforcement Learning with Digital Twin for Traffic Signal Optimization
    Kumarasamy, Vijayalakshmi K.
    Saroj, Abhilasha Jairam
    Liang, Yu
    Wu, Dalei
    Hunter, Michael P.
    Guin, Angshuman
    Sartipi, Mina
    SYMMETRY-BASEL, 2024, 16 (04):
  • [2] Learning Heterogeneous Strategies via Graph-based Multi-agent Reinforcement Learning
    Li, Yang
    Luo, Xiangfeng
    Xie, Shaorong
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 709 - 713
  • [3] Collaborative Information Dissemination with Graph-Based Multi-Agent Reinforcement Learning
    Galliera, Raffaele
    Venable, Kristen Brent
    Bassani, Matteo
    Suri, Niranjan
    ALGORITHMIC DECISION THEORY, ADT 2024, 2025, 15248 : 160 - 173
  • [4] Multi-Agent Collaborative Exploration through Graph-based Deep Reinforcement Learning
    Luo, Tianze
    Subagdja, Budhitama
    Wang, Di
    Tan, Ah-Hwee
    2019 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA), 2019, : 2 - 7
  • [5] A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning
    Pan, Wei
    Liu, Cheng
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2023, 18 (01)
  • [6] Learning Decentralized Traffic Signal Controllers With Multi-Agent Graph Reinforcement Learning
    Zhang, Yao
    Yu, Zhiwen
    Zhang, Jun
    Wang, Liang
    Luan, Tom H.
    Guo, Bin
    Yuen, Chau
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (06) : 7180 - 7195
  • [7] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
    Duc Thien Nguyen
    Yeoh, William
    Lau, Hoong Chuin
    Zilberstein, Shlomo
    Zhang, Chongjie
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1447 - 1455
  • [8] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
    Duc Thien Nguyen
    Yeoh, William
    Hoong Chuin Lau
    Zilberstein, Shlomo
    Zhang, Chongjie
    AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1341 - 1342
  • [9] Graph-based multi-agent reinforcement learning for collaborative search and tracking of multiple UAVs
    Zhao, Bocheng
    Huo, Mingying
    Li, Zheng
    Feng, Wenyu
    Yu, Ze
    Qi, Naiming
    Wang, Shaohai
    CHINESE JOURNAL OF AERONAUTICS, 2025, 38 (03)
  • [10] Graph-based Selection-Activation Reinforcement Learning for Heterogenous Multi-agent Collaboration
    Chen, Hao-Xiang
    Zhang, Xi-Wen
    Shen, Jun-Nan
    Chinese Control Conference, CCC, 2024, : 5835 - 5840