Multi-Agent Reinforcement Learning in Non-Cooperative Stochastic Games Using Large Language Models

被引：0

作者：

Alsadat, Shayan Meshkat ^{[1
]}

Xu, Zhe ^{[1
]}

机构：

[1] Arizona State Univ, Fac Mech Engn, Tempe, AZ 85281 USA

来源：

IEEE CONTROL SYSTEMS LETTERS | 2024年 / 8卷

关键词：

Games; Nash equilibrium; Stochastic processes; Q-learning; Convergence; Learning automata; Large language models; Trajectory; Robustness; Probabilistic logic; Reinforcement learning; large language models; stochastic games; reward machines;

D O I：

10.1109/LCSYS.2024.3515879

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the use of large language models (LLMs) to integrate high-level knowledge in stochastic games using reinforcement learning with reward machines to encode non-Markovian and Markovian reward functions. In non-cooperative games, one challenge is to provide agents with knowledge about the task efficiently to speed up the convergence to an optimal policy. We aim to provide this knowledge in the form of deterministic finite automata (DFA) generated by LLMs (LLM-generated DFA). Additionally, we use reward machines (RMs) to encode the temporal structure of the game and the non-Markovian or Markovian reward functions. Our proposed algorithm, LLM-generated DFA for Multi-agent Reinforcement Learning with Reward Machines for Stochastic Games (StochQ-RM), can learn an equivalent reward machine to the ground truth reward machine (specified task) in the environment using the LLM-generated DFA. Additionally, we propose DFA-based q-learning with reward machines (DBQRM) to find the best responses for each agent using Nash equilibrium in stochastic games. Despite the fact that the LLMs are known to hallucinate, we show that our method is robust and guaranteed to converge to an optimal policy. Furthermore, we study the performance of our proposed method in three case studies.

引用

页码：2757 / 2762

页数：6

共 50 条

[21] Cooperative perception in Vehicular Networks using Multi-Agent Reinforcement Learning
Abdel-Aziz, Mohamed K.
Samarakoon, Sumudu
Perfecto, Cristina
Bennis, Mehdi
2020 54TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2020, : 408 - 412
[22] Cooperative Multi-agent Reinforcement Learning Models (CMRLM) for Intelligent Traffic Control
Vidhate, Deepak A.
Kulkarni, Parag
2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM), 2017, : 325 - 331
[23] Multi-agent Inverse Reinforcement Learning for Certain General-Sum Stochastic Games
Lin, Xiaomin
Adams, Stephen C.
Beling, Peter A.
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 66 : 473 - 502
[24] Learning Cooperative Intrinsic Motivation in Multi-Agent Reinforcement Learning
Hong, Seung-Jin
Lee, Sang-Kwang
12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1697 - 1699
[25] Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning
Wang, Xin
Zhao, Chen
Huang, Tingwen
Chakrabarti, Prasun
Kurths, Juergen
IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2023, 9 : 13 - 23
[26] Multi-agent cooperative learning research based on reinforcement learning
Liu, Fei
Zeng, Guangzhou
2006 10TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, PROCEEDINGS, VOLS 1 AND 2, 2006, : 1408 - 1413
[27] Cooperative Multi-Agent Reinforcement Learning With Approximate Model Learning
Park, Young Joon
Lee, Young Jae
Kim, Seoung Bum
IEEE ACCESS, 2020, 8 : 125389 - 125400
[28] Cooperative Multi-Agent Reinforcement Learning with Hypergraph Convolution
Bai, Yunpeng
Gong, Chen
Zhang, Bin
Fan, Guoliang
Hou, Xinwen
Lu, Yu
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[29] Multi-agent Cooperative Search based on Reinforcement Learning
Sun, Yinjiang
Zhang, Rui
Liang, Wenbao
Xu, Cheng
PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON UNMANNED SYSTEMS (ICUS), 2020, : 891 - 896
[30] Levels of Realism for Cooperative Multi-agent Reinforcement Learning
Cunningham, Bryan
Cao, Yong
ADVANCES IN SWARM INTELLIGENCE, ICSI 2012, PT I, 2012, 7331 : 573 - 582

← 1 2 3 4 5 →