Multi-Agent Reinforcement Learning with Prospect Theory

被引：0

作者：

Danis, Dominic ^{[1
]}

Parmacek, Parker ^{[1
]}

Dunajsky, David ^{[1
]}

Ramasubramanian, Bhaskar ^{[1
]}

机构：

[1] Western Washington Univ, Elect & Comp Engn, Bellingham, WA 98225 USA

来源：

2023 PROCEEDINGS OF THE CONFERENCE ON CONTROL AND ITS APPLICATIONS, CT | 2023年

基金：

美国国家科学基金会;

关键词：

DECISION; RISK;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in cyber and cyber-physical systems have informed the development of scalable and efficient algorithms for these systems to learn behaviors when operating in uncertain and unknown environments. When such systems share their operating environments with human users, such as in autonomous driving, it is important to be able to learn behaviors for each entity in the environment that will (i) recognize presence of other entities, and (ii) be aligned with preferences of one or more human users in the environment. While multiagent reinforcement learning (MARL) provides a modeling, design, and analysis paradigm for (i), there remains a gap in the development of strategies to solve (ii). In this paper, we aim to bridge this gap through the design, analysis, and evaluation of MARL algorithms that recognize preferences of human users. We use cumulative prospect theory (CPT) to model multiple human traits such as a tendency to view gains and losses differently, and to evaluate outcomes relative to a reference point. We define a CPT-based value function, and learn agent policies as a consequence of optimizing this value function. To this end, we develop MA-CPT-Q, a multi-agent CPT-based Q-learning algorithm, and establish its convergence. We adapt this algorithm to a setting where any agent can call upon 'more experienced' agents to aid its own learning process, and propose MA-CPT-Q-WS, a multi-agent CPT-based Q-learning algorithm with weight sharing. We evaluate both algorithms in an environment where agents have to reach a target state while avoiding collisions with obstacles and with other agents. Our results show that agent behaviors after learning policies when following MA-CPT-Q and MA-CPT-Q-WS are better aligned with that of human users who might be placed in the same environment.

引用

页码：9 / 16

页数：8

共 50 条

[21] Concept Learning for Interpretable Multi-Agent Reinforcement Learning
Zabounidis, Renos
Campbell, Joseph
Stepputtis, Simon
Hughes, Dana
Sycara, Katia
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1828 - 1837
[22] Learning structured communication for multi-agent reinforcement learning
Sheng, Junjie
Wang, Xiangfeng
Jin, Bo
Yan, Junchi
Li, Wenhao
Chang, Tsung-Hui
Wang, Jun
Zha, Hongyuan
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2022, 36 (02)
[23] Learning structured communication for multi-agent reinforcement learning
Junjie Sheng
Xiangfeng Wang
Bo Jin
Junchi Yan
Wenhao Li
Tsung-Hui Chang
Jun Wang
Hongyuan Zha
Autonomous Agents and Multi-Agent Systems, 2022, 36
[24] Generalized learning automata for multi-agent reinforcement learning
De Hauwere, Yann-Michael
Vrancx, Peter
Nowe, Ann
AI COMMUNICATIONS, 2010, 23 (04) : 311 - 324
[25] Multi-agent reinforcement learning for character control
Li, Cheng
Fussell, Levi
Komura, Taku
VISUAL COMPUTER, 2021, 37 (12): : 3115 - 3123
[26] Parallel and distributed multi-agent reinforcement learning
Kaya, M
Arslan, A
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, : 437 - 441
[27] Reinforcement learning of multi-agent communicative acts
Hoet S.
Sabouret N.
Revue d'Intelligence Artificielle, 2010, 24 (02) : 159 - 188
[28] Coding for Distributed Multi-Agent Reinforcement Learning
Wang, Baoqian
Xie, Junfei
Atanasov, Nikolay
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10625 - 10631
[29] Multi-agent Reinforcement Learning in Network Management
Bagnasco, Ricardo
Serrat, Joan
SCALABILITY OF NETWORKS AND SERVICES, PROCEEDINGS, 2009, 5637 : 199 - 202
[30] Multi-agent Reinforcement Learning for Service Composition
Lei, Yu
Yu, Philip S.
PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2016), 2016, : 790 - 793

← 1 2 3 4 5 →