Multi-Agent Reinforcement Learning with Prospect Theory

被引：0

作者：

Danis, Dominic ^{[1
]}

Parmacek, Parker ^{[1
]}

Dunajsky, David ^{[1
]}

Ramasubramanian, Bhaskar ^{[1
]}

机构：

[1] Western Washington Univ, Elect & Comp Engn, Bellingham, WA 98225 USA

来源：

2023 PROCEEDINGS OF THE CONFERENCE ON CONTROL AND ITS APPLICATIONS, CT | 2023年

基金：

美国国家科学基金会;

关键词：

DECISION; RISK;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in cyber and cyber-physical systems have informed the development of scalable and efficient algorithms for these systems to learn behaviors when operating in uncertain and unknown environments. When such systems share their operating environments with human users, such as in autonomous driving, it is important to be able to learn behaviors for each entity in the environment that will (i) recognize presence of other entities, and (ii) be aligned with preferences of one or more human users in the environment. While multiagent reinforcement learning (MARL) provides a modeling, design, and analysis paradigm for (i), there remains a gap in the development of strategies to solve (ii). In this paper, we aim to bridge this gap through the design, analysis, and evaluation of MARL algorithms that recognize preferences of human users. We use cumulative prospect theory (CPT) to model multiple human traits such as a tendency to view gains and losses differently, and to evaluate outcomes relative to a reference point. We define a CPT-based value function, and learn agent policies as a consequence of optimizing this value function. To this end, we develop MA-CPT-Q, a multi-agent CPT-based Q-learning algorithm, and establish its convergence. We adapt this algorithm to a setting where any agent can call upon 'more experienced' agents to aid its own learning process, and propose MA-CPT-Q-WS, a multi-agent CPT-based Q-learning algorithm with weight sharing. We evaluate both algorithms in an environment where agents have to reach a target state while avoiding collisions with obstacles and with other agents. Our results show that agent behaviors after learning policies when following MA-CPT-Q and MA-CPT-Q-WS are better aligned with that of human users who might be placed in the same environment.

引用

页码：9 / 16

页数：8

共 50 条

[41] On Centralized Critics in Multi-Agent Reinforcement Learning
Lyu, Xueguang
Baisero, Andrea
Xiao, Yuchen
Daley, Brett
Amato, Christopher
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2023, 77 : 295 - 354
[42] Deep Multi-Agent Reinforcement Learning: A Survey
Liang X.-X.
Feng Y.-H.
Ma Y.
Cheng G.-Q.
Huang J.-C.
Wang Q.
Zhou Y.-Z.
Liu Z.
Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (12): : 2537 - 2557
[43] Coordination as inference in multi-agent reinforcement learning
Li, Zhiyuan
Wu, Lijun
Su, Kaile
Wu, Wei
Jing, Yulin
Wu, Tong
Duan, Weiwei
Yue, Xiaofeng
Tong, Xiyi
Han, Yizhou
NEURAL NETWORKS, 2024, 172
[44] Multi-agent reinforcement learning: weighting and partitioning
Sun, R
Peterson, T
NEURAL NETWORKS, 1999, 12 (4-5) : 727 - 753
[45] Multi-Agent Reinforcement Learning and Chimpanzee Hunting
Sauter, Michael Z.
Shi, Dongqing
Kralik, Jerald D.
2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2009), VOLS 1-4, 2009, : 622 - 626
[46] A modular approach to multi-agent reinforcement learning
Ono, N
Fukumoto, K
DISTRIBUTED ARTIFICIAL INTELLIGENCE MEETS MACHINE LEARNING: LEARNING IN MULTI-AGENT ENVIRONMENTS, 1997, 1221 : 25 - 39
[47] Lenient Multi-Agent Deep Reinforcement Learning
Palmer, Gregory
Tuyls, Karl
Bloembergen, Daan
Savani, Rahul
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 443 - 451
[48] Multi-agent deep reinforcement learning: a survey
Gronauer, Sven
Diepold, Klaus
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (02) : 895 - 943
[49] AUTOTELIC REINFORCEMENT LEARNING IN MULTI-AGENT ENVIRONMENTS
Nisioti, Eleni
Masquil, Elias
Hamon, Gautier
Moulin-Frier, Clement
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 137 - 161
[50] Experience generalization for multi-agent reinforcement learning
Pegoraro, R
Costa, AHR
Ribeiro, CHC
SCCC 2001: XXI INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, PROCEEDINGS, 2001, : 233 - 239

← 1 2 3 4 5 →