I2Q: A Fully Decentralized Q-Learning Algorithm

被引：0

作者：

Jiang, Jiechuan ^{[1
]}

Lu, Zongqing ^{[1
]}

机构：

[1] Peking Univ, Sch Comp Sci, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fully decentralized multi-agent reinforcement learning has shown great potential for many real-world cooperative tasks, where the global information, e.g., the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy. Empirically, we show that I2Q can achieve remarkable improvement in a variety of cooperative multi-agent tasks.

引用

页数：13

共 50 条

[31] Contextual Q-Learning
Pinto, Tiago
Vale, Zita
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
[32] Q-PSP learning: An exploitation-oriented Q-learning algorithm and its applications
Horiuchi, T
Fujino, A
Katai, O
Sawaragi, T
1996 IEEE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION (ICEC '96), PROCEEDINGS OF, 1996, : 76 - 81
[33] CVaR Q-Learning
Stanko, Silvestr
Macek, Karel
COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358
[34] Bayesian Q-learning
Dearden, R
Friedman, N
Russell, S
FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 761 - 768
[35] Zap Q-Learning
Devraj, Adithya M.
Meyn, Sean P.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[36] Fuzzy Q-learning
Glorennec, PY
Jouffe, L
PROCEEDINGS OF THE SIXTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS I - III, 1997, : 659 - 662
[37] Convex Q-Learning
Lu, Fan
Mehta, Prashant G.
Meyn, Sean P.
Neu, Gergely
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 4749 - 4756
[38] Q-learning and robotics
Touzet, CF
Santos, JM
SIMULATION IN INDUSTRY 2001, 2001, : 685 - 689
[39] Neural Q-learning
Stephan ten Hagen
Ben Kröse
Neural Computing & Applications, 2003, 12 : 81 - 88
[40] Robust Q-Learning
Ertefaie, Ashkan
McKay, James R.
Oslin, David
Strawderman, Robert L.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 368 - 381

← 1 2 3 4 5 →