Discrete-time dynamic graphical games:model-free reinforcement learning solution

被引：4

作者：

Mohammed I.ABOUHEAF ^{[1
]}

Frank L.LEWIS ^{[2
,3
]}

Magdi S.MAHMOUD ^{[1
]}

Dariusz G.MIKULSKI ^{[4
]}

机构：

[1] Systems Engineering Department, King Fahd University of Petroleum & Mineral

[2] UTA Research Institute, University of Texas at Arlington,Fort Worth, Texas, U.S.A.

[3] State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University

来源：

Control Theory and Technology | 2015年 / 13卷 / 01期

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Dynamic graphical games; Nash equilibrium; discrete mechanics; optimal control; model-free reinforcement learning; policy iteration;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from multi-agent dynamical systems, where pinning control is used to make all the agents synchronize to the state of a command generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents’ dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.

引用

页码：55 / 69

页数：15

共 50 条

[31] Model-free Reinforcement Learning for Non-stationary Mean Field Games
Mishra, Rajesh K.
Vasal, Deepanshu
Vishwanath, Sriram
2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 1032 - 1037
[32] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
Al-Tamimi, Asma
Lewis, Frank L.
Abu-Khalaf, Murad
AUTOMATICA, 2007, 43 (03) : 473 - 481
[33] H∞ Tracking Control for Linear Discrete-Time Systems: Model-Free Q-Learning Designs
Yang, Yunjie
Wan, Yan
Zhu, Jihong
Lewis, Frank L.
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (01): : 175 - 180
[34] Hierarchical Dynamic Power Management Using Model-Free Reinforcement Learning
Wang, Yanzhi
Triki, Maryam
Lin, Xue
Ammari, Ahmed C.
Pedram, Massoud
PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2013), 2013, : 170 - 177
[35] Finite-Time Model-Free Adaptive Control for Discrete-Time Nonlinear Systems
Weng, Yongpeng
Zhang, Qiuxia
Cao, Jinde
Yan, Huaicheng
Qi, Wenhai
Cheng, Jun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (11) : 4113 - 4117
[36] Output synchronization of heterogeneous discrete-time systems: A model-free optimal approach*
Kiumarsi, Bahare
Lewis, Frank L.
AUTOMATICA, 2017, 84 : 86 - 94
[37] Model-free optimal control of discrete-time systems with additive and multiplicative noises
Lai, Jing
Xiong, Junlin
Shu, Zhan
AUTOMATICA, 2023, 147
[38] Compact Model-Free Adaptive Control Algorithm for Discrete-Time Nonlinear Systems
Zhang, Xiaofei
Ma, Hongbin
Zhang, Xinghong
Li, You
IEEE ACCESS, 2019, 7 : 141062 - 141071
[39] Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions
Wei, Qinglai
Zhang, Huaguang
Dai, Jing
NEUROCOMPUTING, 2009, 72 (7-9) : 1839 - 1848
[40] Model-Free Reinforcement Learning for Nonlinear Zero-Sum Games with Simultaneous Explorations
Zhang, Qichao
Zhao, Donghin
Zhu, Yuanheng
Chen, Xi
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 4533 - 4538

← 1 2 3 4 5 →