Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

被引：25

作者：

Wang, Wei ^{[1
,2
]}

Chen, Xin ^{[1
,2
]}

Fu, Hao ^{[1
,2
]}

Wu, Min ^{[1
,2
]}

机构：

[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China

[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Hubei, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE | 2019年 / 50卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning; MULTIAGENT SYSTEMS; STABILITY; ALGORITHM; NETWORKS;

D O I：

10.1080/00207721.2019.1599463

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.

引用

页码：1338 / 1352

页数：15

共 50 条

[21] Parallel Control for Nonzero-Sum Games With Completely Unknown Nonlinear Dynamics via Reinforcement Learning
Lu, Jingwei
Wei, Qinglai
Wang, Fei-Yue
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025,
[22] An Unknown Multiplayer Nonzero-Sum Game: Prescribed-Time Dynamic Event-Triggered Control via Adaptive Dynamic Programming
Zhang, Kun
Zhang, Zhi-Xuan
Xie, Xiang Peng
Rubio, Jose de Jesus
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,
[23] Adaptive Dynamic Programming-based Self-triggered Optimal Control for Nonzero-sum Games of Nonlinear Systems with Constrained State
Zhi, Gan
Su, Hanguang
Wang, Rui
Ma, Dazhong
2024 3RD CONFERENCE ON FULLY ACTUATED SYSTEM THEORY AND APPLICATIONS, FASTA 2024, 2024, : 1158 - 1163
[24] Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning
Yang, Yongliang
Zhang, Sen
Dong, Jie
Yin, Yixin
IEEE ACCESS, 2020, 8 : 14074 - 14088
[25] Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data
Lewis, F. L.
Vamvoudakis, Kyriakos G.
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01): : 14 - 25
[26] Online Monitoring of Heterogeneous Partially Observable Data Streams Based on Q-Learning
Li, Haoqian
Ye, Honghan
Cheng, Jing-Ru C.
Liu, Kaibo
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, : 1 - 16
[27] Dynamic Event-Driven ADP for N-Player Nonzero-Sum Games of Constrained Nonlinear Systems
Guo, Siyu
Pan, Yingnan
Li, Hongyi
Cao, Liang
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,
[28] Data-Driven Optimal Controller Design for Maglev Train: Q-Learning Method
Xin, Liang
Jiang, Hongwei
Wen, Tao
Long, Zhiqiang
2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 1289 - 1294
[29] Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games
Xinxing Li
Zhihong Peng
Lei Jiao
Lele Xi
Junqi Cai
Science China Information Sciences, 2019, 62
[30] Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games
Xinxing LI
Zhihong PENG
Lei JIAO
Lele XI
Junqi CAI
ScienceChina(InformationSciences), 2019, 62 (12) : 164 - 177

← 1 2 3 4 5 →