Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

被引：25

作者：

Wang, Wei ^{[1
,2
]}

Chen, Xin ^{[1
,2
]}

Fu, Hao ^{[1
,2
]}

Wu, Min ^{[1
,2
]}

机构：

[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China

[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Hubei, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE | 2019年 / 50卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning; MULTIAGENT SYSTEMS; STABILITY; ALGORITHM; NETWORKS;

D O I：

10.1080/00207721.2019.1599463

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.

引用

页码：1338 / 1352

页数：15

共 50 条

[1] Data-Driven Adaptive Dynamic Programming for Two-Player Nonzero-Sum Game
Zhang, Qichao
Zhao, Dongbin
Zhou, Yafei
2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 3445 - 3450
[2] Data-Driven Partially Observable Dynamic Processes Using Adaptive Dynamic Programming
Zhong, Xiangnan
Ni, Zhen
Tang, Yufei
He, Haibo
2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2014, : 156 - 163
[3] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
Xinxing Li
Zhihong Peng
Li Liang
Wenzhong Zha
Science China Information Sciences, 2019, 62
[4] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
Xinxing LI
Zhihong PENG
Li LIANG
Wenzhong ZHA
ScienceChina(InformationSciences), 2019, 62 (05) : 195 - 213
[5] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
Li, Xinxing
Peng, Zhihong
Liang, Li
Zha, Wenzhong
SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (05)
[6] Q-Learning for Feedback Nash Strategy of Finite-Horizon Nonzero-Sum Difference Games
Zhang, Zhaorong
Xu, Juanjuan
Fu, Minyue
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9170 - 9178
[7] Optimal Regulation Strategy for Nonzero-Sum Games of the Immune System Using Adaptive Dynamic Programming
Sun, Jiayue
Zhang, Huaguang
Yan, Ying
Xu, Shun
Fan, Xiaoxi
IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (03) : 1475 - 1484
[8] An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics
Zhang, Bao-Qiang
Wang, Bing-Chang
Cao, Ying
JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2024, 37 (05) : 1907 - 1922
[9] An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics
ZHANG Bao-Qiang
WANG Bing-Chang
CAO Ying
JournalofSystemsScience&Complexity, 2024, 37 (05) : 1907 - 1922
[10] Off-policy based adaptive dynamic programming method for nonzero-sum games on discrete-time system
Wen, Yinlei
Zhang, Huaguang
Ren, He
Zhang, Kun
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2020, 357 (12): : 8059 - 8081

← 1 2 3 4 5 →