Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

被引：25

作者：

Wang, Wei ^{[1
,2
]}

Chen, Xin ^{[1
,2
]}

Fu, Hao ^{[1
,2
]}

Wu, Min ^{[1
,2
]}

机构：

[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China

[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Hubei, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE | 2019年 / 50卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning; MULTIAGENT SYSTEMS; STABILITY; ALGORITHM; NETWORKS;

D O I：

10.1080/00207721.2019.1599463

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.

引用

页码：1338 / 1352

页数：15

共 50 条

[31] Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games
Li, Xinxing
Peng, Zhihong
Jiao, Lei
Xi, Lele
Cai, Junqi
SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (12)
[32] Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs
Cui, Xiaohong
Zhang, Huaguang
Luo, Yanhong
Zu, Peifu
NEUROCOMPUTING, 2016, 185 : 37 - 44
[33] (Data-Driven) Development of dynamic scheduling in semiconductor manufacturing using a Q-learning approach
Shiue, Yeou-Ren
Lee, Ken-Chuan
Su, Chao-Ton
INTERNATIONAL JOURNAL OF COMPUTER INTEGRATED MANUFACTURING, 2022, 35 (10-11) : 1188 - 1204
[34] Event-Triggered Data-Driven Control of Nonlinear Systems via Q-Learning
Shen, Mouquan
Wang, Xianming
Zhu, Song
Huang, Tingwen
Wang, Qing-Guo
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025, 55 (02): : 1069 - 1077
[35] Fuzzy adaptive Q-learning method with dynamic learning parameters
Maeda, Y
JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 2778 - 2780
[36] A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems
Lin, Mingduo
Liu, Derong
Zhao, Bo
Dai, Qionghai
Dong, Yi
2019 9TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST2019), 2019, : 6 - 10
[37] Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems
Jiang, He
Zhang, Huaguang
Zhang, Kun
Cui, Xiaohong
NEUROCOMPUTING, 2018, 275 : 649 - 658
[38] Particle Swarm optimization-Based Neuro-Dynamic Programming for Nonzero-Sum Games of Multi-Player Nonlinear Systems
Wu, Qiuye
Zhao, Bo
Liu, Derong
2022 IEEE International Conference on Real-Time Computing and Robotics, RCAR 2022, 2022, : 733 - 737
[39] Distributed adaptive dynamic programming for data-driven optimal control
Tang, Wentao
Daoutidis, Prodromos
SYSTEMS & CONTROL LETTERS, 2018, 120 : 36 - 43
[40] Online event-based adaptive critic design with experience replay to solve partially unknown multi-player nonzero-sum games
Liu, Pengda
Zhang, Huaguang
Su, Hanguang
Ren, He
NEUROCOMPUTING, 2021, 458 : 219 - 231

← 1 2 3 4 5 →