Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

被引：25

作者：

Wang, Wei ^{[1
,2
]}

Chen, Xin ^{[1
,2
]}

Fu, Hao ^{[1
,2
]}

Wu, Min ^{[1
,2
]}

机构：

[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China

[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Hubei, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE | 2019年 / 50卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning; MULTIAGENT SYSTEMS; STABILITY; ALGORITHM; NETWORKS;

D O I：

10.1080/00207721.2019.1599463

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.

引用

页码：1338 / 1352

页数：15

共 50 条

[41] Q-Learning: A Data Analysis Method for Constructing Adaptive Interventions
Nahum-Shani, Inbal
Qian, Min
Almirall, Daniel
Pelham, William E.
Gnagy, Beth
Fabiano, Gregory A.
Waxmonsky, James G.
Yu, Jihnhee
Murphy, Susan A.
PSYCHOLOGICAL METHODS, 2012, 17 (04) : 478 - 494
[42] Data-driven approximate Q-learning stabilization with optimality error bound analysis
Li, Yongqiang
Yang, Chengzan
Hou, Zhongsheng
Feng, Yuanjing
Yin, Chenkun
AUTOMATICA, 2019, 103 (435-442) : 435 - 442
[43] Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games
Song, Ruizhuo
Lewis, Frank L.
Wei, Qinglai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 704 - 713
[44] Adaptive Dynamic Programming and Data-Driven Cooperative Optimal Output Regulation with Adaptive Observers
Qasem, Omar
Jebari, Khalid
Gao, Weinan
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2538 - 2543
[45] Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design
Bian, Tao
Jiang, Zhong-Ping
AUTOMATICA, 2016, 71 : 348 - 360
[46] Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator
Ren, He
Zhang, Huaguang
Wen, Yinlei
Liu, Chong
NEUROCOMPUTING, 2019, 335 : 96 - 104
[47] Anti-lock Braking Systems Data-Driven Control Using Q-Learning
Radac, Mircea-Bogdan
Precup, Radu-Emil
Roman, Raul-Cristian
2017 IEEE 26TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2017, : 418 - 423
[48] Safe Q-Learning for Data-Driven Nonlinear Optimal Control with Asymmetric State Constraints
Zhao, Mingming
Wang, Ding
Song, Shijie
Qiao, Junfei
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (12) : 2408 - 2422
[49] Safe Q-Learning for Data-Driven Nonlinear Optimal Control With Asymmetric State Constraints
Mingming Zhao
Ding Wang
Shijie Song
Junfei Qiao
IEEE/CAA Journal of Automatica Sinica, 2024, 11 (12) : 2408 - 2422
[50] An adaptive subspace data-driven method for nonlinear dynamic systems
Sun, Chengyuan
Kang, Haobo
Ma, Hongjun
Bai, Hua
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (17): : 13596 - 13623

← 1 2 3 4 5 →