Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

被引:25
|
作者
Wang, Wei [1 ,2 ]
Chen, Xin [1 ,2 ]
Fu, Hao [1 ,2 ]
Wu, Min [1 ,2 ]
机构
[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China
[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning; MULTIAGENT SYSTEMS; STABILITY; ALGORITHM; NETWORKS;
D O I
10.1080/00207721.2019.1599463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.
引用
收藏
页码:1338 / 1352
页数:15
相关论文
共 50 条
  • [21] Parallel Control for Nonzero-Sum Games With Completely Unknown Nonlinear Dynamics via Reinforcement Learning
    Lu, Jingwei
    Wei, Qinglai
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025,
  • [22] An Unknown Multiplayer Nonzero-Sum Game: Prescribed-Time Dynamic Event-Triggered Control via Adaptive Dynamic Programming
    Zhang, Kun
    Zhang, Zhi-Xuan
    Xie, Xiang Peng
    Rubio, Jose de Jesus
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,
  • [23] Adaptive Dynamic Programming-based Self-triggered Optimal Control for Nonzero-sum Games of Nonlinear Systems with Constrained State
    Zhi, Gan
    Su, Hanguang
    Wang, Rui
    Ma, Dazhong
    2024 3RD CONFERENCE ON FULLY ACTUATED SYSTEM THEORY AND APPLICATIONS, FASTA 2024, 2024, : 1158 - 1163
  • [24] Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning
    Yang, Yongliang
    Zhang, Sen
    Dong, Jie
    Yin, Yixin
    IEEE ACCESS, 2020, 8 : 14074 - 14088
  • [25] Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data
    Lewis, F. L.
    Vamvoudakis, Kyriakos G.
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01): : 14 - 25
  • [26] Online Monitoring of Heterogeneous Partially Observable Data Streams Based on Q-Learning
    Li, Haoqian
    Ye, Honghan
    Cheng, Jing-Ru C.
    Liu, Kaibo
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, : 1 - 16
  • [27] Dynamic Event-Driven ADP for N-Player Nonzero-Sum Games of Constrained Nonlinear Systems
    Guo, Siyu
    Pan, Yingnan
    Li, Hongyi
    Cao, Liang
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,
  • [28] Data-Driven Optimal Controller Design for Maglev Train: Q-Learning Method
    Xin, Liang
    Jiang, Hongwei
    Wen, Tao
    Long, Zhiqiang
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 1289 - 1294
  • [29] Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games
    Xinxing Li
    Zhihong Peng
    Lei Jiao
    Lele Xi
    Junqi Cai
    Science China Information Sciences, 2019, 62
  • [30] Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games
    Xinxing LI
    Zhihong PENG
    Lei JIAO
    Lele XI
    Junqi CAI
    ScienceChina(InformationSciences), 2019, 62 (12) : 164 - 177