Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

被引:25
|
作者
Wang, Wei [1 ,2 ]
Chen, Xin [1 ,2 ]
Fu, Hao [1 ,2 ]
Wu, Min [1 ,2 ]
机构
[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China
[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning; MULTIAGENT SYSTEMS; STABILITY; ALGORITHM; NETWORKS;
D O I
10.1080/00207721.2019.1599463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.
引用
收藏
页码:1338 / 1352
页数:15
相关论文
共 50 条
  • [31] Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games
    Li, Xinxing
    Peng, Zhihong
    Jiao, Lei
    Xi, Lele
    Cai, Junqi
    SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (12)
  • [32] Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs
    Cui, Xiaohong
    Zhang, Huaguang
    Luo, Yanhong
    Zu, Peifu
    NEUROCOMPUTING, 2016, 185 : 37 - 44
  • [33] (Data-Driven) Development of dynamic scheduling in semiconductor manufacturing using a Q-learning approach
    Shiue, Yeou-Ren
    Lee, Ken-Chuan
    Su, Chao-Ton
    INTERNATIONAL JOURNAL OF COMPUTER INTEGRATED MANUFACTURING, 2022, 35 (10-11) : 1188 - 1204
  • [34] Event-Triggered Data-Driven Control of Nonlinear Systems via Q-Learning
    Shen, Mouquan
    Wang, Xianming
    Zhu, Song
    Huang, Tingwen
    Wang, Qing-Guo
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025, 55 (02): : 1069 - 1077
  • [35] Fuzzy adaptive Q-learning method with dynamic learning parameters
    Maeda, Y
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 2778 - 2780
  • [36] A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems
    Lin, Mingduo
    Liu, Derong
    Zhao, Bo
    Dai, Qionghai
    Dong, Yi
    2019 9TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST2019), 2019, : 6 - 10
  • [37] Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems
    Jiang, He
    Zhang, Huaguang
    Zhang, Kun
    Cui, Xiaohong
    NEUROCOMPUTING, 2018, 275 : 649 - 658
  • [38] Particle Swarm optimization-Based Neuro-Dynamic Programming for Nonzero-Sum Games of Multi-Player Nonlinear Systems
    Wu, Qiuye
    Zhao, Bo
    Liu, Derong
    2022 IEEE International Conference on Real-Time Computing and Robotics, RCAR 2022, 2022, : 733 - 737
  • [39] Distributed adaptive dynamic programming for data-driven optimal control
    Tang, Wentao
    Daoutidis, Prodromos
    SYSTEMS & CONTROL LETTERS, 2018, 120 : 36 - 43
  • [40] Online event-based adaptive critic design with experience replay to solve partially unknown multi-player nonzero-sum games
    Liu, Pengda
    Zhang, Huaguang
    Su, Hanguang
    Ren, He
    NEUROCOMPUTING, 2021, 458 : 219 - 231