Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

被引:25
|
作者
Wang, Wei [1 ,2 ]
Chen, Xin [1 ,2 ]
Fu, Hao [1 ,2 ]
Wu, Min [1 ,2 ]
机构
[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China
[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning; MULTIAGENT SYSTEMS; STABILITY; ALGORITHM; NETWORKS;
D O I
10.1080/00207721.2019.1599463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.
引用
收藏
页码:1338 / 1352
页数:15
相关论文
共 50 条
  • [41] Q-Learning: A Data Analysis Method for Constructing Adaptive Interventions
    Nahum-Shani, Inbal
    Qian, Min
    Almirall, Daniel
    Pelham, William E.
    Gnagy, Beth
    Fabiano, Gregory A.
    Waxmonsky, James G.
    Yu, Jihnhee
    Murphy, Susan A.
    PSYCHOLOGICAL METHODS, 2012, 17 (04) : 478 - 494
  • [42] Data-driven approximate Q-learning stabilization with optimality error bound analysis
    Li, Yongqiang
    Yang, Chengzan
    Hou, Zhongsheng
    Feng, Yuanjing
    Yin, Chenkun
    AUTOMATICA, 2019, 103 (435-442) : 435 - 442
  • [43] Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games
    Song, Ruizhuo
    Lewis, Frank L.
    Wei, Qinglai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 704 - 713
  • [44] Adaptive Dynamic Programming and Data-Driven Cooperative Optimal Output Regulation with Adaptive Observers
    Qasem, Omar
    Jebari, Khalid
    Gao, Weinan
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2538 - 2543
  • [45] Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design
    Bian, Tao
    Jiang, Zhong-Ping
    AUTOMATICA, 2016, 71 : 348 - 360
  • [46] Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator
    Ren, He
    Zhang, Huaguang
    Wen, Yinlei
    Liu, Chong
    NEUROCOMPUTING, 2019, 335 : 96 - 104
  • [47] Anti-lock Braking Systems Data-Driven Control Using Q-Learning
    Radac, Mircea-Bogdan
    Precup, Radu-Emil
    Roman, Raul-Cristian
    2017 IEEE 26TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2017, : 418 - 423
  • [48] Safe Q-Learning for Data-Driven Nonlinear Optimal Control with Asymmetric State Constraints
    Zhao, Mingming
    Wang, Ding
    Song, Shijie
    Qiao, Junfei
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (12) : 2408 - 2422
  • [49] Safe Q-Learning for Data-Driven Nonlinear Optimal Control With Asymmetric State Constraints
    Mingming Zhao
    Ding Wang
    Shijie Song
    Junfei Qiao
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (12) : 2408 - 2422
  • [50] An adaptive subspace data-driven method for nonlinear dynamic systems
    Sun, Chengyuan
    Kang, Haobo
    Ma, Hongjun
    Bai, Hua
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (17): : 13596 - 13623