Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

被引:25
|
作者
Wang, Wei [1 ,2 ]
Chen, Xin [1 ,2 ]
Fu, Hao [1 ,2 ]
Wu, Min [1 ,2 ]
机构
[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China
[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning; MULTIAGENT SYSTEMS; STABILITY; ALGORITHM; NETWORKS;
D O I
10.1080/00207721.2019.1599463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.
引用
收藏
页码:1338 / 1352
页数:15
相关论文
共 50 条
  • [1] Data-Driven Adaptive Dynamic Programming for Two-Player Nonzero-Sum Game
    Zhang, Qichao
    Zhao, Dongbin
    Zhou, Yafei
    2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 3445 - 3450
  • [2] Data-Driven Partially Observable Dynamic Processes Using Adaptive Dynamic Programming
    Zhong, Xiangnan
    Ni, Zhen
    Tang, Yufei
    He, Haibo
    2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2014, : 156 - 163
  • [3] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Xinxing Li
    Zhihong Peng
    Li Liang
    Wenzhong Zha
    Science China Information Sciences, 2019, 62
  • [4] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Xinxing LI
    Zhihong PENG
    Li LIANG
    Wenzhong ZHA
    ScienceChina(InformationSciences), 2019, 62 (05) : 195 - 213
  • [5] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Li, Xinxing
    Peng, Zhihong
    Liang, Li
    Zha, Wenzhong
    SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (05)
  • [6] Q-Learning for Feedback Nash Strategy of Finite-Horizon Nonzero-Sum Difference Games
    Zhang, Zhaorong
    Xu, Juanjuan
    Fu, Minyue
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9170 - 9178
  • [7] Optimal Regulation Strategy for Nonzero-Sum Games of the Immune System Using Adaptive Dynamic Programming
    Sun, Jiayue
    Zhang, Huaguang
    Yan, Ying
    Xu, Shun
    Fan, Xiaoxi
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (03) : 1475 - 1484
  • [8] An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics
    Zhang, Bao-Qiang
    Wang, Bing-Chang
    Cao, Ying
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2024, 37 (05) : 1907 - 1922
  • [9] An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics
    ZHANG Bao-Qiang
    WANG Bing-Chang
    CAO Ying
    JournalofSystemsScience&Complexity, 2024, 37 (05) : 1907 - 1922
  • [10] Off-policy based adaptive dynamic programming method for nonzero-sum games on discrete-time system
    Wen, Yinlei
    Zhang, Huaguang
    Ren, He
    Zhang, Kun
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2020, 357 (12): : 8059 - 8081