Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

被引:0
|
作者
Xinxing Li
Zhihong Peng
Li Liang
Wenzhong Zha
机构
[1] Beijing Institute of Technology,School of Automation
[2] State Key Laboratory of Intelligent Control and Decision of Complex System,Information Science Academy
[3] China Electronics Technology Group Corporation,undefined
来源
关键词
adaptive dynamic programming; ADP; Q-learning; reinforcement learning; RL; linear nonzero-sum quadratic differential games; policy iteration; PI; off-policy;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a policy iteration-based Q-learning algorithm is proposed to solve infinite horizon linear nonzero-sum quadratic differential games with completely unknown dynamics. The Q-learning algorithm, which employs off-policy reinforcement learning (RL), can learn the Nash equilibrium and the corresponding value functions online, using the data sets generated by behavior policies. First, we prove equivalence between the proposed off-policy Q-learning algorithm and an offline PI algorithm by selecting specific initially admissible polices that can be learned online. Then, the convergence of the off-policy Q-learning algorithm is proved under a mild rank condition that can be easily met by injecting appropriate probing noises into behavior policies. The generated data sets can be repeatedly used during the learning process, which is computationally effective. The simulation results demonstrate the effectiveness of the proposed Q-learning algorithm.
引用
收藏
相关论文
共 50 条
  • [1] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Xinxing LI
    Zhihong PENG
    Li LIANG
    Wenzhong ZHA
    ScienceChina(InformationSciences), 2019, 62 (05) : 195 - 213
  • [2] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Li, Xinxing
    Peng, Zhihong
    Liang, Li
    Zha, Wenzhong
    SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (05)
  • [3] An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics
    Zhang, Bao-Qiang
    Wang, Bing-Chang
    Cao, Ying
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2024, 37 (05) : 1907 - 1922
  • [4] An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics
    ZHANG Bao-Qiang
    WANG Bing-Chang
    CAO Ying
    Journal of Systems Science & Complexity, 2024, 37 (05) : 1907 - 1922
  • [5] Stochastic adaptive linear quadratic nonzero-sum differential games
    Tian, Xiu-Qin
    Liu, Shu-Jun
    Yang, Xue
    APPLIED MATHEMATICS AND COMPUTATION, 2024, 477
  • [6] Linear quadratic nonzero-sum differential games with random jumps
    Wu, Z
    Yu, ZY
    APPLIED MATHEMATICS AND MECHANICS-ENGLISH EDITION, 2005, 26 (08) : 1034 - 1039
  • [7] Linear quadratic nonzero-sum differential games with random jumps
    Wu Zhen
    Wu Zhi-yong
    Applied Mathematics and Mechanics, 2005, 26 (8) : 1034 - 1039
  • [8] LINEAR QUADRATIC NONZERO-SUM DIFFERENTIAL GAMES WITH RANDOM JUMPS
    吴臻
    于志勇
    AppliedMathematicsandMechanics(EnglishEdition), 2005, (08) : 1034 - 1039
  • [9] Infinite Time Nonzero-Sum Linear Quadratic Stochastic Differential Games
    Sun Huiying
    Li Meng
    Zhang Weihai
    PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 1081 - 1084
  • [10] Policy iteration algorithm for nonzero-sum games with unknown models
    School of Information Science & Engineering, Northeastern University, Shenyang
    110819, China
    不详
    110000, China
    Dongbei Daxue Xuebao, 3 (318-321 and 326):