Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics

被引:169
|
作者
Li, Hongliang [1 ]
Liu, Derong [1 ]
Wang, Ding [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive critic designs; adaptive dynamic programming; approximate dynamic programming; reinforcement learning; policy iteration; zero-sum games; ADAPTIVE OPTIMAL-CONTROL; NONLINEAR-SYSTEMS; FEEDBACK-CONTROL; CONTROL SCHEME; ARCHITECTURE; MANAGEMENT; ALGORITHM; EQUATION; DESIGNS;
D O I
10.1109/TASE.2014.2300532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we develop an integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear continuous-time dynamics. This algorithm is a fully model-free method solving the game algebraic Riccati equation forward in time. The developed algorithm updates value function, control and disturbance policies simultaneously. The convergence of the algorithm is demonstrated to be equivalent to Newton's method. To implement this algorithm, one critic network and two action networks are used to approximate the game value function, control and disturbance policies, respectively, and the least squares method is used to estimate the unknown parameters. The effectiveness of the developed scheme is demonstrated in the simulation by designing an H-infinity state feedback controller for a power system. Note to Practitioners-Noncooperative zero-sum differential game provides an ideal tool to study multiplayer optimal decision and control problems. Existing approaches usually solve the Nash equilibrium solution by means of offline iterative computation, and require the exact knowledge of the system dynamics. However, it is difficult to obtain the exact knowledge of the system dynamics for many real-world industrial systems. The algorithm developed in this paper is a fully model-free method which solves the zero-sum differential game problem forward in time by making use of online measured data. This method is not affected by errors between an identification model and a real system, and responds fast to changes of the system dynamics. Exploration signals are required to satisfy the persistence of excitation condition to update the value function and the policies, and these signals do not affect the convergence of the learning process. The least squares method is used to obtain the approximate solution for the zero-sum games with unknown dynamics. The developed algorithm is applied to a load-frequency controller design for a power system whose parameters are not known a priori. In future research, we will extend the results to zero-sum and nonzero-sum differential games with completely unknown nonlinear continuous-time dynamics.
引用
收藏
页码:706 / 714
页数:9
相关论文
共 50 条
  • [1] Online Solution of Two-Player Zero-Sum Games for Continuous-Time Nonlinear Systems With Completely Unknown Dynamics
    Fu, Yue
    Chai, Tianyou
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (12) : 2577 - 2587
  • [2] Continuous-time zero-sum games with probability criterion
    Bhabak, Arnab
    Saha, Subhamay
    STOCHASTIC ANALYSIS AND APPLICATIONS, 2021, 39 (06) : 1130 - 1143
  • [3] Data-Driven Integral Reinforcement Learning for Continuous-Time Non-Zero-Sum Games
    Yang, Yongliang
    Wang, Liming
    Modares, Hamidreza
    Ding, Dawei
    Yin, Yixin
    Wunsch, Donald
    IEEE ACCESS, 2019, 7 : 82901 - 82912
  • [4] Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems
    Yasini, Sholeh
    Karimpour, Ali
    Sistani, Mohammad-Bagher Naghibi
    Modares, Hamidreza
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2015, 29 (04) : 473 - 493
  • [5] Bias and overtaking equilibria for zero-sum continuous-time Markov games
    Tomás Prieto-Rumeau
    Onésimo Hernández-Lerma
    Mathematical Methods of Operations Research, 2005, 61 : 437 - 454
  • [6] Bias and overtaking equilibria for zero-sum continuous-time Markov games
    Prieto-Rumeau, T
    Hernández-Lerma, O
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2005, 61 (03) : 437 - 454
  • [7] Extremal Shift Rule for Continuous-Time Zero-Sum Markov Games
    Yurii Averboukh
    Dynamic Games and Applications, 2017, 7 : 1 - 20
  • [8] Extremal Shift Rule for Continuous-Time Zero-Sum Markov Games
    Averboukh, Yurii
    DYNAMIC GAMES AND APPLICATIONS, 2017, 7 (01) : 1 - 20
  • [9] Event-Triggered single-network Control for Nonlinear Continuous-Time Zero-Sum Games with Partially Unknown Dynamics
    Peng, Binbin
    Cui, Xiaohong
    Wan, Zimeng
    Yu, Haijiao
    2022 1st International Conference on Cyber-Energy Systems and Intelligent Energy, ICCSIE 2022, 2023,
  • [10] Q-learning for continuous-time graphical games on large networks with completely unknown linear system dynamics
    Vamvoudakis, Kyriakos G.
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2017, 27 (16) : 2900 - 2920