Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games

被引:30
|
作者
Lian, Bosen [1 ]
Donge, Vrushabh S. [1 ]
Lewis, Frank L. [1 ]
Chai, Tianyou [2 ,3 ]
Davoudi, Ali [1 ]
机构
[1] Univ Texas Arlington, Dept Elect Engn, Arlington, TX 76019 USA
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
[3] Northeastern Univ, Int Joint Res Lab Integrated Automat, Shenyang 110819, Peoples R China
关键词
Games; Cost function; Optimal control; Heuristic algorithms; Trajectory; System dynamics; Costs; Inverse optimal control (IOC); inverse RL; nonzero-sum Nash games; off-policy; optimal control; CONTINUOUS-TIME; IDENTIFICATION;
D O I
10.1109/TNNLS.2022.3186229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article proposes a data-driven inverse reinforcement learning (RL) control algorithm for nonzero-sum multiplayer games in linear continuous-time differential dynamical systems. The inverse RL problem in the games is solved by a learner reconstructing the unknown expert players' cost functions from demonstrated expert's optimal state and control input trajectories. The learner, thus, obtains the same control feedback gains and trajectories as the expert, only using data along system trajectories without knowing system dynamics. This article first proposes a model-based inverse RL policy iteration framework that has: 1) policy evaluation step for reconstructing cost matrices using Lyapunov functions; 2) state-reward weight improvement step using inverse optimal control (IOC); and 3) policy improvement step using optimal control. Based on the model-based policy iteration algorithm, this article further develops an online data-driven off-policy inverse RL algorithm without knowing any knowledge of system dynamics or expert control gains. Rigorous convergence and stability analysis of the algorithms are provided. It shows that the off-policy inverse RL algorithm guarantees unbiased solutions while probing noises are added to satisfy the persistence of excitation (PE) condition. Finally, two different simulation examples validate the effectiveness of the proposed algorithms.
引用
收藏
页码:2028 / 2041
页数:14
相关论文
共 50 条
  • [21] Underactuated MIMO Airship Control Based on Online Data-Driven Reinforcement Learning
    Boase, Derek
    Gueaieb, Wail
    Miah, Md Suruz
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 9464 - 9471
  • [22] Model-free Data-driven Predictive Control Using Reinforcement Learning
    Sawant, Shambhuraj
    Reinhardt, Dirk
    Kordabad, Arash Bahari
    Gros, Sebastien
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4046 - 4052
  • [23] Data-driven constrained reinforcement learning for optimal control of a multistage evaporation process
    Yao, Yao
    Ding, Jinliang
    Zhao, Chunhui
    Wang, Yonggang
    Chai, Tianyou
    CONTROL ENGINEERING PRACTICE, 2022, 129
  • [24] Reinforcement Learning based Data-driven Optimal Control Strategy for Systems with Disturbance
    Fan, Zhong-Xin
    Li, Shihua
    Liu, Rongjie
    2023 IEEE 12TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE, DDCLS, 2023, : 567 - 572
  • [25] Learning Convex Piecewise Linear Machine for Data-driven Optimal Control
    Zhou, Yuxun
    Jin, Baihong
    Spanos, Costas J.
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 966 - 972
  • [26] Control analysis and synthesis of data-driven learning for uncertain linear systems
    Meng, Deyuan
    AUTOMATICA, 2023, 148
  • [27] Control analysis and synthesis of data-driven learning for uncertain linear systems
    Meng, Deyuan
    AUTOMATICA, 2023, 148
  • [28] Poisoning Attacks on Data-Driven Utility Learning in Games
    Jia, Ruoxi
    Konstantakopoulos, Ioannis C.
    Li, Bo
    Spanos, Costas
    2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 5774 - 5780
  • [29] Inverse Reinforcement Learning for Identification in Linear-Quadratic Dynamic Games
    Koepf, Florian
    Inga, Jairo
    Rothfufss, Simon
    Flad, Michael
    Hohmann, Soeren
    IFAC PAPERSONLINE, 2017, 50 (01): : 14902 - 14908
  • [30] Data-Driven Economic NMPC Using Reinforcement Learning
    Gros, Sebastien
    Zanon, Mario
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (02) : 636 - 648