Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games

被引:30
|
作者
Lian, Bosen [1 ]
Donge, Vrushabh S. [1 ]
Lewis, Frank L. [1 ]
Chai, Tianyou [2 ,3 ]
Davoudi, Ali [1 ]
机构
[1] Univ Texas Arlington, Dept Elect Engn, Arlington, TX 76019 USA
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
[3] Northeastern Univ, Int Joint Res Lab Integrated Automat, Shenyang 110819, Peoples R China
关键词
Games; Cost function; Optimal control; Heuristic algorithms; Trajectory; System dynamics; Costs; Inverse optimal control (IOC); inverse RL; nonzero-sum Nash games; off-policy; optimal control; CONTINUOUS-TIME; IDENTIFICATION;
D O I
10.1109/TNNLS.2022.3186229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article proposes a data-driven inverse reinforcement learning (RL) control algorithm for nonzero-sum multiplayer games in linear continuous-time differential dynamical systems. The inverse RL problem in the games is solved by a learner reconstructing the unknown expert players' cost functions from demonstrated expert's optimal state and control input trajectories. The learner, thus, obtains the same control feedback gains and trajectories as the expert, only using data along system trajectories without knowing system dynamics. This article first proposes a model-based inverse RL policy iteration framework that has: 1) policy evaluation step for reconstructing cost matrices using Lyapunov functions; 2) state-reward weight improvement step using inverse optimal control (IOC); and 3) policy improvement step using optimal control. Based on the model-based policy iteration algorithm, this article further develops an online data-driven off-policy inverse RL algorithm without knowing any knowledge of system dynamics or expert control gains. Rigorous convergence and stability analysis of the algorithms are provided. It shows that the off-policy inverse RL algorithm guarantees unbiased solutions while probing noises are added to satisfy the persistence of excitation (PE) condition. Finally, two different simulation examples validate the effectiveness of the proposed algorithms.
引用
收藏
页码:2028 / 2041
页数:14
相关论文
共 50 条
  • [1] Inverse Reinforcement Learning Control for Linear Multiplayer Games
    Lian, Bosen
    Donge, Vrushabh S.
    Lewis, Frank L.
    Chai, Tianyou
    Davoudi, Ali
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2839 - 2844
  • [2] Data-Driven Wind Farm Control via Multiplayer Deep Reinforcement Learning
    Dong, Hongyang
    Zhao, Xiaowei
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2023, 31 (03) : 1468 - 1475
  • [3] Online Data-Driven Inverse Reinforcement Learning for Deterministic Systems
    Asl, Hamed Jabbari
    Uchibe, Eiji
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 884 - 889
  • [4] Formulations for Data-Driven Control Design and Reinforcement Learning
    Lee, Donghwan
    Kim, Do Wan
    2022 IEEE 17TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA, 2022, : 207 - 212
  • [5] Data-Driven Robust Control Using Reinforcement Learning
    Ngo, Phuong D.
    Tejedor, Miguel
    Godtliebsen, Fred
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [6] Data-Driven Reinforcement Learning Control for Quadrotor Systems
    Dang, Ngoc Trung
    Dao, Phuong Nam
    INTERNATIONAL JOURNAL OF MECHANICAL ENGINEERING AND ROBOTICS RESEARCH, 2024, 13 (05): : 495 - 501
  • [7] Data-Driven Control of Hydraulic Manipulators by Reinforcement Learning
    Yao, Zhikai
    Xu, Fengyu
    Jiang, Guo-Ping
    Yao, Jianyong
    IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2024, 29 (04) : 2673 - 2684
  • [8] Certified data-driven inverse reinforcement learning of Markov jump systems
    Xue, Wenqian
    Lewis, Frank L.
    Lian, Bosen
    AUTOMATICA, 2025, 176
  • [9] On the Performance of Data-Driven Reinforcement Learning for Commercial HVAC Control
    Faddel, Samy
    Tian, Guanyu
    Zhou, Qun
    Aburub, Haneen
    2020 IEEE INDUSTRY APPLICATIONS SOCIETY ANNUAL MEETING, 2020,
  • [10] Safe Reinforcement Learning using Data-Driven Predictive Control
    Selim, Mahmoud
    Alanwar, Amr
    El-Kharashi, M. Watheq
    Abbas, Hazem M.
    Johansson, Karl H.
    2022 5TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, SIGNAL PROCESSING, AND THEIR APPLICATIONS (ICCSPA), 2022,