Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games

被引:30
|
作者
Lian, Bosen [1 ]
Donge, Vrushabh S. [1 ]
Lewis, Frank L. [1 ]
Chai, Tianyou [2 ,3 ]
Davoudi, Ali [1 ]
机构
[1] Univ Texas Arlington, Dept Elect Engn, Arlington, TX 76019 USA
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
[3] Northeastern Univ, Int Joint Res Lab Integrated Automat, Shenyang 110819, Peoples R China
关键词
Games; Cost function; Optimal control; Heuristic algorithms; Trajectory; System dynamics; Costs; Inverse optimal control (IOC); inverse RL; nonzero-sum Nash games; off-policy; optimal control; CONTINUOUS-TIME; IDENTIFICATION;
D O I
10.1109/TNNLS.2022.3186229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article proposes a data-driven inverse reinforcement learning (RL) control algorithm for nonzero-sum multiplayer games in linear continuous-time differential dynamical systems. The inverse RL problem in the games is solved by a learner reconstructing the unknown expert players' cost functions from demonstrated expert's optimal state and control input trajectories. The learner, thus, obtains the same control feedback gains and trajectories as the expert, only using data along system trajectories without knowing system dynamics. This article first proposes a model-based inverse RL policy iteration framework that has: 1) policy evaluation step for reconstructing cost matrices using Lyapunov functions; 2) state-reward weight improvement step using inverse optimal control (IOC); and 3) policy improvement step using optimal control. Based on the model-based policy iteration algorithm, this article further develops an online data-driven off-policy inverse RL algorithm without knowing any knowledge of system dynamics or expert control gains. Rigorous convergence and stability analysis of the algorithms are provided. It shows that the off-policy inverse RL algorithm guarantees unbiased solutions while probing noises are added to satisfy the persistence of excitation (PE) condition. Finally, two different simulation examples validate the effectiveness of the proposed algorithms.
引用
收藏
页码:2028 / 2041
页数:14
相关论文
共 50 条
  • [31] Data-driven crowd evacuation: A reinforcement learning method
    Yao, Zhenzhen
    Zhang, Guijuan
    Lu, Dianjie
    Liu, Hong
    NEUROCOMPUTING, 2019, 366 : 314 - 327
  • [32] Data-driven Deep Reinforcement Learning for Automated Driving
    Prabu, Avinash
    Li, Lingxi
    Chen, Yaobin
    King, Brian
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 3790 - 3795
  • [33] Data-Driven MPC for Nonlinear Systems with Reinforcement Learning
    Li, Yiran
    Wang, Qian
    Sun, Zhongqi
    Xia, Yuanqing
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 2404 - 2409
  • [34] Past Data-Driven Adaptation in Hierarchical Reinforcement Learning
    Zhang, Sijie
    Chen, Aiguo
    Wang, Tianzi
    Zhou, Xincen
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 29 - 35
  • [35] Optimization for Data-Driven Learning and Control
    Khan, Usman A.
    Bajwa, Waheed U.
    Nedic, Angelia
    Rabbat, Michael G.
    Sayed, Ali H.
    PROCEEDINGS OF THE IEEE, 2020, 108 (11) : 1863 - 1868
  • [36] Data-Driven Control and Learning Systems
    Hou, Zhongsheng
    Gao, Huijun
    Lewis, Frank L.
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2017, 64 (05) : 4070 - 4075
  • [37] Data-Driven Integral Reinforcement Learning for Continuous-Time Non-Zero-Sum Games
    Yang, Yongliang
    Wang, Liming
    Modares, Hamidreza
    Ding, Dawei
    Yin, Yixin
    Wunsch, Donald
    IEEE ACCESS, 2019, 7 : 82901 - 82912
  • [38] Data-driven control by using data-driven prediction and LASSO for FIR typed inverse controller
    Suzuki, Motoya
    Kaneko, Osamu
    ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2023, 106 (03)
  • [39] Data-Driven Control by using Data-Driven Prediction and LASSO for FIR Typed Inverse Controller
    Suzuki M.
    Kaneko O.
    IEEJ Transactions on Electronics, Information and Systems, 2023, 143 (03) : 266 - 275
  • [40] On a Probabilistic Approach for Inverse Data-Driven Optimal Control
    Garrabe, Emiland
    Jesawada, Hozefa
    Del Vecchio, Carmen
    Russo, Giovanni
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4411 - 4416