Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks

被引:361
|
作者
Modares, Hamidreza [1 ]
Lewis, Frank L. [2 ]
Naghibi-Sistani, Mohammad-Bagher [1 ]
机构
[1] Ferdowsi Univ Mashhad, Dept Elect Engn, Mashhad, Iran
[2] Univ Texas Arlington, Res Inst, Ft Worth, TX 76118 USA
基金
美国国家科学基金会;
关键词
Input constraints; neural networks; optimal control; reinforcement learning; unknown dynamics; CONTINUOUS-TIME;
D O I
10.1109/TNNLS.2013.2276571
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems. The proposed PI algorithm is implemented on an actor-critic structure where two neural networks (NNs) are tuned online and simultaneously to generate the optimal bounded control policy. The requirement of complete knowledge of the system dynamics is obviated by employing a novel NN identifier in conjunction with the actor and critic NNs. It is shown how the identifier weights estimation error affects the convergence of the critic NN. A novel learning rule is developed to guarantee that the identifier weights converge to small neighborhoods of their ideal values exponentially fast. To provide an easy-to-check persistence of excitation condition, the experience replay technique is used. That is, recorded past experiences are used simultaneously with current data for the adaptation of the identifier weights. Stability of the whole system consisting of the actor, critic, system state, and system identifier is guaranteed while all three networks undergo adaptation. Convergence to a near-optimal control law is also shown. The effectiveness of the proposed method is illustrated with a simulation example.
引用
收藏
页码:1513 / 1525
页数:13
相关论文
共 50 条
  • [31] Neural Networks-Based Adaptive Control for Nonlinear State Constrained Systems With Input Delay
    Li, Da-Peng
    Liu, Yan-Jun
    Tong, Shaocheng
    Chen, C. L. Philip
    Li, Dong-Juan
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (04) : 1249 - 1258
  • [32] Adaptive Output Optimal Control Algorithm for Unknown System Dynamics Based on Policy Iteration
    Ohtake, Susumu
    Yamakita, Masaki
    2010 AMERICAN CONTROL CONFERENCE, 2010, : 1671 - 1676
  • [33] Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks
    Zhu, Liao
    Wei, Qinglai
    Guo, Ping
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (08): : 5112 - 5122
  • [34] Discrete-Time Nonlinear Generalized Policy Iteration for Optimal Control Using Neural Networks
    Wei, Qinglai
    Liu, Derong
    Yang, Xiong
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT I, 2014, 8834 : 389 - 396
  • [35] Adaptive Neural Network Control for Missile Systems With Unknown Hysteresis Input
    Cai, Jian-Ping
    Xing, Lantao
    Zhang, Meng
    Shen, Lujuan
    IEEE ACCESS, 2017, 5 : 15839 - 15847
  • [36] Model-free Nearly Optimal Control of Constrained-Input Nonlinear Systems Based on Synchronous Reinforcement Learning
    Zhao, Han
    Guo, Lei
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 2162 - 2167
  • [37] Policy iteration optimal tracking control for chaotic systems by using an adaptive dynamic programming approach
    魏庆来
    刘德荣
    徐延才
    Chinese Physics B, 2015, 24 (03) : 91 - 98
  • [38] Policy iteration optimal tracking control for chaotic systems by using an adaptive dynamic programming approach
    Wei Qing-Lai
    Liu De-Rong
    Xu Yan-Cai
    CHINESE PHYSICS B, 2015, 24 (03)
  • [39] Policy Iteration-based Indirect Adaptive Optimal Control for Completely Unknown Continuous-Time LTI Systems
    Jha, Sumit Kumar
    Roy, Sayan Basu
    Bhasin, Shubhendu
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 448 - 454
  • [40] Policy Iteration for Optimal Control of Weakly Coupled Nonlinear Systems with Completely Unknown Dynamics
    Li, Chao
    Wang, Ding
    Liu, Derong
    He, Haibo
    2016 AMERICAN CONTROL CONFERENCE (ACC), 2016, : 5722 - 5727