Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances

被引:197
|
作者
Song, Ruizhuo [1 ]
Lewis, Frank L. [2 ,3 ]
Wei, Qinglai [4 ]
Zhang, Huaguang [5 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
[2] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX 76118 USA
[3] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110004, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[5] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金; 美国国家科学基金会;
关键词
Adaptive critic designs; adaptive/approximate dynamic programming (ADP); dynamic programming; off-policy; optimal control; unknown system; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; TIME NONLINEAR-SYSTEMS; OPTIMAL-CONTROL SCHEME; FEEDBACK-CONTROL; ALGORITHM; ITERATION; DESIGN;
D O I
10.1109/TCYB.2015.2421338
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.
引用
收藏
页码:1041 / 1050
页数:10
相关论文
共 50 条
  • [31] Online Actor-critic Reinforcement Learning Control for Uncertain Surface Vessel Systems with External Disturbances
    Van Tu Vu
    Quang Huy Tran
    Thanh Loc Pham
    Phuong Nam Dao
    International Journal of Control, Automation and Systems, 2022, 20 : 1029 - 1040
  • [32] Actor-Critic Optimal Control for Semi-Markovian Jump Systems With Time Delay
    Zhang, Lulu
    Zhang, Huaguang
    Yue, Xiaohui
    Wang, Tianbiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (04) : 2164 - 2168
  • [33] Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning
    Shi, Daming
    Guo, Xudong
    Liu, Yi
    Fan, Wenhui
    ENTROPY, 2022, 24 (06)
  • [34] Adaptive Inverse Optimal Control for Rehabilitation Robot Systems Using Actor-Critic Algorithm
    Meng, Fancheng
    Dai, Yaping
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [35] Adaptive Actor-Critic Design-Based Integral Sliding-Mode Control for Partially Unknown Nonlinear Systems With Input Disturbances
    Fan, Quan-Yong
    Yang, Guang-Hong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (01) : 165 - 177
  • [36] Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks
    Zhu, Liao
    Wei, Qinglai
    Guo, Ping
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (08): : 5112 - 5122
  • [37] Actor-Critic Model Predictive Control
    Romero, Angel
    Song, Yunlong
    Scaramuzza, Davide
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 14777 - 14784
  • [38] Mild Policy Evaluation for Offline Actor-Critic
    Huang, Longyang
    Dong, Botao
    Lu, Jinhui
    Zhang, Weidong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) : 17950 - 17964
  • [39] Bayesian Policy Gradient and Actor-Critic Algorithms
    Ghavamzadeh, Mohammad
    Engel, Yaakov
    Valko, Michal
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [40] Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration
    Song, Ruizhuo
    Wei, Qinglai
    Xiao, Wendong
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 (06): : 1435 - 1441