Infinite Horizon Self-Learning Optimal Control of Nonaffine Discrete-Time Nonlinear Systems

被引:127
|
作者
Wei, Qinglai [1 ]
Liu, Derong [1 ]
Yang, Xiong [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Adaptive critic designs; adaptive dynamic programming (ADP); approximate dynamic programming; generalized policy iteration; neural networks (NNs); neurodynamic programming; nonlinear systems; optimal control; reinforcement learning; OPTIMAL TRACKING CONTROL; DYNAMIC-PROGRAMMING ALGORITHM; ADAPTIVE OPTIMAL-CONTROL; ZERO-SUM GAMES; UNKNOWN DYNAMICS; CONTROL SCHEME; APPROXIMATION ERRORS; POLICY ITERATION; LINEAR-SYSTEMS; CRITIC DESIGNS;
D O I
10.1109/TNNLS.2015.2401334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel iterative adaptive dynamic programming (ADP)-based infinite horizon self-learning optimal control algorithm, called generalized policy iteration algorithm, is developed for nonaffine discrete-time (DT) nonlinear systems. Generalized policy iteration algorithm is a general idea of interacting policy and value iteration algorithms of ADP. The developed generalized policy iteration algorithm permits an arbitrary positive semidefinite function to initialize the algorithm, where two iteration indices are used for policy improvement and policy evaluation, respectively. It is the first time that the convergence, admissibility, and optimality properties of the generalized policy iteration algorithm for DT nonlinear systems are analyzed. Neural networks are used to implement the developed algorithm. Finally, numerical examples are presented to illustrate the performance of the developed algorithm.
引用
收藏
页码:866 / 879
页数:14
相关论文
共 50 条
  • [1] Relaxed Optimal Control With Self-Learning Horizon for Discrete-Time Stochastic Dynamics
    Wang, Ding
    Wang, Jiangyu
    Liu, Ao
    Liu, Derong
    Qiao, Junfei
    IEEE TRANSACTIONS ON CYBERNETICS, 2025, 55 (03) : 1183 - 1196
  • [2] Optimal Self-Learning Control Scheme for Discrete-Time Nonlinear Systems Using Local Value Iteration
    Wei, Qinglai
    Liu, Derong
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3544 - 3549
  • [3] Infinite horizon LQ optimal control for discrete-time stochastic systems
    Huang, Yulin
    Zhang, Weihai
    Zhang, Huanshui
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 252 - 256
  • [4] An Adaptive Terminal Iterative Learning Control for Nonaffine Nonlinear Discrete-Time Systems
    Chien, Chiang-Ju
    Wang, Ying-Chung
    Chi, Ronghu
    Shen, Dong
    2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 1090 - 1094
  • [5] A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture
    RuiZhuo Song
    WenDong Xiao
    ChangYin Sun
    Science China Information Sciences, 2014, 57 : 1 - 10
  • [6] A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture
    SONG RuiZhuo
    XIAO WenDong
    SUN ChangYin
    ScienceChina(InformationSciences), 2014, 57 (06) : 284 - 293
  • [7] A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture
    Song RuiZhuo
    Xiao WenDong
    Sun ChangYin
    SCIENCE CHINA-INFORMATION SCIENCES, 2014, 57 (06) : 1 - 10
  • [8] INFINITE HORIZON LINEAR QUADRATIC OPTIMAL CONTROL FOR DISCRETE-TIME STOCHASTIC SYSTEMS
    Huang, Yulin
    Zhang, Weihai
    Zhang, Huanshui
    ASIAN JOURNAL OF CONTROL, 2008, 10 (05) : 608 - 615
  • [9] Discrete-Time Self-Learning Parallel Control
    Wei, Qinglai
    Wang, Lingxiao
    Lu, Jingwei
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (01): : 192 - 204
  • [10] Data-Based Optimal Tracking Control of Nonaffine Nonlinear Discrete-Time Systems
    Luo, Biao
    Liu, Derong
    Huang, Tingwen
    Li, Chao
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV, 2016, 9950 : 573 - 581