Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

被引:0
|
作者
Lin, Mingduo [1 ]
Zhao, Bo [1 ,2 ]
Liu, Derong [3 ,4 ]
机构
[1] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Key Lab Ind Internet Things & Networked Control, Minist Educ, Chongqing 400065, Peoples R China
[3] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China
[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming (ADP); data-based control; optimal control; output tracking control; policy optimization (PO); reinforcement learning (RL); GRADIENT METHODS; LINEAR-SYSTEMS; TIME-SYSTEMS;
D O I
10.1109/TNNLS.2024.3379207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and $Q$ -learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [1] Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis
    Lin, Mingduo
    Zhao, Bo
    Liu, Derong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 5574 - 5585
  • [2] Model-Free Optimal Control for Affine Nonlinear Systems With Convergence Analysis
    Zhao, Dongbin
    Xia, Zhongpu
    Wang, Ding
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2015, 12 (04) : 1461 - 1468
  • [3] A Model-Free Optimal Control Method
    Zhou, Mi
    Verriest, Erik
    Abdallah, Chaouki
    SOUTHEASTCON 2024, 2024, : 948 - 954
  • [4] Model-Free Imitation Learning with Policy Optimization
    Ho, Jonathan
    Gupta, Jayesh K.
    Ermon, Stefano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [5] Optimal Output Tracking for Switched Systems Under DoS Attacks: A Model-Free Adaptive Predictive Control Method
    Qi, Yiwen
    Guo, Shitong
    Tang, Yiwen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (01) : 266 - 270
  • [6] Iterative Q-Learning for Model-Free Optimal Control With Adjustable Convergence Rate
    Wang, Ding
    Wang, Yuan
    Zhao, Mingming
    Qiao, Junfei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (04) : 2224 - 2228
  • [7] Model-free output feedback optimal tracking control for two-dimensional batch processes
    Shi, Huiyuan
    Ma, Jiayue
    Liu, Qiang
    Li, Jinna
    Jiang, Xueying
    Li, Ping
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 143
  • [8] Adjustable Iterative Q-Learning Schemes for Model-Free Optimal Tracking Control
    Qiao, Junfei
    Zhao, Mingming
    Wang, Ding
    Ha, Mingming
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (02): : 1202 - 1213
  • [9] Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay
    Lin, Mingduo
    Zhao, Bo
    Liu, Derong
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (06): : 3692 - 3703
  • [10] Optimal Online Learning Procedures for Model-Free Policy Evaluation
    Ueno, Tsuyoshi
    Maeda, Shin-ichi
    Kawanabe, Motoaki
    Ishii, Shin
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 473 - +