Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

被引:0
|
作者
Lin, Mingduo [1 ]
Zhao, Bo [1 ,2 ]
Liu, Derong [3 ,4 ]
机构
[1] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Key Lab Ind Internet Things & Networked Control, Minist Educ, Chongqing 400065, Peoples R China
[3] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China
[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming (ADP); data-based control; optimal control; output tracking control; policy optimization (PO); reinforcement learning (RL); GRADIENT METHODS; LINEAR-SYSTEMS; TIME-SYSTEMS;
D O I
10.1109/TNNLS.2024.3379207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and $Q$ -learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [21] Q-learning based model-free input-output feedback linearization control method ?
    Sun, Yipu
    Chen, Xin
    He, Wenpeng
    Zhang, Ziying
    Fukushima, Edwardo F.
    She, Jinhua
    IFAC PAPERSONLINE, 2023, 56 (02): : 9534 - 9539
  • [22] Model-Free Control of Indoor Temperatures in Residential Buildings: Convergence Analysis
    Wu, Tumin
    Olama, Mohammed M.
    Djouadi, Seddik M.
    2024 IEEE POWER AND ENERGY CONFERENCE AT ILLINOIS, PECI, 2024,
  • [23] Model-Free H∞ Optimal Tracking Control of Constrained Nonlinear Systems via an Iterative Adaptive Learning Algorithm
    Hou, Jiaxu
    Wang, Ding
    Liu, Derong
    Zhang, Yun
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4097 - 4108
  • [24] Optimal Output Regulation for Model-Free Quanser Helicopter With Multistep Q-Learning
    Luo, Biao
    Wu, Huai-Ning
    Huang, Tingwen
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2018, 65 (06) : 4953 - 4961
  • [25] Deterministic policy gradient adaptive dynamic programming for model-free optimal control
    Zhang, Yongwei
    Zhao, Bo
    Liu, Derong
    NEUROCOMPUTING, 2020, 387 : 40 - 50
  • [26] A model-free robust policy iteration algorithm for optimal control of nonlinear systems
    Bhasin, S.
    Johnson, M.
    Dixon, W. E.
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 3060 - 3065
  • [27] Model-Free Optimized Tracking Control Heuristic
    Wang, Ning
    Abouheaf, Mohammed
    Gueaieb, Wail
    Nahas, Nabil
    ROBOTICS, 2020, 9 (03)
  • [28] Model-free Policy Learning with Reward Gradients
    Lan, Qingfong
    Tosatto, Samuele
    Farrahi, Homayoon
    Mahmood, A. Rupam
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [29] Observer-Based Human-in-the-Loop Optimal Output Cluster Synchronization Control for Multiagent Systems: A Model-Free Reinforcement Learning Method
    Huang, Zongsheng
    Li, Tieshan
    Long, Yue
    Liang, Hongjing
    IEEE TRANSACTIONS ON CYBERNETICS, 2025, 55 (02) : 649 - 660
  • [30] Learning model-free motor control
    Agostini, A
    Celaya, E
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 947 - 948