Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

被引：0

作者：

Lin, Mingduo ^{[1
]}

Zhao, Bo ^{[1
,2
]}

Liu, Derong ^{[3
,4
]}

机构：

[1] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China

[2] Chongqing Univ Posts & Telecommun, Key Lab Ind Internet Things & Networked Control, Minist Educ, Chongqing 400065, Peoples R China

[3] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China

[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年

基金：

中国国家自然科学基金;

关键词：

Adaptive dynamic programming (ADP); data-based control; optimal control; output tracking control; policy optimization (PO); reinforcement learning (RL); GRADIENT METHODS; LINEAR-SYSTEMS; TIME-SYSTEMS;

D O I：

10.1109/TNNLS.2024.3379207

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and $Q$ -learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.

引用

页码：1 / 12

页数：12

共 50 条

[1] Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis
Lin, Mingduo
Zhao, Bo
Liu, Derong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 5574 - 5585
[2] Model-Free Optimal Control for Affine Nonlinear Systems With Convergence Analysis
Zhao, Dongbin
Xia, Zhongpu
Wang, Ding
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2015, 12 (04) : 1461 - 1468
[3] A Model-Free Optimal Control Method
Zhou, Mi
Verriest, Erik
Abdallah, Chaouki
SOUTHEASTCON 2024, 2024, : 948 - 954
[4] Model-Free Imitation Learning with Policy Optimization
Ho, Jonathan
Gupta, Jayesh K.
Ermon, Stefano
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[5] Optimal Output Tracking for Switched Systems Under DoS Attacks: A Model-Free Adaptive Predictive Control Method
Qi, Yiwen
Guo, Shitong
Tang, Yiwen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (01) : 266 - 270
[6] Iterative Q-Learning for Model-Free Optimal Control With Adjustable Convergence Rate
Wang, Ding
Wang, Yuan
Zhao, Mingming
Qiao, Junfei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (04) : 2224 - 2228
[7] Model-free output feedback optimal tracking control for two-dimensional batch processes
Shi, Huiyuan
Ma, Jiayue
Liu, Qiang
Li, Jinna
Jiang, Xueying
Li, Ping
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 143
[8] Adjustable Iterative Q-Learning Schemes for Model-Free Optimal Tracking Control
Qiao, Junfei
Zhao, Mingming
Wang, Ding
Ha, Mingming
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (02): : 1202 - 1213
[9] Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay
Lin, Mingduo
Zhao, Bo
Liu, Derong
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (06): : 3692 - 3703
[10] Optimal Online Learning Procedures for Model-Free Policy Evaluation
Ueno, Tsuyoshi
Maeda, Shin-ichi
Kawanabe, Motoaki
Ishii, Shin
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 473 - +

← 1 2 3 4 5 →