Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

被引：0

作者：

Lin, Mingduo ^{[1
]}

Zhao, Bo ^{[1
,2
]}

Liu, Derong ^{[3
,4
]}

机构：

[1] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China

[2] Chongqing Univ Posts & Telecommun, Key Lab Ind Internet Things & Networked Control, Minist Educ, Chongqing 400065, Peoples R China

[3] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China

[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年

基金：

中国国家自然科学基金;

关键词：

Adaptive dynamic programming (ADP); data-based control; optimal control; output tracking control; policy optimization (PO); reinforcement learning (RL); GRADIENT METHODS; LINEAR-SYSTEMS; TIME-SYSTEMS;

D O I：

10.1109/TNNLS.2024.3379207

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and $Q$ -learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.

引用

页码：1 / 12

页数：12

共 50 条

[21] Q-learning based model-free input-output feedback linearization control method ?
Sun, Yipu
Chen, Xin
He, Wenpeng
Zhang, Ziying
Fukushima, Edwardo F.
She, Jinhua
IFAC PAPERSONLINE, 2023, 56 (02): : 9534 - 9539
[22] Model-Free Control of Indoor Temperatures in Residential Buildings: Convergence Analysis
Wu, Tumin
Olama, Mohammed M.
Djouadi, Seddik M.
2024 IEEE POWER AND ENERGY CONFERENCE AT ILLINOIS, PECI, 2024,
[23] Model-Free H∞ Optimal Tracking Control of Constrained Nonlinear Systems via an Iterative Adaptive Learning Algorithm
Hou, Jiaxu
Wang, Ding
Liu, Derong
Zhang, Yun
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4097 - 4108
[24] Optimal Output Regulation for Model-Free Quanser Helicopter With Multistep Q-Learning
Luo, Biao
Wu, Huai-Ning
Huang, Tingwen
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2018, 65 (06) : 4953 - 4961
[25] Deterministic policy gradient adaptive dynamic programming for model-free optimal control
Zhang, Yongwei
Zhao, Bo
Liu, Derong
NEUROCOMPUTING, 2020, 387 : 40 - 50
[26] A model-free robust policy iteration algorithm for optimal control of nonlinear systems
Bhasin, S.
Johnson, M.
Dixon, W. E.
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 3060 - 3065
[27] Model-Free Optimized Tracking Control Heuristic
Wang, Ning
Abouheaf, Mohammed
Gueaieb, Wail
Nahas, Nabil
ROBOTICS, 2020, 9 (03)
[28] Model-free Policy Learning with Reward Gradients
Lan, Qingfong
Tosatto, Samuele
Farrahi, Homayoon
Mahmood, A. Rupam
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[29] Observer-Based Human-in-the-Loop Optimal Output Cluster Synchronization Control for Multiagent Systems: A Model-Free Reinforcement Learning Method
Huang, Zongsheng
Li, Tieshan
Long, Yue
Liang, Hongjing
IEEE TRANSACTIONS ON CYBERNETICS, 2025, 55 (02) : 649 - 660
[30] Learning model-free motor control
Agostini, A
Celaya, E
ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 947 - 948

← 1 2 3 4 5 →