Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances

被引：197

作者：

Song, Ruizhuo ^{[1
]}

Lewis, Frank L. ^{[2
,3
]}

Wei, Qinglai ^{[4
]}

Zhang, Huaguang ^{[5
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China

[2] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX 76118 USA

[3] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110004, Peoples R China

[4] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

[5] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2016年 / 46卷 / 05期

基金：

北京市自然科学基金; 中国国家自然科学基金; 美国国家科学基金会;

关键词：

Adaptive critic designs; adaptive/approximate dynamic programming (ADP); dynamic programming; off-policy; optimal control; unknown system; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; TIME NONLINEAR-SYSTEMS; OPTIMAL-CONTROL SCHEME; FEEDBACK-CONTROL; ALGORITHM; ITERATION; DESIGN;

D O I：

10.1109/TCYB.2015.2421338

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.

引用

页码：1041 / 1050

页数：10

共 50 条

[21] Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation
Chen, Zaiwei
Khodadadian, Sajad
Maguluri, Siva Theja
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2611 - 2616
[22] Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Duan, Jingliang
Guan, Yang
Li, Shengbo Eben
Ren, Yangang
Sun, Qi
Cheng, Bo
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6584 - 6598
[23] Optimal Actor-Critic Policy With Optimized Training Datasets
Banerjee, Chayan
Chen, Zhiyong
Noman, Nasimul
Zamani, Mohsen
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (06): : 1324 - 1334
[24] Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Ren, Jineng
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
[25] Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
Stankovic, Milos S.
Beko, Marko
Ilic, Nemanja
Stankovic, Srdjan S.
EUROPEAN JOURNAL OF CONTROL, 2023, 74
[26] Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm
Jiang, Haobo
Qian, Jianjun
Xie, Jin
Yang, Jian
PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 562 - 573
[27] Cooperative traffic signal control using Multi-step return and Off-policy Asynchronous Advantage Actor-Critic Graph algorithm
Yang, Shantian
Yang, Bo
Wong, Hau-San
Kang, Zhongfeng
KNOWLEDGE-BASED SYSTEMS, 2019, 183
[28] Off-policy algorithm based Hierarchical optimal control for completely unknown dynamic systems
Cui, Xiaohong
Chen, Jiayu
Wang, Binrui
Xu, Suan
NEUROCOMPUTING, 2022, 488 : 669 - 680
[29] Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
Tanabe, Takumi
Sato, Rei
Fukuchi, Kazuto
Sakuma, Jun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[30] Online Actor-critic Reinforcement Learning Control for Uncertain Surface Vessel Systems with External Disturbances
Vu, Van Tu
Tran, Quang Huy
Pham, Thanh Loc
Dao, Phuong Nam
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2022, 20 (03) : 1029 - 1040

← 1 2 3 4 5 →