Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks

被引：361

作者：

Modares, Hamidreza ^{[1
]}

Lewis, Frank L. ^{[2
]}

Naghibi-Sistani, Mohammad-Bagher ^{[1
]}

机构：

[1] Ferdowsi Univ Mashhad, Dept Elect Engn, Mashhad, Iran

[2] Univ Texas Arlington, Res Inst, Ft Worth, TX 76118 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2013年 / 24卷 / 10期

基金：

美国国家科学基金会;

关键词：

Input constraints; neural networks; optimal control; reinforcement learning; unknown dynamics; CONTINUOUS-TIME;

D O I：

10.1109/TNNLS.2013.2276571

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems. The proposed PI algorithm is implemented on an actor-critic structure where two neural networks (NNs) are tuned online and simultaneously to generate the optimal bounded control policy. The requirement of complete knowledge of the system dynamics is obviated by employing a novel NN identifier in conjunction with the actor and critic NNs. It is shown how the identifier weights estimation error affects the convergence of the critic NN. A novel learning rule is developed to guarantee that the identifier weights converge to small neighborhoods of their ideal values exponentially fast. To provide an easy-to-check persistence of excitation condition, the experience replay technique is used. That is, recorded past experiences are used simultaneously with current data for the adaptation of the identifier weights. Stability of the whole system consisting of the actor, critic, system state, and system identifier is guaranteed while all three networks undergo adaptation. Convergence to a near-optimal control law is also shown. The effectiveness of the proposed method is illustrated with a simulation example.

引用

页码：1513 / 1525

页数：13

共 50 条

[1] Optimal bounded policy for nonlinear tracking control of unknown constrained-input systems
Sabahi, Farnaz
TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2025, 47 (03) : 585 - 598
[2] A policy iteration approach to online optimal control of continuous-time constrained-input systems
Modares, Hamidreza
Sistani, Mohammad-Bagher Naghibi
Lewis, Frank L.
ISA TRANSACTIONS, 2013, 52 (05) : 611 - 621
[3] Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming
Zhu, Yuanheng
Zhao, Dongbin
He, Haibo
Ji, Junhong
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2017, 64 (05) : 4101 - 4109
[4] Online adaptive data-driven control for unknown nonlinear systems with constrained-input
Xi'an University of Architecture and Technology, Xi'an, China
Int. Conf. Cyber-Energy Syst. Intell. Energy, ICCSIE, 1600,
[5] Reinforcement learning-based optimal control of unknown constrained-input nonlinear systems using simulated experience
Asl, Hamed Jabbari
Uchibe, Eiji
NONLINEAR DYNAMICS, 2023, 111 (17) : 16093 - 16110
[6] H∞ Control of Constrained-Input Nonlinear Systems with Unknown Model Based on Adaptive Dynamic Programming
Pu, Jun
Ma, Qingliang
Gu, Fan
Yu, Zexiang
PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 2265 - 2270
[7] Reinforcement learning-based optimal control of unknown constrained-input nonlinear systems using simulated experience
Hamed Jabbari Asl
Eiji Uchibe
Nonlinear Dynamics, 2023, 111 : 16093 - 16110
[8] Reinforcement Learning-Based Nearly Optimal Control for Constrained-Input Partially Unknown Systems Using Differentiator
Guo, Xinxin
Yan, Weisheng
Cui, Rongxin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4713 - 4725
[9] Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning
Modares, Hamidreza
Lewis, Frank L.
AUTOMATICA, 2014, 50 (07) : 1780 - 1792
[10] Optimal Output Feedback Control of Nonlinear Partially-Unknown Constrained-Input Systems Using Integral Reinforcement Learning
Ren, Ling
Zhang, Guoshan
Mu, Chaoxu
NEURAL PROCESSING LETTERS, 2019, 50 (03) : 2963 - 2989

← 1 2 3 4 5 →