Convergence and stability analysis of value iteration Q-learning under non-discounted cost for discrete-time optimal control

被引：0

作者：

Song, Shijie ^{[1
]}

Zhao, Mingming ^{[2
]}

Gong, Dawei ^{[1
]}

Zhu, Minglei ^{[3
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Peoples R China

[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China

[3] Southwest Jiaotong Univ, Sch City & Intelligent Transportat, Chengdu 611756, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 606卷

关键词：

Adaptive dynamic programming; Adaptive critic control; Nonlinear system; Optimal control; Neural network; OPTIMAL TRACKING CONTROL; OPTIMAL ADAPTIVE-CONTROL; SYSTEMS; DESIGN;

D O I：

10.1016/j.neucom.2024.128370

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a theoretical analysis of the value iteration Q-learning with non-discounted costs. The analysis focuses on two main aspects: the convergence of the iterative Q-function and the stability of the system under the final iterative control policy. Unlike previous theoretical results on Q-learning, our analysis takes into account the effect of approximation errors, leading to a more comprehensive investigation. We first discuss the effect of approximation errors on the iterative Q-function update. Then, considering the presence of approximation errors in each iteration, we analyze the convergence of the iterative Q-function. Furthermore, we establish a sufficient condition, also accounting for the approximation errors, to ensure the stability of the system under the final iterative control policy. Finally, two simulation cases are conducted to validate the presented convergence and stability results.

引用

页数：11

共 50 条

[21] Optimal control for unknown mean-field discrete-time system based on Q-Learning
Ge, Yingying
Liu, Xikui
Li, Yan
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2021, 52 (15) : 3335 - 3349
[22] Online Value Iteration for Discrete-Time Nonlinear Optimal Regulation with Stability Guarantee
Wang, Yuan
Wang, Ding
Wu, Junlong
Zhao, Mingming
2022 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS, ICCR, 2022, : 262 - 268
[23] Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis
Wei, Qinglai
Lewis, Frank L.
Liu, Derong
Song, Ruizhuo
Lin, Hanquan
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (06): : 875 - 891
[24] Optimal tracking control for discrete-time modal persistent dwell time switched systems based on Q-learning
Zhang, Xuewen
Wang, Yun
Xia, Jianwei
Li, Feng
Shen, Hao
OPTIMAL CONTROL APPLICATIONS & METHODS, 2023, 44 (06): : 3327 - 3341
[25] Exploiting homogeneity for the optimal control of discrete-time systems: application to value iteration
Granzotto, Mathieu
Postoyan, Romain
Busoniu, Lucian
Nesic, Dragan
Daafouz, Jamal
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 6006 - 6011
[26] Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems
Li, Jinna
Chai, Tianyou
Lewis, Frank L.
Ding, Zhengtao
Jiang, Yi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1308 - 1320
[27] Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems
Zhu, Yuanheng
Zhao, Dongbin
He, Haibo
Ji, Junhong
COGNITIVE COMPUTATION, 2015, 7 (06) : 763 - 771
[28] Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems
Yuanheng Zhu
Dongbin Zhao
Haibo He
Junhong Ji
Cognitive Computation, 2015, 7 : 763 - 771
[29] System Stability of Learning-Based Linear Optimal Control With General Discounted Value Iteration
Wang, Ding
Ren, Jin
Ha, Mingming
Qiao, Junfei
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6504 - 6514
[30] Optimal control for discrete-time affine non-linear systems using general value iteration
Li, H.
Liu, D.
IET CONTROL THEORY AND APPLICATIONS, 2012, 6 (18): : 2725 - 2736

← 1 2 3 4 5 →