Convergence and stability analysis of value iteration Q-learning under non-discounted cost for discrete-time optimal control

被引：0

作者：

Song, Shijie ^{[1
]}

Zhao, Mingming ^{[2
]}

Gong, Dawei ^{[1
]}

Zhu, Minglei ^{[3
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Peoples R China

[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China

[3] Southwest Jiaotong Univ, Sch City & Intelligent Transportat, Chengdu 611756, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 606卷

关键词：

Adaptive dynamic programming; Adaptive critic control; Nonlinear system; Optimal control; Neural network; OPTIMAL TRACKING CONTROL; OPTIMAL ADAPTIVE-CONTROL; SYSTEMS; DESIGN;

D O I：

10.1016/j.neucom.2024.128370

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a theoretical analysis of the value iteration Q-learning with non-discounted costs. The analysis focuses on two main aspects: the convergence of the iterative Q-function and the stability of the system under the final iterative control policy. Unlike previous theoretical results on Q-learning, our analysis takes into account the effect of approximation errors, leading to a more comprehensive investigation. We first discuss the effect of approximation errors on the iterative Q-function update. Then, considering the presence of approximation errors in each iteration, we analyze the convergence of the iterative Q-function. Furthermore, we establish a sufficient condition, also accounting for the approximation errors, to ensure the stability of the system under the final iterative control policy. Finally, two simulation cases are conducted to validate the presented convergence and stability results.

引用

页数：11

共 50 条

[1] Stabilizing value iteration Q-learning for online evolving control of discrete-time nonlinear systems
Zhao, Mingming
Wang, Ding
Qiao, Junfei
NONLINEAR DYNAMICS, 2024, 112 (11) : 9137 - 9153
[2] Stability Analysis of Discrete-Time Infinite-Horizon Optimal Control With Discounted Cost
Postoyan, Romain
Busoniu, Lucian
Nesic, Dragan
Daafouz, Jamal
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2736 - 2749
[3] Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis
Wei, Qinglai
Lewis, Frank L.
Sun, Qiuye
Yan, Pengfei
Song, Ruizhuo
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (05) : 1224 - 1237
[4] Discrete-Time Optimal Control Scheme Based on Q-Learning Algorithm
Wei, Qinglai
Liu, Derong
Song, Ruizhuo
2016 SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2016, : 125 - 130
[5] Generalized value iteration for discounted optimal control with stability analysis
Ha, Mingming
Wang, Ding
Liu, Derong
SYSTEMS & CONTROL LETTERS, 2021, 147 (147)
[6] Convergence analysis of cooperative Q-Learning using discrete-time Lyapunov approach
School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha 10, Bandung, Indonesia
不详
不详
ICIC Express Lett., 12 (3153-3161):
[7] Stability analysis of discrete-time finite-horizon discounted optimal control
Granzotto, Mathieu
Postoyan, Romain
Busoniu, Lucian
Nesic, Dragan
Daafouz, Jamal
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 2322 - 2327
[8] Safety-Critical Optimal Control of Discrete-Time Non-Linear Systems via Policy Iteration-Based Q-Learning
Long, Lijun
Liu, Xiaomei
Huang, Xiaomin
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2025,
[9] Asymptotic behavior of the value functions of discrete-time discounted optimal control
Wirth, F
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2001, 110 (01) : 183 - 210
[10] A DISCRETE-TIME SWITCHING SYSTEM ANALYSIS OF Q-LEARNING
Lee, Donghwan
Hu, Jianghai
He, Niao
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (03) : 1861 - 1880

← 1 2 3 4 5 →