Convergence and stability analysis of value iteration Q-learning under non-discounted cost for discrete-time optimal control

被引：0

作者：

Song, Shijie ^{[1
]}

Zhao, Mingming ^{[2
]}

Gong, Dawei ^{[1
]}

Zhu, Minglei ^{[3
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Peoples R China

[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China

[3] Southwest Jiaotong Univ, Sch City & Intelligent Transportat, Chengdu 611756, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 606卷

关键词：

Adaptive dynamic programming; Adaptive critic control; Nonlinear system; Optimal control; Neural network; OPTIMAL TRACKING CONTROL; OPTIMAL ADAPTIVE-CONTROL; SYSTEMS; DESIGN;

D O I：

10.1016/j.neucom.2024.128370

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a theoretical analysis of the value iteration Q-learning with non-discounted costs. The analysis focuses on two main aspects: the convergence of the iterative Q-function and the stability of the system under the final iterative control policy. Unlike previous theoretical results on Q-learning, our analysis takes into account the effect of approximation errors, leading to a more comprehensive investigation. We first discuss the effect of approximation errors on the iterative Q-function update. Then, considering the presence of approximation errors in each iteration, we analyze the convergence of the iterative Q-function. Furthermore, we establish a sufficient condition, also accounting for the approximation errors, to ensure the stability of the system under the final iterative control policy. Finally, two simulation cases are conducted to validate the presented convergence and stability results.

引用

页数：11

共 50 条

[31] Minimax Q-learning design for H∞ control of linear discrete-time systems
Li, Xinxing
Xi, Lele
Zha, Wenzhong
Peng, Zhihong
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (03) : 438 - 451
[32] Model-free optimal tracking control for discrete-time system with delays using reinforcement Q-learning
Liu, Yang
Yu, Rui
ELECTRONICS LETTERS, 2018, 54 (12) : 750 - 751
[33] Policy Iteration Algorithm for Constrained Cost Optimal Control of Discrete-Time Nonlinear System
Li, Tao
Wei, Qinglai
Li, Hongyang
Song, Ruizhuo
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[34] Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm
Tan, Xufeng
Li, Yuan
Liu, Yang
AIMS MATHEMATICS, 2023, 8 (05): : 10249 - 10265
[35] Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems
Wei, Qinglai
Liu, Derong
Lin, Hanquan
IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (03) : 840 - 853
[36] Optimal Self-Learning Control Scheme for Discrete-Time Nonlinear Systems Using Local Value Iteration
Wei, Qinglai
Liu, Derong
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3544 - 3549
[37] Optimal State Tracking Control for Linear Discrete-time Systems Via Value Iteration
Liu, Yingying
Shi, Zhan
Wang, Zhanshan
PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 836 - 841
[38] A Q-LEARNING ALGORITHM FOR DISCRETE-TIME LINEAR-QUADRATIC CONTROL WITH RANDOM PARAMETERS OF UNKNOWN DISTRIBUTION: CONVERGENCE AND STABILIZATION
DU, K. A., I
Meng, Q. I. N. G. X. I. N.
Zhang, F. U.
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2022, 60 (04) : 1991 - 2015
[39] Stable approximate Q-learning under discounted cost for data-based adaptive tracking control
Liang, Zhantao
Ha, Mingming
Liu, Derong
Wang, Yonghua
NEUROCOMPUTING, 2024, 568
[40] Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem
Rizvi, Syed Ali Asad
Lin, Zongli
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1523 - 1536

← 1 2 3 4 5 →