Convergence and stability analysis of value iteration Q-learning under non-discounted cost for discrete-time optimal control

被引:0
|
作者
Song, Shijie [1 ]
Zhao, Mingming [2 ]
Gong, Dawei [1 ]
Zhu, Minglei [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Peoples R China
[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
[3] Southwest Jiaotong Univ, Sch City & Intelligent Transportat, Chengdu 611756, Peoples R China
关键词
Adaptive dynamic programming; Adaptive critic control; Nonlinear system; Optimal control; Neural network; OPTIMAL TRACKING CONTROL; OPTIMAL ADAPTIVE-CONTROL; SYSTEMS; DESIGN;
D O I
10.1016/j.neucom.2024.128370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a theoretical analysis of the value iteration Q-learning with non-discounted costs. The analysis focuses on two main aspects: the convergence of the iterative Q-function and the stability of the system under the final iterative control policy. Unlike previous theoretical results on Q-learning, our analysis takes into account the effect of approximation errors, leading to a more comprehensive investigation. We first discuss the effect of approximation errors on the iterative Q-function update. Then, considering the presence of approximation errors in each iteration, we analyze the convergence of the iterative Q-function. Furthermore, we establish a sufficient condition, also accounting for the approximation errors, to ensure the stability of the system under the final iterative control policy. Finally, two simulation cases are conducted to validate the presented convergence and stability results.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Minimax Q-learning design for H∞ control of linear discrete-time systems
    Li, Xinxing
    Xi, Lele
    Zha, Wenzhong
    Peng, Zhihong
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (03) : 438 - 451
  • [32] Model-free optimal tracking control for discrete-time system with delays using reinforcement Q-learning
    Liu, Yang
    Yu, Rui
    ELECTRONICS LETTERS, 2018, 54 (12) : 750 - 751
  • [33] Policy Iteration Algorithm for Constrained Cost Optimal Control of Discrete-Time Nonlinear System
    Li, Tao
    Wei, Qinglai
    Li, Hongyang
    Song, Ruizhuo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [34] Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm
    Tan, Xufeng
    Li, Yuan
    Liu, Yang
    AIMS MATHEMATICS, 2023, 8 (05): : 10249 - 10265
  • [35] Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems
    Wei, Qinglai
    Liu, Derong
    Lin, Hanquan
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (03) : 840 - 853
  • [36] Optimal Self-Learning Control Scheme for Discrete-Time Nonlinear Systems Using Local Value Iteration
    Wei, Qinglai
    Liu, Derong
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3544 - 3549
  • [37] Optimal State Tracking Control for Linear Discrete-time Systems Via Value Iteration
    Liu, Yingying
    Shi, Zhan
    Wang, Zhanshan
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 836 - 841
  • [38] A Q-LEARNING ALGORITHM FOR DISCRETE-TIME LINEAR-QUADRATIC CONTROL WITH RANDOM PARAMETERS OF UNKNOWN DISTRIBUTION: CONVERGENCE AND STABILIZATION
    DU, K. A., I
    Meng, Q. I. N. G. X. I. N.
    Zhang, F. U.
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2022, 60 (04) : 1991 - 2015
  • [39] Stable approximate Q-learning under discounted cost for data-based adaptive tracking control
    Liang, Zhantao
    Ha, Mingming
    Liu, Derong
    Wang, Yonghua
    NEUROCOMPUTING, 2024, 568
  • [40] Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem
    Rizvi, Syed Ali Asad
    Lin, Zongli
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1523 - 1536