Convergence and stability analysis of value iteration Q-learning under non-discounted cost for discrete-time optimal control

被引:0
|
作者
Song, Shijie [1 ]
Zhao, Mingming [2 ]
Gong, Dawei [1 ]
Zhu, Minglei [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Peoples R China
[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
[3] Southwest Jiaotong Univ, Sch City & Intelligent Transportat, Chengdu 611756, Peoples R China
关键词
Adaptive dynamic programming; Adaptive critic control; Nonlinear system; Optimal control; Neural network; OPTIMAL TRACKING CONTROL; OPTIMAL ADAPTIVE-CONTROL; SYSTEMS; DESIGN;
D O I
10.1016/j.neucom.2024.128370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a theoretical analysis of the value iteration Q-learning with non-discounted costs. The analysis focuses on two main aspects: the convergence of the iterative Q-function and the stability of the system under the final iterative control policy. Unlike previous theoretical results on Q-learning, our analysis takes into account the effect of approximation errors, leading to a more comprehensive investigation. We first discuss the effect of approximation errors on the iterative Q-function update. Then, considering the presence of approximation errors in each iteration, we analyze the convergence of the iterative Q-function. Furthermore, we establish a sufficient condition, also accounting for the approximation errors, to ensure the stability of the system under the final iterative control policy. Finally, two simulation cases are conducted to validate the presented convergence and stability results.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Stabilizing value iteration Q-learning for online evolving control of discrete-time nonlinear systems
    Zhao, Mingming
    Wang, Ding
    Qiao, Junfei
    NONLINEAR DYNAMICS, 2024, 112 (11) : 9137 - 9153
  • [2] Stability Analysis of Discrete-Time Infinite-Horizon Optimal Control With Discounted Cost
    Postoyan, Romain
    Busoniu, Lucian
    Nesic, Dragan
    Daafouz, Jamal
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2736 - 2749
  • [3] Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis
    Wei, Qinglai
    Lewis, Frank L.
    Sun, Qiuye
    Yan, Pengfei
    Song, Ruizhuo
    IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (05) : 1224 - 1237
  • [4] Discrete-Time Optimal Control Scheme Based on Q-Learning Algorithm
    Wei, Qinglai
    Liu, Derong
    Song, Ruizhuo
    2016 SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2016, : 125 - 130
  • [5] Generalized value iteration for discounted optimal control with stability analysis
    Ha, Mingming
    Wang, Ding
    Liu, Derong
    SYSTEMS & CONTROL LETTERS, 2021, 147 (147)
  • [6] Convergence analysis of cooperative Q-Learning using discrete-time Lyapunov approach
    School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha 10, Bandung, Indonesia
    不详
    不详
    ICIC Express Lett., 12 (3153-3161):
  • [7] Stability analysis of discrete-time finite-horizon discounted optimal control
    Granzotto, Mathieu
    Postoyan, Romain
    Busoniu, Lucian
    Nesic, Dragan
    Daafouz, Jamal
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 2322 - 2327
  • [8] Safety-Critical Optimal Control of Discrete-Time Non-Linear Systems via Policy Iteration-Based Q-Learning
    Long, Lijun
    Liu, Xiaomei
    Huang, Xiaomin
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2025,
  • [9] Asymptotic behavior of the value functions of discrete-time discounted optimal control
    Wirth, F
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2001, 110 (01) : 183 - 210
  • [10] A DISCRETE-TIME SWITCHING SYSTEM ANALYSIS OF Q-LEARNING
    Lee, Donghwan
    Hu, Jianghai
    He, Niao
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (03) : 1861 - 1880