Convergence and stability analysis of value iteration Q-learning under non-discounted cost for discrete-time optimal control

被引:0
|
作者
Song, Shijie [1 ]
Zhao, Mingming [2 ]
Gong, Dawei [1 ]
Zhu, Minglei [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Peoples R China
[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
[3] Southwest Jiaotong Univ, Sch City & Intelligent Transportat, Chengdu 611756, Peoples R China
关键词
Adaptive dynamic programming; Adaptive critic control; Nonlinear system; Optimal control; Neural network; OPTIMAL TRACKING CONTROL; OPTIMAL ADAPTIVE-CONTROL; SYSTEMS; DESIGN;
D O I
10.1016/j.neucom.2024.128370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a theoretical analysis of the value iteration Q-learning with non-discounted costs. The analysis focuses on two main aspects: the convergence of the iterative Q-function and the stability of the system under the final iterative control policy. Unlike previous theoretical results on Q-learning, our analysis takes into account the effect of approximation errors, leading to a more comprehensive investigation. We first discuss the effect of approximation errors on the iterative Q-function update. Then, considering the presence of approximation errors in each iteration, we analyze the convergence of the iterative Q-function. Furthermore, we establish a sufficient condition, also accounting for the approximation errors, to ensure the stability of the system under the final iterative control policy. Finally, two simulation cases are conducted to validate the presented convergence and stability results.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Optimal control for unknown mean-field discrete-time system based on Q-Learning
    Ge, Yingying
    Liu, Xikui
    Li, Yan
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2021, 52 (15) : 3335 - 3349
  • [22] Online Value Iteration for Discrete-Time Nonlinear Optimal Regulation with Stability Guarantee
    Wang, Yuan
    Wang, Ding
    Wu, Junlong
    Zhao, Mingming
    2022 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS, ICCR, 2022, : 262 - 268
  • [23] Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis
    Wei, Qinglai
    Lewis, Frank L.
    Liu, Derong
    Song, Ruizhuo
    Lin, Hanquan
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (06): : 875 - 891
  • [24] Optimal tracking control for discrete-time modal persistent dwell time switched systems based on Q-learning
    Zhang, Xuewen
    Wang, Yun
    Xia, Jianwei
    Li, Feng
    Shen, Hao
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2023, 44 (06): : 3327 - 3341
  • [25] Exploiting homogeneity for the optimal control of discrete-time systems: application to value iteration
    Granzotto, Mathieu
    Postoyan, Romain
    Busoniu, Lucian
    Nesic, Dragan
    Daafouz, Jamal
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 6006 - 6011
  • [26] Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems
    Li, Jinna
    Chai, Tianyou
    Lewis, Frank L.
    Ding, Zhengtao
    Jiang, Yi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1308 - 1320
  • [27] Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems
    Zhu, Yuanheng
    Zhao, Dongbin
    He, Haibo
    Ji, Junhong
    COGNITIVE COMPUTATION, 2015, 7 (06) : 763 - 771
  • [28] Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems
    Yuanheng Zhu
    Dongbin Zhao
    Haibo He
    Junhong Ji
    Cognitive Computation, 2015, 7 : 763 - 771
  • [29] System Stability of Learning-Based Linear Optimal Control With General Discounted Value Iteration
    Wang, Ding
    Ren, Jin
    Ha, Mingming
    Qiao, Junfei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6504 - 6514
  • [30] Optimal control for discrete-time affine non-linear systems using general value iteration
    Li, H.
    Liu, D.
    IET CONTROL THEORY AND APPLICATIONS, 2012, 6 (18): : 2725 - 2736