Convergence and stability analysis of value iteration Q-learning under non-discounted cost for discrete-time optimal control

被引:0
|
作者
Song, Shijie [1 ]
Zhao, Mingming [2 ]
Gong, Dawei [1 ]
Zhu, Minglei [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Peoples R China
[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
[3] Southwest Jiaotong Univ, Sch City & Intelligent Transportat, Chengdu 611756, Peoples R China
关键词
Adaptive dynamic programming; Adaptive critic control; Nonlinear system; Optimal control; Neural network; OPTIMAL TRACKING CONTROL; OPTIMAL ADAPTIVE-CONTROL; SYSTEMS; DESIGN;
D O I
10.1016/j.neucom.2024.128370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a theoretical analysis of the value iteration Q-learning with non-discounted costs. The analysis focuses on two main aspects: the convergence of the iterative Q-function and the stability of the system under the final iterative control policy. Unlike previous theoretical results on Q-learning, our analysis takes into account the effect of approximation errors, leading to a more comprehensive investigation. We first discuss the effect of approximation errors on the iterative Q-function update. Then, considering the presence of approximation errors in each iteration, we analyze the convergence of the iterative Q-function. Furthermore, we establish a sufficient condition, also accounting for the approximation errors, to ensure the stability of the system under the final iterative control policy. Finally, two simulation cases are conducted to validate the presented convergence and stability results.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Q-Learning Methods for LQR Control of Completely Unknown Discrete-Time Linear Systems
    Fan, Wenwu
    Xiong, Junlin
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 5933 - 5943
  • [42] An ADDHP-based Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics
    Mu, Chaoxu
    Zhao, Qian
    Sun, Changyin
    Gao, Zhongke
    APPLIED SOFT COMPUTING, 2019, 82
  • [43] Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach
    Li, Jinna
    Yuan, Decheng
    Ding, Zhengtao
    2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 7 - 12
  • [44] Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm
    Wang, Tao
    Zhang, Huaguang
    Luo, Yanhong
    NEUROCOMPUTING, 2018, 312 : 1 - 8
  • [45] FINITE-HORIZON OPTIMAL CONTROL OF DISCRETE-TIME LINEAR SYSTEMS WITH COMPLETELY UNKNOWN DYNAMICS USING Q-LEARNING
    Zhao, Jingang
    Zhang, Chi
    JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2021, 17 (03) : 1471 - 1483
  • [46] Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate
    Wang, Yuan
    Wang, Ding
    Zhao, Mingming
    Liu, Nan
    Qiao, Junfei
    NEURAL NETWORKS, 2024, 175
  • [47] Reinforcement Q-Learning and Non-Zero-Sum Games Optimal Tracking Control for Discrete-Time Linear Multi-Input Systems
    Zhao, Jin-Gang
    2023 IEEE 12TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE, DDCLS, 2023, : 277 - 282
  • [48] Comparisons of Continuous-time and Discrete-time Q-learning Schemes for Adaptive Linear Quadratic Control
    Chun, Tae Yoon
    Lee, Jae Young
    Park, Jin Bae
    Choi, Yoon Ho
    2012 PROCEEDINGS OF SICE ANNUAL CONFERENCE (SICE), 2012, : 1228 - 1233
  • [49] Optimal trajectory tracking for uncertain linear discrete-time systems using time-varying Q-learning
    Geiger, Maxwell
    Narayanan, Vignesh
    Jagannathan, Sarangapani
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2024, 38 (07) : 2340 - 2368
  • [50] Reinforcement Q-learning algorithm for H∞ tracking control of discrete-time Markov jump systems
    Shi, Jiahui
    He, Dakuo
    Zhang, Qiang
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2025, 56 (03) : 502 - 523