Error Bound Analysis of Q-Function for Discounted Optimal Control Problems With Policy Iteration

被引:25
|
作者
Yan, Pengfei [1 ]
Wang, Ding [1 ]
Li, Hongliang [2 ]
Liu, Derong [3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[2] IBM Res China, Beijing 100193, Peoples R China
[3] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Adaptive dynamic programming (ADP); error analysis; nonlinear systems; policy iteration; Q-function; UNCERTAIN NONLINEAR-SYSTEMS; UNKNOWN INTERNAL DYNAMICS; ADAPTIVE OPTIMAL-CONTROL; DISCRETE-TIME-SYSTEMS; H-INFINITY CONTROL; ZERO-SUM GAMES; HJB SOLUTION; PERFORMANCE; ALGORITHM; DESIGN;
D O I
10.1109/TSMC.2016.2563982
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present error bound analysis of the Q-function for the action-dependent adaptive dynamic programming for solving discounted optimal control problems of unknown discrete-time nonlinear systems. The convergence of Q-functions derived by a policy iteration algorithm under ideal conditions is given. Considering the approximated errors of the Q-function and control policy in the policy evaluation step and policy improvement step, we establish error bounds of approximate Q-functions in each iteration. With the given boundedness conditions, the approximate Q-function will converge to a finite neighborhood of the optimal Q-function. To implement the presented algorithm, two three-layer neural networks are employed to approximate the Q-function and the control policy, respectively. Finally, a simulation example is utilized to verify the validity of the presented algorithm.
引用
收藏
页码:1207 / 1216
页数:10
相关论文
共 50 条
  • [1] On policy iteration-based discounted optimal control
    Dong, Botao
    Huang, Longyang
    Ma, Xiwen
    Zhang, Weidong
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (07) : 4926 - 4942
  • [2] A Simple Upper Bound of the Gaussian Q-Function with Closed-Form Error Bound
    Jang, Won Mee
    IEEE COMMUNICATIONS LETTERS, 2011, 15 (02) : 157 - 159
  • [3] Learning Q-Function Approximations for Hybrid Control Problems
    Menta, Sandeep
    Warrington, Joseph
    Lygeros, John
    Morari, Manfred
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 1364 - 1369
  • [4] Generalized value iteration for discounted optimal control with stability analysis
    Ha, Mingming
    Wang, Ding
    Liu, Derong
    SYSTEMS & CONTROL LETTERS, 2021, 147 (147)
  • [5] Tight geometric bound for Marcum Q-function
    Zhao, X.
    Gong, D.
    Li, Y.
    ELECTRONICS LETTERS, 2008, 44 (05) : 340 - 341
  • [6] MODIFIED POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROBLEMS
    PUTERMAN, ML
    SHIN, MC
    MANAGEMENT SCIENCE, 1978, 24 (11) : 1127 - 1137
  • [7] Class of tight bounds on the Q-function with closed-form upper bound on relative error
    Peric, Zoran H.
    Nikolic, Jelena R.
    Petkovic, Marko D.
    MATHEMATICAL METHODS IN THE APPLIED SCIENCES, 2019, 42 (06) : 1786 - 1794
  • [8] Optimal stochastic control policy of discounted problems with quadratic cost in investment
    Yuan Ji-hong
    Liu Kun-hui
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING (14TH) VOLS 1-3, 2007, : 2016 - +
  • [9] Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
    Bertsekas, Dimitri P.
    Yu, Huizhen
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 1409 - 1416
  • [10] Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
    Bertsekas, Dimitri P.
    Yu, Huizhen
    MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (01) : 66 - 94