Least squares solutions of the HJB equation with neural network value-function approximators

被引:47
|
作者
Tassa, Yuval [1 ]
Erez, Tom
机构
[1] Hebrew Univ Jerusalem, Interdisciplinary Ctr Neural Computat, IL-91904 Jerusalem, Israel
[2] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2007年 / 18卷 / 04期
关键词
differential neural networks (NNs); dynamic programming; feedforward neural networks; Hamilton-Jacoby-Bellman (HJB) equation; optimal control; viscosity solution;
D O I
10.1109/TNN.2007.899249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present an empirical study of iterative least squares minimization of the Hamilton-jacobi-Bellman (HJB) residual with a neural network (NN) approximation of the value function. Although the nonlinearities in the optimal control problem and NN approximator preclude theoretical guarantees and raise concerns of numerical instabilities, we present two simple methods for promoting convergence, the effectiveness of which is presented in a series of experiments. The first method involves the gradual increase of the horizon time scale, with a corresponding gradual increase in value function complexity. The second method involves the assumption of stochastic dynamics which introduces a regularizing second derivative term to the HJB equation. A gradual reduction of this term provides further stabilization of the convergence. We demonstrate the solution of several problems, including the 4-D inverted-pendulum system with bounded control. Our approach requires no initial stabilizing policy or any restrictive assumptions on the plant or cost function, only knowledge of the plant dynamics. In the Appendix, we provide the equations for first- and second-order differential backpropagation.
引用
收藏
页码:1031 / 1041
页数:11
相关论文
共 50 条
  • [21] Least-squares solutions and least-rank solutions of the matrix equation AXA* = B and their relations
    Tian, Yongge
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2013, 20 (05) : 713 - 722
  • [22] Least-squares ReLU neural network (LSNN) method for linear advection-reaction equation
    Cai, Zhiqiang
    Chen, Jingshuang
    Liu, Min
    JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 443
  • [23] Study on Adaptive Least Trimmed Squares Fuzzy Neural Network
    Liao, Shih-Hui
    Han, Ming-Feng
    Chang, Jyh-Yeong
    Lin, Chin-Teng
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2013, 15 (03) : 338 - 346
  • [24] The implementation of partial least squares with artificial neural network architecture
    Hsiao, TC
    Lin, CW
    Zeng, MT
    Chiang, HHK
    PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 20, PTS 1-6: BIOMEDICAL ENGINEERING TOWARDS THE YEAR 2000 AND BEYOND, 1998, 20 : 1341 - 1343
  • [25] PLS: A Pseudo Least Squares method for neural network training
    De Keyser, R
    ARTIFICIAL INTELLIGENCE IN REAL-TIME CONTROL 1997, 1998, : 85 - 90
  • [26] Robust learning in a partial least-squares neural network
    Ham, FM
    McDowall, TM
    NONLINEAR ANALYSIS-THEORY METHODS & APPLICATIONS, 1997, 30 (05) : 2903 - 2914
  • [27] Ranks of least squares solutions of the matrix equation A X B=C
    Liu, Yong Hui
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (06) : 1270 - 1278
  • [28] Iterative algorithms for least-squares solutions of a quaternion matrix equation
    Salman Ahmadi-Asl
    Fatemeh Panjeh Ali Beik
    Journal of Applied Mathematics and Computing, 2017, 53 : 95 - 127
  • [29] Iterative algorithms for least-squares solutions of a quaternion matrix equation
    Ahmadi-Asl, Salman
    Beik, Fatemeh Panjeh Ali
    JOURNAL OF APPLIED MATHEMATICS AND COMPUTING, 2017, 53 (1-2) : 95 - 127
  • [30] On the rank range of the least-squares solutions of the matrix equation AXB=C
    Meng, Chun-Jun
    Li, Tao-Zhen
    Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2013, 40 (07): : 92 - 94