Least squares solutions of the HJB equation with neural network value-function approximators

被引：47

作者：

Tassa, Yuval ^{[1
]}

Erez, Tom

机构：

[1] Hebrew Univ Jerusalem, Interdisciplinary Ctr Neural Computat, IL-91904 Jerusalem, Israel

[2] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS | 2007年 / 18卷 / 04期

关键词：

differential neural networks (NNs); dynamic programming; feedforward neural networks; Hamilton-Jacoby-Bellman (HJB) equation; optimal control; viscosity solution;

D O I：

10.1109/TNN.2007.899249

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present an empirical study of iterative least squares minimization of the Hamilton-jacobi-Bellman (HJB) residual with a neural network (NN) approximation of the value function. Although the nonlinearities in the optimal control problem and NN approximator preclude theoretical guarantees and raise concerns of numerical instabilities, we present two simple methods for promoting convergence, the effectiveness of which is presented in a series of experiments. The first method involves the gradual increase of the horizon time scale, with a corresponding gradual increase in value function complexity. The second method involves the assumption of stochastic dynamics which introduces a regularizing second derivative term to the HJB equation. A gradual reduction of this term provides further stabilization of the convergence. We demonstrate the solution of several problems, including the 4-D inverted-pendulum system with bounded control. Our approach requires no initial stabilizing policy or any restrictive assumptions on the plant or cost function, only knowledge of the plant dynamics. In the Appendix, we provide the equations for first- and second-order differential backpropagation.

引用

页码：1031 / 1041

页数：11

共 50 条

[21] Least-squares solutions and least-rank solutions of the matrix equation AXA* = B and their relations
Tian, Yongge
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2013, 20 (05) : 713 - 722
[22] Least-squares ReLU neural network (LSNN) method for linear advection-reaction equation
Cai, Zhiqiang
Chen, Jingshuang
Liu, Min
JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 443
[23] Study on Adaptive Least Trimmed Squares Fuzzy Neural Network
Liao, Shih-Hui
Han, Ming-Feng
Chang, Jyh-Yeong
Lin, Chin-Teng
INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2013, 15 (03) : 338 - 346
[24] The implementation of partial least squares with artificial neural network architecture
Hsiao, TC
Lin, CW
Zeng, MT
Chiang, HHK
PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 20, PTS 1-6: BIOMEDICAL ENGINEERING TOWARDS THE YEAR 2000 AND BEYOND, 1998, 20 : 1341 - 1343
[25] PLS: A Pseudo Least Squares method for neural network training
De Keyser, R
ARTIFICIAL INTELLIGENCE IN REAL-TIME CONTROL 1997, 1998, : 85 - 90
[26] Robust learning in a partial least-squares neural network
Ham, FM
McDowall, TM
NONLINEAR ANALYSIS-THEORY METHODS & APPLICATIONS, 1997, 30 (05) : 2903 - 2914
[27] Ranks of least squares solutions of the matrix equation A X B=C
Liu, Yong Hui
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (06) : 1270 - 1278
[28] Iterative algorithms for least-squares solutions of a quaternion matrix equation
Salman Ahmadi-Asl
Fatemeh Panjeh Ali Beik
Journal of Applied Mathematics and Computing, 2017, 53 : 95 - 127
[29] Iterative algorithms for least-squares solutions of a quaternion matrix equation
Ahmadi-Asl, Salman
Beik, Fatemeh Panjeh Ali
JOURNAL OF APPLIED MATHEMATICS AND COMPUTING, 2017, 53 (1-2) : 95 - 127
[30] On the rank range of the least-squares solutions of the matrix equation AXB=C
Meng, Chun-Jun
Li, Tao-Zhen
Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2013, 40 (07): : 92 - 94

← 1 2 3 4 5 →