Approximate Dynamic Programming via a Smoothed Linear Program

被引：34

作者：

Desai, Vijay V. ^{[1
]}

Farias, Vivek F. ^{[2
]}

Moallemi, Ciamac C. ^{[3
]}

机构：

[1] Columbia Univ, Dept Ind Engn & Operat Res, New York, NY 10027 USA

[2] MIT, Sloan Sch Management, Cambridge, MA 02139 USA

[3] Columbia Univ, Grad Sch Business, New York, NY 10027 USA

来源：

OPERATIONS RESEARCH | 2012年 / 60卷 / 03期

关键词：

CONVERGENCE; POLICIES;

D O I：

10.1287/opre.1120.1044

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural "projection" of a well-studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program-the "smoothed approximate linear program"-is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. These bounds are, in general, no worse than those available for extant LP approaches and for specific problem instances can be shown to be arbitrarily stronger. Second, experiments with our approach on a pair of challenging problems (the game of Tetris and a queueing network control problem) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by a substantial margin.

引用

页码：655 / 674

页数：20

共 50 条

[21] Safe Approximate Dynamic Programming via Kernelized Lipschitz Estimation
Chakrabarty, Ankush
Jha, Devesh K.
Buzzard, Gregery T.
Wang, Yebin
Vamvoudakis, Kyriakos G.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) : 405 - 419
[22] Mitigation of Coincident Peak Charges via Approximate Dynamic Programming
Dowling, Chase P.
Zhang, Baosen
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4202 - 4207
[23] Adaptive Optimal Observer Design via Approximate Dynamic Programming
Na, Jing
Herrmann, Guido
Vamvoudakis, Kyriakos G.
2017 AMERICAN CONTROL CONFERENCE (ACC), 2017, : 3288 - 3293
[24] Program risk definition via linear programming techniques
Pighin, M
Podgorelec, V
Kokol, P
EIGHTH IEEE SYMPOSIUM ON SOFTWARE METRICS, PROCEEDINGS, 2002, : 197 - 202
[25] Model-free approximate dynamic programming schemes for linear systems
Al-Tamimi, Asma
Vrabie, Draguna
Abu-Khalaf, Murad
Lewis, Frank L.
2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 371 - +
[26] On constraint sampling in the linear programming approach to approximate linear programming
de Farias, DP
Van Roy, B
42ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, PROCEEDINGS, 2003, : 2441 - 2446
[27] The discrete Radon transform and its approximate inversion via linear programming
Fishburn, P
Schwander, P
Shepp, L
Vanderbei, RJ
DISCRETE APPLIED MATHEMATICS, 1997, 75 (01) : 39 - 61
[28] Smoothed analysis of termination of linear programming algorithms
Daniel A. Spielman
Shang-Hua Teng
Mathematical Programming, 2003, 97 : 375 - 404
[29] Smoothed analysis of termination of linear programming algorithms
Spielman, DA
Teng, SH
MATHEMATICAL PROGRAMMING, 2003, 97 (1-2) : 375 - 404
[30] Smoothed analysis of the perceptron algorithm for linear programming
Blum, A
Dunagan, J
PROCEEDINGS OF THE THIRTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2002, : 905 - 914

← 1 2 3 4 5 →