Approximate Dynamic Programming via a Smoothed Linear Program

被引:34
|
作者
Desai, Vijay V. [1 ]
Farias, Vivek F. [2 ]
Moallemi, Ciamac C. [3 ]
机构
[1] Columbia Univ, Dept Ind Engn & Operat Res, New York, NY 10027 USA
[2] MIT, Sloan Sch Management, Cambridge, MA 02139 USA
[3] Columbia Univ, Grad Sch Business, New York, NY 10027 USA
关键词
CONVERGENCE; POLICIES;
D O I
10.1287/opre.1120.1044
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural "projection" of a well-studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program-the "smoothed approximate linear program"-is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. These bounds are, in general, no worse than those available for extant LP approaches and for specific problem instances can be shown to be arbitrarily stronger. Second, experiments with our approach on a pair of challenging problems (the game of Tetris and a queueing network control problem) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by a substantial margin.
引用
收藏
页码:655 / 674
页数:20
相关论文
共 50 条
  • [21] Safe Approximate Dynamic Programming via Kernelized Lipschitz Estimation
    Chakrabarty, Ankush
    Jha, Devesh K.
    Buzzard, Gregery T.
    Wang, Yebin
    Vamvoudakis, Kyriakos G.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) : 405 - 419
  • [22] Mitigation of Coincident Peak Charges via Approximate Dynamic Programming
    Dowling, Chase P.
    Zhang, Baosen
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4202 - 4207
  • [23] Adaptive Optimal Observer Design via Approximate Dynamic Programming
    Na, Jing
    Herrmann, Guido
    Vamvoudakis, Kyriakos G.
    2017 AMERICAN CONTROL CONFERENCE (ACC), 2017, : 3288 - 3293
  • [24] Program risk definition via linear programming techniques
    Pighin, M
    Podgorelec, V
    Kokol, P
    EIGHTH IEEE SYMPOSIUM ON SOFTWARE METRICS, PROCEEDINGS, 2002, : 197 - 202
  • [25] Model-free approximate dynamic programming schemes for linear systems
    Al-Tamimi, Asma
    Vrabie, Draguna
    Abu-Khalaf, Murad
    Lewis, Frank L.
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 371 - +
  • [26] On constraint sampling in the linear programming approach to approximate linear programming
    de Farias, DP
    Van Roy, B
    42ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, PROCEEDINGS, 2003, : 2441 - 2446
  • [27] The discrete Radon transform and its approximate inversion via linear programming
    Fishburn, P
    Schwander, P
    Shepp, L
    Vanderbei, RJ
    DISCRETE APPLIED MATHEMATICS, 1997, 75 (01) : 39 - 61
  • [28] Smoothed analysis of termination of linear programming algorithms
    Daniel A. Spielman
    Shang-Hua Teng
    Mathematical Programming, 2003, 97 : 375 - 404
  • [29] Smoothed analysis of termination of linear programming algorithms
    Spielman, DA
    Teng, SH
    MATHEMATICAL PROGRAMMING, 2003, 97 (1-2) : 375 - 404
  • [30] Smoothed analysis of the perceptron algorithm for linear programming
    Blum, A
    Dunagan, J
    PROCEEDINGS OF THE THIRTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2002, : 905 - 914