Approximate dynamic programming via direct search in the space of value function approximations

被引:10
|
作者
Arruda, E. F. [1 ]
Fragoso, M. D. [2 ]
do Val, J. B. R. [3 ]
机构
[1] FENG PUCRS, BR-90619900 Porto Alegre, RS, Brazil
[2] CSC LNCC, BR-25651075 Petropolis, RJ, Brazil
[3] DT FEEC UNICAMP, BR-13083852 Campinas, SP, Brazil
关键词
Dynamic programming; Markov decision processes; Convex optimization; Direct search methods; CONVERGENCE;
D O I
10.1016/j.ejor.2010.11.019
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span semi-norm of the Bellman residual in a convex value function approximation space. The novelty here is that the optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to local optimality are derived. The procedure employs the classical AVI algorithm direction (Bellman residual) combined with a set of independent search directions, to improve the convergence rate. It has guaranteed convergence and satisfies, at least, the necessary optimality conditions over a prescribed set of directions. To illustrate the method, examples are presented that deal with a class of problems from the literature and a large state space queueing problem setting. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:343 / 351
页数:9
相关论文
共 50 条
  • [21] Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management
    Chen, Wei
    Huang, Dayu
    Kulkarni, Ankur A.
    Unnikrishnan, Jayakrishnan
    Zhu, Quanyan
    Mehta, Prashant
    Meyn, Sean
    Wierman, Adam
    PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 3575 - 3580
  • [22] Deploying Strategy of Tethered Space Robot with Approximate Dynamic Programming
    Ma, Zhiqiang
    Tiu, Zhengxiong
    Ge, Chengxu
    2020 IEEE INTERNATIONAL CONFERENCE ON REAL-TIME COMPUTING AND ROBOTICS (IEEE-RCAR 2020), 2020, : 222 - 226
  • [23] Accelerating value function approximations for dynamic dial-a-ride problems via dimensionality reductions
    Heitmann, R. -Julius O.
    Soeffker, Ninja
    Klawonn, Frank
    Ulmer, Marlin W.
    Mattfeld, Dirk C.
    COMPUTERS & OPERATIONS RESEARCH, 2024, 167
  • [24] Post-Decision States and Separable Approximations Are Powerful Tools of Approximate Dynamic Programming
    Ruszczynski, Andrzej
    INFORMS JOURNAL ON COMPUTING, 2010, 22 (01) : 20 - 22
  • [25] Single Agent Indirect Herding via Approximate Dynamic Programming
    Deptula, Patryk
    Bell, Zachary I.
    Zegers, Federico M.
    Licitra, Ryan A.
    Dixon, Warren E.
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 7136 - 7141
  • [26] Safe Approximate Dynamic Programming via Kernelized Lipschitz Estimation
    Chakrabarty, Ankush
    Jha, Devesh K.
    Buzzard, Gregery T.
    Wang, Yebin
    Vamvoudakis, Kyriakos G.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) : 405 - 419
  • [27] Mitigation of Coincident Peak Charges via Approximate Dynamic Programming
    Dowling, Chase P.
    Zhang, Baosen
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4202 - 4207
  • [28] Adaptive Optimal Observer Design via Approximate Dynamic Programming
    Na, Jing
    Herrmann, Guido
    Vamvoudakis, Kyriakos G.
    2017 AMERICAN CONTROL CONFERENCE (ACC), 2017, : 3288 - 3293
  • [29] VALUE FUNCTION FOR REGIONAL CONTROL PROBLEMS VIA DYNAMIC PROGRAMMING AND PONTRYAGIN MAXIMUM PRINCIPLE
    Barles, Guy
    Briani, Ariela
    Trelat, Emmanuel
    MATHEMATICAL CONTROL AND RELATED FIELDS, 2018, 8 (3-4) : 509 - 533
  • [30] Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value
    Farahmand, Amir-massoud
    Nikovski, Daniel N.
    Igarashi, Yuji
    Konaka, Hiroki
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 3123 - 3129