Approximate dynamic programming via direct search in the space of value function approximations

被引：10

作者：

Arruda, E. F. ^{[1
]}

Fragoso, M. D. ^{[2
]}

do Val, J. B. R. ^{[3
]}

机构：

[1] FENG PUCRS, BR-90619900 Porto Alegre, RS, Brazil

[2] CSC LNCC, BR-25651075 Petropolis, RJ, Brazil

[3] DT FEEC UNICAMP, BR-13083852 Campinas, SP, Brazil

来源：

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH | 2011年 / 211卷 / 02期

关键词：

Dynamic programming; Markov decision processes; Convex optimization; Direct search methods; CONVERGENCE;

D O I：

10.1016/j.ejor.2010.11.019

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span semi-norm of the Bellman residual in a convex value function approximation space. The novelty here is that the optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to local optimality are derived. The procedure employs the classical AVI algorithm direction (Bellman residual) combined with a set of independent search directions, to improve the convergence rate. It has guaranteed convergence and satisfies, at least, the necessary optimality conditions over a prescribed set of directions. To illustrate the method, examples are presented that deal with a class of problems from the literature and a large state space queueing problem setting. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：343 / 351

页数：9

共 50 条

[21] Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management
Chen, Wei
Huang, Dayu
Kulkarni, Ankur A.
Unnikrishnan, Jayakrishnan
Zhu, Quanyan
Mehta, Prashant
Meyn, Sean
Wierman, Adam
PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 3575 - 3580
[22] Deploying Strategy of Tethered Space Robot with Approximate Dynamic Programming
Ma, Zhiqiang
Tiu, Zhengxiong
Ge, Chengxu
2020 IEEE INTERNATIONAL CONFERENCE ON REAL-TIME COMPUTING AND ROBOTICS (IEEE-RCAR 2020), 2020, : 222 - 226
[23] Accelerating value function approximations for dynamic dial-a-ride problems via dimensionality reductions
Heitmann, R. -Julius O.
Soeffker, Ninja
Klawonn, Frank
Ulmer, Marlin W.
Mattfeld, Dirk C.
COMPUTERS & OPERATIONS RESEARCH, 2024, 167
[24] Post-Decision States and Separable Approximations Are Powerful Tools of Approximate Dynamic Programming
Ruszczynski, Andrzej
INFORMS JOURNAL ON COMPUTING, 2010, 22 (01) : 20 - 22
[25] Single Agent Indirect Herding via Approximate Dynamic Programming
Deptula, Patryk
Bell, Zachary I.
Zegers, Federico M.
Licitra, Ryan A.
Dixon, Warren E.
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 7136 - 7141
[26] Safe Approximate Dynamic Programming via Kernelized Lipschitz Estimation
Chakrabarty, Ankush
Jha, Devesh K.
Buzzard, Gregery T.
Wang, Yebin
Vamvoudakis, Kyriakos G.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) : 405 - 419
[27] Mitigation of Coincident Peak Charges via Approximate Dynamic Programming
Dowling, Chase P.
Zhang, Baosen
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4202 - 4207
[28] Adaptive Optimal Observer Design via Approximate Dynamic Programming
Na, Jing
Herrmann, Guido
Vamvoudakis, Kyriakos G.
2017 AMERICAN CONTROL CONFERENCE (ACC), 2017, : 3288 - 3293
[29] VALUE FUNCTION FOR REGIONAL CONTROL PROBLEMS VIA DYNAMIC PROGRAMMING AND PONTRYAGIN MAXIMUM PRINCIPLE
Barles, Guy
Briani, Ariela
Trelat, Emmanuel
MATHEMATICAL CONTROL AND RELATED FIELDS, 2018, 8 (3-4) : 509 - 533
[30] Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value
Farahmand, Amir-massoud
Nikovski, Daniel N.
Igarashi, Yuji
Konaka, Hiroki
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 3123 - 3129

← 1 2 3 4 5 →