A Least-Squares Temporal Difference based method for solving resource allocation problems

被引:4
|
作者
Forootani, Ali [1 ]
Tipaldi, Massimo [1 ]
Zarch, Majid Ghaniee [2 ]
Liuzza, Davide [3 ]
Glielmo, Luigi [1 ]
机构
[1] Univ Sannio, Dept Engn, Piazza Roma, I-82100 Benevento, Italy
[2] Bu Ali Sina Univ, Dept Elect Engn, Hamadan, Hamadan, Iran
[3] ENEA, Fus & Nucl Safety Dept, Rome, Italy
关键词
Least-squares temporal difference; Approximate dynamic programming; Markov decision process; Birth-death process; Monte Carlo simulations; DYNAMIC-PROGRAMMING APPROACH; POLICY EVALUATION;
D O I
10.1016/j.ifacsc.2020.100106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcome the so-called curse of dimensionality associated to real stochastic processes. In this regard, we propose a novel Least-Squares Temporal Difference (LSTD) based method: the ``Multi-trajectory Greedy LSTD'' (MG-LSTD). It is an exploration-enhanced recursive LSTD algorithm with the policy improvement embedded within the LSTD iterations. It makes use of multi-trajectories Monte Carlo simulations in order to enhance the system state space exploration. This method is applied for solving resource allocation problems modeled via a constrained Stochastic Dynamic Programming (SDP) based framework. In particular, such problems are formulated as a set of parallel Birth-Death Processes (BDPs). Some operational scenarios are defined and solved to show the effectiveness of the proposed approach. Finally, we provide some experimental evidence on the MG-LSTD algorithm convergence properties in function of its key-parameters. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条