A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

被引:0
|
作者
Zheng, Haifeng [1 ]
Wang, Dan [1 ]
机构
[1] Jinan Univ, Sch Econ, Guangzhou 510632, Guangdong, Peoples R China
来源
AIMS MATHEMATICS | 2024年 / 9卷 / 12期
关键词
Markov decision processes; Deterministic system; value iteration; policy iteration; average cost criterion;
D O I
10.3934/math.20241613
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.
引用
收藏
页码:33818 / 33842
页数:25
相关论文
共 50 条
  • [21] The Policy Iteration Algorithm for Average Continuous Control of Piecewise Deterministic Markov Processes
    Costa, O. L. V.
    Dufour, F.
    PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 506 - 511
  • [22] The Policy Iteration Algorithm for Average Continuous Control of Piecewise Deterministic Markov Processes
    O. L. V. Costa
    F. Dufour
    Applied Mathematics & Optimization, 2010, 62 : 185 - 204
  • [23] Policy iteration type algorithms for recurrent state Markov decision processes
    Patek, SD
    COMPUTERS & OPERATIONS RESEARCH, 2004, 31 (14) : 2333 - 2347
  • [24] ON CONVERGENCE OF VALUE ITERATION FOR A CLASS OF TOTAL COST MARKOV DECISION PROCESSES
    Yu, Huizhen
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2015, 53 (04) : 1982 - 2016
  • [25] Advantage Based Value Iteration for Markov Decision Processes with Unknown Rewards
    Alizadeh, Pegah
    Chevaleyre, Yann
    Levy, Francois
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3837 - 3844
  • [26] Uniform convergence of value iteration policies for discounted Markov decision processes
    Cruz-Suarez, Daniel
    Montes-De-Oca, Raul
    BOLETIN DE LA SOCIEDAD MATEMATICA MEXICANA, 2006, 12 (01): : 133 - 148
  • [27] Value Iteration for Average Cost Markov Decision Processes in Borel Spaces
    Zhu, Quanxin
    Guo, Xianping
    APPLIED MATHEMATICS RESEARCH EXPRESS, 2005, (02) : 61 - 76
  • [28] Approximate Value Iteration for Risk-Aware Markov Decision Processes
    Yu, Pengqian
    Haskell, William B.
    Xu, Huan
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (09) : 3135 - 3142
  • [29] THE CONVERGENCE OF VALUE-ITERATION IN DISCOUNTED MARKOV DECISION-PROCESSES
    WHITE, DJ
    SCHERER, WT
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1994, 182 (02) : 348 - 360
  • [30] Generalized Second-Order Value Iteration in Markov Decision Processes
    Kamanchi, Chandramouli
    Diddigi, Raghuram Bharadwaj
    Bhatnagar, Shalabh
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (08) : 4241 - 4247