ASYNCHRONOUS VALUE ITERATION FOR MARKOV DECISION PROCESSES WITH CONTINUOUS STATE SPACES

被引:2
|
作者
Yang, Xiangyu [1 ]
Hu, Jian-Qiang [1 ]
Hu, Jiaqiao [2 ]
Peng, Yijie [3 ]
机构
[1] Fudan Univ, Sch Management, Shanghai, Peoples R China
[2] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[3] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
SIMULATION;
D O I
10.1109/WSC48552.2020.9384120
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We propose a simulation-based value iteration algorithm for approximately solving infinite horizon discounted MDPs with continuous state spaces and finite actions. At each time step, the algorithm employs the shrinking ball method to estimate the value function at sampled states and uses historical estimates in an interpolation-based fitting strategy to build an approximator of the optimal value function. Under moderate conditions, we prove that the sequence of approximators generated by the algorithm converges uniformly to the optimal value function with probability one. Simple numerical examples are provided to compare our algorithm with two other existing methods.
引用
收藏
页码:2856 / 2866
页数:11
相关论文
共 50 条
  • [31] Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
    Sidford, Aaron
    Wang, Mengdi
    Wu, Xian
    Ye, Yinyu
    SODA'18: PROCEEDINGS OF THE TWENTY-NINTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2018, : 770 - 787
  • [32] Value Iteration and Action ε-Approximation of Optimal Policies in Discounted Markov Decision Processes
    Montes-De-Oca, Raul
    Lemus-Rodriguez, Enrique
    RECENT ADVANCES IN APPLIED MATHEMATICS, 2009, : 213 - +
  • [33] Value Iteration for Long-Run Average Reward in Markov Decision Processes
    Ashok, Pranav
    Chatterjee, Krishnendu
    Daca, Przemyslaw
    Kretinsky, Jan
    Meggendorfer, Tobias
    COMPUTER AIDED VERIFICATION, CAV 2017, PT I, 2017, 10426 : 201 - 221
  • [34] Speeding up the convergence of value iteration in partially observable Markov decision processes
    Zhang, NL
    Zhang, WH
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 14 : 29 - 51
  • [35] A Note on Generalized Second-Order Value Iteration in Markov Decision Processes
    Vijesh, Villavarayan Antony
    Rudresha, Shreyas Sumithra
    Abdulla, Mohammed Shahid
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 199 (03) : 1022 - 1049
  • [36] A Note on Generalized Second-Order Value Iteration in Markov Decision Processes
    Villavarayan Antony Vijesh
    Shreyas Sumithra Rudresha
    Mohammed Shahid Abdulla
    Journal of Optimization Theory and Applications, 2023, 199 : 1022 - 1049
  • [37] Incremental value iteration for time-aggregated Markov-decision processes
    Sun, Tao
    Zhao, Qianchuan
    Luh, Peter B.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (11) : 2177 - 2182
  • [38] A NEW PARALLELIZED OF HIERARCHICAL VALUE ITERATION ALGORITHM FOR DISCOUNTED MARKOV DECISION PROCESSES
    Nachaoui, Mourad
    Chafik, Sanae
    Daoui, Cherki
    DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS-SERIES S, 2025, 18 (01): : 1 - 14
  • [39] Sketched Newton Value Iteration for Large-Scale Markov Decision Processes
    Liu, Jinsong
    Xie, Chenghan
    Deng, Qi
    Ge, Dongdong
    Ye, Yinyu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13936 - 13944
  • [40] Toward an Optimized Value Iteration Algorithm for Average Cost Markov Decision Processes
    Arruda, Edilson F.
    Ourique, Fabricio
    Almudevar, Anthony
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 930 - 934