Multigrid methods for policy evaluation and reinforcement learning

被引:0
|
作者
Ziv, O [1 ]
Shimkin, N [1 ]
机构
[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel
关键词
FUNCTION APPROXIMATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a new class of multigrid temporal-difference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov Decision Processes with linear functional approximation. The proposed scheme builds on the multigrid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(lambda) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments.
引用
收藏
页码:1391 / 1396
页数:6
相关论文
共 50 条
  • [1] Multigrid Reinforcement Learning with Reward Shaping
    Grzes, Marek
    Kudenko, Daniel
    ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
  • [2] Error bounds in reinforcement learning policy evaluation
    Lu, FC
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3501 : 438 - 449
  • [3] Least Square Policy Evaluation in Reinforcement Learning
    Zhang, Haifei
    Deng, Hailong
    Huang, Liangbin
    Hong, Ying
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INDUSTRIAL AUTOMATION (ICITIA 2015), 2015, : 583 - 590
  • [4] Independent Policy Gradient Methods for Competitive Reinforcement Learning
    Daskalakis, Constantinos
    Foster, Dylan J.
    Golowich, Noah
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] Policy gradient methods for reinforcement learning with function approximation
    Sutton, RS
    McAllester, D
    Singh, S
    Mansour, Y
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1057 - 1063
  • [6] Federated Offline Reinforcement Learning with Proximal Policy Evaluation
    Sheng YUE
    Yongheng DENG
    Guanbo WANG
    Ju REN
    Yaoxue ZHANG
    Chinese Journal of Electronics, 2024, 33 (06) : 1360 - 1372
  • [7] Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
    Ramprasad, Pratik
    Li, Yuantong
    Yang, Zhuoran
    Wang, Zhaoran
    Sun, Will Wei
    Cheng, Guang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2901 - 2914
  • [8] Federated Offline Reinforcement Learning with Proximal Policy Evaluation
    Yue, Sheng
    Deng, Yongheng
    Wang, Guanbo
    Ren, Ju
    Zhang, Yaoxue
    CHINESE JOURNAL OF ELECTRONICS, 2024, 33 (06) : 1360 - 1372
  • [9] A perspective on off-policy evaluation in reinforcement learning
    Li, Lihong
    FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
  • [10] A perspective on off-policy evaluation in reinforcement learning
    Lihong Li
    Frontiers of Computer Science, 2019, 13 : 911 - 912