Multigrid methods for policy evaluation and reinforcement learning

被引:0
|
作者
Ziv, O [1 ]
Shimkin, N [1 ]
机构
[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel
关键词
FUNCTION APPROXIMATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a new class of multigrid temporal-difference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov Decision Processes with linear functional approximation. The proposed scheme builds on the multigrid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(lambda) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments.
引用
收藏
页码:1391 / 1396
页数:6
相关论文
共 50 条
  • [21] Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation
    Winnicki, Anna
    Srikant, R.
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 801 - 806
  • [22] Autonomous helicopter control using reinforcement learning policy search methods
    Bagnell, JA
    Schneider, JG
    2001 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2001, : 1615 - 1620
  • [23] Comparison of Different Domain Randomization Methods for Policy Transfer in Reinforcement Learning
    Ma, Mingjun
    Li, Haoran
    Hu, Guangzheng
    Liu, Shasha
    Zhao, Dongbin
    2023 IEEE 12TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE, DDCLS, 2023, : 1818 - 1823
  • [24] Policy derivation methods for critic -only reinforcement learning in continuous spaces
    Alibekov, Eduard
    Kubalik, Jiri
    Babuska, Robert
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 69 : 178 - 187
  • [25] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
    Zhong, Rujie
    Zhang, Duohan
    Schafer, Lukas
    Albrecht, Stefano V.
    Hanna, Josiah P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [26] Distributed Policy Evaluation with Fractional Order Dynamics in Multiagent Reinforcement Learning
    Dai, Wei
    Wang, Wei
    Mao, Zhongtian
    Jiang, Ruwen
    Nian, Fudong
    Li, Teng
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [27] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
    Weiwei Wang
    Yuqiang Li
    Xianyi Wu
    Statistics and Computing, 2024, 34
  • [28] Policy Learning with Human Reinforcement
    Kao-Shing Hwang
    Jin-Ling Lin
    Haobin Shi
    Yu-Ying Chen
    International Journal of Fuzzy Systems, 2016, 18 : 618 - 629
  • [29] Fully asynchronous policy evaluation in distributed reinforcement learning over networks
    Sha, Xingyu
    Zhang, Jiaqi
    You, Keyou
    Zhang, Kaiqing
    Basar, Tamer
    AUTOMATICA, 2022, 136
  • [30] Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
    Jiang, Nan
    Li, Lihong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48