Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction

被引:1
|
作者
Feng, Jie [1 ]
Wei, Ke [1 ]
Chen, Jinchi [2 ]
机构
[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[2] East China Univ Sci & Technol, Sch Math, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Natural policy gradient; Reinforcement learning; Sample complexity; Variance reduction;
D O I
10.1007/s10915-024-02688-x
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate epsilon\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-optimality with a sample complexity of O(epsilon-2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{O}(\varepsilon <^>{-2})$$\end{document}, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.
引用
收藏
页数:28
相关论文
共 38 条
  • [31] On the Convergence of Natural Policy Gradient and Mirror Descent-Like Policy Methods for Average-Reward MDPs
    Murthy, Yashaswini
    Srikant, R.
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 1979 - 1984
  • [32] Convergence of Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction
    Suzuki, Taiji
    Wu, Denny
    Nitanda, Atsushi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] Global Convergence of Policy Gradient Primal-Dual Methods for Risk-Constrained LQRs
    Zhao, Feiran
    You, Keyou
    Basar, Tamer
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (05) : 2934 - 2949
  • [34] Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods
    Fathi, Vida
    Arabneydi, Jalal
    Aghdam, Amir G.
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 4927 - 4932
  • [35] Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods
    Guo, Xin
    Hu, Anran
    Zhang, Junzi
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6774 - 6782
  • [36] Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time
    Wang, Weichen
    Han, Jiequn
    Yang, Zhuoran
    Wang, Zhaoran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7780 - 7791
  • [37] Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
    Gao, Xuefeng
    Gürbüzbalaban, Mert
    Zhu, Lingjiong
    Operations Research, 2022, 70 (05) : 2931 - 2947
  • [38] Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
    Gao, Xuefeng
    Gurbuzbalaban, Mert
    Zhu, Lingjiong
    OPERATIONS RESEARCH, 2021, : 2931 - 2947