Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction

被引:1
|
作者
Feng, Jie [1 ]
Wei, Ke [1 ]
Chen, Jinchi [2 ]
机构
[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[2] East China Univ Sci & Technol, Sch Math, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Natural policy gradient; Reinforcement learning; Sample complexity; Variance reduction;
D O I
10.1007/s10915-024-02688-x
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate epsilon\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-optimality with a sample complexity of O(epsilon-2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{O}(\varepsilon <^>{-2})$$\end{document}, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.
引用
收藏
页数:28
相关论文
共 38 条
  • [11] Stochastic zeroth-order gradient and Hessian estimators: variance reduction and refined bias bounds
    Feng, Yasong
    Wang, Tianyu
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2023, 12 (03)
  • [12] Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
    Fazel, Maryam
    Ge, Rong
    Kakade, Sham M.
    Mesbahi, Mehran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [13] A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC
    Changyou CHEN
    Wenlin WANG
    Yizhe ZHANG
    Qinliang SU
    Lawrence CARIN
    Science China(Information Sciences), 2019, 62 (01) : 67 - 79
  • [14] A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC
    Chen, Changyou
    Wang, Wenlin
    Zhang, Yizhe
    Su, Qinliang
    Carin, Lawrence
    SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (01)
  • [15] A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC
    Changyou Chen
    Wenlin Wang
    Yizhe Zhang
    Qinliang Su
    Lawrence Carin
    Science China Information Sciences, 2019, 62
  • [16] Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
    Cen, Shicong
    Chen, Fan
    Chi, Yuejie
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2833 - 2838
  • [17] Ordering-based Conditions for Global Convergence of Policy Gradient Methods
    Mei, Jincheng
    Dai, Bo
    Agarwal, Alekh
    Ghavamzadeh, Mohammad
    Szepesvari, Csaba
    Schuurmans, Dale
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [18] GLOBAL CONVERGENCE OF POLICY GRADIENT METHODS TO (ALMOST) LOCALLY OPTIMAL POLICIES
    Zhang, Kaiqing
    Koppel, Alec
    Zhu, Hao
    Basar, Tamer
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2020, 58 (06) : 3586 - 3612
  • [19] Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems
    Yang, Junchi
    Kiyavash, Negar
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [20] On linear and super-linear convergence of Natural Policy Gradient algorithm
    Khodadadian, Sajad
    Jhunjhunwala, Prakirt Raj
    Varma, Sushil Mahavir
    Maguluri, Siva Theja
    SYSTEMS & CONTROL LETTERS, 2022, 164