Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction

被引:1
|
作者
Feng, Jie [1 ]
Wei, Ke [1 ]
Chen, Jinchi [2 ]
机构
[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[2] East China Univ Sci & Technol, Sch Math, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Natural policy gradient; Reinforcement learning; Sample complexity; Variance reduction;
D O I
10.1007/s10915-024-02688-x
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate epsilon\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-optimality with a sample complexity of O(epsilon-2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{O}(\varepsilon <^>{-2})$$\end{document}, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.
引用
收藏
页数:28
相关论文
共 38 条
  • [21] Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization
    Sun, Youbang
    Liu, Tao
    Kumar, P. R.
    Shahrampour, Shahin
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1217 - 1222
  • [22] Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods
    Cheng, Ching-An
    Yan, Xinyan
    Boots, Byron
    CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [23] Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization
    Kinoshita, Yuri
    Suzuki, Taiji
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [24] Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
    Ged, Francois G.
    Veiga, Maria Han
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [25] Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
    Sun, Youbang
    Liu, Tao
    Zhou, Ruida
    Kumar, P. R.
    Shahrampour, Shahin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [26] CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION
    Cayci, Semih
    He, Niao
    Srikant, R.
    SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (03) : 2729 - 2755
  • [27] Algorithms for Variance Reduction in a Policy-Gradient Based Actor-Critic Framework
    Awate, Yogesh P.
    ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 130 - 136
  • [28] Global Convergence of Policy Gradient Algorithms for Indefinite Least Squares Stationary Optimal Control
    Bu, Jingjing
    Mesbahi, Mehran
    IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (03): : 638 - 643
  • [29] Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient Method and Global Convergence
    Jansch-Porto, Joao Paulo
    Hu, Bin
    Dullerud, Geir E.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (04) : 2475 - 2482
  • [30] Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence
    Pattathil, Sarath
    Zhang, Kaiqing
    Ozdaglar, Asuman
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206