Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction

被引:1
|
作者
Feng, Jie [1 ]
Wei, Ke [1 ]
Chen, Jinchi [2 ]
机构
[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[2] East China Univ Sci & Technol, Sch Math, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Natural policy gradient; Reinforcement learning; Sample complexity; Variance reduction;
D O I
10.1007/s10915-024-02688-x
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate epsilon\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-optimality with a sample complexity of O(epsilon-2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{O}(\varepsilon <^>{-2})$$\end{document}, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.
引用
收藏
页数:28
相关论文
共 38 条
  • [1] Compressed Gradient Methods With Hessian-Aided Error Compensation
    Khirirat, Sarit
    Magnusson, Sindri
    Johansson, Mikael
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 998 - 1011
  • [2] Hessian Aided Policy Gradient
    Shen, Zebang
    Hassani, Hamed
    Mi, Chao
    Qian, Hui
    Ribeiro, Alejandro
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [3] On the Global Optimum Convergence of Momentum-based Policy Gradient
    Ding, Yuhao
    Zhang, Junzi
    Lavaei, Javad
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [4] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
    Cen, Shicong
    Cheng, Chen
    Chen, Yuxin
    Wei, Yuting
    Chi, Yuejie
    OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578
  • [5] The Importance of Variance Reduction in Policy Gradient Method
    Lau, Tak Kit
    Liu, Yun-hui
    2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 1376 - 1381
  • [6] Geometry and convergence of natural policy gradient methods
    Müller J.
    Montúfar G.
    Information Geometry, 2024, 7 (Suppl 1) : 485 - 523
  • [7] On the Linear Convergence of Natural Policy Gradient Algorithm
    Khodadadian, Sajad
    Jhunjhunwala, Prakirt Raj
    Varma, Sushil Mahavir
    Maguluri, Siva Theja
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3794 - 3799
  • [8] Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning
    Chen, Jinchi
    Feng, Jie
    Gao, Weiguo
    Wei, Ke
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [9] An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
    Xu, Pan
    Gao, Felicia
    Gu, Quanquan
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 541 - 551
  • [10] On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
    Zhang, Junyu
    Ni, Chengzhuo
    Yu, Zheng
    Szepesvari, Csaba
    Wang, Mengdi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34