Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction

被引：1

作者：

Feng, Jie ^{[1
]}

Wei, Ke ^{[1
]}

Chen, Jinchi ^{[2
]}

机构：

[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China

[2] East China Univ Sci & Technol, Sch Math, Shanghai, Peoples R China

来源：

JOURNAL OF SCIENTIFIC COMPUTING | 2024年 / 101卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Natural policy gradient; Reinforcement learning; Sample complexity; Variance reduction;

D O I：

10.1007/s10915-024-02688-x

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate epsilon\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-optimality with a sample complexity of O(epsilon-2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{O}(\varepsilon <^>{-2})$$\end{document}, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.

引用

页数：28

共 38 条

[31] On the Convergence of Natural Policy Gradient and Mirror Descent-Like Policy Methods for Average-Reward MDPs
Murthy, Yashaswini
Srikant, R.
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 1979 - 1984
[32] Convergence of Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction
Suzuki, Taiji
Wu, Denny
Nitanda, Atsushi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[33] Global Convergence of Policy Gradient Primal-Dual Methods for Risk-Constrained LQRs
Zhao, Feiran
You, Keyou
Basar, Tamer
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (05) : 2934 - 2949
[34] Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods
Fathi, Vida
Arabneydi, Jalal
Aghdam, Amir G.
2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 4927 - 4932
[35] Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods
Guo, Xin
Hu, Anran
Zhang, Junzi
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6774 - 6782
[36] Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time
Wang, Weichen
Han, Jiequn
Yang, Zhuoran
Wang, Zhaoran
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7780 - 7791
[37] Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
Gao, Xuefeng
Gürbüzbalaban, Mert
Zhu, Lingjiong
Operations Research, 2022, 70 (05) : 2931 - 2947
[38] Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
Gao, Xuefeng
Gurbuzbalaban, Mert
Zhu, Lingjiong
OPERATIONS RESEARCH, 2021, : 2931 - 2947

← 1 2 3 4 →