Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction

被引：1

作者：

Feng, Jie ^{[1
]}

Wei, Ke ^{[1
]}

Chen, Jinchi ^{[2
]}

机构：

[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China

[2] East China Univ Sci & Technol, Sch Math, Shanghai, Peoples R China

来源：

JOURNAL OF SCIENTIFIC COMPUTING | 2024年 / 101卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Natural policy gradient; Reinforcement learning; Sample complexity; Variance reduction;

D O I：

10.1007/s10915-024-02688-x

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate epsilon\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-optimality with a sample complexity of O(epsilon-2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{O}(\varepsilon <^>{-2})$$\end{document}, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.

引用

页数：28

共 38 条

[21] Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization
Sun, Youbang
Liu, Tao
Kumar, P. R.
Shahrampour, Shahin
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1217 - 1222
[22] Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods
Cheng, Ching-An
Yan, Xinyan
Boots, Byron
CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
[23] Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization
Kinoshita, Yuri
Suzuki, Taiji
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[24] Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
Ged, Francois G.
Veiga, Maria Han
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[25] Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
Sun, Youbang
Liu, Tao
Zhou, Ruida
Kumar, P. R.
Shahrampour, Shahin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[26] CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION
Cayci, Semih
He, Niao
Srikant, R.
SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (03) : 2729 - 2755
[27] Algorithms for Variance Reduction in a Policy-Gradient Based Actor-Critic Framework
Awate, Yogesh P.
ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 130 - 136
[28] Global Convergence of Policy Gradient Algorithms for Indefinite Least Squares Stationary Optimal Control
Bu, Jingjing
Mesbahi, Mehran
IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (03): : 638 - 643
[29] Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient Method and Global Convergence
Jansch-Porto, Joao Paulo
Hu, Bin
Dullerud, Geir E.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (04) : 2475 - 2482
[30] Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence
Pattathil, Sarath
Zhang, Kaiqing
Ozdaglar, Asuman
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206

← 1 2 3 4 →