Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction

被引：1

作者：

Feng, Jie ^{[1
]}

Wei, Ke ^{[1
]}

Chen, Jinchi ^{[2
]}

机构：

[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China

[2] East China Univ Sci & Technol, Sch Math, Shanghai, Peoples R China

来源：

JOURNAL OF SCIENTIFIC COMPUTING | 2024年 / 101卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Natural policy gradient; Reinforcement learning; Sample complexity; Variance reduction;

D O I：

10.1007/s10915-024-02688-x

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate epsilon\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-optimality with a sample complexity of O(epsilon-2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{O}(\varepsilon <^>{-2})$$\end{document}, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.

引用

页数：28

共 38 条

[1] Compressed Gradient Methods With Hessian-Aided Error Compensation
Khirirat, Sarit
Magnusson, Sindri
Johansson, Mikael
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 998 - 1011
[2] Hessian Aided Policy Gradient
Shen, Zebang
Hassani, Hamed
Mi, Chao
Qian, Hui
Ribeiro, Alejandro
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[3] On the Global Optimum Convergence of Momentum-based Policy Gradient
Ding, Yuhao
Zhang, Junzi
Lavaei, Javad
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[4] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
Cen, Shicong
Cheng, Chen
Chen, Yuxin
Wei, Yuting
Chi, Yuejie
OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578
[5] The Importance of Variance Reduction in Policy Gradient Method
Lau, Tak Kit
Liu, Yun-hui
2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 1376 - 1381
[6] Geometry and convergence of natural policy gradient methods
Müller J.
Montúfar G.
Information Geometry, 2024, 7 (Suppl 1) : 485 - 523
[7] On the Linear Convergence of Natural Policy Gradient Algorithm
Khodadadian, Sajad
Jhunjhunwala, Prakirt Raj
Varma, Sushil Mahavir
Maguluri, Siva Theja
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3794 - 3799
[8] Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning
Chen, Jinchi
Feng, Jie
Gao, Weiguo
Wei, Ke
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[9] An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
Xu, Pan
Gao, Felicia
Gu, Quanquan
35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 541 - 551
[10] On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Zhang, Junyu
Ni, Chengzhuo
Yu, Zheng
Szepesvari, Csaba
Wang, Mengdi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 →