On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

被引：0

作者：

Zhang, Junyu ^{[1
]}

Ni, Chengzhuo ^{[2
]}

Yu, Zheng ^{[2
]}

Szepesvari, Csaba ^{[3
]}

Wang, Mengdi ^{[2
]}

机构：

[1] Natl Univ Singapore, Dept Ind Syst Engn & Management, Singapore 119077, Singapore

[2] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA

[3] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

ALGORITHMS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the variance reduction techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. We show an (O) over tilde (epsilon(-3)) sample complexity for TSIVR-PG to find an epsilon-stationary policy. By assuming the overparameterization of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global epsilon-optimal policy with (O) over tilde (epsilon(-2)) samples.

引用

页数：13

共 50 条

[41] Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold Optimization
Zhang, Dewei
Tajbakhsh, Sam Davanloo
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 196 (01) : 324 - 361
[42] Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold Optimization
Dewei Zhang
Sam Davanloo Tajbakhsh
Journal of Optimization Theory and Applications, 2023, 196 : 324 - 361
[43] Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization
Wang, Zhe
Zhou, Yi
Liang, Yingbin
Lan, Guanghui
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[44] MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization
Condat, Laurent
Richtarik, Peter
MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 190, 2022, 190
[45] Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient
Lin, Tianyi
Fan, Chengyou
Wang, Mengdi
Jordan, Michael I.
2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 126 - 131
[46] Stochastic Variance-Reduced Cubic Regularized Newton Methods
Zhou, Dongruo
Xu, Pan
Gu, Quanquan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[47] Variance-reduced particle methods for solving the Boltzmann equation
Baker, Lowell L.
Hadjiconstantinou, Nicolas G.
JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2008, 5 (02) : 165 - 174
[48] Stochastic Recursive Variance-Reduced Cubic Regularization Methods
Zhou, Dongruo
Gu, Quanquan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 3980 - 3989
[49] Momentum-based variance-reduced stochastic Bregman proximal gradient methods for nonconvex nonsmooth optimization
Liao, Shichen
Liu, Yan
Han, Congying
Guo, Tiande
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
[50] VARIANCE-REDUCED SIMULATION OF MULTISCALE TUMOR GROWTH MODELING
Lejon, Annelies
Mortier, Bert
Samaey, Giovanni
MULTISCALE MODELING & SIMULATION, 2017, 15 (01): : 388 - 409

← 1 2 3 4 5 →