On the Global Optimum Convergence of Momentum-based Policy Gradient

被引：0

作者：

Ding, Yuhao ^{[1
]}

Zhang, Junzi ^{[2
]}

Lavaei, Javad ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] Amazon Advertising, San Francisco, CA USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151 | 2022年 / 151卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by establishing the first set of global convergence results of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fishernon-degenerate policy parametrizations, and show that adding a momentum term improves the global optimality sample complexities of vanilla PG methods by (O) over tilde(epsilon(-1.5)) and (O) over tilde(epsilon(-1)), respectively, where epsilon > 0 is the target tolerance. Our results for the generic Fishernon-degenerate policy parametrizations also provide the first single-loop and finite-batch PG algorithm achieving an (O) over tilde (epsilon(-3)) global optimality sample complexity. Finally, as a byproduct, our analyses provide general tools for deriving the global convergence rates of stochastic PG methods, which can be readily applied and extended to other PG estimators under the two parametrizations.

引用

页数：25

共 50 条

[31] GLOBAL CONVERGENCE OF POLICY GRADIENT METHODS TO (ALMOST) LOCALLY OPTIMAL POLICIES
Zhang, Kaiqing
Koppel, Alec
Zhu, Hao
Basar, Tamer
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2020, 58 (06) : 3586 - 3612
[32] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
Xu, Yangyang
Xu, Yibo
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 196 (01) : 266 - 297
[33] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
Yangyang Xu
Yibo Xu
Journal of Optimization Theory and Applications, 2023, 196 : 266 - 297
[34] Force prediction by PIV imaging: A momentum-based approach
Unal, MF
Lin, JC
Rockwell, D
JOURNAL OF FLUIDS AND STRUCTURES, 1997, 11 (08) : 965 - 971
[35] Investor heterogeneity and momentum-based trading strategies in China
Gao, Ya
Han, Xing
Li, Youwei
Xiong, Xiong
INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS, 2021, 74
[36] A Momentum-Based Collision Detection Algorithm for Industrial Robots
He, Sumei
Ye, Jinhua
Li, Zhijing
Li, Shiyi
Wu, Guokui
Wu, Haibin
2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2015, : 1253 - 1259
[37] Momentum-based approximation of incompressible multiphase fluid flows
Cappanera, Loic
Guermond, Jean-Luc
Herreman, Wietze
Nore, Caroline
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2018, 86 (08) : 541 - 563
[38] Gradient-based adaptive algorithm with an optimum vector of convergence factors
Ghavami, M
Rupprecht, W
NaderEsfahani, S
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1995, 43 (12) : 3039 - 3042
[39] Virtual orbital angular momentum-based phase clock
Zhang, Lei
Yu, Benli
OPTICA, 2024, 11 (10): : 1468 - 1477
[40] The free retraction of natural rubber: A momentum-based model
Tunnicliffe, Lewis B.
Thomas, Alan G.
Busfield, James J. C.
POLYMER TESTING, 2015, 47 : 36 - 41

← 1 2 3 4 5 →