A Hessian Actor-Critic Algorithm

被引：0

作者：

Wang, Jing ^{[1
]}

Paschalidis, Ioannis Ch ^{[1
,2
]}

机构：

[1] Boston Univ, Div Syst Engn, 8 St Marys St, Boston, MA 02215 USA

[2] Boston Univ, Dept Elect & Comp Engn, Boston, MA 02215 USA

来源：

2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC) | 2014年

关键词：

Actor-critic algorithms; Newton's method; Markov decision processes; Autonomous robots; SENSITIVITY-ANALYSIS; POTENTIALS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider Markov Decision Processes (MDPs) following a policy parametrized by a parsimonious set of parameters and seek to optimize the policy over these parameters. In this setting, optimization can be done using a gradient ascent method. If designed well, the parameterized policy can significantly reduce the problem complexity. Existing algorithms usually suffer from slow convergence because they search along the gradient direction in a steepest ascent way. In this paper, we first propose an estimate for the Hessian of the overall reward the decision maker receives. Based on this estimate, we then introduce a new Newton-like method of the actor-critic type. We compare the new algorithm with several existing algorithms in a robotics application and demonstrate that our method exhibits faster convergence.

引用

页码：1131 / 1136

页数：6

共 50 条

[21] Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
Zhong, Shan
Liu, Quan
Fu, QiMing
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2016, 2016
[22] A connectionist actor-critic algorithm for faster learning and biological plausibility
Johard, Leonard
Ruffaldi, Emanuele
2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2014, : 3903 - 3909
[23] Natural actor-critic algorithms
Bhatnagar, Shalabh
Sutton, Richard S.
Ghavamzadeh, Mohammad
Lee, Mark
AUTOMATICA, 2009, 45 (11) : 2471 - 2482
[24] Actor-Critic Instance Segmentation
Araslanov, Nikita
Rothkopf, Constantin A.
Roth, Stefan
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8229 - 8238
[25] Procurement auctions using actor-critic type learning algorithm
Raju, CVL
Narahari, Y
Shah, S
2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 4588 - 4594
[26] An accelerated asynchronous advantage actor-critic algorithm applied in papermaking
Wang, Xuechun
Zhuang, Zhiwei
Zou, Luobao
Zhang, Weidong
PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 8637 - 8642
[27] Efficient Actor-critic Algorithm with Dual Piecewise Model Learning
Zhong, Shan
Liu, Quan
Gong, Shengrong
Fu, Qiming
Xu, Jin
2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 823 - 830
[28] Actor-Critic or Critic-Actor? A Tale of Two Time Scales
Bhatnagar, Shalabh
Borkar, Vivek S.
Guin, Soumyajit
IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 2671 - 2676
[29] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
Tasfi, Norman
Capretz, Miriam
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[30] Evaluating Correctness of Reinforcement Learning based on Actor-Critic Algorithm
Kim, Youngjae
Hussain, Manzoor
Suh, Jae-Won
Hong, Jang-Eui
2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 320 - 325

← 1 2 3 4 5 →