Offline-Online Actor-Critic

被引:1
|
作者
Wang X. [1 ]
Hou D. [1 ]
Huang L. [1 ]
Cheng Y. [1 ]
机构
[1] China University of Mining and Technology, Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education and School of Information and Control Engineering, Xuzhou
来源
关键词
Actor-critic; behavior clone (BC) constraint; distribution shift; offline-online reinforcement learning (RL); policy performance degradation;
D O I
10.1109/TAI.2022.3225251
中图分类号
学科分类号
摘要
Offline-online reinforcement learning (RL) can effectively address the problem of missing data (commonly known as transition) in offline RL. However, due to the effect of distribution shift, the performance of policy may degrade when an agent moves from offline to online training phases. In this article, we first analyze the problems of distribution shift and policy performance degradation in offline-online RL. Then, in order to alleviate these problems, we propose a novel RL algorithm offline-online actor-critic (O2AC) algorithm. In O2AC, a behavior clone constraint term is introduced into the policy objective function to address the distribution shift in offline training phase. In addition, in online training phase, the influence of the behavior clone constraint term is gradually reduced, which alleviates the policy performance degradation. Experiments show that O2AC outperforms existing offline-online RL algorithms. © 2020 IEEE.
引用
收藏
页码:61 / 69
页数:8
相关论文
共 50 条
  • [21] A Hessian Actor-Critic Algorithm
    Wang, Jing
    Paschalidis, Ioannis Ch
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1131 - 1136
  • [22] Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
    Zhou, Wei
    Li, Yiying
    Yang, Yongxin
    Wang, Huaimin
    Hospedales, Timothy M.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [23] Natural actor-critic algorithms
    Bhatnagar, Shalabh
    Sutton, Richard S.
    Ghavamzadeh, Mohammad
    Lee, Mark
    AUTOMATICA, 2009, 45 (11) : 2471 - 2482
  • [24] Actor-Critic Instance Segmentation
    Araslanov, Nikita
    Rothkopf, Constantin A.
    Roth, Stefan
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8229 - 8238
  • [25] Actor-Critic or Critic-Actor? A Tale of Two Time Scales
    Bhatnagar, Shalabh
    Borkar, Vivek S.
    Guin, Soumyajit
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 2671 - 2676
  • [26] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
    Tasfi, Norman
    Capretz, Miriam
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [27] An Online Actor-Critic Learning Approach with Levenberg-Marquardt Algorithm
    Ni, Zhen
    He, Haibo
    Prokhorov, Danil V.
    Fu, Jian
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 2333 - 2340
  • [28] Applying Online Expert Supervision in Deep Actor-Critic Reinforcement Learning
    Zhang, Jin
    Chen, Jiansheng
    Huang, Yiqing
    Wan, Weitao
    Li, Tianpeng
    PATTERN RECOGNITION AND COMPUTER VISION, PT II, 2018, 11257 : 469 - 478
  • [29] Speed Tracking Control via Online Continuous Actor-Critic learning
    Huang, Zhenhua
    Xu, Xin
    Sun, Zhenping
    Tan, Jun
    Qian, Lilin
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 3172 - 3177
  • [30] Importance sampling actor-critic algorithms
    Williams, Jason L.
    Fisher, John W., III
    Willsky, Alan S.
    2006 AMERICAN CONTROL CONFERENCE, VOLS 1-12, 2006, 1-12 : 1625 - +