Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

被引:1
|
作者
Li, Chao [1 ]
Wu, Fengge [1 ]
Zhao, Junsuo [1 ]
机构
[1] Univ Chinese Acad Sci, Chinese Acad Sci, Inst Software, Beijing, Peoples R China
关键词
Deep Reinforcement Learning; Learning from Demonstrations; Self-Imitation Learning; Sample Efficiency;
D O I
10.1109/IJCNN54540.2023.10191691
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) provides a new way to generate robot control policy. However, the process of training control policy requires lengthy exploration, resulting in a low sample efficiency of reinforcement learning (RL) in real-world tasks. Both imitation learning (IL) and learning from demonstrations (LfD) improve the training process by using expert demonstrations, but imperfect expert demonstrations can mislead policy improvement. Offline to Online reinforcement learning requires a lot of offline data to initialize the policy, and distribution shift can easily lead to performance degradation during online fine-tuning. To solve the above problems, we propose a learning from demonstrations method named Accelerating Self-Imitation Learning from Demonstrations (A-SILfD), which treats expert demonstrations as the agent's successful experiences and uses experiences to constrain policy improvement. Furthermore, we prevent performance degradation due to large estimation errors in the Q-function by the ensemble Q-functions. Our experiments show that A-SILfD can significantly improve sample efficiency using a small number of different quality expert demonstrations. In four Mujoco continuous control tasks, A-SILfD can significantly outperform baseline methods after 150,000 steps of online training and is not misled by imperfect expert demonstrations during training. In addition, our ablation experiments demonstrate the effectiveness of each part of the method.
引用
收藏
页数:8
相关论文
共 17 条
  • [1] Learning Category-Level Generalizable Object Manipulation Policy Via Generative Adversarial Self-Imitation Learning From Demonstrations
    Shen, Hao
    Wan, Weikang
    Wang, He
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 11166 - 11173
  • [2] Self-Imitation Learning via Generalized Lower Bound Q-learning
    Tang, Yunhao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Perception-Aware-Based UAV Trajectory Planner via Generative Adversarial Self-Imitation Learning From Demonstrations
    Zhang, Hanxuan
    Huo, Ju
    Huang, Yulong
    Cheng, Jiajun
    Li, Xiaofeng
    IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (03): : 3248 - 3260
  • [4] Learning Robotic Skills via Self-Imitation and Guide Reward
    Ran, Chenyang
    Su, Jianbo
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2158 - 2163
  • [5] Improving Deep Reinforcement Learning with Intrinsic Rewards via Self-Imitation Learning
    Xu, Mao
    Zhao, Qian
    39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024, 2024, : 2017 - 2022
  • [6] Automated Anomaly Detection via Curiosity-Guided Search and Self-Imitation Learning
    Li, Yuening
    Chen, Zhengzhang
    Zha, Daochen
    Zhou, Kaixiong
    Jin, Haifeng
    Chen, Haifeng
    Hu, Xia
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (06) : 2365 - 2377
  • [7] Harnessing Network Effect for Fake News Mitigation: Selecting Debunkers via Self-Imitation Learning
    Xu, Xiaofei
    Deng, Ke
    Dann, Michael
    Zhang, Xiuzhen
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22447 - 22456
  • [8] Taking complementary advantages: Improving exploration via double self-imitation learning in procedurally-generated environments
    Lin, Hao
    He, Yue
    Li, Fanzhang
    Liu, Quan
    Wang, Bangjun
    Zhu, Fei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [9] Automated financial time series anomaly detection via curiosity-guided exploration and self-imitation learning
    Cao, Feifei
    Guo, Xitong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
  • [10] Self-Practice Imitation Learning from Weak Policy
    Da, Qing
    Yu, Yang
    Zhou, Zhi-Hua
    PARTIALLY SUPERVISED LEARNING, PSL 2013, 2013, 8193 : 9 - 20