Improving Offline Reinforcement Learning with Inaccurate Simulators

被引:0
|
作者
Hou, Yiwen [1 ]
Sun, Haoyuan [1 ]
Ma, Jinming [1 ]
Wu, Feng [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICRA57147.2024.10610833
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Offline reinforcement learning (RL) provides a promising approach to avoid costly online interaction with the real environment. However, the performance of offline RL highly depends on the quality of the datasets, which may cause extrapolation error in the learning process. In many robotic applications, an inaccurate simulator is often available. However, the data directly collected from the inaccurate simulator cannot be directly used in offline RL due to the well-known exploration-exploitation dilemma and the dynamic gap between inaccurate simulation and the real environment. To address these issues, we propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner. Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset. Given this, we collect data from the inaccurate simulator starting from the distribution provided by the generator and reweight the simulated data using the discriminator. Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
引用
收藏
页码:5162 / 5168
页数:7
相关论文
共 50 条
  • [31] A Review of Offline Reinforcement Learning Based on Representation Learning
    Wang X.-S.
    Wang R.-R.
    Cheng Y.-H.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1104 - 1128
  • [32] Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
    Mazoure, Bogdan
    Kostrikov, Ilya
    Nachum, Ofir
    Tompson, Jonathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] Deadly triad matters for offline reinforcement learning
    Peng, Zhiyong
    Liu, Yadong
    Zhou, Zongtan
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [34] Robust Reinforcement Learning using Offline Data
    Panaganti, Kishan
    Xu, Zaiyan
    Kalathil, Dileep
    Ghavamzadeh, Mohammad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [35] Discrete Uncertainty Quantification For Offline Reinforcement Learning
    Perez, Jose Luis
    Corrochano, Javier
    Garcia, Javier
    Majadas, Ruben
    Ibanez-Llano, Cristina
    Perez, Sergio
    Fernandez, Fernando
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2023, 13 (04) : 273 - 287
  • [36] Supported Value Regularization for Offline Reinforcement Learning
    Mao, Yixiu
    Zhang, Hongchang
    Chen, Chen
    Xu, Yi
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Boundary Data Augmentation for Offline Reinforcement Learning
    SHEN Jiahao
    JIANG Ke
    TAN Xiaoyang
    ZTECommunications, 2023, 21 (03) : 29 - 36
  • [38] Fast Rates for the Regret of Offline Reinforcement Learning
    Hu, Yichun
    Kallus, Nathan
    Uehara, Masatoshi
    MATHEMATICS OF OPERATIONS RESEARCH, 2025, 50 (01)
  • [39] Mutual Information Regularized Offline Reinforcement Learning
    Ma, Xiao
    Kang, Bingyi
    Xu, Zhongwen
    Lin, Min
    Yan, Shuicheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] Revisiting the Minimalist Approach to Offline Reinforcement Learning
    Tarasov, Denis
    Kurenkov, Vladislav
    Nikulin, Alexander
    Kolesnikov, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,