Efficient Diffusion Policies for Offline Reinforcement Learning

被引:0
|
作者
Kang, Bingyi [1 ]
Ma, Xiao [1 ]
Du, Chao [1 ]
Pang, Tianyu [1 ]
Yan, Shuicheng [1 ]
机构
[1] Sea AI Lab, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL [37] significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. EDP approximately constructs actions from corrupted ones at training to avoid running the sampling chain. We conduct extensive experiments on the D4RL benchmark. The results show that EDP can reduce the diffusion policy training time from 5 days to 5 hours on gym-locomotion tasks. Moreover, we show that EDP is compatible with various offline RL algorithms (TD3, CRR, and IQL) and achieves new state-of-the-art on D4RL by large margins over previous methods. Our code is available at https://github.com/sail-sg/edp.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning
    Ada, Suzan Ece
    Oztop, Erhan
    Ugur, Emre
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3116 - 3123
  • [2] On Efficient Sampling in Offline Reinforcement Learning
    Jia, Qing-Shan
    2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 1 - 6
  • [3] Safe Offline Reinforcement Learning Through Hierarchical Policies
    Liu, Shaofan
    Sun, Shiliang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT II, 2022, 13281 : 380 - 391
  • [4] Efficient Offline Reinforcement Learning With Relaxed Conservatism
    Huang, Longyang
    Dong, Botao
    Zhang, Weidong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5260 - 5272
  • [5] Is Pessimism Provably Efficient for Offline Reinforcement Learning?
    Jin, Ying
    Yang, Zhuoran
    Wang, Zhaoran
    MATHEMATICS OF OPERATIONS RESEARCH, 2024,
  • [6] Efficient Online Reinforcement Learning with Offline Data
    Ball, Philip J.
    Smith, Laura
    Kostrikov, Ilya
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [7] Learning conditional policies for crystal design using offline reinforcement learning
    Govindarajan, Prashant
    Miret, Santiago
    Rector-Brooks, Jarrid
    Phielipp, Mariano
    Rajendran, Janarthanan
    Chandar, Sarath
    DIGITAL DISCOVERY, 2024, 3 (04): : 769 - 785
  • [8] Offline Reinforcement Learning With Reverse Diffusion Guide Policy
    Zhang, Jiazhi
    Cheng, Yuhu
    Cao, Shuo
    Wang, Xuesong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (10) : 11785 - 11793
  • [9] Sample Efficient Offline-to-Online Reinforcement Learning
    Guo, Siyuan
    Zou, Lixin
    Chen, Hechang
    Qu, Bohao
    Chi, Haotian
    Yu, Philip S.
    Chang, Yi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1299 - 1310
  • [10] ORLEP: an efficient offline reinforcement learning evaluation platform
    Mao, Keming
    Chen, Chen
    Zhang, Jinkai
    Li, Yiyang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 37073 - 37087