Efficient Diffusion Policies for Offline Reinforcement Learning

被引：0

作者：

Kang, Bingyi ^{[1
]}

Ma, Xiao ^{[1
]}

Du, Chao ^{[1
]}

Pang, Tianyu ^{[1
]}

Yan, Shuicheng ^{[1
]}

机构：

[1] Sea AI Lab, Singapore, Singapore

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL [37] significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. EDP approximately constructs actions from corrupted ones at training to avoid running the sampling chain. We conduct extensive experiments on the D4RL benchmark. The results show that EDP can reduce the diffusion policy training time from 5 days to 5 hours on gym-locomotion tasks. Moreover, we show that EDP is compatible with various offline RL algorithms (TD3, CRR, and IQL) and achieves new state-of-the-art on D4RL by large margins over previous methods. Our code is available at https://github.com/sail-sg/edp.

引用

页数：18

共 50 条

[1] Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning
Ada, Suzan Ece
Oztop, Erhan
Ugur, Emre
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3116 - 3123
[2] On Efficient Sampling in Offline Reinforcement Learning
Jia, Qing-Shan
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 1 - 6
[3] Safe Offline Reinforcement Learning Through Hierarchical Policies
Liu, Shaofan
Sun, Shiliang
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT II, 2022, 13281 : 380 - 391
[4] Efficient Offline Reinforcement Learning With Relaxed Conservatism
Huang, Longyang
Dong, Botao
Zhang, Weidong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5260 - 5272
[5] Is Pessimism Provably Efficient for Offline Reinforcement Learning?
Jin, Ying
Yang, Zhuoran
Wang, Zhaoran
MATHEMATICS OF OPERATIONS RESEARCH, 2024,
[6] Efficient Online Reinforcement Learning with Offline Data
Ball, Philip J.
Smith, Laura
Kostrikov, Ilya
Levine, Sergey
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[7] Learning conditional policies for crystal design using offline reinforcement learning
Govindarajan, Prashant
Miret, Santiago
Rector-Brooks, Jarrid
Phielipp, Mariano
Rajendran, Janarthanan
Chandar, Sarath
DIGITAL DISCOVERY, 2024, 3 (04): : 769 - 785
[8] Offline Reinforcement Learning With Reverse Diffusion Guide Policy
Zhang, Jiazhi
Cheng, Yuhu
Cao, Shuo
Wang, Xuesong
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (10) : 11785 - 11793
[9] Sample Efficient Offline-to-Online Reinforcement Learning
Guo, Siyuan
Zou, Lixin
Chen, Hechang
Qu, Bohao
Chi, Haotian
Yu, Philip S.
Chang, Yi
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1299 - 1310
[10] ORLEP: an efficient offline reinforcement learning evaluation platform
Mao, Keming
Chen, Chen
Zhang, Jinkai
Li, Yiyang
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 37073 - 37087

← 1 2 3 4 5 →