Efficient Diffusion Policies for Offline Reinforcement Learning

被引：0

作者：

Kang, Bingyi ^{[1
]}

Ma, Xiao ^{[1
]}

Du, Chao ^{[1
]}

Pang, Tianyu ^{[1
]}

Yan, Shuicheng ^{[1
]}

机构：

[1] Sea AI Lab, Singapore, Singapore

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL [37] significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. EDP approximately constructs actions from corrupted ones at training to avoid running the sampling chain. We conduct extensive experiments on the D4RL benchmark. The results show that EDP can reduce the diffusion policy training time from 5 days to 5 hours on gym-locomotion tasks. Moreover, we show that EDP is compatible with various offline RL algorithms (TD3, CRR, and IQL) and achieves new state-of-the-art on D4RL by large margins over previous methods. Our code is available at https://github.com/sail-sg/edp.

引用

页数：18

共 50 条

[41] Hyperparameter Tuning in Offline Reinforcement Learning
Tittaferrante, Andrew
Yassine, Abdulsalam
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 585 - 590
[42] Offline reinforcement learning with task hierarchies
Devin Schwab
Soumya Ray
Machine Learning, 2017, 106 : 1569 - 1598
[43] Survival Instinct in Offline Reinforcement Learning
Li, Anqi
Misra, Dipendra
Kolobov, Andrey
Cheng, Ching-An
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[44] Offline Reinforcement Learning at Multiple Frequencies
Burns, Kaylee
Yu, Tianhe
Finn, Chelsea
Hausman, Karol
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 2041 - 2051
[45] Offline Reinforcement Learning for Mobile Notifications
Yuan, Yiping
Muralidharan, Ajith
Nandy, Preetam
Cheng, Miao
Prabhakar, Prakruthi
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3614 - 3623
[46] A DATASET PERSPECTIVE ON OFFLINE REINFORCEMENT LEARNING
Schweighofer, Kajetan
Radler, Andreas
Dinu, Marius-Constantin
Hofmarcher, Markus
Patil, Vihang
Bitto-Nemling, Angela
Eghbal-zadeh, Hamid
Hochreiter, Sepp
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
[47] Learning to Influence Human Behavior with Offline Reinforcement Learning
Hong, Joey
Levine, Sergey
Dragan, Anca
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[48] A Review of Offline Reinforcement Learning Based on Representation Learning
Wang X.-S.
Wang R.-R.
Cheng Y.-H.
Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1104 - 1128
[49] Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization
Kong, Rui
Wu, Chenyang
Gao, Chen-Xiao
Zhang, Zongzhang
Li, Ming
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4317 - 4325
[50] On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Nguyen-Tang, Thanh
Arora, Raman
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →