Learning conditional policies for crystal design using offline reinforcement learning

被引：0

作者：

Govindarajan, Prashant ^{[1
]}

Miret, Santiago ^{[2
]}

Rector-Brooks, Jarrid ^{[3
]}

Phielipp, Mariano ^{[2
]}

Rajendran, Janarthanan ^{[3
]}

Chandar, Sarath ^{[1
]}

机构：

[1] Mila Quebec AI Inst, Polytech, Montreal, PQ, Canada

[2] Intel Labs, Hillsboro, OR USA

[3] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ, Canada

来源：

DIGITAL DISCOVERY | 2024年 / 3卷 / 04期

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

10.1039/d4dd00024b

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Navigating through the exponentially large chemical space to search for desirable materials is an extremely challenging task in material discovery. Recent developments in generative and geometric deep learning have shown promising results in molecule and material discovery but often lack evaluation with high-accuracy computational methods. This work aims to design novel and stable crystalline materials conditioned on a desired band gap. To achieve conditional generation, we: (1) formulate crystal design as a sequential decision-making problem, create relevant trajectories based on high-quality materials data, and use conservative Q-learning to learn a conditional policy from these trajectories. To do so, we formulate a reward function that incorporates constraints for energetic and electronic properties obtained directly from density functional theory (DFT) calculations; (2) evaluate the generated materials from the policy using DFT calculations for both energy and band gap; (3) compare our results to relevant baselines, including behavioral cloning and unconditioned policy learning. Our experiments show that conditioned policies achieve targeted crystal design and demonstrate the capability to perform crystal discovery evaluated with accurate and computationally expensive DFT calculations. Conservative Q-learning for band-gap conditioned crystal design with DFT evaluations - the model is trained on trajectories constructed from crystals in the Materials Project. Results indicate promising performance for lower band gap targets.

引用

页码：769 / 785

页数：17

共 50 条

[1] Efficient Diffusion Policies for Offline Reinforcement Learning
Kang, Bingyi
Ma, Xiao
Du, Chao
Pang, Tianyu
Yan, Shuicheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] Safe Offline Reinforcement Learning Through Hierarchical Policies
Liu, Shaofan
Sun, Shiliang
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT II, 2022, 13281 : 380 - 391
[3] Offline Reinforcement Learning with Pseudometric Learning
Dadashi, Robert
Rezaeifar, Shideh
Vieillard, Nino
Hussenot, Leonard
Pietquin, Olivier
Geist, Matthieu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[4] Robust Reinforcement Learning using Offline Data
Panaganti, Kishan
Xu, Zaiyan
Kalathil, Dileep
Ghavamzadeh, Mohammad
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[5] Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning
Ada, Suzan Ece
Oztop, Erhan
Ugur, Emre
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3116 - 3123
[6] Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
Zhang, Ruiqi
Zanette, Andrea
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[7] Learning Behavior of Offline Reinforcement Learning Agents
Shukla, Indu
Dozier, Haley. R.
Henslee, Althea. C.
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
[8] Benchmarking Offline Reinforcement Learning
Tittaferrante, Andrew
Yassine, Abdulsalam
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 259 - 263
[9] Federated Offline Reinforcement Learning
Zhou, Doudou
Zhang, Yufeng
Sonabend-W, Aaron
Wang, Zhaoran
Lu, Junwei
Cai, Tianxi
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3152 - 3163
[10] Distributed Offline Reinforcement Learning
Heredia, Paulo
George, Jemin
Mou, Shaoshuai
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 4621 - 4626

← 1 2 3 4 5 →