Learning conditional policies for crystal design using offline reinforcement learning

被引：0

作者：

Govindarajan, Prashant ^{[1
]}

Miret, Santiago ^{[2
]}

Rector-Brooks, Jarrid ^{[3
]}

Phielipp, Mariano ^{[2
]}

Rajendran, Janarthanan ^{[3
]}

Chandar, Sarath ^{[1
]}

机构：

[1] Mila Quebec AI Inst, Polytech, Montreal, PQ, Canada

[2] Intel Labs, Hillsboro, OR USA

[3] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ, Canada

来源：

DIGITAL DISCOVERY | 2024年 / 3卷 / 04期

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

10.1039/d4dd00024b

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Navigating through the exponentially large chemical space to search for desirable materials is an extremely challenging task in material discovery. Recent developments in generative and geometric deep learning have shown promising results in molecule and material discovery but often lack evaluation with high-accuracy computational methods. This work aims to design novel and stable crystalline materials conditioned on a desired band gap. To achieve conditional generation, we: (1) formulate crystal design as a sequential decision-making problem, create relevant trajectories based on high-quality materials data, and use conservative Q-learning to learn a conditional policy from these trajectories. To do so, we formulate a reward function that incorporates constraints for energetic and electronic properties obtained directly from density functional theory (DFT) calculations; (2) evaluate the generated materials from the policy using DFT calculations for both energy and band gap; (3) compare our results to relevant baselines, including behavioral cloning and unconditioned policy learning. Our experiments show that conditioned policies achieve targeted crystal design and demonstrate the capability to perform crystal discovery evaluated with accurate and computationally expensive DFT calculations. Conservative Q-learning for band-gap conditioned crystal design with DFT evaluations - the model is trained on trajectories constructed from crystals in the Materials Project. Results indicate promising performance for lower band gap targets.

引用

页码：769 / 785

页数：17

共 50 条

[21] On Efficient Sampling in Offline Reinforcement Learning
Jia, Qing-Shan
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 1 - 6
[22] Conservative network for offline reinforcement learning
Peng, Zhiyong
Liu, Yadong
Chen, Haoqiang
Zhou, Zongtan
KNOWLEDGE-BASED SYSTEMS, 2023, 282
[23] Conservative Offline Distributional Reinforcement Learning
Ma, Yecheng Jason
Jayaraman, Dinesh
Bastani, Osbert
Advances in Neural Information Processing Systems, 2021, 23 : 19235 - 19247
[24] Offline reinforcement learning with task hierarchies
Schwab, Devin
Ray, Soumya
MACHINE LEARNING, 2017, 106 (9-10) : 1569 - 1598
[25] Survival Instinct in Offline Reinforcement Learning
Li, Anqi
Misra, Dipendra
Kolobov, Andrey
Cheng, Ching-An
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[26] Offline Reinforcement Learning at Multiple Frequencies
Burns, Kaylee
Yu, Tianhe
Finn, Chelsea
Hausman, Karol
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 2041 - 2051
[27] Offline reinforcement learning with representations for actions
Lou, Xingzhou
Yin, Qiyue
Zhang, Junge
Yu, Chao
He, Zhaofeng
Cheng, Nengjie
Huang, Kaiqi
INFORMATION SCIENCES, 2022, 610 : 746 - 758
[28] Dual Generator Offline Reinforcement Learning
Vuong, Quan
Kumar, Aviral
Levine, Sergey
Chebotar, Yevgen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[29] A Minimalist Approach to Offline Reinforcement Learning
Fujimoto, Scott
Gu, Shixiang Shane
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[30] An Optimistic Perspective on Offline Reinforcement Learning
Agarwal, Rishabh
Schuurmans, Dale
Norouzi, Mohammad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119

← 1 2 3 4 5 →