Learning conditional policies for crystal design using offline reinforcement learning

被引:0
|
作者
Govindarajan, Prashant [1 ]
Miret, Santiago [2 ]
Rector-Brooks, Jarrid [3 ]
Phielipp, Mariano [2 ]
Rajendran, Janarthanan [3 ]
Chandar, Sarath [1 ]
机构
[1] Mila Quebec AI Inst, Polytech, Montreal, PQ, Canada
[2] Intel Labs, Hillsboro, OR USA
[3] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ, Canada
来源
DIGITAL DISCOVERY | 2024年 / 3卷 / 04期
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1039/d4dd00024b
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Navigating through the exponentially large chemical space to search for desirable materials is an extremely challenging task in material discovery. Recent developments in generative and geometric deep learning have shown promising results in molecule and material discovery but often lack evaluation with high-accuracy computational methods. This work aims to design novel and stable crystalline materials conditioned on a desired band gap. To achieve conditional generation, we: (1) formulate crystal design as a sequential decision-making problem, create relevant trajectories based on high-quality materials data, and use conservative Q-learning to learn a conditional policy from these trajectories. To do so, we formulate a reward function that incorporates constraints for energetic and electronic properties obtained directly from density functional theory (DFT) calculations; (2) evaluate the generated materials from the policy using DFT calculations for both energy and band gap; (3) compare our results to relevant baselines, including behavioral cloning and unconditioned policy learning. Our experiments show that conditioned policies achieve targeted crystal design and demonstrate the capability to perform crystal discovery evaluated with accurate and computationally expensive DFT calculations. Conservative Q-learning for band-gap conditioned crystal design with DFT evaluations - the model is trained on trajectories constructed from crystals in the Materials Project. Results indicate promising performance for lower band gap targets.
引用
收藏
页码:769 / 785
页数:17
相关论文
共 50 条
  • [21] On Efficient Sampling in Offline Reinforcement Learning
    Jia, Qing-Shan
    2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 1 - 6
  • [22] Conservative network for offline reinforcement learning
    Peng, Zhiyong
    Liu, Yadong
    Chen, Haoqiang
    Zhou, Zongtan
    KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [23] Conservative Offline Distributional Reinforcement Learning
    Ma, Yecheng Jason
    Jayaraman, Dinesh
    Bastani, Osbert
    Advances in Neural Information Processing Systems, 2021, 23 : 19235 - 19247
  • [24] Offline reinforcement learning with task hierarchies
    Schwab, Devin
    Ray, Soumya
    MACHINE LEARNING, 2017, 106 (9-10) : 1569 - 1598
  • [25] Survival Instinct in Offline Reinforcement Learning
    Li, Anqi
    Misra, Dipendra
    Kolobov, Andrey
    Cheng, Ching-An
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [26] Offline Reinforcement Learning at Multiple Frequencies
    Burns, Kaylee
    Yu, Tianhe
    Finn, Chelsea
    Hausman, Karol
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 2041 - 2051
  • [27] Offline reinforcement learning with representations for actions
    Lou, Xingzhou
    Yin, Qiyue
    Zhang, Junge
    Yu, Chao
    He, Zhaofeng
    Cheng, Nengjie
    Huang, Kaiqi
    INFORMATION SCIENCES, 2022, 610 : 746 - 758
  • [28] Dual Generator Offline Reinforcement Learning
    Vuong, Quan
    Kumar, Aviral
    Levine, Sergey
    Chebotar, Yevgen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [29] A Minimalist Approach to Offline Reinforcement Learning
    Fujimoto, Scott
    Gu, Shixiang Shane
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [30] An Optimistic Perspective on Offline Reinforcement Learning
    Agarwal, Rishabh
    Schuurmans, Dale
    Norouzi, Mohammad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119